Skip to main content
The evaluation endpoint scores one submitted simulation result.

Single payload

Evaluate one official Strategy Arena simulation result at a time.

Benchmark adjusted

Compare scenario utility against the server-owned Benchmark Hold manifest.

Replay friendly

Return score, diagnostics, and eligibility fields that a client can display.

Endpoint

POST /v1/families/strategy-arena/evaluations
The request is expected to contain a simulation payload returned from the official Strategy Arena simulation profile.

Determinism

The submitted body should be the official simulation payload returned by the Strategy Arena simulation profile.
The router resolves the server-owned benchmark manifest for the resolved dataset. Clients do not provide a competing benchmark in public beta.
Evaluation uses the current beta scoring formula version. The same payload and rule version should produce the same score.
Repeating evaluation over the same simulation payload should return the same score.

Returned score

The endpoint returns data.score as the public beta leaderboard number: score=100×scenarioResult.utilityScoremax(benchmarkUtility,0.0001)\mathrm{score} = 100 \times \frac{\mathrm{scenarioResult.utilityScore}} {\max(\mathrm{benchmarkUtility}, 0.0001)} This is the current beta UI sort key.

Diagnostics

The response also includes:

Benchmark context

Benchmark utility and benchmark-adjusted score breakdown.

Risk and return

Annualized return, max drawdown, CVaR, turnover, and execution cost.

Utility details

Utility breakdown fields that explain why the score moved.

Gate diagnostics

Eligibility result and the CapitalScore row for robustness review.

Not included

Public v1 evaluation does not expose:
ContestScore is a separate reward and duplicate-aware participation concept.
Duplicate-aware contest logic is not part of the public evaluation endpoint.
Leaderboard assembly remains a UI or thin-client concern in beta.
Public beta clients should sort rows with data.score and show diagnostics beside the score.

Scoring and benchmarks

How beta score, CapitalScore, ContestScore, and official evaluation differ.

Leaderboard and evidence

What the beta UI should display per row.

API errors

Stable problem details and client handling rules.