Single payload
Evaluate one official Strategy Arena simulation result at a time.
Benchmark adjusted
Compare scenario utility against the server-owned Benchmark Hold manifest.
Replay friendly
Return score, diagnostics, and eligibility fields that a client can display.
Endpoint
Determinism
Simulation payload
Simulation payload
The submitted body should be the official simulation payload returned by the
Strategy Arena simulation profile.
Benchmark manifest
Benchmark manifest
The router resolves the server-owned benchmark manifest for the resolved
dataset. Clients do not provide a competing benchmark in public beta.
Scoring formula version
Scoring formula version
Evaluation uses the current beta scoring formula version. The same payload
and rule version should produce the same score.
Returned score
The endpoint returnsdata.score as the public beta leaderboard number:
This is the current beta UI sort key.
Diagnostics
The response also includes:Benchmark context
Benchmark utility and benchmark-adjusted score breakdown.
Risk and return
Annualized return, max drawdown, CVaR, turnover, and execution cost.
Utility details
Utility breakdown fields that explain why the score moved.
Gate diagnostics
Eligibility result and the CapitalScore row for robustness review.
Not included
Public v1 evaluation does not expose:Contest ranking
Contest ranking
ContestScore is a separate reward and duplicate-aware participation concept.
Duplicate penalties
Duplicate penalties
Duplicate-aware contest logic is not part of the public evaluation endpoint.
Cross-submission leaderboard ownership
Cross-submission leaderboard ownership
Leaderboard assembly remains a UI or thin-client concern in beta.
Public beta clients should sort rows with
data.score and show diagnostics
beside the score.Related pages
Scoring and benchmarks
How beta score, CapitalScore, ContestScore, and official evaluation differ.
Leaderboard and evidence
What the beta UI should display per row.
API errors
Stable problem details and client handling rules.
