Evaluations - AlphaEngine

The evaluation endpoint scores one submitted simulation result.

Single payload

Evaluate one official Strategy Arena simulation result at a time.

Benchmark adjusted

Compare scenario utility against the server-owned Benchmark Hold manifest.

Replay friendly

Return score, diagnostics, and eligibility fields that a client can display.

Endpoint

POST /v1/families/strategy-arena/evaluations

The request is expected to contain a simulation payload returned from the official Strategy Arena simulation profile.

Determinism

Simulation payload

The submitted body should be the official simulation payload returned by the Strategy Arena simulation profile.

Benchmark manifest

The router resolves the server-owned benchmark manifest for the resolved dataset. Clients do not provide a competing benchmark in public beta.

Scoring formula version

Evaluation uses the current beta scoring formula version. The same payload and rule version should produce the same score.

Repeating evaluation over the same simulation payload should return the same score.

Returned score

The endpoint returns data.score as the public beta leaderboard number:

\mathrm{score} = 100 \times \frac{\mathrm{scenarioResult.utilityScore}} {\max(\mathrm{benchmarkUtility}, 0.0001)}

This is the current beta UI sort key.

Diagnostics

The response also includes:

Benchmark context

Benchmark utility and benchmark-adjusted score breakdown.

Risk and return

Annualized return, max drawdown, CVaR, turnover, and execution cost.

Utility details

Utility breakdown fields that explain why the score moved.

Gate diagnostics

Eligibility result and the CapitalScore row for robustness review.

Not included

Public v1 evaluation does not expose:

Contest ranking

ContestScore is a separate reward and duplicate-aware participation concept.

Duplicate penalties

Duplicate-aware contest logic is not part of the public evaluation endpoint.

Cross-submission leaderboard ownership

Leaderboard assembly remains a UI or thin-client concern in beta.

Public beta clients should sort rows with data.score and show diagnostics beside the score.

Scoring and benchmarks

How beta score, CapitalScore, ContestScore, and official evaluation differ.

Leaderboard and evidence

What the beta UI should display per row.

API errors

Stable problem details and client handling rules.

Simulations Errors

Single payload

Benchmark adjusted

Replay friendly

​Endpoint

​Determinism

​Returned score

​Diagnostics

Benchmark context

Risk and return

Utility details

Gate diagnostics

​Not included

​Related pages

Scoring and benchmarks

Leaderboard and evidence

API errors

Endpoint

Determinism

Returned score

Diagnostics

Not included

Related pages