> ## Documentation Index
> Fetch the complete documentation index at: https://docs.alphaengine.trade/llms.txt
> Use this file to discover all available pages before exploring further.

# Evaluations

> Deterministic beta evaluation over a submitted simulation result.

The evaluation endpoint scores one submitted simulation result.

<CardGroup cols={3}>
  <Card title="Single payload" icon="file-check">
    Evaluate one official Strategy Arena simulation result at a time.
  </Card>

  <Card title="Benchmark adjusted" icon="chart-line">
    Compare scenario utility against the server-owned Benchmark Hold manifest.
  </Card>

  <Card title="Replay friendly" icon="rotate-ccw">
    Return score, diagnostics, and eligibility fields that a client can display.
  </Card>
</CardGroup>

## Endpoint

```http theme={null}
POST /v1/families/strategy-arena/evaluations
```

The request is expected to contain a simulation payload returned from the
official Strategy Arena simulation profile.

## Determinism

<AccordionGroup>
  <Accordion title="Simulation payload">
    The submitted body should be the official simulation payload returned by the
    Strategy Arena simulation profile.
  </Accordion>

  <Accordion title="Benchmark manifest">
    The router resolves the server-owned benchmark manifest for the resolved
    dataset. Clients do not provide a competing benchmark in public beta.
  </Accordion>

  <Accordion title="Scoring formula version">
    Evaluation uses the current beta scoring formula version. The same payload
    and rule version should produce the same score.
  </Accordion>
</AccordionGroup>

Repeating evaluation over the same simulation payload should return the same
score.

## Returned score

The endpoint returns `data.score` as the public beta leaderboard number:

$$
\mathrm{score} = 100 \times
\frac{\mathrm{scenarioResult.utilityScore}}
{\max(\mathrm{benchmarkUtility}, 0.0001)}
$$

This is the current beta UI sort key.

## Diagnostics

The response also includes:

<CardGroup cols={2}>
  <Card title="Benchmark context" icon="scale">
    Benchmark utility and benchmark-adjusted score breakdown.
  </Card>

  <Card title="Risk and return" icon="chart-line">
    Annualized return, max drawdown, CVaR, turnover, and execution cost.
  </Card>

  <Card title="Utility details" icon="list-checks">
    Utility breakdown fields that explain why the score moved.
  </Card>

  <Card title="Gate diagnostics" icon="shield">
    Eligibility result and the CapitalScore row for robustness review.
  </Card>
</CardGroup>

## Not included

Public v1 evaluation does not expose:

<AccordionGroup>
  <Accordion title="Contest ranking">
    ContestScore is a separate reward and duplicate-aware participation concept.
  </Accordion>

  <Accordion title="Duplicate penalties">
    Duplicate-aware contest logic is not part of the public evaluation endpoint.
  </Accordion>

  <Accordion title="Cross-submission leaderboard ownership">
    Leaderboard assembly remains a UI or thin-client concern in beta.
  </Accordion>
</AccordionGroup>

<Callout>
  Public beta clients should sort rows with `data.score` and show diagnostics
  beside the score.
</Callout>

## Related pages

<CardGroup cols={2}>
  <Card title="Scoring and benchmarks" icon="chart-line" href="/arena/scoring-and-benchmarks">
    How beta score, CapitalScore, ContestScore, and official evaluation differ.
  </Card>

  <Card title="Leaderboard and evidence" icon="table" href="/arena/leaderboard-and-evidence">
    What the beta UI should display per row.
  </Card>

  <Card title="API errors" icon="triangle-alert" href="/api/errors">
    Stable problem details and client handling rules.
  </Card>
</CardGroup>
