Scoring and Benchmarks

AlphaEngine scores strategies on evidence, not raw APY.

Beta score
CapitalScore
ContestScore
Private evaluation

Public beta ranking uses score, a benchmark-adjusted display number returned by the evaluation endpoint.

Utility

Per-scenario utility combines return, drawdown, tail risk, turnover, and explicit execution cost:

U = w_r \mu - w_{dd}\mathrm{MDD} - w_{tail}\mathrm{CVaR} - w_{to}\mathrm{TO} - w_{cost}\mathrm{Cost}

Default MVP weights:

Weight	Value	Meaning
`w_r`	`1.0`	return contribution
`w_dd`	`0.6`	drawdown penalty
`w_tail`	`2.0`	tail-risk penalty
`w_to`	`0.3`	turnover penalty
`w_cost`	`1.0`	explicit cost penalty

Source: ranking-evaluation/docs/p0-evaluation-design-v1.md.

Beta score

The public beta evaluation endpoint returns score:

\mathrm{score} = 100 \times \frac{\mathrm{strategyUtility}}{\max(\mathrm{benchmarkUtility}, 0.0001)}

This is the current UI sort key. It compares the submitted simulation utility to the benchmark utility for the resolved dataset. The response also includes diagnostics such as annualized return, max drawdown, CVaR, turnover, execution cost, benchmark utility, eligibility, and CapitalScore.

Benchmark Hold

Benchmark Hold is the comparison baseline for the same dataset and scored window. It anchors the beta score so a strategy is measured against simple principal-token carry rather than an isolated return number.

CapitalScore

CapitalScore is the robust allocation diagnostic:

\mathrm{CapitalScore} = \operatorname{mean}(U_s) - \kappa \cdot \operatorname{std}(U_s) - \frac{c}{\sqrt{T_{\mathrm{eff}}}}

It penalizes fragile strategies and small samples. It should be shown as a secondary diagnostic in beta unless product explicitly chooses it as the main leaderboard behavior. Source: ranking-evaluation/docs/p0-evaluation-design-v1.md.

ContestScore

ContestScore is separate from allocation logic. It supports contest rewards and duplicate-aware participation scoring. It must not influence CapitalScore. Source: ranking-evaluation/docs/development-log.md.

Official private evaluation

The internal official-market score combines hidden scenario scores:

\mathrm{officialMarketScore} = 0.65 \cdot \mathrm{RealSisterScore} + 0.20 \cdot p_{25}(\mathrm{PerturbationScores}) + 0.15 \cdot p_{25}(\mathrm{StressScores})

The current implementation expects one real-sister score, five perturbation scores, and five stress scores. Source: ranking-evaluation/docs/development-log.md.

Eligibility gates

Eligibility is separate from ranking. Current backend gates include:

positive-utility fraction at least 0.70,
max drawdown no more than 0.35,
mean turnover no more than 0.25,
p95 runtime no more than 10_000 ms,
budget and config validity,
finite numeric values.

Eligibility failures should be explicit and fixable.

Leaderboard and evidence

See how rows should present score, diagnostics, artifacts, and eligibility.

Eligibility, not allocation

Keep capital eligibility separate from allocation promises.

Evaluations API

Inspect the implemented evaluation endpoint behavior.

​Utility

​Beta score

​Benchmark Hold

​CapitalScore

​ContestScore

​Official private evaluation

​Eligibility gates

​Related pages