Skip to main content
AlphaEngine scores strategies on evidence, not raw APY.
Public beta ranking uses score, a benchmark-adjusted display number returned by the evaluation endpoint.

Utility

Per-scenario utility combines return, drawdown, tail risk, turnover, and explicit execution cost: U=wrμwddMDDwtailCVaRwtoTOwcostCostU = w_r \mu - w_{dd}\mathrm{MDD} - w_{tail}\mathrm{CVaR} - w_{to}\mathrm{TO} - w_{cost}\mathrm{Cost} Default MVP weights:
WeightValueMeaning
w_r1.0return contribution
w_dd0.6drawdown penalty
w_tail2.0tail-risk penalty
w_to0.3turnover penalty
w_cost1.0explicit cost penalty
Source: ranking-evaluation/docs/p0-evaluation-design-v1.md.

Beta score

The public beta evaluation endpoint returns score: score=100×strategyUtilitymax(benchmarkUtility,0.0001)\mathrm{score} = 100 \times \frac{\mathrm{strategyUtility}}{\max(\mathrm{benchmarkUtility}, 0.0001)} This is the current UI sort key. It compares the submitted simulation utility to the benchmark utility for the resolved dataset. The response also includes diagnostics such as annualized return, max drawdown, CVaR, turnover, execution cost, benchmark utility, eligibility, and CapitalScore.

Benchmark Hold

Benchmark Hold is the comparison baseline for the same dataset and scored window. It anchors the beta score so a strategy is measured against simple principal-token carry rather than an isolated return number.

CapitalScore

CapitalScore is the robust allocation diagnostic: CapitalScore=mean(Us)κstd(Us)cTeff\mathrm{CapitalScore} = \operatorname{mean}(U_s) - \kappa \cdot \operatorname{std}(U_s) - \frac{c}{\sqrt{T_{\mathrm{eff}}}} It penalizes fragile strategies and small samples. It should be shown as a secondary diagnostic in beta unless product explicitly chooses it as the main leaderboard behavior. Source: ranking-evaluation/docs/p0-evaluation-design-v1.md.

ContestScore

ContestScore is separate from allocation logic. It supports contest rewards and duplicate-aware participation scoring. It must not influence CapitalScore. Source: ranking-evaluation/docs/development-log.md.

Official private evaluation

The internal official-market score combines hidden scenario scores: officialMarketScore=0.65RealSisterScore+0.20p25(PerturbationScores)+0.15p25(StressScores)\mathrm{officialMarketScore} = 0.65 \cdot \mathrm{RealSisterScore} + 0.20 \cdot p_{25}(\mathrm{PerturbationScores}) + 0.15 \cdot p_{25}(\mathrm{StressScores}) The current implementation expects one real-sister score, five perturbation scores, and five stress scores. Source: ranking-evaluation/docs/development-log.md.

Eligibility gates

Eligibility is separate from ranking. Current backend gates include:
  • positive-utility fraction at least 0.70,
  • max drawdown no more than 0.35,
  • mean turnover no more than 0.25,
  • p95 runtime no more than 10_000 ms,
  • budget and config validity,
  • finite numeric values.
Eligibility failures should be explicit and fixable.

Leaderboard and evidence

See how rows should present score, diagnostics, artifacts, and eligibility.

Eligibility, not allocation

Keep capital eligibility separate from allocation promises.

Evaluations API

Inspect the implemented evaluation endpoint behavior.