- Beta score
- CapitalScore
- ContestScore
- Private evaluation
Public beta ranking uses
score, a benchmark-adjusted display number
returned by the evaluation endpoint.Utility
Per-scenario utility combines return, drawdown, tail risk, turnover, and explicit execution cost: Default MVP weights:| Weight | Value | Meaning |
|---|---|---|
w_r | 1.0 | return contribution |
w_dd | 0.6 | drawdown penalty |
w_tail | 2.0 | tail-risk penalty |
w_to | 0.3 | turnover penalty |
w_cost | 1.0 | explicit cost penalty |
ranking-evaluation/docs/p0-evaluation-design-v1.md.
Beta score
The public beta evaluation endpoint returnsscore:
This is the current UI sort key. It compares the submitted simulation utility to
the benchmark utility for the resolved dataset.
The response also includes diagnostics such as annualized return, max drawdown,
CVaR, turnover, execution cost, benchmark utility, eligibility, and CapitalScore.
Benchmark Hold
Benchmark Hold is the comparison baseline for the same dataset and scored window. It anchors the beta score so a strategy is measured against simple principal-token carry rather than an isolated return number.CapitalScore
CapitalScore is the robust allocation diagnostic: It penalizes fragile strategies and small samples. It should be shown as a secondary diagnostic in beta unless product explicitly chooses it as the main leaderboard behavior. Source:ranking-evaluation/docs/p0-evaluation-design-v1.md.
ContestScore
ContestScore is separate from allocation logic. It supports contest rewards and duplicate-aware participation scoring. It must not influence CapitalScore. Source:ranking-evaluation/docs/development-log.md.
Official private evaluation
The internal official-market score combines hidden scenario scores: The current implementation expects one real-sister score, five perturbation scores, and five stress scores. Source:ranking-evaluation/docs/development-log.md.
Eligibility gates
Eligibility is separate from ranking. Current backend gates include:- positive-utility fraction at least
0.70, - max drawdown no more than
0.35, - mean turnover no more than
0.25, - p95 runtime no more than
10_000ms, - budget and config validity,
- finite numeric values.
Related pages
Leaderboard and evidence
See how rows should present score, diagnostics, artifacts, and eligibility.
Eligibility, not allocation
Keep capital eligibility separate from allocation promises.
Evaluations API
Inspect the implemented evaluation endpoint behavior.
