Calibration · Trust receipts

How well-calibrated are these projections?

When a model says "65% chance," the actual win rate should be 65%. Below is the honest version of how close each model gets: per-model status, the size of any known miscalibration, and the correction already applied. Bin charts land as each sport's resolve-worker exposes a public endpoint.

Sports instrumented

CBB, NBA, CFB, NFL

Weather cities

Kalshi temperature markets

Resolve audits

Weekly

cbb_weekly_health_check_worker, Mondays 10am PT

Last refresh

2026-05-12

Numbers below

Per-model status

CBB · Win probability

Instrumented · publishing chart soon

PAV-isotonic calibrator on top of the v2 model, refit weekly. Latest CV Brier 0.22428 across 3,242 pairs. Calibrator's weak point is the 0.10–0.20 predicted-probability bin where training data is thin (n≈23).

Source: scripts/cbb_isotonic_refit.py (2026-05-08 refit)· Resolved samples: 3,242

Weather · Temperature bands

Instrumented · publishing chart soon

EMOS v1 is mildly overconfident above 30% predicted probability. The live trading layer caps confidence at 0.40 in response. Public charts will show predicted-vs-actual band rates per city per band.

Source: EMOS v1 12K-game backtest (2026-05-01)· Resolved samples: 12K historical + ~300 live

NBA · Totals

Instrumented · publishing chart soon

Live totals residual GBM brings out-of-sample MAE from 18.2 to 11.2 (–39%) on a 1,220-game holdout. Vegas-prior τ-blend ships on top to anchor early-game projections.

Source: Live totals overhaul backtest (2026-04-30)· Resolved samples: 1,220 game holdout

NBA · Championship

Awaiting season data

Playoff-series model uses dual-Elo + in-series state. Two full seasons (2024, 2025) backfilled to tune the playoff_total_bias_correction. Calibration chart lands when the 2026 playoffs accumulate enough resolved series.

· Resolved samples: 166 playoff games backfilled

CFB · Playoff

Awaiting season data

Committee-proxy seeding learned on 2014–2024 CFP history. Out-of-sample seed accuracy + bracket-walk Brier will land here when the 2026 season starts and we have live committee comparisons.

NFL · Season wins

Live

Off-season ratings calibrated against DraftKings 2026 win totals: 25 of 32 teams within 1.5 wins. Median team within 0.8 wins. The full per-team table is in our internal Vegas-comparison doc; cleaning a public version now.

Source: docs/over_under_vs_vegas_predictions.md· Resolved samples: 32 teams

What "calibrated" means here

Pick a model. Group every one of its predictions by predicted probability -- say in 10pp buckets. For each bucket, compute the actual rate at which the event happened. Plot those points. A perfectly calibrated model has every bucket sitting on the diagonal: where you predicted 60% you got 60%. We'll publish those plots per model here, with bin counts and rolling 30/60/90-day windows. The audit query runs Mondays at 10am PT in cbb_weekly_health_check_worker; parallel weather and NBA audits run nightly.

Why we publish the misses

The two biggest model corrections we've shipped this year -- the EMOS-v1 overconfidence cap on weather and the live NBA totals GBM replacement -- both came out of the calibration process catching the model lying. The framework is the moat. The bin charts are how we prove it's real.

Source: Per-sport calibration audits (CBB weekly worker, weather/NBA nightly backtests)· Updated 2026-05-12