Watching · Open research

What we're watching

Cells, hypotheses, and open research questions the model is tracking but doesn't yet have the empirical backing to publish as picks. Each entry shows the current sample, what would promote it to a live pick, and what would kill it.

The confidence pill on each card is honest. "Low" means we're still collecting data or the CI is wide enough that the cell could be noise. "Low-medium" means the directional signal is real but the sample needs more time, or the juice math is unknown. We won't mark anything "medium" on this page -- by the time it earns that label, it's on the picks page.

NBA

NBA · Totals

low-medium

NBA totals: the 5-10pp edge band wins more than the 10pp+ band

Our NBA totals model is well-calibrated in its middle and overconfident at its tail. The 5-10pp edge band is the actual edge; the 10pp+ band is noise.

Evidence so far

n=81 · WR 71.6% · CI [61.0, 80.3] · breakeven 52.4% · ROI +45.2%

Post-2026-04-07 (playoff model launch). 686 opps logged in the 5-10pp band in the last 24h, so the slate has constant action.

Promote if

Sample reaches n=150 in the playoff-only cohort with WR ≥ 55%
CI lower bound stays > 52.4% (DK breakeven)
Pattern replicates in regular-season post-fix data

Retire if

Forward WR drops below 55% on the next n=50 picks
CI lower bound crosses below breakeven

Next checkpointWeekly -- sample grows ~2-3 picks/day in the band

Sources (3)

predictions table query, 2026-05-13
blog_edges/claims/2026-05-13-week-20-rerun.json (2026-W20-02)
memory: project_nba_playoff_model_2026_04_07

NBA · Totals

low confidence

NBA totals 10pp+: borderline, CI spans breakeven by 1pp

The 10pp+ edge band is the cell the model claims to be most confident in, but 60 days of resolved data has the lower CI bound just below breakeven. Either it's noise that mean-reverts, or the next month of data separates it.

Evidence so far

n=276 · WR 55.1% · CI [49.2, 60.8] · breakeven 52.4% · ROI +10.5%

Wilson lower bound 49.2% is 3.2pp below breakeven. Don't bet it; the +10.5% pooled ROI is real for the period but the CI says noise is plausible.

Promote if

30 more days of data move the CI lower bound above 52.4%
Same pattern shows up in regular-season-only cohort

Retire if

CI lower bound drops below 48% after 100 more picks
Pattern remains 'high-edge underperforms low-edge' -- promotes the 5-10pp story over this one

Next checkpointWeekly -- at current pace ~40 new picks/week

Sources (1)

predictions table query, 2026-05-13 (60d window)

CBB

CBB · ATS

low-medium

CBB DK road-favorite -1.5: 73.3% cover on holdout, juice unknown

Heavy road favorites (PEA ≥3.0) in college baseball cover -1.5 at 73.3% on our 2026 holdout (n=225). Wilson CI [67.2%, 78.7%], year-stable. The unknown is DK juice on heavy road -1.5 -- breakeven is ~65.5% at -190 and 71.4% at -250. We need 2-3 weeks of forward DK spread snapshots to compute the actual edge.

Evidence so far

n=225 · WR 73.3% · CI [67.2, 78.7] · breakeven 65.5%

Cover rate is real and monotonic. The CBB DK resolver shipped this week, which means the forward DK spread snapshot clock just started. Until then, no live picks from this cell.

Promote if

Three weeks of forward DK spread snapshots show cover rate holds at 70%+
Implied juice at point of bet typically falls in -180 to -220 range (where breakeven is 64-68%)

Retire if

Forward cover rate drops below 65%
Snapshot data shows DK consistently posts -250+ on these games (juice eats the edge)

Next checkpoint2026-06-03 (3 weeks after CBB resolver fix shipped)

Sources (2)

memory: project_cbb_ats_road_favorite_2026_05_12
blog_edges/findings/2026-05-13-pipeline-fixes.md (resolver fix)

CBB · Totals

low confidence

CBB totals: model is biased high on its highest projections

LGBM totals bias drifts from +0.41 runs in the pred≤11 band to +0.93 runs in the pred≥15 band on a 527-game holdout. That's a calibration patch candidate, orthogonal to the park signal we're already tracking.

Evidence so far

n=527

Not a win-rate cell -- this is a model-bias diagnostic. Bias correction lands as a model refit, not as a betting cell directly. Worth flagging because the magnitude is +0.93 runs on high-total games, which is meaningful for over/under calls.

Promote if

Fit a per-band intercept correction, refit, holdout MAE drops by ≥0.05 runs
Live forward MAE on high-total games tightens to within 0.3 runs

Retire if

Refit doesn't move holdout MAE, or the bias appears to be a sample artifact

Next checkpointNext CBB session (after CWS, ~mid-June)

Sources (1)

memory: project_cbb_totals_band_bias_2026_05_12

WEATHER

Weather · Calibration

low confidence

Weather threshold raise: did 0.45 actually fix the overconfidence?

We raised MIN_MODEL_P from 0.40 to 0.45 this morning because the 475-pred audit said the model is +8.1pp overconfident at the floor. After 24h of post-fix data, avg predicted_prob on new opps should drop toward 0.45. After 7 days, holdout Brier should improve. If neither happens, we've got more diagnostic work to do.

Evidence so far

n=475 · WR 23.8% · CI [20.0, 28.0] · breakeven 23.9%

Pre-fix data. Post-fix scan hadn't run when this entry was written. The whole point of the watching item is the empirical confirmation.

Promote if

Post-fix avg predicted_prob ≥ 0.45 and rising over 7 days
Holdout Brier on the next 100 resolved predictions ≤ 0.18

Retire if

Post-fix calibration still shows +5pp+ structural overconfidence on n=100 → climatology shrinkage layer becomes urgent

Next checkpoint2026-05-20 (one week post-fix audit)

Sources (3)

blog_edges/findings/2026-05-13-pipeline-fixes.md
blog_edges/claims/2026-05-13-week-20.json (2026-W20-01)
memory: project_kalshi_weather_calibration_audit_2026_05_12

MLB

MLB · F5

low confidence

MLB F5 totals: forward audit due May 21

The MLB First-5 totals signal was real on the cheat-mode backtest (n=1141, gap≥1.0 WR 59.5%) but the snapshotter has been collecting live KXMLBF5 data since 2026-05-07. The May 21 audit is the gate: does live data confirm the backtest, or did we overfit?

Evidence so far

breakeven 52.4%

23,319 snapshot rows captured 2026-05-07 through 2026-05-13. No live scanner yet by design -- the audit decides whether to wire one.

Promote if

Forward audit shows live WR ≥ 56% on n=80+ picks in the gap≥1.0 cell
Pattern matches the cheat-mode backtest at similar significance

Retire if

Forward audit WR drops below 50% -- overfit, retire the cell

Next checkpoint2026-05-21 (forward audit per project_mlb_f5_day1_2026_05_07)

Sources (2)

memory: project_mlb_f5_day1_2026_05_07
scripts/cbb_cws_montecarlo.py (related infra)