The Marginbeta
Journal

Week of May 12: Three Production Fixes, One Watching-Tier Signal

Three production fixes shipped this week from the same audit that's looking for tradable cells. One watching-tier signal at 71.6% on 81 games. The picks page is empty by design.

The Margin·May 13, 2026·6 min read

Welcome to The Margin. Every Tuesday, the system audits itself and the audit gets published. What broke, what shipped, what's worth watching, and (when the data clears the bar) which cells have earned real money.

This week, the audit found more bugs than edges. That's the honest read on mature prediction markets in May, and the post is going to write through it rather than around it.

What we found wrong

The morning run pooled 475 resolved weather predictions across three production model versions and asked one question: was the model right, or was the market right?

The model claimed Kalshi temperature markets were underpriced by 6 to 13 percentage points in every version. Reality said the market was correct to within 0.1pp. Realized win rate: 23.8% against an average claimed probability of 31.9%. The Wilson 95% confidence interval on the pooled rate (a standard binomial CI that doesn't pretend the sample is normal at the tails) was [20%, 28%]. The model's number sat well above the upper bound. That isn't noise. It's a structural overconfidence bias.

Three patches shipped together:

  1. Weather scanner threshold raised from MIN_MODEL_P 0.40 to 0.45. Realized win rate rises monotonically past 0.45, so a higher floor uses the overconfidence by refusing to act on it.
  2. Synthetic per-region calibration curves demoted. They were fit on a 270K-sample backtest replay with cross-validated Brier score (mean squared error on probabilities) of 0.188, but live Brier came in at 0.196. The synthetic distribution didn't match live. They were demoted to shadow status so the lookup chain falls through to the live-fit Beta curve instead.
  3. CBB DK resolver was missing. This one wasn't on the audit list. It surfaced because an internal agent tried to compute a CBB edge and couldn't, because 295 predictions had been sitting unresolved for five days. Four lines of code fixed it, 205 backlog rows resolved on the first run.

Pulled $0 of revenue. Saved a real amount the scanner would have lost.

What we're tracking

One cell survived the artifact gate and is sitting just below the formal n=150 promotion threshold: NBA totals in the 5-10pp model-edge band.

Post playoff-model launch (2026-04-07 onward), n=81 resolved predictions. The model has won 71.6% of them. Wilson 95% CI is [61.0%, 80.3%], and the lower bound sits 8.6 percentage points above DraftKings' breakeven of 52.4%. (Breakeven is the win rate required to break even at standard -110 juice: 110 / 210, or 52.4%.) ROI per dollar staked in the band: +45.2%.

The shape matters more than the headline number. The 10pp+ band, where the model claims its biggest edges, wins only 55.1% on n=276. Wilson lower bound there is 49.2%, below breakeven. The "huge edge" picks aren't safer, they're the noisiest. The model is well-calibrated in the middle and overconfident at the tail — the same pattern this week's weather audit just falsified, surfacing in a completely different module.

This is a model-bias observation, not a betting recommendation. The cell isn't promoted yet for a few reasons:

  • n=81 is below the 150-row promotion bar.
  • The pattern hasn't appeared in an independent universe, only post-2026-04-07 playoff games.
  • The tail-overconfidence story is a hypothesis. It could be a sample-size coincidence that mean-reverts.

Promotion conditions: forward weeks add 50+ picks in the band, win rate stays above 55%, lower CI bound holds above breakeven. The full watching-tier list lives on /watching.

Last week's call

System seeding. Nothing was promoted to live last week, so there's nothing to grade this week. The journal starts a real receipt next Tuesday.

What just got better

The CBB DK resolver fix did two things. It cleared the 295-row backlog, and it started the forward clock on a more interesting cell buried in the internal notes: DK ATS road-favorite -1.5 with PEA composite rating gap ≥3.0. The 2026 holdout said 73.3% cover rate on n=225, but the juice on heavy road favorites is brutal. At -190 the breakeven is 65.5%, at -250 it's 71.4%. To compute that forward, the system needs live DK spread snapshots. Until last week, those weren't being captured.

They are now. Three to four weeks of new data either clear the juice math or don't. Either outcome is a journal post.

First-look numbers on the now-flowing CBB cells:

  • cbb_dk_spread: 47.3% on n=91. Wilson CI [37.3, 57.4]. Spans breakeven. Noise.
  • cbb_dk_totals: 55.2% on n=105. Wilson CI [45.7, 64.4]. Spans breakeven. Noise.

Neither cell is tradeable. The point is the data flows.

Where to look next

  • Picks. Empty by design this week. When a cell clears the promotion bar, each pick renders next to its current DraftKings line so the value movement is visible.
  • Journal. Every weekly audit, archived with status pills. The CSV-style trail of what was claimed and what was delivered.
  • Methodology. How each model works, with the K_DIFF and HFA constants for readers who want them.
  • Calibration. Bin charts ship per sport as samples accumulate. Calibration means: when the model says 70%, does the 70%-bucket win 70% of the time?

History on prediction-market "edge content" is brutal. Every cell that looked like a 70% winner at n=30 reverted by n=150. Every "+30pp edge" the system has ever flagged turned out to be a measurement artifact: stale Kalshi orderbook quotes, terminal-price proxies, classifier shortcuts. The four-check artifact gate exists because of those, and it kept the file honest this week.

One validated cell with the math behind it beats seven picks no one can defend. The NBA 5-10pp band is one promotion step from the first real one.

NBA totals: realized win rate by model edge tier (post 2026-04-07)

The 5-10pp band wins 71.6% on n=81. The 10pp+ band, where the model claims its biggest edges, hits 55.1% on n=276. Breakeven on standard DK juice is 52.4%.Source: The Margin predictions table, resolved games after NBA playoff-model launch