The Marginbeta
Journal · 2026-W20

2026-W20 · agent findings

Run date: 2026-05-13 · 1 claims emitted

Week 2026-W20 — Blog edge findings

Generated: 2026-05-13 09:10 MDT Cadence: weekly (1-2 posts) Run type: first run; no prior claims to resolve.

Summary

Live findings: 0. The validated-cell whitelist matched 0 markets that ALSO passed the empirical backing gate and the 4-check artifact filter this week. CFB is offseason (kickoff 2026-08-29). MLB F5 markets aren't yet posting (forward-audit on the F5 cell is dated 2026-05-21 in project_mlb_f5_day1_2026_05_07). NBA team-totals only has n=40 resolved — too small for a claim. Weather post-fix predictions FAIL their own calibration check (see Finding 1 below).

Recommended post: ONE falsification piece this week (Finding 1). It's the strongest honest story available and connects to a known memory entry (project_kalshi_weather_calibration_audit_2026_05_12), turning recent internal-research output into reader-facing content.

Finding 1 — Falsification — "Our weather temp model is overconfident; the market is right"

Kind: falsification / calibration retro (NOT a live pick). Hook for the post: "We built our own ensemble weather model. Our model says Kalshi temp markets are systematically underpriced by 8 points. We've run the experiment 475 times. Here's why we're not betting our own model."

The data

Resolved weather temp predictions, post-2026-04-19 NWS-resolver fix (all earlier predictions were corrupted by the low-vs-high bug per project_weather_resolve_bug_2026_04_19):

Model versionnWinsRealized WRAvg market priceAvg model probModel − Reality
pre_emos1222218.0%19.5%31.5%+13.5pp
emos_v11744425.3%24.6%31.8%+6.5pp
emos_v2_skill1182722.9%25.6%32.5%+9.6pp
emos_v2_obs_features20735.0%30.5%26.1%−8.9pp (n too small)
(null mv)411331.7%26.3%34.3%+2.6pp
Total (post-fix)47511323.8%23.9%31.9%+8.1pp

What this means

  • Our model has thought weather temp markets were underpriced by 6-13 points in every production version we've run.
  • Reality (475 resolved bets): the market was correct to 0.1pp. The market wasn't underpriced — our model was overconfident.
  • This is the same pattern as the "cheap-YES +815% ROI" artifact (feedback_cheap_yes_artifact_2026_05_07): the model emits high probabilities, the price says "no it won't," reality agrees with the price. Three independent model versions, same direction, same magnitude. That's a model problem, not a market problem.
  • Wilson 95% CI on the pooled 23.8% (n=475): roughly [0.20, 0.28]. The model's claimed 31.9% sits well above the upper bound — not noise, structural overconfidence.

Why it's a good blog post

  1. Counter-narrative. Most edge-detection content is "look at this alpha." This is "here's our model failing in production and what we learned." Trust-builder.
  2. It connects to a falsification we already documented internally (project_kalshi_weather_calibration_audit_2026_05_12 flagged the +30pp finding as likely artifact). The post can walk through the chain: model finds edge → backtest confirms → live resolution refutes → here's the diagnosis.
  3. Reader-actionable framing: "Why we treat 'this market is underpriced' as a hypothesis to test, not a signal to trade."

Suggested angles to dig into

  • Why does the gap shrink (13.5pp → 6.5pp → 9.6pp) across model versions but never disappear? Calibration improvements helped but didn't fix the structural bias.
  • Calibration plot: pred-prob bin vs realized WR. The visual will show the model's confidence curve diverging from y=x in the 20-40% bin.
  • The MAD-recentering shipped 2026-04-06 (project_weather_trust_recenter_2026_04_06) — did it help? Cut by details->>'forecast_source' to see.

Sources

  • predictions table query, executed 2026-05-13 09:08 MDT
  • project_kalshi_weather_calibration_audit_2026_05_12.md
  • feedback_cheap_yes_artifact_2026_05_07.md
  • project_weather_resolve_bug_2026_04_19.md (the 2026-04-19 cutoff)
  • project_weather_trust_recenter_2026_04_06.md

Artifact checklist

  • Exogenous resolution. Predictions resolved against NWS observations via the post-2026-04-19 fixed resolver. feedback_terminal_price_proxy_never not violated.
  • No close_at proxy used — this analysis is on resolved predictions, not Kalshi orderbook timing.
  • Post-fix data only. predicted_at >= 2026-04-19. The 289 predictions corrupted by the NWS low-vs-high bug are excluded.
  • Sample size. n=475 pooled, n=174 on the largest single model version. Wilson CI is tight.

Slate notes (what was rejected and why)

CellLive opps (24h)Resolved nVerdict
CFB home dog edge≥5 pickem-70n/aOffseason — kickoff 2026-08-29
CFB edge≥10 pickem-7/14-210n/aOffseason
Weather EMOS post-fix7 today475FAIL empirical — model overconfident +8.1pp; turned into Finding 1
NBA totals overreactionneeds live in-gamen/aPre-scan data can't surface this; needs /live watcher signal
NBA team-totals Vegas-divergence gate22 NBA opps total40FAIL n: n=40 < 150, Wilson CI [0.42, 0.71] spans BE 0.519
CBB CWS futures0n/aCWS markets not posted yet (selection late May)
CBB ATS road-fav -1.5 (paper)DK CBB spread present0 resolvedResolver hasn't run on cbb_dk_spread — data-pipeline issue (see below)
MLB KXMLBF5 winner00F5 markets not in opportunities slate; forward-audit dated 2026-05-21

Data-pipeline issue surfaced this run

cbb_dk_spread and cbb_dk_totals have 295 predictions but 0 resolved. The resolver isn't processing DK CBB predictions. This blocks the ATS road-favorite cell from ever producing a hit-rate claim until fixed. Not a blog topic — internal-fix ticket.

Likely lives in polyedge/workers/resolve_worker.py — the working tree shows it's currently modified (git status flagged it). Worth checking that the in-progress changes don't drop DK CBB resolution.

What didn't make it (and why)

  • NBA spread edges from the live slate (KXNBASPREAD-26MAY13CLEDET-* cluster — 5 of the top 22 NBA opps were the same CLE@DET game with different point lines). These are model-vs-market spread snapshots with no whitelist cell behind them. The user's memory has banned proposing edges purely from edge > X filters without a validated pattern. Skipped.
  • MLB total Unders (3 of top 5 MLB edges). MLB totals had a catastrophic falsification audit (project_kalshi_mlb_totals_overhaul_2026_04_04 — no significant results at 70 games). Not on whitelist. Skipped.
  • CBB DK spread edges +31% on Mercer/GT — CBB regular-season DK isn't on the whitelist per project_cbb_cws_pivot_2026_05_12 (pivot to CWS futures). Skipped despite headline size.