Journal · 2026-05-13-pipeline-fixes

2026-05-13-pipeline-fixes · agent findings

Run date: 2026-05-13-pipeline-fixes · 0 claims emitted

Pipeline fixes — 2026-05-13

Companion to 2026-05-13-week-20.md (the blog findings file). These are the production-side changes that came out of the audit.

Shipped

1. CBB DK resolver — `polyedge/workers/resolve_worker.py`

Problem: cbb_dk_spread, cbb_dk_totals, cbb_dk_fade had no entries in the _RESOLVERS dispatch dict. 295 predictions accumulated unresolved over ~5 days. This blocked the ATS road-favorite cell from ever surfacing a hit rate.

Fix: Added _resolve_cbb_dk_score() that settles DK CBB spread/totals predictions against cbb_game_snapshots.home_score / away_score. cbb_dk_fade routes to the existing Kalshi resolver (its slugs are KXNCAABBGAME-*). Hand-verified math on 4 cases (UCLA-Oregon under 15.5 won, Georgia-LSU under 12.5 lost, Stanford-NCSU under 15.5 won, USC-Bama under 11.5 lost) — all correct.

Result: 205 of 304 backlog predictions resolved on first run (91 spread, 105 totals, 5 totals pushes, 9 fade-Kalshi). 94 remain pending (2026-05-12/13 games whose scores haven't propagated to cbb_game_snapshots).

First-look hit rates:

Module	n	WR	Avg pred	Avg market
`cbb_dk_spread`	91	47.3%	81.2%	52.2%
`cbb_dk_totals`	105	55.2%	79.3%	53.2%

These have small n and the spread model is well below DK breakeven (~52.4%). Don't act on them yet; resolver will keep filling. Re-audit once n≥150 per market type.

2. Weather MIN_MODEL_P 0.40 → 0.45 — `polyedge/workers/weather_early_open_worker.py`

Problem: Live audit (n=475 resolved post-2026-04-19 predictions) showed +8.1pp structural model overconfidence across all three production EMOS versions. Scanner was emitting structurally-losing opportunities.

Fix: Raised threshold from 0.40 to 0.45. Realized WR rises monotonically past 0.45, so the new floor uses the overconfidence by refusing to act on it.

Risk surface: None added — strictly tightens an existing gate. Reversible by editing the constant.

3. Synthetic per-region calibration curves demoted to shadow

Problem: Per-region Beta curves (commit 3fa5824 / 2525507) fit on synthetic backtest replay (n=270K) had CV Brier 0.1881 but live 475-pred Brier is 0.196. The synthetic distribution doesn't match live.

Fix: Renamed calibration:weather:region_central and calibration:weather:region_northwest keys in system_settings to shadow_calibration:weather:<name>_synthetic_demoted_2026_05_13. The load_calibration_curve() lookup chain now falls through to calibration:weather (which is fit on LIVE emos_v1 + emos_v2_skill predictions, not synthetic).

Reversible: Rename keys back to restore.

Cache caveat: running uvicorn has _CURVE_CACHE populated with old curves. A launchctl kickstart -k gui/501/com.polyedge.v2 on next deploy will clear it. No emergency restart required — the threshold raise alone protects.

Deferred (P0 #3 from the audit)

Scanner-side climatology shrinkage layer

Recommendation from weather-researcher agent: Wire a calibrated_prob = 0.75 * raw + 0.25 * climatology_prob layer into the scanner using GHCN historical climatology as the prior (NOT Kalshi market price, which creates circularity per feedback_terminal_price_proxy_never).

Why deferred: Real surgery — touches 4+ model_prob emission sites in polyedge/modules/weather/scanner.py (lines 1840, 2522, 2587, 2952), each feeding downstream gates (edge, MIN_MODEL_P, rejection logic). The helper exists (compute_climatology() in backtest_bucket.py, _gauss_bucket_prob() in both backtest_bucket.py and bucket_model.py) but wiring it in correctly without breaking the live scanner needs a proper PR with unit tests.

Scope estimate: ~1-2 hours of focused work + test coverage. Best landed in a dedicated session, not bolted on at session end.

Why the deferral is safe: The threshold raise (0.40 → 0.45) plus the synthetic-curve demotion together accomplish most of what shrinkage would. The scanner now refuses to act on the overconfident bin AND falls through to the live-fit Beta curve. Whatever residual overconfidence remains is bounded by the 0.45 floor.

Not a bug (mistakenly suspected)

MLB F5 scanner — pre-production by design

Snapshotter is running normally (23,319 rows captured 2026-05-07 through 2026-05-13 09:17, includes today). There is no production F5 scanner because the forward-audit is scheduled 2026-05-21. After the audit, if the F5 ridge model holds up, a scanner gets wired then. Until then, F5 markets won't appear in opportunities and that's intentional.

Validation steps after deploy

lsof -ti :7842 → kill → restart uvicorn on canonical port. This clears _CURVE_CACHE so the demoted curves stop being served.
Watch the next weather scan: number of opps should drop (more filtered by MIN_MODEL_P 0.45).
After 24h, query: SELECT COUNT(*) FILTER (WHERE actual_outcome IS NOT NULL), AVG(market_price), AVG(predicted_prob) FROM predictions WHERE module='weather' AND predicted_at > now() - interval '24h'; — avg pred should drop toward 0.45 floor; if it stays >0.50, that signals the live-fit Beta curve is doing real work.
After 7 days, re-run scripts/fit_weather_calibration_from_live.py on the post-fix data. If holdout Brier now improves (it should — the 0.45 floor removes the worst-calibrated bin), the deferred climatology shrinkage may not be needed.

Files changed

polyedge/workers/resolve_worker.py (+88 lines: _resolve_cbb_dk_score + dispatch)
polyedge/workers/weather_early_open_worker.py (+9 lines: threshold raise + comment)
scripts/fit_weather_calibration_from_live.py (new, 159 lines)
docs/blog_edges/findings/2026-05-13-pipeline-fixes.md (this file)
system_settings table: 2 keys renamed to shadow_*