2026-05-13-pipeline-fixes · agent findings
Pipeline fixes — 2026-05-13
Companion to 2026-05-13-week-20.md (the blog findings file). These are
the production-side changes that came out of the audit.
Shipped
1. CBB DK resolver — polyedge/workers/resolve_worker.py
Problem: cbb_dk_spread, cbb_dk_totals, cbb_dk_fade had no entries
in the _RESOLVERS dispatch dict. 295 predictions accumulated unresolved
over ~5 days. This blocked the ATS road-favorite cell from ever surfacing
a hit rate.
Fix: Added _resolve_cbb_dk_score() that settles DK CBB spread/totals
predictions against cbb_game_snapshots.home_score / away_score.
cbb_dk_fade routes to the existing Kalshi resolver (its slugs are
KXNCAABBGAME-*). Hand-verified math on 4 cases (UCLA-Oregon under 15.5
won, Georgia-LSU under 12.5 lost, Stanford-NCSU under 15.5 won, USC-Bama
under 11.5 lost) — all correct.
Result: 205 of 304 backlog predictions resolved on first run
(91 spread, 105 totals, 5 totals pushes, 9 fade-Kalshi). 94 remain
pending (2026-05-12/13 games whose scores haven't propagated to
cbb_game_snapshots).
First-look hit rates:
| Module | n | WR | Avg pred | Avg market |
|---|---|---|---|---|
cbb_dk_spread | 91 | 47.3% | 81.2% | 52.2% |
cbb_dk_totals | 105 | 55.2% | 79.3% | 53.2% |
These have small n and the spread model is well below DK breakeven (~52.4%). Don't act on them yet; resolver will keep filling. Re-audit once n≥150 per market type.
2. Weather MIN_MODEL_P 0.40 → 0.45 — polyedge/workers/weather_early_open_worker.py
Problem: Live audit (n=475 resolved post-2026-04-19 predictions) showed +8.1pp structural model overconfidence across all three production EMOS versions. Scanner was emitting structurally-losing opportunities.
Fix: Raised threshold from 0.40 to 0.45. Realized WR rises monotonically past 0.45, so the new floor uses the overconfidence by refusing to act on it.
Risk surface: None added — strictly tightens an existing gate. Reversible by editing the constant.
3. Synthetic per-region calibration curves demoted to shadow
Problem: Per-region Beta curves (commit 3fa5824 / 2525507) fit on
synthetic backtest replay (n=270K) had CV Brier 0.1881 but live 475-pred
Brier is 0.196. The synthetic distribution doesn't match live.
Fix: Renamed calibration:weather:region_central and
calibration:weather:region_northwest keys in system_settings to
shadow_calibration:weather:<name>_synthetic_demoted_2026_05_13. The
load_calibration_curve() lookup chain now falls through to
calibration:weather (which is fit on LIVE emos_v1 + emos_v2_skill
predictions, not synthetic).
Reversible: Rename keys back to restore.
Cache caveat: running uvicorn has _CURVE_CACHE populated with
old curves. A launchctl kickstart -k gui/501/com.polyedge.v2 on next
deploy will clear it. No emergency restart required — the threshold
raise alone protects.
Deferred (P0 #3 from the audit)
Scanner-side climatology shrinkage layer
Recommendation from weather-researcher agent: Wire a
calibrated_prob = 0.75 * raw + 0.25 * climatology_prob layer into the
scanner using GHCN historical climatology as the prior (NOT Kalshi
market price, which creates circularity per
feedback_terminal_price_proxy_never).
Why deferred: Real surgery — touches 4+ model_prob emission sites
in polyedge/modules/weather/scanner.py (lines 1840, 2522, 2587, 2952),
each feeding downstream gates (edge, MIN_MODEL_P, rejection logic). The
helper exists (compute_climatology() in backtest_bucket.py,
_gauss_bucket_prob() in both backtest_bucket.py and bucket_model.py)
but wiring it in correctly without breaking the live scanner needs a
proper PR with unit tests.
Scope estimate: ~1-2 hours of focused work + test coverage. Best landed in a dedicated session, not bolted on at session end.
Why the deferral is safe: The threshold raise (0.40 → 0.45) plus the synthetic-curve demotion together accomplish most of what shrinkage would. The scanner now refuses to act on the overconfident bin AND falls through to the live-fit Beta curve. Whatever residual overconfidence remains is bounded by the 0.45 floor.
Not a bug (mistakenly suspected)
MLB F5 scanner — pre-production by design
Snapshotter is running normally (23,319 rows captured 2026-05-07
through 2026-05-13 09:17, includes today). There is no production
F5 scanner because the forward-audit is scheduled 2026-05-21. After
the audit, if the F5 ridge model holds up, a scanner gets wired then.
Until then, F5 markets won't appear in opportunities and that's
intentional.
Validation steps after deploy
lsof -ti :7842→ kill → restart uvicorn on canonical port. This clears_CURVE_CACHEso the demoted curves stop being served.- Watch the next weather scan: number of opps should drop (more filtered by MIN_MODEL_P 0.45).
- After 24h, query:
SELECT COUNT(*) FILTER (WHERE actual_outcome IS NOT NULL), AVG(market_price), AVG(predicted_prob) FROM predictions WHERE module='weather' AND predicted_at > now() - interval '24h';— avg pred should drop toward 0.45 floor; if it stays >0.50, that signals the live-fit Beta curve is doing real work. - After 7 days, re-run
scripts/fit_weather_calibration_from_live.pyon the post-fix data. If holdout Brier now improves (it should — the 0.45 floor removes the worst-calibrated bin), the deferred climatology shrinkage may not be needed.
Files changed
polyedge/workers/resolve_worker.py(+88 lines:_resolve_cbb_dk_score+ dispatch)polyedge/workers/weather_early_open_worker.py(+9 lines: threshold raise + comment)scripts/fit_weather_calibration_from_live.py(new, 159 lines)docs/blog_edges/findings/2026-05-13-pipeline-fixes.md(this file)system_settingstable: 2 keys renamed to shadow_*