Introducing the Field of 64 Projector (and What It Can't Do Yet)
A new page on The Margin predicts which 64 teams make the NCAA baseball tournament and how the committee should seed them. It's data-only, no editor in the loop. It agrees with Baseball America on 45 of 64 teams — and the 19 it doesn't are where this gets useful.
What just shipped
There's a new page on the site: /cbb-bracket. It predicts the 64-team NCAA Division I baseball tournament — which teams make it, who hosts the 16 regional sites, and how the 1/2/3/4 seeds line up in each regional. It updates every three days.
Pick from three sources via the toggle: McConnellMargin (this site's projection), Baseball America, and D1Baseball. All three are full Field-of-64 brackets. Ours is the only one built by a model with no human editor.
In development, not hardened
A few caveats up front:
- This is v1. The pipeline works end-to-end and the backtest numbers are honest, but eight years of historical brackets is a thin training base for any selection-style model. Numbers will move.
- No editors. Baseball America and D1Baseball are written by people who watch the games, talk to coaches, and apply the feel layer the committee actually uses. Our model has none of that. It reads a resume vector — RPI, quadrant records, road record, last-15, conference record, strength of schedule — and predicts. That's it.
- The gap to the editorial baseline. Bracketologists publicly hit ~85-90% of the actual field. Our leave-one-year-out backtest sits at 76% — about ten points behind. The missing piece isn't the math; it's the soft knowledge.
We're publishing it anyway because pure-data output is genuinely rare in college baseball. If our model and Baseball America agree, that's chalk. If they don't, one of them is wrong — and you can read the resume yourself and decide which.
How the prediction gets made
Three stages, each backtested against the actual brackets from 2016-2024 (skipping the 2020 COVID year):
Field selection. A gradient-boosted classifier scores every D1 team's resume on Selection Day. Top 64 by probability = the projected field. Top feature by a wide margin is RPI; everything else is secondary. Leave-one-year-out hit rate across eight historical seasons: 76% (std 3.5). Best year 2018 at 81%, worst 2021 at 69%.
National seeding. A second model ranks the projected field for the top-16 national seeds. Backtest overlap with the actual top-16 seed line: 73%.
Regional placement. A constraint solver assigns 2/3/4 seeds to each regional with two rules: same-conference teams go in different regionals, and 3/4 seeds get placed by geography. When fed the actual 2024 field, the solver identified 16 of 16 regional hosts correctly. Where it weakens is the 3-and-4 seed groupings — those depend on geographic preferences, attendance projections, and TV-window decisions that aren't in any public dataset. Baseball America and D1Baseball only match each other on the full 4-team regional group about 5% of the time, so this isn't a model failure as much as it's the noisy part of the committee's job.
Today's projection vs Baseball America vs D1Baseball
For the May 13 read:
- We agree with Baseball America on 45 of 64 teams and 11 of 16 national seeds.
- We agree with D1Baseball on 46 of 64 teams and 11 of 16 national seeds.
- Baseball America and D1Baseball agree with each other on 55 of 64 teams and 14 of 16 hosts — they share editorial conventions our model doesn't.
The bigger disagreements are the bubble teams. We have Nebraska, Coastal Carolina, Kansas, and USC as top-16 hosts. Baseball America has Florida, Florida State, West Virginia, and Oregon there instead. Both lists are defensible from the resume; what tips the scales editorially is record vs. ranked opponents, conference reputation, and momentum into late May. Our model doesn't weight those the way a human does.
What it doesn't do
Two limits to name:
- No tournament simulation. This artifact predicts the field, not who wins. The existing CWS odds page handles Omaha. A regional-round forward simulator that bridges the two — and applies injury and rotation adjustments at the game level — is on the roadmap, not in this ship.
- No editorial layer. When a coach quits, a team gets hot, a star pitcher goes down — the model doesn't know. The resume vector freezes the data at the snapshot date and rolls forward.
What's coming
The next selection-relevant moment is the actual bracket release on Selection Monday, May 25. Between now and then, expect two more refreshes (Friday, Monday) as conference tournament results come in. After Selection Day, the same page pivots to displaying the real bracket and we move into round-by-round simulation.
Until then: read the resumes yourself. The model's just doing the same.