Why Most 2018 World Cup Models Were Wrong (And What We Learned)

Croatia reached the final at 2.4% pre-tournament probability. Russia made the quarters at under 1%. The miss wasn't bad luck — it was structural. Here's what we changed.

OddsIQ Team · April 26, 2026 · 9 min read

world-cupcalibrationmodel-historytransparency

Russia 2018 was one of the most upset-heavy World Cups in modern memory.

Germany, the defending champions, exited in the group stage. Croatia, ranked 20th in the world by FIFA, reached the final. Russia, the lowest-ranked team in the tournament by Elo, made the quarter-finals. The eventual winner France were rated as a top-five team but won a bracket that had collapsed around them.

Almost every major statistical forecast missed badly on multiple teams. We're going to walk through what went wrong, what was structural in the models vs what was unpredictable, and how the 2018 misses shaped what we built for 2026.

We weren't running OddsIQ in 2018 — this is retrospective analysis using public records and our own internal post-mortems on our predecessor model. Where we make claims about specific team probabilities, those are reconstructed from pre-tournament odds and from FiveThirtyEight's published 2018 forecast (their model is publicly documented).

The five biggest model failures

A summary of what most pre-tournament models got wrong in 2018:

1. Germany's group-stage exit. Pre-tournament: ~14-16% to win the World Cup, top-two favourites at most books. Result: out in the group stage with one win, two losses. This was a 2-3% probability event in most models.

2. Croatia's run to the final. Pre-tournament: 2-3% to win, ~10% to reach the final. Result: lost the final to France. Their actual probability of reaching the final was almost certainly underestimated by every public model.

3. Russia's quarter-final run. Pre-tournament: under 1% to reach the quarter-final. Result: beat Spain in the round of 16 (massive upset), narrowly lost to Croatia in the QF.

4. Spain and Argentina underperformance. Both had top-five pre-tournament probabilities. Both went out in the round of 16. Combined, they had something like a 35% chance of reaching the semi-finals between them; neither did.

5. England's run to the semi-finals. Pre-tournament: ~5% probability to reach the semi-finals. Result: reached the semi-finals. Less of a "miss" than the others — England were genuinely a strong team rated reasonably by models — but their bracket path opened up in ways no model predicted.

What was actually predictable

Some of these "misses" were just variance. The 2018 World Cup was a single tournament, and even a perfectly calibrated model would have a ~5% probability of Germany exiting the group stage. If you run that probability over 1000 simulated tournaments, you get 50 of them where Germany goes out early. 2018 was one of those 50.

The same applies to Russia's run, partially. Most models gave Russia low single-digit probability of reaching the quarter-final. But across 32 teams, some low-ranked team is going to overperform every tournament. That's how variance works.

If the only mistake had been bad single-tournament luck, our learning would be: "the model was probably fine, the dice rolled badly." Move on.

But there were structural problems too.

What was NOT just variance

Three patterns of error showed up across the 2018 results that suggested the models had real flaws:

Overconfidence in defending champions

Germany got priced like 2014 Germany. The market and most models assigned them a top-two probability based on their historical pedigree and 2014 victory. The actual 2018 squad was clearly weaker than 2014 — the defensive transition was incomplete, several key 2014 players had retired or declined, and qualifying performances had been mediocre.

The lesson: recency-weighted form needs to dominate reputation. Most models in 2018 were giving too much weight to long-run team strength and not enough to current-state evidence.

Underrating CONMEBOL teams' tournament resilience

Croatia and Russia overperformed; they happen to be UEFA teams. Argentina and Brazil from CONMEBOL underperformed by less than the headlines suggested — Brazil reached the QF (about expected), Argentina reached the R16 (slightly under expected but within range).

But the real CONMEBOL story across multiple recent World Cups is consistent overperformance vs Elo rating. From 1998 to 2022, South American teams have collectively done meaningfully better than their Elo would predict in tournament settings. By 2018, this pattern was clear in the data, but most public models weren't applying systematic confederation adjustments.

The lesson: historical confederation patterns are real signal, even if they're philosophically uncomfortable. Treating all Elo-equivalent teams as equally likely to perform leaves money on the table.

Underweighting bracket variance for favourites

The deepest-running favourites (Germany, Brazil, Spain, Argentina) collectively had something like a 70% probability of producing the eventual winner, by most pre-tournament models. None of them won.

This wasn't just unlucky — it reflected a structural bias in how some models computed tournament probabilities. Multi-stage tournaments compound risk in ways that the headline "win probability" sometimes hides.

A team that's a 60% favourite in each of 5 single-elimination matches has only a 7.8% chance of running the bracket. Many 2018 models were anchoring on overall strength estimates rather than path-by-path risk, which inflated favourite probabilities.

The lesson: Monte Carlo simulation that fully respects bracket variance is materially different from closed-form probability calculations. The simulator naturally shows you what closed-form math sometimes hides.

What we changed

The model we run today incorporates direct lessons from 2018:

Explicit confederation adjustments. South American teams get a small positive nudge in tournament settings based on their historical overperformance. African teams have collectively under-performed Elo and get a small negative nudge. CONCACAF teams get a small home boost when the tournament is in their region. These adjustments are tuned to fit historical data and are openly disclosed in our methodology.

Recency-weighted form — currently being added in our V2 model (shipping shortly). The current model uses long-run Elo without aggressive recency decay; the V2 model explicitly weights the most recent 10-15 internationals more heavily. This addresses the "defending champion premium" issue directly.

Monte Carlo as the primary engine. We run 50,000 daily simulations of the full tournament rather than computing probabilities analytically. This naturally surfaces bracket variance and produces the "favourite caps out at 12-18%" tournament probabilities that closed-form models often inflate.

Path-difficulty visibility. We expose stage-by-stage probabilities for each team (reach R16, reach QF, etc.) so you can see where their probability comes from. A team with a 12% championship probability and an "easy" path looks very different from a team with 12% and a brutal path.

Where we'd still get 2018 wrong

Honesty requires admitting which 2018 mistakes our current model would still have made:

Russia's quarter-final run — our model would still have priced Russia at under 1% pre-tournament. Their actual run was driven by bracket luck and a single huge upset over Spain. Our model handles upsets probabilistically, but a quarter-final by the lowest-ranked team in the tournament would still be classified as a low-probability event.

This is correct. Russia's run wasn't predictable; it was just within the range of low-probability events that happen.

Germany's group exit — our model would have had Germany around 5-7% to exit the group stage in 2018. Given the actual squad strength and form, this is in the right neighbourhood. Once it happened, our model would have been fine; predicting it specifically wasn't possible.

Croatia's final run — our model would have had Croatia at 4-5% to reach the final, vs the 10% the result implies. So we'd have been closer than most 2018 models, but still under the realised value. Whether our number is right or whether Croatia got slightly lucky in their bracket is genuinely unknowable from a single tournament.

What we're publishing in 2026

Two commitments for 2026:

Pre-tournament prediction transparency. Every team's full probability path will be on our World Cup 2026 page before kickoff. After the tournament, we'll publish a retro showing where we were right and where we missed.

Calibration analysis post-tournament. We'll compute the calibration of our predictions across all 104 matches. If our 70% predictions hit at 70%, our 30% predictions hit at 30%, the model was honest. If those bins are systematically off, we'll publish how and why.

This is the part of forecasting that nobody does because it's uncomfortable. We're committing to it now so there's a public record before the results come in.

The lesson that matters most

Across all the model improvements, the single biggest lesson from 2018 is don't be confident in tournament outcomes.

Knockout football is governed by variance. Even excellent models confidently make wrong predictions. The right response isn't to keep refining the model in pursuit of certainty — it's to communicate the uncertainty more clearly.

When our 2026 model says "France 12.4% to win," that's not a prediction that France will win. It's a prediction that France winning is one of the more likely outcomes in a high-variance event. France losing is consistent with the same prediction.

The 2018 World Cup taught the forecasting community a lesson we'd been resistant to: even the most disciplined models will make seemingly bad calls in single tournaments. The fix is calibration over many tournaments, not omniscience over one.

We'll find out how we did in July.

Where to look next

OddsIQ provides AI analysis, not financial or betting advice. Past performance does not guarantee future results. Gamble responsibly: BeGambleAware, GamCare, GamStop.

How to Read a Calibration Chart (And Why It's the Honest Way to Judge a Forecaster)

Calibration charts tell you whether a forecaster's predictions actually mean what they say. Here's how to read one, what good and bad look like, and why almost no betting tipster will show you theirs.

8 min · April 26, 2026

methodologycalibration

What We Got Wrong This Season (Calibration Honesty, Q1 2026)

We've been live for a few months. Here are the matches our model got most badly wrong, what they tell us, and where the real biases are.

7 min · April 26, 2026

calibrationtransparency

Five Teams the Betting Market Overrates for World Cup 2026

Bookmaker odds aren't just probabilities — they're probabilities plus a margin plus the weight of public money. Here are five teams the market currently overrates compared to our model.

9 min · April 26, 2026

world-cupvalue-bets

The five biggest model failures

What was actually predictable

What was NOT just variance

Overconfidence in defending champions

Underrating CONMEBOL teams' tournament resilience

Underweighting bracket variance for favourites

What we changed

Where we'd still get 2018 wrong

What we're publishing in 2026

The lesson that matters most

Where to look next

Related reads

How to Read a Calibration Chart (And Why It's the Honest Way to Judge a Forecaster)

What We Got Wrong This Season (Calibration Honesty, Q1 2026)

Five Teams the Betting Market Overrates for World Cup 2026

The five biggest model failures

What was actually predictable

What was NOT just variance

Overconfidence in defending champions

Underrating CONMEBOL teams' tournament resilience

Underweighting bracket variance for favourites

What we changed

Where we'd still get 2018 wrong

What we're publishing in 2026

The lesson that matters most

Where to look next