Why xG Is Overrated for Predicting Upsets

Expected goals revolutionised football analysis. It also produces some of the worst upset predictions in betting. Here's why xG falls down in the matches you actually care about.

OddsIQ Team · April 15, 2026 · 8 min read

methodologyxgmodel-limitationscontrarian

Expected goals — xG — is the most important football statistic of the last 15 years. It quantified what every analyst already knew (some shots are better than others, some defences concede higher-quality chances than others) into a number you can put on a chart.

It's a real advance. We use xG-derived inputs in our model. Every credible analytics outfit does.

But xG has a specific failure mode, and it shows up exactly where the most casual football fans interact with statistical analysis: predicting upsets.

If you've ever seen a "Burnley dominated the xG and lost 0-3" hot take, you've seen the failure. This post is about what's actually happening when xG and result diverge, why it happens more often than people expect, and what it means for using xG-based predictions.

What xG is

xG estimates the probability that a given shot becomes a goal, based on factors like distance, angle, body part, defensive pressure, and assist type. Sum across a match's shots and you get the expected goals each team should have scored, on average, given the chances they created.

A team that creates 1.8 xG and concedes 0.6 xG should win comfortably most of the time. They had better and more chances.

That's the average. The variance is huge.

Where the variance comes from

A 0.3 xG shot has a 30% probability of becoming a goal — averaged across all shots ever taken with similar characteristics. In any specific instance, the shot either goes in or doesn't. There is no "30% goal." There's a binary outcome.

Across 90 minutes, a team takes maybe 12-15 shots. Each is a probabilistic event. The realised goals can vary widely from the expected total.

A simple example: a team with 10 shots, each at 0.15 xG, has 1.5 expected goals. The probability they score:

Zero goals: 19.7%
One goal: 34.7%
Two goals: 27.6%
Three goals: 13.0%
Four or more: 5.0%

Most likely outcome is one goal. But scoring zero is nearly 1 in 5. Scoring three or more is also nearly 1 in 5. The xG number (1.5) hides this distribution.

Now combine two teams with their own distributions. The probability of any specific scoreline starts to look much fuzzier than the xG totals suggest.

The upset specifically

Consider Burnley vs Manchester City. xG totals at full time: Burnley 0.4, Man City 2.1.

City should win comfortably most of the time. Their xG advantage is real. But "most of the time" isn't always.

Run that match 100 times in your head:

City wins by 2+ goals: maybe 50 times
City wins by 1: maybe 25 times
Draw: maybe 15 times
Burnley wins: maybe 10 times

That last category isn't impossible. It happens about 10% of the time. Burnley scores their one chance, City misses everything, and the table reads 1-0 to Burnley.

When this happens, the post-match analysis writes itself: "Burnley won despite being dominated on xG. Their tactical organisation and clinical finishing overcame possession dominance."

This is the wrong frame. Burnley didn't "overcome" anything. They got lucky in a specific match. The 10% outcome happened. xG was a perfectly fine prediction; the realised outcome just landed in the tail of the distribution.

Why xG-only predictions miss this

Models that use xG as their primary input have a specific failure mode for upset matches.

They underestimate variance. Quoting "City has 2.1 xG vs Burnley's 0.4" sounds like a confident prediction of a City win. But the underlying probability of any single goal materialising is much fuzzier than the xG totals suggest. A 2-1 underdog upset isn't a failure of xG — it's a normal outcome that xG itself implies.

They underweight defensive concentration. Some teams are deliberately defensive. Burnley under Sean Dyche didn't try to dominate possession or chances; they tried to stay compact, deny the opposition good chances, and counter once or twice with whatever they had. Their xG totals look terrible because that's the system. But their results were better than xG predicted, consistently, across many seasons.

A pure xG-based prediction model treats Burnley's 0.4 xG game as if Burnley were just lucky to keep things to 0.4. In reality, Burnley engineered 0.4 xG. The opposition didn't get good chances because Burnley's defensive structure prevented them. Predicting their next match using their offensive xG without weighting their defensive xG appropriately gets you wrong answers.

They miss situational factors. A team's xG average over a season tells you about their typical performance. A specific match might be unusual. A team playing for a draw against superior opposition will produce different patterns than the same team chasing a win. Pure xG models don't capture this without explicit context features.

What we use instead

Our model uses xG-derived inputs (specifically, attack and defence quality estimates that lean on xG-style chance quality), but it builds them into a Dixon-Coles Poisson framework that explicitly models the variance.

This matters. When we say "City 75% to win, 15% to draw, 10% Burnley win," we're not collapsing those percentages into a winner prediction. We're saying this is the probability distribution. The 10% Burnley win isn't a flaw — it's an honest part of the forecast.

If Burnley wins, our model wasn't "wrong." Our model said 10%. That's a reasonable probability that gets realised about 1 time in 10. If we say 10% on a thousand similar matches and the underdog wins 100 times, our model is calibrated.

If we never said 10% on those matches and instead said 2%, then we'd be wrong.

The "xG dominance" fallacy

A specific bad-faith argument shows up in football discourse: "Team X dominated the xG but lost. Therefore xG is broken / clutch matters / there's something the data can't see."

The first two are wrong; the third has a kernel of truth.

xG isn't broken. A team with 1.8 xG in a match is more likely to win than a team with 0.6 xG, on average across many matches. The single match outcome doesn't refute this — it's a draw from the distribution.

Clutch is mostly noise. "Clinical finishers" sometimes exist for a season or two and then regress to the mean. The narratives outlast the underlying truth. Some shots in some moments do matter more (penalties, shootouts, late equalisers in tight games), but most "clutch" patterns disappear in larger samples.

Some things xG can't see, sometimes. A team that's deliberately playing for a draw, parking the bus, soaking up pressure to score on a counter — that team's xG profile underrepresents what they're trying to do. They'd rather have 0.4 xG with a goal than 1.4 xG with no goal. Pure xG misses this strategy.

This is a real limitation of xG, but it's not "xG is broken." It's "xG is a measure of chance quality; it doesn't capture intent or strategic context."

How to think about xG correctly

Three rules of thumb for using xG without being misled.

Treat xG totals as expected averages, not predictions. "City has 2.1 xG and Burnley has 0.4 xG" doesn't predict the score; it describes what the chances created suggest should happen on average. The actual match has variance.

Look at xG over many matches, not single matches. A team's underlying xG performance over 10-15 matches is much more meaningful than any single match. If a team consistently outperforms or underperforms their xG over a long period, that's signal. If it's just a few matches, it's likely noise.

Don't use xG alone to predict upsets. When the xG suggests a one-sided match, the upside scenario is real but harder to see in the numbers. Combining xG with other inputs — defensive style, recent form, tactical context — gets you closer to a useful prediction.

Where xG genuinely shines

To be clear: xG remains the best widely-available football statistic for many things.

Evaluating players (a striker scoring above their xG over a sustained period is genuinely getting better chances or finishing more clinically)
Diagnosing team performance (a team underperforming xG over a season has an attacking issue worth investigating)
Finding undervalued matches in betting markets (xG-based models can find xG-rich teams whose results haven't yet caught up)

Where xG falls down is in single-match upset prediction. That's a narrow but visible failure mode that's worth understanding.

Where to look next

How our Premier League model works — what we use beyond xG
How to read a calibration chart — checking whether predictions match reality
Methodology page — full technical breakdown

xG is a great statistic. It's also one that gets overinterpreted constantly. The fix isn't to abandon it; it's to remember that probabilities are distributions, single matches are draws, and upsets you call "xG dominance failures" are usually just the model behaving exactly as expected.

OddsIQ provides AI analysis, not financial or betting advice. Past performance does not guarantee future results. Gamble responsibly: BeGambleAware, GamCare, GamStop.

Why We Don't Model Injuries (And Why You Should Be Suspicious of Models That Do)

Every betting model that claims to factor in injuries is either using bad data or pretending. Here's what we tried, why we stopped, and what we use instead.

8 min · April 15, 2026

methodologytransparency

How to Read a Calibration Chart (And Why It's the Honest Way to Judge a Forecaster)

Calibration charts tell you whether a forecaster's predictions actually mean what they say. Here's how to read one, what good and bad look like, and why almost no betting tipster will show you theirs.

8 min · April 26, 2026

methodologycalibration

Why Penalty Shootouts Are Almost Unpredictable (And What Our Model Does About It)

Germany's 83% historical shootout win rate. England's 14%. Real pattern or small sample noise? Here's what 60 years of World Cup penalties actually tell you.

8 min · April 26, 2026

world-cuppenalty-shootouts

What xG is

Where the variance comes from

The upset specifically

Why xG-only predictions miss this

What we use instead

The "xG dominance" fallacy

How to think about xG correctly

Where xG genuinely shines

Where to look next

Related reads

Why We Don't Model Injuries (And Why You Should Be Suspicious of Models That Do)

How to Read a Calibration Chart (And Why It's the Honest Way to Judge a Forecaster)

Why Penalty Shootouts Are Almost Unpredictable (And What Our Model Does About It)

What xG is

Where the variance comes from

The upset specifically

Why xG-only predictions miss this

What we use instead

The "xG dominance" fallacy

How to think about xG correctly

Where xG genuinely shines

Where to look next