xGaura Methodology

Section 1 — Overview

1. What the Model Is and Why We Built It

xGaura's prediction engine is a Poisson distribution model calibrated on expected goals (xG) data, adjusted for team strength using Elo ratings and for venue using a home advantage coefficient. The model produces a full scoreline probability matrix for each fixture, from which 1X2 probabilities, over/under rates, BTTS frequencies, and most likely exact scores are derived.

We built a mathematical model rather than a subjective one because football prediction is fundamentally a probability problem. No single method eliminates uncertainty — football is too inherently random for that. What a well-calibrated model does is identify situations where the bookmaker's implied probability diverges meaningfully from the evidence-based probability. That divergence is the edge. Capturing it consistently over a large sample of bets is what produces a positive long-term return.

The model is recalibrated weekly using rolling 38-match windows for each team. It is transparent — everything on this page — and its performance is tracked publicly on the tipster leaderboard.

Section 2 — Inputs

2. Expected Goals (xG) as the Primary Input

The model uses expected goals (xG) as its primary input rather than actual goals scored and conceded. This is the most important methodological decision in the model and it is worth explaining clearly.

Actual goals contain significant random variance. A shot that strikes the post, a goalkeeper making an exceptional one-handed save, a deflection that wrong-foots a defender — these are low-probability events that determine whether a shot results in a goal, but they carry no information about the quality of the chance itself. Over a small sample of matches, actual goals can diverge substantially from the underlying quality of play.

xG removes this variance by measuring shot quality rather than shot outcomes. Each shot is assigned a probability between 0 and 1 based on:

Location: distance and angle from goal. Assist type: cross, through ball, set piece, rebound. Body part: foot (dominant or weak) or head. Shot type: open play, counter-attack, penalty, free kick. Game state: whether the team was ahead, level or behind at the time of the shot.

A shot from the six-yard box centrally off a through ball might carry an xG of 0.75 — meaning it would be expected to result in a goal 75% of the time from that position under those conditions. Summing all shots in a match gives the team's total xG for that game. Over a 38-match rolling window, a team's average xG per game is a significantly more stable and predictive measure of their true attacking quality than goals scored.

The practical implication: a team currently scoring 2.2 goals per game from 1.4 xG is almost certainly overperforming through finishing luck and will regress. A team scoring 1.0 goals from 2.1 xG is underperforming and is likely to improve. The model captures this by using xG averages rather than goal averages as inputs, which systematically improves prediction accuracy, particularly early in a season when sample sizes are small.

Section 3 — The Poisson Formula

3. The Poisson Distribution Formula

The Poisson distribution models the probability of a given number of events occurring in a fixed interval, given a known average rate of occurrence. Applied to football, it models the number of goals a team will score in a 90-minute match given their expected scoring rate.

The Poisson probability mass function is:

          P(k goals) = (λk × e−λ) / k!
        

Where λ (lambda) is the expected number of goals for that team in that match (derived from their xG average as adjusted by the model), k is the specific number of goals we are calculating the probability for (0, 1, 2, 3…), and e is Euler's number (≈ 2.71828).

We calculate this for each team independently for k = 0 through 8 (covering >99.9% of all real match outcomes). For each home score h and away score a, the joint probability of that exact scoreline is:

          P(h – a) = Phome(h) × Paway(a)
        

This produces an 8×8 scoreline probability matrix. The 64 cells sum to approximately 1.0 (with rounding). Each cell represents the probability of a specific exact scoreline.

Section 4 — Elo Adjustment

4. Elo Ratings and Strength Adjustment

Raw xG averages do not fully account for the quality of opponents faced. A team averaging 2.2 xG per game against weak defences will face a very different challenge against a top-four side. The Elo system corrects for this.

Elo is a rating system originally developed for chess, widely adapted for team sports. Each team carries a rating that is updated after every match. If the home team wins, their Elo increases and the away team's decreases, with the magnitude of the change determined by the result relative to expectation — beating a much stronger side earns more Elo than beating a weak one. The system converges over time to an accurate measure of team strength.

xGaura uses the Elo differential between the two teams to scale the raw xG averages. Specifically:

          λhome = xGhome avg × (1 + α × ΔElo / 400)

          λaway = xGaway avg × (1 − α × ΔElo / 400)

Where ΔElo is the home team's Elo minus the away team's Elo, and α is the model's Elo sensitivity coefficient (calibrated at 0.18 in v2.4). A home team with a 200-point Elo advantage over the away side will have their expected goals increased by approximately 9%, and the away team's decreased by 9%. This ensures the model correctly penalises mismatched fixtures without relying solely on recent form.

Section 5 — Home Advantage

5. Home Advantage Coefficient

Home advantage is one of the most consistently documented phenomena in football. Across Europe's top five leagues over the past decade, home sides win approximately 46% of matches, draw 26%, and lose 28%. This asymmetry has several proposed causes — crowd noise affecting referee decisions, travel fatigue for away sides, familiarity with the pitch surface — but whatever the mechanism, the empirical effect is clear and stable.

xGaura applies a league-specific home advantage multiplier to the home team's λ. These multipliers are calibrated per competition:

League	Home Win %	Home xG Multiplier	Away xG Multiplier
Premier League	45.2%	1.08	0.94
La Liga	47.1%	1.10	0.92
Serie A	44.8%	1.07	0.95
Bundesliga	43.6%	1.06	0.96
Ligue 1	46.3%	1.09	0.93

The home advantage multiplier is applied after the Elo adjustment. It is re-estimated at the start of each season as new data accumulates, and updated mid-season if the calibration period produces a sufficient sample. Neutral venue matches (e.g. UCL finals) are flagged and the home advantage coefficient is set to 1.0.

Section 6 — Scoreline Matrix

6. Generating the Scoreline Probability Matrix

With λ values for home and away teams calculated after Elo and home advantage adjustments, the model computes the Poisson probability for each team scoring 0 through 8 goals. The resulting 9×9 matrix (81 cells) covers every scoreline from 0–0 to 8–8. In practice, scores above 5 goals for either team have a combined probability of less than 0.2% and are included for completeness rather than practical relevance.

The four most probable scorelines for each fixture are displayed on the Mathematical Predictions page. The highest-probability cell is highlighted in blue (the "hot" cell), the second and third most likely in a lighter highlight (the "warm" cells). These are descriptive outputs of the model — they are not tips. The probability of any individual exact scoreline is inherently low. For value bets derived from the scoreline matrix, see the Correct Scores page.

Section 7 — Derived Markets

7. Deriving 1X2, Over/Under, BTTS and HT Probabilities

From the scoreline matrix, all market probabilities are derived by summing the appropriate cells:

1X2: P(Home Win) = sum of all cells where h > a. P(Draw) = sum of all cells where h = a. P(Away Win) = sum of all cells where a > h.

Over/Under 2.5: P(Over 2.5) = sum of all cells where h + a ≥ 3. P(Under 2.5) = sum of all cells where h + a ≤ 2. The same logic applies for 1.5, 3.5 and 4.5 lines by adjusting the threshold.

BTTS Yes: P(BTTS Yes) = 1 − P(home scores 0) − P(away scores 0) + P(both score 0). Equivalently: the sum of all cells where h ≥ 1 and a ≥ 1.

Double Chance: P(1X) = P(Home Win) + P(Draw). P(X2) = P(Away Win) + P(Draw). P(12) = P(Home Win) + P(Away Win) = 1 − P(Draw).

HT/FT: Derived from the two-phase model described in Section 9 below.

Section 8 — Value Identification

8. Identifying Value Bets

A tip is published when the model's probability for an outcome meaningfully exceeds the bookmaker's implied probability. The edge is calculated as:

          Edge % = (Model Probability − Implied Probability) × 100
        

Where Implied Probability = 1 / decimal odds. A bookmaker offering odds of 1.85 on an outcome is implying a probability of 1 / 1.85 = 54.1%. If the model calculates a 62% probability for the same outcome, the edge is (62% − 54.1%) × 100 = 7.9%.

Tips are published where the edge exceeds our minimum threshold (currently 4% for standard markets, 2% for high-confidence Double Chance selections). Tips below this threshold exist in the model output but are not surfaced on the site, as the edge is too small to be statistically meaningful over a realistic betting sample.

The expected value of any bet is calculated as:

          EV = (Model Prob × Net Profit) − ((1 − Model Prob) × Stake)
        

A positive EV bet is one where the mathematical expectation over a large sample is profitable. The ROI calculator allows you to compute EV for any bet using your own model probability and the available odds.

Section 9 — Half Time Sub-Model

9. The Half Time Sub-Model

Half Time predictions use a dedicated first-half sub-model rather than a scaled-down version of the full-match model. This distinction matters. Scaling a 90-minute model to 45 minutes would produce distorted probabilities because the first half of a football match has a distinct statistical fingerprint.

First-half scoring rates are lower than second-half rates across all five leagues. Teams are more defensive early, managers rarely make tactical changes before the break, and pressing intensity tends to peak in the opening 15 minutes before declining. Applying a full-match model at half the rate would underestimate the probability of a 0–0 half-time score and overestimate the probability of goals.

The HT sub-model ingests each team's first-half xG average over the rolling 38-match window — not their overall match xG — and applies the same Poisson formula with a separate home advantage coefficient calibrated specifically on first-half data. The output is a HT scoreline matrix from which HT 1X2 probabilities and HT/FT double result probabilities are derived.

For HT/FT markets, the model uses a two-phase sequential calculation: the first phase generates the half-time state, and the second phase generates the full-time state conditioned on the half-time state (i.e. if the team leads at HT, the second-half model is run from a state of 1-goal advantage rather than a neutral start).

Section 10 — Accuracy & Limitations

10. Model Accuracy, Limitations and Version History

Verified Accuracy by Market

The following accuracy figures are based on all published tips since January 2023, verified against actual match results:

Market	Tips Published	Win Rate	ROI	Avg Odds
1X2 Match Result	847	74%	+11.2%	1.88
BTTS Yes / No	812	76%	+13.4%	1.82
Double Chance	798	82%	+8.6%	1.44
Over / Under 2.5	1,204	77%	+14.8%	1.74
Half Time Result	641	71%	+9.1%	2.08
HT / FT Double Result	488	58%	+12.4%	3.62
Correct Scores	412	18%	+16.8%	7.40

Known Limitations

Team news lag: The model does not automatically ingest confirmed team selections or injury news. A key striker's late withdrawal, a surprise tactical shift, or a goalkeeper change can materially affect a match without the model knowing. Tips are reviewed manually before publication but there is always a window where unpublished team news could affect accuracy. Always check team lineups before placing any bet.

Small sample variance: At the start of each season, rolling averages include a proportion of last-season data. The model handles this by weighting recent matches more heavily, but predictions in August and September carry wider uncertainty than those in February and March.

Rare events: Red cards, own goals and penalties introduce variance that the Poisson model does not account for. These events are genuinely unpredictable and no statistical model can fully price them in.

Bookmaker margin: Implied probabilities used in edge calculations are taken from the best available odds across major European bookmakers. The model compares against the best price, not the market average. Always use an odds comparison service to verify you are getting the best available price before placing.

Version History

v2.4 (current): Elo sensitivity coefficient recalibrated to 0.18 (from 0.22 in v2.3). Half-time sub-model home advantage coefficients updated with 2023–24 season data. Over/Under minimum edge threshold reduced from 5% to 4% following back-testing showing sufficient sample at lower thresholds. v2.3: First-half xG sub-model introduced, replacing the scaled full-match model for HT predictions. Significant improvement in HT accuracy (+8.4 percentage points). v2.2: xG replaced actual goals as primary model input. Largest single accuracy improvement in the model's history (+6.1pp overall). v2.1: League-specific home advantage coefficients introduced, replacing a single global coefficient. v2.0: Elo rating system integrated. v1.x: Original Poisson model using raw goals averages without Elo or home advantage adjustment (2020–2022).

Model Version 2.4 — Contents