2026 FIFA World Cup

Prediction model — design and methodology

Premise and approach

I wanted to do a clean statistical build-out of a real-world situation during this Utah chapter, but my original plan to try and predict snowfall in LCC proved to be much more complex than expected. As such, given the upcoming World Cup and my personal interests in soccer, I decided to think through and build out a model for the tournament, assisted, as usual, by the beast Claude Code.

The work contained here has strong contribution from Claude - I would not have been able to get anywhere near this far without its help. At the same time, I've contributed key insights and advancements where Claude would have otherwise fell into a loop, lost track of the overall objective, or simply given up from data limitation excuses. It goes to show what the current state of Human-AI collaboration looks like.

Philosophy

Sports analytics is a rich field with some very smart people involved, and I don't necessarily see this beating out commercial models or highly-specialized pundits. What I do see this as is a clean bottom-up statistical framework for thinking about a team-sport tournament, and I hope that the statistical methodology framed below does the Yale Department of Statistics justice.

In our model, we're not just picking a "winner" for each match-up - instead, we're modeling the underlying expected number of goals each team will score, then use that to feed a prediction for each matchup. We'll build this model off the (https://www.ajbuckeconbikesail.net/wkpapers/Airports/MVPoisson/soccer_betting.pdf)[Dixon-Coles model], a well-established model for soccer that modifies an underlying Poisson distribution model for goals scored. The work below adds a couple additional pieces to simply fitting parameters of the Dixon-Coles model to historical international matchups, summarized in the schematic below.

team ifitted parameters
αi— attack strength (scalar)
βi— defensive weakness (scalar)
ui— attack style (ℝ⁴ vector)
vi— defence style (ℝ⁴ vector)
team jfitted parameters
αj— attack strength (scalar)
βj— defensive weakness (scalar)
uj— attack style (ℝ⁴ vector)
vj— defence style (ℝ⁴ vector)
neutral venue · γ = 1 · bilinear style interaction
expected goalsdeterministic
log μij= log αi + log βj + ui·vj
log νij= log αj + log βi + uj·vi
ui·vj is the style residual — zero when styles are neutral, positive/negative otherwise
independent Poisson assumption
goals scoredrandom variables
X~ Pois(μij)
Y~ Pois(νij)
Dixon-Coles τ(x,y; ρ) correction on low-scoring cells
joint distributioncorrected
P̃(x, y) = τ(x, y; ρ) · Pois(x; μ) · Pois(y; ν)
ρ ≤ 0 inflates 0-0 probability; re-estimated jointly in stage 2
sum over 10×10 scoreline grid
P(win i)
Σ_{x>y} P̃
P(draw)
Σ_{x=y} P̃
P(win j)
Σ_{x<y} P̃
stage 1 — α, β, ρ by MLE · wk = e−φd · sk · δn
stage 2 — u, v, ρ by MLE + L2 · log μij = log αi + log βj + ui·vj

The Poisson foundation

We begin with all of our modeling by understanding the Poisson distribution, which models the occurrence of rare, independent events (already an approximation, since goals are, by their very nature, very dependent on previous goals). Suspending our disbelief, if we model a specific team ii as expected to score μ\mu goals against team jj, then the probability of actually scoring exactly xx goals is:

P(X=x)=eμμxx!P(X = x) = \frac{e^{-\mu}\,\mu^x}{x!}
0
1
2
3
4
5
6
7
8
9
10
μ = 1.55
mode = 1P(0) = 0.223P(1) = 0.335

The parameter μ\mu is what we are trying to estimate. It is not a fixed constant — it depends on who is playing and their relative strengths. In our naive model, we define μ\mu as a product of three factors:

μij=αiβjγ\mu_{ij} = \alpha_i \cdot \beta_j \cdot \gamma
αi\alpha_iattack strength of the scoring team
βj\beta_jdefensive weakness of the conceding team (higher = leakier)
γ\gammahome advantage multiplier (~1.2 at club level; 1.0 at neutral sites)

The product form models how these factors compound - if αi=1.5\alpha_i = 1.5 (a strong attack, 50% above average) and βj=1.4\beta_j = 1.4 (a very leaky defense, 40% above average), the expected goals rate is 1.5×1.4=2.11.5 \times 1.4 = 2.1 (110% more likely to score), not 1.5+1.41=1.91.5 + 1.4 - 1 = 1.9 (90% more likely to score). Previous work with football data shows that the multiplicative model works better than additive alternatives.

The expected goals for the other team are definitionally symmetric:

μji=αjβi\mu_{ji} = \alpha_j \cdot \beta_i

With these two parameters, we can apply the Poisson model to set up an independent joint bivariate distribution. Again, this is a simplification - obviously, a team down a goal will play more conservatively, changing the math, but we're coming up with a approximate model, not a perfect description.

Under independence, the joint probability of a specific scoreline (x,y)(x, y) is just the product of the two marginal Poisson probabilities:

P(X=x,  Y=y)=eμμxx!eννyy!P(X = x,\; Y = y) = \frac{e^{-\mu}\,\mu^x}{x!} \cdot \frac{e^{-\nu}\,\nu^y}{y!}
Team i
α_i1.50
β_i1.00
μ = α_i · β_j = 1.50 xG
Team j
α_j1.00
β_j1.00
ν = α_j · β_i = 1.00 xG
Team i goals (x) →
012345+
← Team j goals (y)08.2%12.3%9.2%4.6%1.7%0.7%
18.2%12.3%9.2%4.6%1.7%0.7%
24.1%6.2%4.6%2.3%0.9%0.3%
31.4%2.1%1.5%0.8%0.3%0.1%
40.3%0.5%0.4%0.2%0.1%<.1%
5+0.1%0.1%0.1%<.1%<.1%<.1%
i winsdrawj wins
P(i wins) = 48.8%P(draw) = 26.0%P(j wins) = 25.2%

To get outcome probabilities, we sum over all scorelines in the appropriate region. For all practical purposes we can truncate at ten goals per team, which captures over 99.9% of the probability mass for any realistic μ,ν<4\mu, \nu < 4:

P(wini)=x>yP(X=x,Y=y)P(draw)=x=yP(X=x,Y=y)P(winj)=x<yP(X=x,Y=y)\begin{aligned} P(\text{win}_i) &= \sum_{x > y} P(X=x, Y=y) \\[4pt] P(\text{draw}) &= \sum_{x = y} P(X=x, Y=y) \\[4pt] P(\text{win}_j) &= \sum_{x < y} P(X=x, Y=y) \end{aligned}

For World Cup matches played at neutral sites — which describes the entire 2026 tournament — we set γ=1\gamma = 1.


The Dixon-Coles correction

The independent Poisson model has a known failure mode: it systematically underestimates the frequency of 0-0 draws, overestimates 1-1 draws, and slightly misestimates 1-0 and 0-1 results. The reason is that real matches have game states. When a match is 0-0, both teams are often pushing harder to break the deadlock — the game opens up, but also tightens tactically. When one team goes ahead, they may sit back and protect, suppressing further scoring. These dynamics create a mild negative correlation between the two teams' goals precisely at low-scoring outcomes, which the independence assumption cannot capture.

Dixon and Coles addressed this with an elegant correction. Rather than replacing the Poisson model entirely, they multiply the joint probability by a correction factor τ\tau that only modifies the four cells where x+y1x + y \leq 1:

P~(x,y)=τ(x,y,  μ,ν,  ρ)P(x,y)\tilde{P}(x, y) = \tau(x, y,\; \mu, \nu,\; \rho) \cdot P(x, y)
τ(x,y)={1ρμν(x,y)=(0,0)1+ρμ(x,y)=(0,1)1+ρν(x,y)=(1,0)1ρ(x,y)=(1,1)1otherwise\tau(x,y) = \begin{cases} 1 - \rho\mu\nu & (x,y) = (0,0) \\[2pt] 1 + \rho\mu & (x,y) = (0,1) \\[2pt] 1 + \rho\nu & (x,y) = (1,0) \\[2pt] 1 - \rho & (x,y) = (1,1) \\[2pt] 1 & \text{otherwise} \end{cases}

The single parameter ρ0\rho \leq 0 controls the magnitude of the adjustment. With ρ<0\rho < 0, the factor τ(0,0)=1ρμν>1\tau(0,0) = 1 - \rho\mu\nu > 1 inflates the 0-0 probability, and probability mass is redistributed away from the 1-1 cell. The correction is valid — that is, the corrected probabilities still sum to one across all scorelines — by construction.

Empirically, fits on European league data give ρ0.13\rho \approx -0.13. International football, which features fewer high-scoring matches and more conservative defensive tactics, tends to produce slightly more negative values. Our fit on 1,908 WC-team matches yielded ρ0.11\rho \approx -0.11, consistent with this range. We treat ρ\rho as a free parameter and estimate it jointly with the team strength parameters.


Composite data decay

This is one of the main ways that we refine the naive model that simply trains the alpha and beta parameters of the Dixon-Coles setup with historical data. The most naive approach, of course, would simply be to treat each game as contributing the same amount of signal. Obviously, that doesn't make sense, so the slightly less naive approach is to weight each match result's contribution to the model fitting by some decay factor. Instead of just using that, our model integrates three additional signals to weight how much each match informs the training, combining it into a composite factor:

wk=eϕdkcalendar decay    skcompetition quality    δnkmanager epochsw_k = \underbrace{e^{-\phi \, d_k}}_{\text{calendar decay}} \;\cdot\; \underbrace{s_k}_{\text{competition quality}} \;\cdot\; \underbrace{\delta^{\,n_k}}_{\text{manager epochs}}

Each factor captures a distinct mechanism by which a historical result becomes more or less representative of the current team.

Calendar decay

The first factor, eϕdke^{-\phi d_k}, is standard exponential decay over elapsed days. Even with the same manager and squad, teams evolve: tactics develop, individual form rises and falls, set-piece routines change. We set the half-life to 600 days (~20 months), meaning a result from 600 days ago receives half the weight of a result from today. This is calibrated for international football, where squad turnover is slower and tactical systems are more stable than at club level.

Competition quality

The second factor, sks_k, reflects that not all matches are equally diagnostic. A win over a top-10 opponent at a major tournament tells you much more about a team's current ceiling than a friendly against a mid-table qualifier. We weight matches by competition tier:

CompetitionWeight
FIFA World Cup1.0
Continental championships (Euro, Copa América, AFCON, Asian Cup)0.9
Nations Leagues (UEFA, CONCACAF)0.65
World Cup qualifiers0.6
International friendlies0.3

For the major footballing nations with abundant high-quality data, friendly results end up contributing very little total weight. For smaller nations whose few World Cup appearances are separated by decades, qualifier and friendly matches remain important training signal.

Manager epoch discount

The third factor, δnk\delta^{n_k}, penalizes results from previous managerial regimes. Here nkn_k is the number of manager changes that have occurred on either team between match kk and today, and δ(0,1)\delta \in (0, 1) is a per-transition discount.

A new manager represents a partial reset of tactical identity — the same eleven players will press differently, organize differently, and execute set pieces differently. Results from a prior regime are still informative because individual player quality persists, but they are weaker guides to what this team will do today. We use δ=0.85\delta = 0.85, meaning each manager change reduces a match's effective weight by ~15%. This is intentionally moderate: the calendar and competition-quality components already do most of the down-weighting work, so the epoch factor handles the residual discontinuity that smooth decay misses without over-penalizing teams with recent transitions.

As an example: a 2-1 win under a manager who left three seasons ago, after two subsequent regime changes, receives 0.852=0.720.85^2 = 0.72 of the weight it would have had under the current manager — on top of whatever calendar and competition discounts already apply.

Player continuity (designed, pending data)

The full specification includes a fourth factor, ckαc_k^{\,\alpha}, measuring the importance-weighted overlap between the squad that played in match kk and the current national squad:

ck=i    squadkimp(i)1[active now]i    squadkimp(i)c_k = \frac{\displaystyle\sum_{i \;\in\; \text{squad}_k} \text{imp}(i) \cdot \mathbf{1}[\text{active now}]}{\displaystyle\sum_{i \;\in\; \text{squad}_k} \text{imp}(i)}
imp(i)\text{imp}(i)importance score for player i: caps × position weight
1[active now]\mathbf{1}[\text{active now}]1 if player i is still in the current squad, 0 otherwise

A match where nine of eleven starters are still active has ck0.9c_k \approx 0.9; a result from six years ago featuring players who have since retired might have ck0.2c_k \approx 0.2. This factor is designed and its weighting is defined, but not yet incorporated into the current fit — it is pending historical squad roster data collection. The current model runs on calendar decay, competition quality, and manager epochs only.


Model architecture

The estimation pipeline runs in three sequential stages, each a residual layer on top of the prior.

Stage 1 fits one scalar attack and one scalar defence parameter per team from historical match outcomes alone — the classic Dixon-Coles MLE. It captures how many goals each team typically scores and concedes across all opponents.

Stage 2 adds a bilinear style interaction between every team pair, also trained from match outcomes. It captures matchup-specific edges: whether the way team ii attacks creates a systematic advantage against the way team jj defends, beyond what raw scalar strength predicts.

Stage 3 introduces a player attribute correction derived from current squad ratings. It captures a complementary signal: given who is actually in the squad today, does the current roster composition suggest an adjustment to what historical match data alone predicts? Stages 1 and 2 are retrospective — they answer how a team has performed. Stage 3 is prospective — it answers how strong this particular group of players looks right now.

Each stage is fit with prior-stage parameters frozen, learning only the residual that earlier stages cannot explain. Stages 1 and 2 are fully estimated and active. Stage 3 is designed and implemented in parallel — its integration into the live simulation is pending confirmed 23-man squad announcements ahead of the tournament.

inputHistorical Match Data
World Cupcompetition weight × 1.0
Continental championshipscompetition weight × 0.9
World Cup qualifierscompetition weight × 0.6
International friendliescompetition weight × 0.3
weightingComposite Time Decaykey innovation
e^(−φd)Calendar driftolder results receive less weight as time passes
δ^nManager epochsdiscount applied per regime change since match
c^αPlayer continuityimportance-weighted overlap of squad with today's
stage 1DC Base — MLE
αᵢattack strength, one scalar per team
βᵢdefensive weakness, one scalar per team
ρlow-score correlation (Dixon-Coles)
→ frozenparameters fixed before stage 2
stage 2Style Layer — Bilinear ResidualResNet correction
uᵢ ∈ ℝ⁴Attack stylelatent vector per team, trained from match outcomes
vⱼ ∈ ℝ⁴Defence stylelatent vector per team, trained from match outcomes
uᵢ · vⱼInteractionsigned scalar residual on log expected goals
predictionPer-Match Scoreline Distribution
log μᵢⱼ = log αᵢ + log βⱼ + uᵢ·vⱼbilinear expected goals
Dixon-Coles P̃(x, y)corrected joint scoreline distribution
P(win / draw / loss)summed over the 10×10 scoreline grid
simulationMonte Carlo Tournament
N = 100,000independent full-bracket draws
Group stagesample scorelines → points → tiebreakers
Knockoutsextra time draw + penalty coin-flip if level
OutputP(champion / finalist / upset) per team

Stage 1 — DC base layer

The first stage fits one scalar attack parameter αi\alpha_i and one scalar defence parameter βi\beta_i per team. These are estimated from historical match scorelines via maximum likelihood — no shot-level data required. The composite decay weights ensure that stale, low-quality, or pre-regime-change matches contribute minimally to the final values.

The fit runs on 1,908 matches between the 48 WC-qualified teams from January 2010 onward. Fitted globals: ρ^0.11\hat{\rho} \approx -0.11, home advantage γ^1.17\hat{\gamma} \approx 1.17 (neutralized at the 2026 tournament). The scalar parameters are then frozen before Stage 2 begins.


Stage 2 — Bilinear style layer

Scalar αi,βj\alpha_i, \beta_j summarize team quality as single numbers. They cannot capture matchup-specific dynamics: whether France's defensive block is particularly effective against teams that prefer wide diagonal balls, or less effective against high-tempo pressing sides. That information is invisible to a scalar model.

Each team is assigned a KK-dimensional attack style vector ui\mathbf{u}_i and defence style vector vi\mathbf{v}_i. Their dot product adds a signed scalar residual on top of the DC log-expected-goals:

logμij=logαi+logβjStage 1  +  uivjstyle residual\log \mu_{ij} = \underbrace{\log \alpha_i + \log \beta_j}_{\text{Stage 1}} \;+\; \underbrace{\mathbf{u}_i \cdot \mathbf{v}_j}_{\text{style residual}}
uiRK\mathbf{u}_i \in \mathbb{R}^Kattack style vector for team i
vjRK\mathbf{v}_j \in \mathbb{R}^Kdefence style vector for team j
K=4K = 4latent style dimensions

The dot product is positive when team ii's attacking style creates a systematic advantage against team jj's defensive shape, and negative when it does not. With K=4K = 4 dimensions the model can represent up to four independent axes of stylistic variation. In practice, the average matchup adjustment is about 2%; the largest is approximately +57% (Spain attacking France), a pattern visible across multiple recent encounters.

Style vectors are initialized with small Gaussian noise (σ=0.05\sigma = 0.05) to break the zero-gradient saddle at the origin, then estimated by maximizing the DC likelihood with L2 regularization (λ=2.0\lambda = 2.0) on ui,vj\mathbf{u}_i, \mathbf{v}_j. Stage 1 parameters remain frozen throughout. Stage 2 converged in 116 iterations; the refined correlation parameter is ρ^0.14\hat{\rho} \approx -0.14.


Stage 3 — Player attribute correction

Stages 1 and 2 draw on a single data source: historical match outcomes. They answer how a team has performed. Player attribute data answers a different question: how strong is this team right now, given who is actually in the squad?

These two signals are complementary but not redundant. A team that surged in form two years ago but has since lost key players will have inflated DC parameters relative to its current ability. A team that quietly upgraded its squad during a flat qualifying campaign will be underrated. A young nation with sparse international history but a generation of technically developed players currently dominating European club football has almost no training signal in the DC fit — its α,β\alpha, \beta will be conservative by default. Neither the scalar parameters nor the style vectors are designed to catch any of this: their training signal is entirely retrospective.

Stage 3 introduces a correction term γij\gamma_{ij} derived from a separate linear model trained on current player ratings, then centered to remove systematic bias before being applied to the DC output.

Squad embeddings

Player ratings are sourced from the EA FC 26 database, which covers 18,400+ players and includes all 48 WC-qualified nations. For each team, the squad is divided into four positional groups — goalkeeper, defenders, midfielders, forwards — and the mean attribute vector is computed within each group. The full team embedding concatenates these four group means:

ei=[xˉGK,  xˉDEF,  xˉMID,  xˉFWD]    R4M\mathbf{e}_i = \bigl[\,\bar{\mathbf{x}}_{\text{GK}},\;\bar{\mathbf{x}}_{\text{DEF}},\;\bar{\mathbf{x}}_{\text{MID}},\;\bar{\mathbf{x}}_{\text{FWD}}\,\bigr] \;\in\; \mathbb{R}^{4M}

where MM is the number of attributes per group. This preserves positional structure: a team with elite defenders and average midfielders produces a different embedding from one with the reverse imbalance, even if the squad-wide averages are identical. Data-sparse nations (Kenya, Honduras, Jordan) where DC parameters are least reliable are fully covered, since every FIFA-registered player has ratings regardless of league visibility.

The linear model

For each historical match kk, the attribute model independently predicts log goals for team ii as a linear function of both teams' positional embeddings:

y^ij=w[eiatk;  ejdef]+b\hat{y}_{ij} = \mathbf{w}^{\top} \bigl[\mathbf{e}_{i}^{\,\text{atk}};\;\mathbf{e}_{j}^{\,\text{def}}\bigr] + b

where eiatk\mathbf{e}_{i}^{\,\text{atk}} selects team ii's midfield and forward embeddings and ejdef\mathbf{e}_{j}^{\,\text{def}} selects team jj's goalkeeper and defender embeddings. The training target is log(Xk+0.5)\log(X_k + 0.5), where XkX_k is the observed goals scored. The log offset maps zero-goal outcomes to a finite value and keeps the target in the same scale as the DC log-expected-goals outputs. The model is fit by ridge regression using the same composite decay weights as Stage 1 — meaning recent, high-quality matches dominate the fit here too.

Centering and application

The linear model is not trusted for its absolute level: it lacks Poisson structure, has no temporal decay within matches, and is trained using current ratings projected back onto historical matches where squads may have differed substantially. What it is trusted for is its relative signal — which matchups look better or worse given current squad composition, compared to the average.

For each training match kk, compute the residual between the linear prediction and the Stage 1+2 DC output:

δk=y^klogμkDC\delta_k = \hat{y}_k - \log \mu_k^{\scriptscriptstyle\mathrm{DC}}

The mean residual δˉ\bar{\delta} is the systematic offset between the two models — reflecting both scale differences and the fact that current ratings are an imperfect proxy for historical squad quality. Subtracting it from each prediction leaves only the matchup-relative signal:

γij=y^ijδˉ\gamma_{ij} = \hat{y}_{ij} - \bar{\delta}

By construction, γ\gamma has zero mean across training data. Applying it does not inflate or deflate aggregate expected goals — it only shifts individual matchups relative to the DC baseline. The final expected goals for any WC 2026 match are:

logμijfinal=logμijDC+λγij\log \mu_{ij}^{\text{final}} = \log \mu_{ij}^{\scriptscriptstyle\mathrm{DC}} + \lambda \cdot \gamma_{ij}

The shrinkage factor λ(0,1]\lambda \in (0,\,1] governs how much weight to place on the attribute correction relative to historical outcomes. It is calibrated on held-out matches. A smaller λ\lambda defers more to the historical record; a larger one gives more weight to current roster composition. The Dixon-Coles τ\tau correction is applied to the final μijfinal\mu_{ij}^{\text{final}}, not the intermediate DC output, so the low-score cell adjustment remains consistent with the corrected expected goals.


Tournament simulation

With the full scoreline distribution P~(x,y)\tilde{P}(x, y) available for any match between any two teams, we simulate the entire tournament structure by Monte Carlo. Each iteration proceeds as follows:

  1. Group stage. For each of the 104 group-stage matches, sample a scoreline from P~(x,y)\tilde{P}(x, y). Accumulate group points (3 for a win, 1 for a draw, 0 for a loss) and rank teams by points, then goal difference, then goals scored, then head-to-head result.

  2. Knockout rounds. Draws are resolved by extra time (modeled as a second independent Poisson draw at half the regular rate) and if still level, a penalty shootout modeled as a coin flip.

  3. Repeat. Run N=100,000N = 100{,}000 full simulations. The reported probability for any event is simply its frequency across all simulations. One hundred thousand draws is sufficient for stable estimates down to roughly 0.1% probability.

Before the simulation loop, the full 48×4848 \times 48 matrix of expected goals is precomputed once and reused across all draws, keeping per-simulation cost efficient.


Parameter estimation

Stage 1 — DC base. Attack and defence parameters αi,βj\alpha_i, \beta_j and correlation ρ\rho are estimated by maximizing the composite-weighted DC log-likelihood:

L1=kwklogP~(xk,ykαik,βjk,ρ)\mathcal{L}_1 = \sum_k w_k \cdot \log \tilde{P}(x_k, y_k \mid \alpha_{i_k}, \beta_{j_k}, \rho)

Log-link reparameterization (αi=eai\alpha_i = e^{a_i}, βi=ebi\beta_i = e^{b_i}) ensures positivity without box constraints. Scale is fixed by constraining mean({ai})=0\mathrm{mean}(\{a_i\}) = 0, so attack ratings are relative to the 48-team geometric mean.

Stage 2 — Style vectors. With αi,βj\alpha_i, \beta_j frozen, style vectors and a refined ρ\rho are estimated by maximizing the same DC likelihood with a bilinear expected goals term, plus an L2 penalty:

L2=kwklogP~(xk,yku,v,ρ)    λ2 ⁣(UF2+VF2)\mathcal{L}_2 = \sum_k w_k \cdot \log \tilde{P}(x_k, y_k \mid \mathbf{u}, \mathbf{v}, \rho) \;-\; \frac{\lambda}{2}\!\left(\|\mathbf{U}\|_F^2 + \|\mathbf{V}\|_F^2\right)

Analytical gradients for both the Poisson terms and the DC correction cells are passed directly to L-BFGS-B, avoiding finite-difference approximation over the 385-parameter space. Vectors are initialized with small Gaussian noise (σ=0.05\sigma = 0.05) to break the zero-gradient saddle at the origin.

Stage 3 — Attribute correction. The ridge coefficient vector w\mathbf{w} is fit by minimizing composite-weighted squared error between predicted and observed log goals. The ridge penalty is cross-validated on held-out matches. The centering constant δˉ\bar{\delta} and shrinkage factor λ\lambda are computed on the training split.


Limitations

Temporal alignment in Stage 3. The current implementation trains the attribute correction using today's FC 26 ratings projected back onto historical matches, where the actual squads may have differed significantly. The correct approach aligns each match with the FIFA ratings edition closest to its date. Historical editions (FIFA 15–23) are available; this alignment is in progress. The centering step absorbs the systematic component of the mismatch, but matchup-specific noise remains.

Player continuity not yet active. The ckαc_k^{\,\alpha} squad-overlap component of the composite weight is designed but pending historical squad roster data. It would specifically correct cases where a team's historical record is dominated by a generation of players since retired — calendar decay handles this only loosely.

Style vectors are team-level, not player-level. The bilinear layer captures aggregate stylistic matchup effects but cannot represent individual player interactions: whether a particular striker's movement exploits a specific centre-back's positional tendencies, or whether a midfield pair creates unusual press resistance against a specific pressing shape.

No within-match dynamics. The Poisson model assumes a constant goal rate throughout the match. In reality, the rate shifts with game state — a team trailing by one in the 75th minute plays very differently from a team level at half-time. The Dixon-Robinson extension addresses this but requires modelling the full score path rather than just the final result.

Penalty shootouts as coin flips. The model treats shootouts as 50-50. Historical shootout records, specialist penalty takers, and goalkeeper save rates are all partially predictive and worth incorporating as the knockout rounds approach.