Nathan Grigg

World Cup Scenario Planning

Earlier this week, I put together a quick Jupyter notebook to figure out who plays which third-place team in the Round of 32.

I parsed the Wikipedia table into a Pandas DataFrame with 20 columns:

For example, this is the row of the table assuming that the third place finishers from groups A, C, H, and K fail to advance (with True/False shortened to T/F for brevity):

A B C D E F G H I J K L vA vB vD vE vG vI vK vL
F T F T T T T F T T F T 3E 3G 3B 3D 3J 3F 3L 3I

Now I have been whittling down the options with the following Python code, where df is the name of the DataFrame.

from itertools import combinations
from collections import Counter
import pandas as pd

IN = 'FBEID'
OUT = 'H'
ORDER = 'CAG' 
ix = True
for x in IN:
    ix &= df[x]
for x in OUT:
    ix &= ~df[x]
for x, y in combinations(ORDER, 2):
    ix &= (~df[x] | df[y])

print(f'Options remaining: {sum(ix)}')
counters = []
index = ['vA', 'vB', 'vD', 'vE', 'vG', 'vI', 'vK', 'vL']
for x in index:
    counters.append(Counter(df[ix][x]))
dff = pd.DataFrame(counters, index=index).fillna(0).astype(int)
print(dff.replace(0, ""))

The ix variable is a Pandas index selector. For each character of IN, the corresponding column must be True. For each character of OUT, the corresponding column must be False. And finally, for each of the undetermined teams in ORDER, which are sorted by their current ranking, it cannot be that the lower-ranked team gets in while the higher ranked team does not. In Boolean logic terms, this is “x implies y”, which is the same as “(not x) or y.”

Then sum(ix) will get you the number of selected columns. Finally, I arranged everything into a new table that has as its rows the vX labels, as its columns, the 3X labels, and as its values, the number of scenarios left for which that matchup is possible:

Options remaining: 8

   3E 3C 3J 3G 3B 3D 3I 3A 3F 3L 3K
vA  7  1                           
vB        1  7                     
vD              8                  
vE                 8               
vG        2           2  4         
vI                          8      
vK  1                 3        4   
vL        1           3           4

You can see from this that one of the most disruptive scenarios left would be for 3C to play against 1A, but there is only one scenario left in which Scotland makes it through at all, which is that every single match today goes their way. That would mean Croatia losing by 3 or more, Congo failing to win, and some kind of lopsided win between Austria and Algeria.