Earlier this week, I put together a quick Jupyter notebook to figure out who plays which third-place team in the Round of 32.
I parsed the Wikipedia table into a Pandas DataFrame with 20 columns:
Twelve columns named 'A' through 'L' with values True or False,
indicating that the third-place team from that group qualified for the Round
of 32.
Eight columns named 'vA', 'vB', etc., for the eight first place teams
that are slated to play a third-place finisher. The value of these columns is
3A, 3B, etc., for the matchup that will result for the set of third place
finishers determined by the values of the first eight columns.
For example, this is the row of the table assuming that the third place finishers from groups A, C, H, and K fail to advance (with True/False shortened to T/F for brevity):
| A | B | C | D | E | F | G | H | I | J | K | L | vA | vB | vD | vE | vG | vI | vK | vL | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| F | T | F | T | T | T | T | F | T | T | F | T | 3E | 3G | 3B | 3D | 3J | 3F | 3L | 3I |
Now I have been whittling down the options with the following Python code,
where df is the name of the DataFrame.
from itertools import combinations
from collections import Counter
import pandas as pd
IN = 'FBEID'
OUT = 'H'
ORDER = 'CAG'
ix = True
for x in IN:
ix &= df[x]
for x in OUT:
ix &= ~df[x]
for x, y in combinations(ORDER, 2):
ix &= (~df[x] | df[y])
print(f'Options remaining: {sum(ix)}')
counters = []
index = ['vA', 'vB', 'vD', 'vE', 'vG', 'vI', 'vK', 'vL']
for x in index:
counters.append(Counter(df[ix][x]))
dff = pd.DataFrame(counters, index=index).fillna(0).astype(int)
print(dff.replace(0, ""))
The ix variable is a Pandas index selector. For each character of IN, the
corresponding column must be True. For each character of OUT, the
corresponding column must be False. And finally, for each of the undetermined
teams in ORDER, which are sorted by their current ranking, it cannot be that the
lower-ranked team gets in while the higher ranked team does not. In Boolean logic
terms, this is “x implies y”, which is the same as “(not x) or y.”
Then sum(ix) will get you the number of selected columns. Finally, I arranged
everything into a new table that has as its rows the vX labels, as its
columns, the 3X labels, and as its values, the number of scenarios left
for which that matchup is possible:
Options remaining: 8
3E 3C 3J 3G 3B 3D 3I 3A 3F 3L 3K
vA 7 1
vB 1 7
vD 8
vE 8
vG 2 2 4
vI 8
vK 1 3 4
vL 1 3 4
You can see from this that one of the most disruptive scenarios left would be for 3C to play against 1A, but there is only one scenario left in which Scotland makes it through at all, which is that every single match today goes their way. That would mean Croatia losing by 3 or more, Congo failing to win, and some kind of lopsided win between Austria and Algeria.