In this new blog, I’m going to investigate whether statistics and machine learning can do a better job than bookmakers at predicting the outcome of football matches. Betting isn’t my main aim here, but if I do find an edge I certainly won’t be afraid to test it with some real money!
To begin with I’m going to look at predicting not who wins the match, but how many goals are scored in total. But before I start I need a way of measuring how well I’m doing. What does it mean to ‘predict’ the outcome of a match, and how can you measure how good those predictions are.
Probability and Likelihood
Obviously you don’t know in advance what the outcome of a football match is going to be, so any prediction has to be expressed in terms of probabilities. For example, I might predict that:
- In this Man Utd v Man City game, there’s a 45% probability of 3 or more goals (and a 55% probability of 2 or less)
- In this Tottenham v Fulham game, there’s a 60% probability of 3 or more goals (and a 40% probability of 2 or less)
Having made those predictions, you can measure how well they turned out by calculating the likelihood of the actual outcomes i.e. based on your predictions, how likely was what actually happened? Good predictions will (over the long run) give higher likelihood values than poor predictions.
For example, suppose that the Man Utd v Man City and Tottenham v Fulham games both produced 3 or more goals. Your predictions gave this a probability of 45% * 60% = 27%, so that is the total likelihood.
Of course, the more matches you’re predicting the lower the total likelihood will be. What we really want (for comparison purposes) is the average likelihood per match. A likelihood of 27% over 2 matches is equivalent to 51.96% per match (i.e. 51.96% * 51.96% = 27%), so that’s how good these predictions were. Just slightly better than the naive strategy of estimating 50% for each outcome, which (if there are only two possible outcomes) is guaranteed to give a 50% likelihood per match.