<html>
<p>In my previous <a href="https://steemit.com/python/@stevencurrie/soccer-predictions-using-python-part-1">article</a> we scraped some results data to a .CSV file, now we can see if we can make some predictions.</p>
<p>First, we'll add a couple of new imports. Most importantly, the Numpy library which will provide our Poisson distribution.</p>
<p>I've left out the <em>scrapeseason() </em>function here to keep the post a bit shorter. </p>
<blockquote><strong>import </strong>pandas <strong>as </strong>pd<br>
<strong>from </strong>bs4 <strong>import </strong>BeautifulSoup <strong>as </strong>bs<br>
<strong>from </strong>selenium <strong>import </strong>webdriver<br>
<strong>import </strong>datetime<br>
<strong>from </strong>os <strong>import </strong>path<br>
<strong>import </strong>numpy <strong>as </strong>np<br>
<br>
<strong>def </strong>scrapeseason(country, comp, season):<br>
...</blockquote>
<blockquote><br>
<strong>def </strong>poissonpredict(df, gamedate):<br>
<em># set the amount of simulations to run on each game</em><br>
<em> </em>simulatedgames = 100000<br>
<br>
<em># only use games before the date we want to predict</em><br>
<em> </em>historical = df.loc[df[<strong>"date"</strong>] < str(gamedate)]<br>
<br>
<em># make sure we only use games that have valid scores</em><br>
<em> </em>historical = historical.loc[df[<strong>"homeScore"</strong>] > -1]<br>
<br>
<em># games to predict</em><br>
<em> </em>topredict = df.loc[df[<strong>"date"</strong>] == str(gamedate)]<br>
<br>
<em># get average home and away scores for entire competition</em><br>
<em> </em>homeAvg = historical[<strong>"homeScore"</strong>].mean()<br>
awayAvg = historical[<strong>"awayScore"</strong>].mean()<br>
<br>
<em># loop through the games we want to predict</em><br>
<em> </em><strong>for </strong>i <strong>in </strong>topredict.index:<br>
ht = topredict.ix[i, <strong>"homeTeam"</strong>]<br>
at = topredict.ix[i, <strong>"awayTeam"</strong>]<br>
<br>
<em># get average goals scored and conceded for home team</em><br>
<em> </em>homeTeamHomeAvgFor = historical.loc[df[<strong>"homeTeam"</strong>] == ht, <strong>"homeScore"</strong>].mean()<br>
homeTeamHomeAvgAgainst = historical.loc[df[<strong>"homeTeam"</strong>] == ht, <strong>"awayScore"</strong>].mean()<br>
<br>
<em># divide averages for team by averages for competition to get attack and defence strengths</em><br>
<em> </em>homeTeamAttackStrength = homeTeamHomeAvgFor/homeAvg<br>
homeTeamDefenceStrength = homeTeamHomeAvgAgainst/awayAvg<br>
<br>
<em># repeat for away team</em><br>
<em> </em>awayTeamAwayAvgFor = historical.loc[df[<strong>"awayTeam"</strong>] == at, <strong>"awayScore"</strong>].mean()<br>
awayTeamAwayAvgAgainst = historical.loc[df[<strong>"awayTeam"</strong>] == at, <strong>"homeScore"</strong>].mean()<br>
awayTeamAttackStrength = awayTeamAwayAvgFor/awayAvg<br>
awayTeamDefenceStrength = awayTeamAwayAvgAgainst/homeAvg<br>
<br>
<em># calculated expected goals using attackstrength * defencestrength * average</em><br>
<em> </em>homeTeamExpectedGoals = homeTeamAttackStrength * awayTeamDefenceStrength * homeAvg<br>
awayTeamExpectedGoals = awayTeamAttackStrength * homeTeamDefenceStrength * awayAvg<br>
<br>
<em># use numpy's poisson distribution to simulate 100000 games between the two teams</em><br>
<em> </em>homeTeamPoisson = np.random.poisson(homeTeamExpectedGoals, simulatedgames)<br>
awayTeamPoisson = np.random.poisson(awayTeamExpectedGoals, simulatedgames)<br>
<br>
<em># we can now infer some predictions from our simulated games</em><br>
<em> # using numpy to count the results and converting to percentage probability</em><br>
<em> </em>homeTeamWins = np.sum(homeTeamPoisson > awayTeamPoisson) / simulatedgames * 100<br>
draws = np.sum(homeTeamPoisson == awayTeamPoisson) / simulatedgames * 100<br>
awayTeamWins = np.sum(homeTeamPoisson < awayTeamPoisson) / simulatedgames * 100<br>
<br>
<em># store our prediction into the dataframe</em><br>
<em> </em>df.ix[i, <strong>"homeWinProbability"</strong>] = homeTeamWins<br>
df.ix[i, <strong>"draws"</strong>] = draws<br>
df.ix[i, <strong>"awayTeamWins"</strong>] = awayTeamWins<br>
<br>
<strong>return </strong>df<br>
<br>
<br>
<strong>if not </strong>path.isfile(<strong>"data.csv"</strong>):<br>
<em># set which country and competition we want to use</em><br>
<em> # others to try, "Scotland" & "Premiership" or "Europe" & "UEFA Champions League"</em><br>
<em> </em>country = <strong>"England"</strong><br>
<strong> </strong>competition = <strong>"Premier League"</strong><br>
<strong> </strong>lastseason = 2016<br>
thisseason = 2017<br>
<br>
lastseasondata = scrapeseason(country, competition, lastseason)<br>
thisseasondata = scrapeseason(country, competition, thisseason)<br>
<br>
<em># combine our data to one frame</em><br>
<em> </em>data = pd.concat([lastseasondata, thisseasondata])<br>
data.reset_index(inplace=<strong>True</strong>, drop=<strong>True</strong>)<br>
<br>
<em># save to file so we don't need to scrape multiple times</em><br>
<em> </em>data.to_csv(<strong>"data.csv"</strong>)<br>
<strong>else</strong>:<br>
<em># load our csv</em><br>
<em> </em>data = pd.read_csv(<strong>"data.csv"</strong>, index_col=0, parse_dates=<strong>True</strong>)<br>
<br>
gamedate = datetime.date.today()<br>
data = poissonpredict(data, gamedate)<br>
<br>
data.to_csv(<strong>"data.csv"</strong>)</blockquote>
<p>As you can see, I've added in a check to see if our data already exists and load it rather than scraping it again. Also, I have to reiterate that I'm not an expert programmer, so whilst my code may be inelegant, I think it's pretty straightforward.</p>
<p>I've only calculated probabilities for Home Win, Draw and Away Win here but it should be reasonably easy to add other predictions such as Total Goals, Over/Under, Both To Score etc.</p>
<p>So the predictions for todays games? Here they are.</p>
<ul>
<li>Crystal Palace v Southampton Away</li>
<li>Huddersfield Town v Leicester City Home</li>
<li>Liverpool v Burnley Home</li>
<li>Newcastle United v Stoke City Home</li>
<li>Tottenham Hotspur v Swansea City Home</li>
<li>Watford v Manchester City Away</li>
<li>West Bromwich Albion v West Ham Home</li>
</ul>
<p>This was a bit rushed as I wanted to get some predictions before todays games started - I'll improve the code and add extra functionality in the next article. Any comments, tips etc are welcome. I'm off to the bookies now. :-)</p>
</html>