create account

Soccer Predictions using Python (part 2) by stevencurrie

View this thread on: hive.blogpeakd.comecency.com
· @stevencurrie ·
Soccer Predictions using Python (part 2)
<html>
<p>In my previous <a href="https://steemit.com/python/@stevencurrie/soccer-predictions-using-python-part-1">article</a> we scraped some results data to a .CSV file, now we can see if we can make some predictions.</p>
<p>First, we'll add a couple of new imports. &nbsp;Most importantly, the Numpy library which will provide our Poisson distribution.</p>
<p>I've left out the <em>scrapeseason() </em>function here to keep the post a bit shorter.&nbsp;</p>
<blockquote><strong>import </strong>pandas <strong>as </strong>pd<br>
<strong>from </strong>bs4 <strong>import </strong>BeautifulSoup <strong>as </strong>bs<br>
<strong>from </strong>selenium <strong>import </strong>webdriver<br>
<strong>import </strong>datetime<br>
<strong>from </strong>os <strong>import </strong>path<br>
<strong>import </strong>numpy <strong>as </strong>np<br>
<br>
<strong>def </strong>scrapeseason(country, comp, season):<br>
 &nbsp;&nbsp;&nbsp;...</blockquote>
<blockquote><br>
<strong>def </strong>poissonpredict(df, gamedate):<br>
 &nbsp;&nbsp;&nbsp;<em># set the amount of simulations to run on each game</em><br>
<em> &nbsp;&nbsp;&nbsp;</em>simulatedgames = 100000<br>
<br>
 &nbsp;&nbsp;&nbsp;<em># only use games before the date we want to predict</em><br>
<em> &nbsp;&nbsp;&nbsp;</em>historical = df.loc[df[<strong>"date"</strong>] &lt; str(gamedate)]<br>
<br>
 &nbsp;&nbsp;&nbsp;<em># make sure we only use games that have valid scores</em><br>
<em> &nbsp;&nbsp;&nbsp;</em>historical = historical.loc[df[<strong>"homeScore"</strong>] &gt; -1]<br>
<br>
 &nbsp;&nbsp;&nbsp;<em># games to predict</em><br>
<em> &nbsp;&nbsp;&nbsp;</em>topredict = df.loc[df[<strong>"date"</strong>] == str(gamedate)]<br>
<br>
 &nbsp;&nbsp;&nbsp;<em># get average home and away scores for entire competition</em><br>
<em> &nbsp;&nbsp;&nbsp;</em>homeAvg = historical[<strong>"homeScore"</strong>].mean()<br>
 &nbsp;&nbsp;&nbsp;awayAvg = historical[<strong>"awayScore"</strong>].mean()<br>
<br>
 &nbsp;&nbsp;&nbsp;<em># loop through the games we want to predict</em><br>
<em> &nbsp;&nbsp;&nbsp;</em><strong>for </strong>i <strong>in </strong>topredict.index:<br>
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;ht = topredict.ix[i, <strong>"homeTeam"</strong>]<br>
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;at = topredict.ix[i, <strong>"awayTeam"</strong>]<br>
<br>
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<em># get average goals scored and conceded for home team</em><br>
<em> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</em>homeTeamHomeAvgFor = historical.loc[df[<strong>"homeTeam"</strong>] == ht, <strong>"homeScore"</strong>].mean()<br>
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;homeTeamHomeAvgAgainst = historical.loc[df[<strong>"homeTeam"</strong>] == ht, <strong>"awayScore"</strong>].mean()<br>
<br>
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<em># divide averages for team by averages for competition to get attack and defence strengths</em><br>
<em> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</em>homeTeamAttackStrength = homeTeamHomeAvgFor/homeAvg<br>
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;homeTeamDefenceStrength = homeTeamHomeAvgAgainst/awayAvg<br>
<br>
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<em># repeat for away team</em><br>
<em> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</em>awayTeamAwayAvgFor = historical.loc[df[<strong>"awayTeam"</strong>] == at, <strong>"awayScore"</strong>].mean()<br>
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;awayTeamAwayAvgAgainst = historical.loc[df[<strong>"awayTeam"</strong>] == at, <strong>"homeScore"</strong>].mean()<br>
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;awayTeamAttackStrength = awayTeamAwayAvgFor/awayAvg<br>
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;awayTeamDefenceStrength = awayTeamAwayAvgAgainst/homeAvg<br>
<br>
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<em># calculated expected goals using attackstrength * defencestrength * average</em><br>
<em> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</em>homeTeamExpectedGoals = homeTeamAttackStrength * awayTeamDefenceStrength * homeAvg<br>
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;awayTeamExpectedGoals = awayTeamAttackStrength * homeTeamDefenceStrength * awayAvg<br>
<br>
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<em># use numpy's poisson distribution to simulate 100000 games between the two teams</em><br>
<em> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</em>homeTeamPoisson = np.random.poisson(homeTeamExpectedGoals, simulatedgames)<br>
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;awayTeamPoisson = np.random.poisson(awayTeamExpectedGoals, simulatedgames)<br>
<br>
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<em># we can now infer some predictions from our simulated games</em><br>
<em> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# using numpy to count the results and converting to percentage probability</em><br>
<em> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</em>homeTeamWins = np.sum(homeTeamPoisson &gt; awayTeamPoisson) / simulatedgames * 100<br>
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;draws = np.sum(homeTeamPoisson == awayTeamPoisson) / simulatedgames * 100<br>
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;awayTeamWins = np.sum(homeTeamPoisson &lt; awayTeamPoisson) / simulatedgames * 100<br>
<br>
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<em># store our prediction into the dataframe</em><br>
<em> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</em>df.ix[i, <strong>"homeWinProbability"</strong>] = homeTeamWins<br>
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;df.ix[i, <strong>"draws"</strong>] = draws<br>
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;df.ix[i, <strong>"awayTeamWins"</strong>] = awayTeamWins<br>
<br>
 &nbsp;&nbsp;&nbsp;<strong>return </strong>df<br>
<br>
<br>
<strong>if not </strong>path.isfile(<strong>"data.csv"</strong>):<br>
 &nbsp;&nbsp;&nbsp;<em># set which country and competition we want to use</em><br>
<em> &nbsp;&nbsp;&nbsp;# others to try, "Scotland" &amp; "Premiership" or "Europe" &amp; "UEFA Champions League"</em><br>
<em> &nbsp;&nbsp;&nbsp;</em>country = <strong>"England"</strong><br>
<strong> &nbsp;&nbsp;&nbsp;</strong>competition = <strong>"Premier League"</strong><br>
<strong> &nbsp;&nbsp;&nbsp;</strong>lastseason = 2016<br>
 &nbsp;&nbsp;&nbsp;thisseason = 2017<br>
<br>
 &nbsp;&nbsp;&nbsp;lastseasondata = scrapeseason(country, competition, lastseason)<br>
 &nbsp;&nbsp;&nbsp;thisseasondata = scrapeseason(country, competition, thisseason)<br>
<br>
 &nbsp;&nbsp;&nbsp;<em># combine our data to one frame</em><br>
<em> &nbsp;&nbsp;&nbsp;</em>data = pd.concat([lastseasondata, thisseasondata])<br>
 &nbsp;&nbsp;&nbsp;data.reset_index(inplace=<strong>True</strong>, drop=<strong>True</strong>)<br>
<br>
 &nbsp;&nbsp;&nbsp;<em># save to file so we don't need to scrape multiple times</em><br>
<em> &nbsp;&nbsp;&nbsp;</em>data.to_csv(<strong>"data.csv"</strong>)<br>
<strong>else</strong>:<br>
 &nbsp;&nbsp;&nbsp;<em># load our csv</em><br>
<em> &nbsp;&nbsp;&nbsp;</em>data = pd.read_csv(<strong>"data.csv"</strong>, index_col=0, parse_dates=<strong>True</strong>)<br>
<br>
gamedate = datetime.date.today()<br>
data = poissonpredict(data, gamedate)<br>
<br>
data.to_csv(<strong>"data.csv"</strong>)</blockquote>
<p>As you can see, I've added in a check to see if our data already exists and load it rather than scraping it again. &nbsp;Also, I have to reiterate that I'm not an expert programmer, so whilst my code may be inelegant, I think it's pretty straightforward.</p>
<p>I've only calculated probabilities for Home Win, Draw and Away Win here but it should be reasonably easy to add other predictions such as Total Goals, Over/Under, Both To Score etc.</p>
<p>So the predictions for todays games? &nbsp;Here they are.</p>
<ul>
  <li>Crystal Palace v Southampton		Away</li>
  <li>Huddersfield Town v Leicester City	Home</li>
  <li>Liverpool v Burnley			Home</li>
  <li>Newcastle United v Stoke City		Home</li>
  <li>Tottenham Hotspur v Swansea City	Home</li>
  <li>Watford	v Manchester City		Away</li>
  <li>West Bromwich Albion v West Ham 	Home</li>
</ul>
<p>This was a bit rushed as I wanted to get some predictions before todays games started - I'll improve the code and add extra functionality in the next article. &nbsp;Any comments, tips etc are welcome. &nbsp;I'm off to the bookies now. :-)</p>
</html>
👍  ,
properties (23)
authorstevencurrie
permlinksoccer-predictions-using-python-part-2
categoryprogramming
json_metadata{"tags":["programming","python","soccer","predictions"],"links":["https://steemit.com/python/@stevencurrie/soccer-predictions-using-python-part-1"],"app":"steemit/0.1","format":"html"}
created2017-09-16 11:56:09
last_update2017-09-16 11:56:09
depth0
children0
last_payout2017-09-23 11:56:09
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length8,257
author_reputation7,132,661,654
root_title"Soccer Predictions using Python (part 2)"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id15,049,298
net_rshares1,253,957,681
author_curate_reward""
vote details (2)