create account

Introducing Understat, a Python package for revolutionary football metrics by amosbastian

View this thread on: hive.blogpeakd.comecency.com
· @amosbastian ·
$73.11
Introducing Understat, a Python package for revolutionary football metrics
<center>
![understat.png](https://cdn.steemitimages.com/DQmNe1biaFMUVZ67VzggbsqFnERupsMeekEHQvAMmSTngMG/understat.png)
<br>
<sup>
https://github.com/amosbastian/understat
</sup>
</center>

### What is **Understat**?

It's a Python wrapper for the website [Understat](https://understat.com/), which provides revolutionary football metrics multiple leagues. An example of this is expected goals (xG), which is the main new revolutionary football metric, and allows you to evaluate team and player performance. In a low-scoring game such as football, the final score does not really provide a clear picture of  the teams' performances, and this is why more and more sports analytics turn to the advanced models like xG, which is a statistical measure of the quality of chances created and conceded. Understat's goal was to create the most precise method for shot quality evaluation. They did this by training neural network prediction algorithms with a large dataset (>100,000 shots, over 10 parameters for each), and have now made this data available for the public!

---

This website has come up before in [one of my previous posts](https://steemit.com/fpl/@amosbastian/adding-new-features-to-my-fpl-bot-for-reddit), as some of its data was used in one of the features of my Reddit bot for /r/FantasyPL. Since it only used a part of the data that is available on the website, I decided it would be nice to create a Python package that makes everything available, and can easily be used by others who are also interested in this information.

https://github.com/amosbastian/understat/pull/1

## Getting the data

As I mentioned in the previous post, they unfortunately do not have an API. So instead of using an API, the way the data is retrieved is by scraping their website for `<script>`s, and using a regular expression to match the data we want. Basically most of the data look something like in the picture below.

<center>
![](https://cdn.steemitimages.com/DQmQLRP9ezi9ZGoGC5GEaTz2roxhVHcbNw5aPADC9SbQZYB/image.png)
<sup>The data embedded in their website</sup>
</center>

It's pretty consistent across most of the pages, with the biggest difference being the variable's name. Because of this it was easy to create a couple of utility functions that could be used in most of the functions!

<center>
![](https://cdn.steemitimages.com/DQmU1DgGZESBiePVNZrYDhacWvMsRMjNz2Rk5hHNKMihiU8/image.png)
<sup>A couple of the utility / helper functions</sup>
</center>

## Creating the functions

Once I had a way to actually get the data consistently, it relatively straightforward to implement the functions. I would basically go to each page that has information, open Chrome's developer console, and use the following to log all the `<script>`s:

```
Array.from(document.getElementsByTagName('script')).forEach(script => console.log(script));
```

and then go through them manually to see if they contained some useful information. For each bit of useful information I found I created at least one function in the **Understat** class. On their home page, they have the following chart for example:

<center>
![](https://cdn.steemitimages.com/DQmX7nWtjT5Nd9W3fiYTcqHhSnd6FXTWhebiKwwbKHwvqMv/image.png)
<sup>Average goals per match, split by month</sup>
</center>

In the `<script>` tag of this graph there is a variable called `statData`, and so that is retrieved, matched and parsed by the functions in the first picture, using the `get_stats()` function in the **Understat** class. An example of its usage can be found below.

<center>
![](https://cdn.steemitimages.com/DQmWCxEzxKqSQW1p8uJAREhoqBSuc1SxD88oJL8KrtR9Vat/image.png)
<sup>Usage example</sup>
</center>

This results in the following output (which is basically all information you see in the graph).

<center>
![](https://cdn.steemitimages.com/DQmYxNqWwcoyCR41Ar25monF44ZnrxMWcwP93Cdhh9RZnC4/image.png)
<sup>Example output</sup>
</center>

For some reason not all data on their website is available in the same format, and sometimes it's not really useful. Because of this, sometimes the data had to be cleaned up beforehand as well. For example, in the positional data for a player, for some reason they have the position as the key, and their performance as the value - instead I changed it to return a list of dictionaries with the position simply a key value pair in the dictionaries themselves.

## Adding options

I didn't want to just return the data and let the user go through the trouble of filtering it afterwards. After thinking about it, I thought of a way to dynamically pass options (with the responsibility being left to the user) using either an optional dictionary with specific options, or by passing keyword arguments. For example, if you wanted to get all players playing in the Premier League for Manchester United in 2018, then you could use the following code:

<center>
![](https://cdn.steemitimages.com/DQma68BjUGbm1Hj6QDRBuRXaWnegZBin6ZV7zEv9uBYC6hV/image.png)
</center>

Basically how it works is that the `**kwargs` `team_title="Manchester United"` results in the same dictionary `{"team_title": "Manchester United"}`. The `filter_data()` function then takes the data and returns *all* dictionaries for which this key value pair is true! It's a pretty nice way to let people decide how to filter stuff, without having to check everything. Of course, it can be improved, because sometimes you will need to pass a more complex dictionary to get the information you want, which can be tedious and difficult for the user. For now, it's great imo!

## Testing!

I wanted to make sure the output of all functions is exactly how I want it to be, so I also wrote loads of [tests](https://github.com/amosbastian/understat/tree/master/tests). Also, since the website isn't mine, and it could change at any moment, it's pretty important to know exactly what they changed, and hopefully the tests will help with this. 

### Roadmap

I'll be posting about this package on Reddit and seeing what kind of requests come in, as I think this can be really useful for people who don't even play Fantasy Premier League and are just interested in football in general. I'm hoping that this will mean that people come up with some good suggestions or even decide to contribute. Another thing I will be doing is writing some documentation, as the filtering is left to the user, so it's pretty important to know how and *what* you can actually filter the data by - look forward to a post about this in the future!

### Usage & installation

The recommended way to install understat is via `pip`.

    pip install understat

To install it directly from GitHub you can do the following:

    git clone git://github.com/amosbastian/understat.git

You can also install a [.tar file](https://github.com/requests/requests/tarball/master)
or [.zip file](https://github.com/requests/requests/tarball/master)

    $ curl -OL https://github.com/amosbastian/understat/tarball/master
    $ curl -OL https://github.com/amosbastian/understat/zipball/master # Windows

Once it has been downloaded you can easily install it using `pip`::

    $ cd understat
    $ pip install .


Import `Understat` and call its functions like so:

    import asyncio
    import json

    import aiohttp

    from understat import Understat


    async def main():
        async with aiohttp.ClientSession() as session:
            understat = Understat(session)
            data = await understat.get_players("epl", 2018, {"team_title": "Manchester United"})
            print(json.dumps(data))


    if __name__ == "__main__":
        loop = asyncio.get_event_loop()
        loop.run_until_complete(main())

## Contributing

1. Fork the repository on GitHub.
2. Run the tests with `pytest tests/` to confirm they all pass on your system.
   If the tests fail, then try and find out why this is happening. If you aren't
   able to do this yourself, then don't hesitate to either create an issue on
   GitHub, or send an email to [amosbastian@gmail.com](mailto:amosbastian@gmail.com>).
3. Either create your feature and then write tests for it, or do this the other
   way around.
4. Run all tests again with with `pytest tests/` to confirm that everything
   still passes, including your newly added test(s).
5. Create a pull request for the main repository's `master` branch.

## Documentation

Coming soon!
πŸ‘  , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and 124 others
properties (23)
authoramosbastian
permlinkintroducing-understat-a-python-package-for-revolutionary-football-metrics
categoryunderstat
json_metadata{"community":"steempeak","app":"steempeak/1.8.2b","format":"markdown","tags":["understat","football","utopian-io","development","opensource"],"users":["amosbastian","gmail.com"],"links":["https://github.com/amosbastian/understat","https://understat.com/","https://steemit.com/fpl/@amosbastian/adding-new-features-to-my-fpl-bot-for-reddit","https://github.com/amosbastian/understat/pull/1","https://github.com/amosbastian/understat/tree/master/tests","https://github.com/requests/requests/tarball/master","https://github.com/requests/requests/tarball/master","https://mailto:amosbastian@gmail.com&gt;"],"image":["https://cdn.steemitimages.com/DQmNe1biaFMUVZ67VzggbsqFnERupsMeekEHQvAMmSTngMG/understat.png","https://cdn.steemitimages.com/DQmQLRP9ezi9ZGoGC5GEaTz2roxhVHcbNw5aPADC9SbQZYB/image.png","https://cdn.steemitimages.com/DQmU1DgGZESBiePVNZrYDhacWvMsRMjNz2Rk5hHNKMihiU8/image.png","https://cdn.steemitimages.com/DQmX7nWtjT5Nd9W3fiYTcqHhSnd6FXTWhebiKwwbKHwvqMv/image.png","https://cdn.steemitimages.com/DQmWCxEzxKqSQW1p8uJAREhoqBSuc1SxD88oJL8KrtR9Vat/image.png","https://cdn.steemitimages.com/DQmYxNqWwcoyCR41Ar25monF44ZnrxMWcwP93Cdhh9RZnC4/image.png","https://cdn.steemitimages.com/DQma68BjUGbm1Hj6QDRBuRXaWnegZBin6ZV7zEv9uBYC6hV/image.png"]}
created2019-03-07 15:06:09
last_update2019-03-07 15:06:09
depth0
children6
last_payout2019-03-14 15:06:09
cashout_time1969-12-31 23:59:59
total_payout_value55.216 HBD
curator_payout_value17.895 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length8,341
author_reputation174,473,586,900,705
root_title"Introducing Understat, a Python package for revolutionary football metrics"
beneficiaries
0.
accountutopian.pay
weight500
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id80,875,094
net_rshares101,821,652,518,802
author_curate_reward""
vote details (188)
@emrebeyler ·
$13.71
It's cool to see you use asyncio instead of sync requests. Package look good with all the code/docstrings/unit tests. I have minor comments, some may be opinionated:

> So instead of using an API, the way the data is retrieved is by scraping their website for `<script>`s, and using a regular expression to match the data we want.

Btw, by seeing the name "wrapper", I thought it was an API wrapper. :) Maybe calling it scraper or something like that may be better.

```
def find_match(scripts, pattern):
    """Returns the first match found in the given scripts."""

    for script in scripts:
        match = re.search(pattern, script.string)
        if match:
            break

    return match
```

This may be refactored into sth like that maybe? (This changes the behaviour slightly, though. Current impl. will throw
	an error if it can't find a match, while the alternative one returns None if there are no matches)

```
def find_match(scripts, pattern):
    """Returns the first match found in the given scripts."""

    for script in scripts:
        match = re.search(pattern, script.string)
        if match:
        	return match

```

***

Your contribution has been evaluated according to [Utopian policies and guidelines](https://join.utopian.io/guidelines), as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, [click here](https://review.utopian.io/result/3/1-2-2-1-2-1-1-).

---- 
Need help? Chat with us on [Discord](https://discord.gg/uTyJkNm).

[[utopian-moderator]](https://join.utopian.io/)
πŸ‘  , , , , , , , , , ,
properties (23)
authoremrebeyler
permlinkre-amosbastian-introducing-understat-a-python-package-for-revolutionary-football-metrics-20190308t160231665z
categoryunderstat
json_metadata{"tags":["understat"],"links":["https://join.utopian.io/guidelines","https://review.utopian.io/result/3/1-2-2-1-2-1-1-","https://discord.gg/uTyJkNm","https://join.utopian.io/"],"app":"steemit/0.1"}
created2019-03-08 16:02:33
last_update2019-03-08 16:02:33
depth1
children2
last_payout2019-03-15 16:02:33
cashout_time1969-12-31 23:59:59
total_payout_value10.392 HBD
curator_payout_value3.313 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length1,602
author_reputation448,535,049,068,622
root_title"Introducing Understat, a Python package for revolutionary football metrics"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id80,927,084
net_rshares18,691,668,772,708
author_curate_reward""
vote details (11)
@amosbastian ·
Thanks for the review, Emre! I think you are correct for both cases, so I'll change both for the next version.
πŸ‘  , , , ,
properties (23)
authoramosbastian
permlinkre-emrebeyler-re-amosbastian-introducing-understat-a-python-package-for-revolutionary-football-metrics-20190308t161007571z
categoryunderstat
json_metadata{"tags":["understat"],"app":"steemit/0.1"}
created2019-03-08 16:10:09
last_update2019-03-08 16:10:09
depth2
children0
last_payout2019-03-15 16:10:09
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length110
author_reputation174,473,586,900,705
root_title"Introducing Understat, a Python package for revolutionary football metrics"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id80,927,401
net_rshares7,160,038,340
author_curate_reward""
vote details (5)
@utopian-io ·
Thank you for your review, @emrebeyler! Keep up the good work!
properties (22)
authorutopian-io
permlinkre-re-amosbastian-introducing-understat-a-python-package-for-revolutionary-football-metrics-20190308t160231665z-20190311t015649z
categoryunderstat
json_metadata"{"app": "beem/0.20.17"}"
created2019-03-11 01:56:51
last_update2019-03-11 01:56:51
depth2
children0
last_payout2019-03-18 01:56:51
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length62
author_reputation152,955,367,999,756
root_title"Introducing Understat, a Python package for revolutionary football metrics"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id81,043,142
net_rshares0
@steem-plus ·
SteemPlus upvote
Hi, @amosbastian!

You just got a **0.08%** upvote from SteemPlus!
To get higher upvotes, earn more SteemPlus Points (SPP). On your Steemit wallet, check your SPP balance and click on "How to earn SPP?" to find out all the ways to earn.
If you're not using SteemPlus yet, please check our last posts in [here](https://steemit.com/@steem-plus) to see the many ways in which SteemPlus can improve your Steem experience on Steemit and Busy.
properties (22)
authorsteem-plus
permlinkintroducing-understat-a-python-package-for-revolutionary-football-metrics---vote-steemplus
categoryunderstat
json_metadata{}
created2019-03-08 11:37:36
last_update2019-03-08 11:37:36
depth1
children0
last_payout2019-03-15 11:37:36
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length438
author_reputation247,952,188,232,400
root_title"Introducing Understat, a Python package for revolutionary football metrics"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id80,915,252
net_rshares0
@steem-ua ·
#### Hi @amosbastian!

Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your post is eligible for our upvote, thanks to our collaboration with @utopian-io!
**Feel free to join our [@steem-ua Discord server](https://discord.gg/KpBNYGz)**
properties (22)
authorsteem-ua
permlinkre-introducing-understat-a-python-package-for-revolutionary-football-metrics-20190308t160649z
categoryunderstat
json_metadata"{"app": "beem/0.20.18"}"
created2019-03-08 16:06:51
last_update2019-03-08 16:06:51
depth1
children0
last_payout2019-03-15 16:06:51
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length290
author_reputation23,214,230,978,060
root_title"Introducing Understat, a Python package for revolutionary football metrics"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id80,927,266
net_rshares0
@utopian-io ·
Hey, @amosbastian!

**Thanks for contributing on Utopian**.
We’re already looking forward to your next contribution!

**Get higher incentives and support Utopian.io!**
 Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via [SteemPlus](https://chrome.google.com/webstore/detail/steemplus/mjbkjgcplmaneajhcbegoffkedeankaj?hl=en) or [Steeditor](https://steeditor.app)).

**Want to chat? Join us on Discord https://discord.gg/h52nFrV.**

<a href='https://steemconnect.com/sign/account-witness-vote?witness=utopian-io&approve=1'>Vote for Utopian Witness!</a>
properties (22)
authorutopian-io
permlinkre-introducing-understat-a-python-package-for-revolutionary-football-metrics-20190311t015900z
categoryunderstat
json_metadata"{"app": "beem/0.20.17"}"
created2019-03-11 01:59:03
last_update2019-03-11 01:59:03
depth1
children0
last_payout2019-03-18 01:59:03
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length593
author_reputation152,955,367,999,756
root_title"Introducing Understat, a Python package for revolutionary football metrics"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id81,043,231
net_rshares0