create account

Exploratory Data Analysis on HIVE by macrodrigues

View this thread on: hive.blogpeakd.comecency.com
· @macrodrigues ·
$19.98
Exploratory Data Analysis on HIVE
It was already time to play with **Beem**, a python library designed to allow developers to easily access HIVE's network without dealing with all the related blockchain technology.

![Presentation1.png](https://files.peakd.com/file/peakd-hive/macrodrigues/23uRLNvq1PkSEuYTa7yt1f6ApPgVaCDUkutZND45peVBWW9qbuAxcJwaQxv4997HU389N.png)

The following shows some **Exploratory Data Analysis** done using **Jupyter Notebooks**.

# Gathering Data 

Let's start by the necessary imports:

```
from beem import Hive
from beem.discussions import Query, Discussions
from beem.comment import Comment
import pandas as pd
```

Get queries and discussions:

```
q = Query(limit=10, tag="")
d = Discussions()
```
Get the generator containing the posts:
```
# post list for selected query
iter = 10000000
posts = d.get_discussions('hot', q, limit=iter)
```
Above you can see that I used a big iteration number, to get a decent Data Frame to work with.

Then I built a dictionary containing only the posts where the **Pending Payout Value** was bigger than 20 HIVE:

```
amount = 20
data = [
    {
        'ID': post['post_id'],
        'Author': post['author'],
        'Created': post['created'],
        'Title': post['title'],
        'Category': post['category'],
        'Reputation': post['author_reputation'],
        'Votes': len(post['active_votes']),
        'Body Length': post['body_length'],
        'Tags': post['tags'],
        'Metadata': post['json_metadata'],
        'Pending payout value': post['pending_payout_value'].amount,
        'Community': post['community']} \
    for post in posts if post['pending_payout_value'].amount > amount ] 
```
Then I converted the data into a dictionary, and saved as a .csv.

```
df = pd.DataFrame.from_dict(data)
df.to_csv('hive_data.csv')
```
Gathering the data took some time, because of so many iterations.

# Analysing the Data

More imports:

```
import pandas as pd
import seaborn as sns
import plotly as px
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
```
Convert the .csv back into a Data Frame and sort by **Pending Payout Value**.
```
df = pd.read_csv('hive_data.csv')
df.drop([df.columns[0]], axis = 1, inplace = True)
df = df.sort_values(by=df.columns[-2], ascending=False)
```

### Categories that generate the most value per post

The following results, don't show the categories that generate the most value, only the ones where the posts raised more than 20 HIVE. In addition I'm working with a "small" Dataset, taking into account the history of the blockchain. 

I grouped by **Category** and did a sum of the cells. Then I took the 20 best performing categories.

```
df_cat = df.groupby(by=['Category'])\
	.sum().sort_values(by=df.columns[-2], ascending=False)
df_cat_best = df_cat.head(20)
```

A bar plot displaying the results:


![image.png](https://files.peakd.com/file/peakd-hive/macrodrigues/23xpWCjeuBJMjUWRWy7DwBJ2bcMKBa8ukzCrYfAL1NV8M84HJ4gLfufu9yzXix8xHbn57.png)

Any surprises?

**Leo Finance** and **Pinmapple**, clearly stand out compared to the other categories.

### Correlations

Do the features correlate? Number of votes with Pending Payout Value? Reputation?

```
import ast

df_corr = df[['Reputation', 'Votes', 'Body Length', 'Tags', 'Pending payout value']]
df_corr['Tags'] = df_corr['Tags'].apply(lambda x: ast.literal_eval(x))
df_corr['Number of Tags'] = df_corr['Tags'].apply(lambda x: len(x))
```

Above, I used the library "ast" to use the function `ast.literal_eval()`. Because when saving a list in a .csv cell, it is saved as a string instead. This function allows me to convert it back into a list. I also created another feature called "Length of Tags".

Plotting a heatmap:

```
sns.heatmap(df_corr.corr(), cmap="YlGnBu");
```


![image.png](https://files.peakd.com/file/peakd-hive/macrodrigues/23ynojgz9669hbaFH7Qox7k7ZueA2jBWt2PPdMFgm4kpf3HygBo8DCJNnLWjWnQS53Crc.png)

From the heatmap above we can see that there aren't any strong correlation between features. The number of votes it's slightly correlated with the payout value. But reputation, number of tags and body length do not seem to affect the amount of rewards.


## What about the tags?

Taking into account the column having the list of tags, it was interesting to see which ones are contributing to the most value. Obviously this is not a proper analysis, because the tags should be in equal value along the sample, to know exactly which ones have the strongest influence, which is not the case.

```
df_temp_tags = df_corr.explode('Tags')\
	.drop(['Length Tags', 'Reputation', 'Body Length'], axis=1)
df_temp_tags['Tag Counts'] = df_temp_tags['Tags'].apply(lambda x: 1)
df_tags = df_temp_tags.groupby('Tags')\
    .sum().sort_values(by='Pending payout value', ascending=False)
df_tags
```

Above, I "exploded" the list of tags inside the lists and then I counted them.

With the new features, I did a new heatmap:

```
sns.heatmap(df_tags.corr(), cmap="YlGnBu");
```

![image.png](https://files.peakd.com/file/peakd-hive/macrodrigues/245Hz4eS6hoE13RxzL17XLEcNZfZNDeCx24PX81w9H8BeVuDQqPE5CK4HxU6Hue7h9JPM.png)

As expected, the "Tag Counts" is correlated with the Pending Payout Value, once again this doesn't mean that the tags with the most counts are the most effective, it can mean that most of the posts use  the same tag. Let's still see a bar plot of the best performing tags in this analysis:

![image.png](https://files.peakd.com/file/peakd-hive/macrodrigues/23zbkxjQ2kmjcyVGNxXdQMay94XUgRXGPR1DiVtztEoW4swxpRkdmV16gcpHYshj4LS7T.png)

The "neoxian" tag seems to be the best performing between all. Might be using it for this post, along with "proofofbrain" 😁

Expect more content on Exploratory Data Analysis using Beem, I'm not done yet with this library. See you soon!
👍  , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and 162 others
👎  
properties (23)
authormacrodrigues
permlinkexploratory-data-analysis-on-hive
categoryhive-163521
json_metadata{"app":"peakd/2022.07.1","format":"markdown","tags":["neoxian","proofofbrain","hive","stem","python","development"],"users":[],"image":["https://files.peakd.com/file/peakd-hive/macrodrigues/23uRLNvq1PkSEuYTa7yt1f6ApPgVaCDUkutZND45peVBWW9qbuAxcJwaQxv4997HU389N.png","https://files.peakd.com/file/peakd-hive/macrodrigues/23xpWCjeuBJMjUWRWy7DwBJ2bcMKBa8ukzCrYfAL1NV8M84HJ4gLfufu9yzXix8xHbn57.png","https://files.peakd.com/file/peakd-hive/macrodrigues/23ynojgz9669hbaFH7Qox7k7ZueA2jBWt2PPdMFgm4kpf3HygBo8DCJNnLWjWnQS53Crc.png","https://files.peakd.com/file/peakd-hive/macrodrigues/245Hz4eS6hoE13RxzL17XLEcNZfZNDeCx24PX81w9H8BeVuDQqPE5CK4HxU6Hue7h9JPM.png","https://files.peakd.com/file/peakd-hive/macrodrigues/23zbkxjQ2kmjcyVGNxXdQMay94XUgRXGPR1DiVtztEoW4swxpRkdmV16gcpHYshj4LS7T.png"]}
created2022-08-30 00:37:03
last_update2022-08-30 00:37:03
depth0
children5
last_payout2022-09-06 00:37:03
cashout_time1969-12-31 23:59:59
total_payout_value9.766 HBD
curator_payout_value10.216 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length5,775
author_reputation1,290,502,251,141
root_title"Exploratory Data Analysis on HIVE"
beneficiaries
0.
accounthiveonboard
weight100
1.
accountpeakd
weight300
2.
accounttipu
weight100
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id116,154,018
net_rshares29,079,655,006,189
author_curate_reward""
vote details (227)
@cryptothesis ·
Wonderful post! Have a !PIZZA
properties (22)
authorcryptothesis
permlinkre-macrodrigues-2022116t1331827z
categoryhive-163521
json_metadata{"tags":["neoxian","proofofbrain","hive","stem","python","development"],"app":"ecency/3.0.28-vision","format":"markdown+html"}
created2022-11-06 05:03:18
last_update2022-11-06 05:03:18
depth1
children0
last_payout2022-11-13 05:03:18
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length29
author_reputation125,080,746,437,045
root_title"Exploratory Data Analysis on HIVE"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id118,101,965
net_rshares0
@hivebuzz ·
Congratulations @macrodrigues! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s):

<table><tr><td><img src="https://images.hive.blog/60x70/http://hivebuzz.me/@macrodrigues/upvoted.png?202208311041"></td><td>You received more than 1500 upvotes.<br>Your next target is to reach 1750 upvotes.</td></tr>
</table>

<sub>_You can view your badges on [your board](https://hivebuzz.me/@macrodrigues) and compare yourself to others in the [Ranking](https://hivebuzz.me/ranking)_</sub>
<sub>_If you no longer want to receive notifications, reply to this comment with the word_ `STOP`</sub>



**Check out the last post from @hivebuzz:**
<table><tr><td><a href="/hive-122221/@hivebuzz/pud-202209"><img src="https://images.hive.blog/64x128/https://i.imgur.com/805FIIt.jpg"></a></td><td><a href="/hive-122221/@hivebuzz/pud-202209">Hive Power Up Day - September 1st 2022</a></td></tr></table>

###### Support the HiveBuzz project. [Vote](https://hivesigner.com/sign/update_proposal_votes?proposal_ids=%5B%22199%22%5D&approve=true) for [our proposal](https://peakd.com/me/proposals/199)!
properties (22)
authorhivebuzz
permlinknotify-macrodrigues-20220831t105512
categoryhive-163521
json_metadata{"image":["http://hivebuzz.me/notify.t6.png"]}
created2022-08-31 10:55:15
last_update2022-08-31 10:55:15
depth1
children0
last_payout2022-09-07 10:55:15
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length1,131
author_reputation369,598,413,474,979
root_title"Exploratory Data Analysis on HIVE"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id116,193,301
net_rshares0
@pizzabot ·
<center>🍕 PIZZA !


I gifted <strong>$PIZZA</strong> slices here:
@cryptothesis<sub>(3/15)</sub> tipped @macrodrigues (x1)


<sub>Please <a href="https://vote.hive.uno/@pizza.witness">vote for pizza.witness</a>!</sub></center>
properties (22)
authorpizzabot
permlinkre-exploratory-data-analysis-on-hive-20221106t050346z
categoryhive-163521
json_metadata"{"app": "beem/0.24.19"}"
created2022-11-06 05:03:45
last_update2022-11-06 05:03:45
depth1
children0
last_payout2022-11-13 05:03:45
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length226
author_reputation7,836,448,326,862
root_title"Exploratory Data Analysis on HIVE"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id118,101,968
net_rshares0
@poshtoken ·
$0.23
https://twitter.com/HiveTrending/status/1564625098748809217
<sub> The rewards earned on this comment will go directly to the people( @hivetrending ) sharing the post on Twitter as long as they are registered with @poshtoken. Sign up at https://hiveposh.com.</sub>
👍  
properties (23)
authorposhtoken
permlinkre-macrodrigues-exploratory-data-analysis-on-hive49663
categoryhive-163521
json_metadata"{"app":"Poshtoken 0.0.1","payoutToUser":["hivetrending"]}"
created2022-08-30 14:45:06
last_update2022-08-30 14:45:06
depth1
children0
last_payout2022-09-06 14:45:06
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.226 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length264
author_reputation5,990,171,961,071,041
root_title"Exploratory Data Analysis on HIVE"
beneficiaries
0.
accountreward.app
weight10,000
max_accepted_payout1,000,000.000 HBD
percent_hbd0
post_id116,169,973
net_rshares638,795,304,895
author_curate_reward""
vote details (1)
@stemsocial ·
re-macrodrigues-exploratory-data-analysis-on-hive-20220830t035852100z
<div class='text-justify'> <div class='pull-left'>
 <img src='https://stem.openhive.network/images/stemsocialsupport7.png'> </div>

Thanks for your contribution to the <a href='/trending/hive-196387'>STEMsocial community</a>. Feel free to join us on <a href='https://discord.gg/9c7pKVD'>discord</a> to get to know the rest of us!

Please consider delegating to the @stemsocial account (85% of the curation rewards are returned).

You may also include @stemsocial as a beneficiary of the rewards of this post to get a stronger support.&nbsp;<br />&nbsp;<br />
</div>
👍  
properties (23)
authorstemsocial
permlinkre-macrodrigues-exploratory-data-analysis-on-hive-20220830t035852100z
categoryhive-163521
json_metadata{"app":"STEMsocial"}
created2022-08-30 03:58:51
last_update2022-08-30 03:58:51
depth1
children0
last_payout2022-09-06 03:58:51
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length565
author_reputation22,935,093,304,060
root_title"Exploratory Data Analysis on HIVE"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id116,158,030
net_rshares17,557,577,317
author_curate_reward""
vote details (1)