## Motivation This is a sort of follow-up to my [previous post](https://utopian.io/utopian-io/@calebjohn/analysis-of-steem-voting-behavior) analysing voting behavior on the steem blockchain. What I'll demonstrate here is a few methods to seperate posts with questionable voting patterns from posts that appear more organic, as well as digging into the voters on the less organic posts to see if we can find some patterns that help to tell us the extent of bot use in top posts. In this post we'll take a look at the number of posts arriving per second for given posts and the crossover in these voting spikes. ## Background and Tools For background please refer to my [previous post](https://utopian.io/utopian-io/@calebjohn/analysis-of-steem-voting-behavior), particularly the Background, Imports, Data and Helper Function sections. These explain how I got the data and the basic functions that I use to process it. Note: for this analysis I'll use the same data (January 2018). ## Process For clarity I decided not to do any python magic for finding maximum values (this will become more clear right away) so I wrote this simple looping function to find the max element of the first list in a 2xN list and return both values. ```python def max_both(lst): mx = lst[0][0] other_max = lst[1][0] for i,element in enumerate(lst[0]): if element >= mx: mx = element other_max = lst[1][i] return mx, other_max ``` The below code using the `relative_times` function (which takes all votes on a post and returns their time relative to the post creation) from my [previous post](https://utopian.io/utopian-io/@calebjohn/analysis-of-steem-voting-behavior). A histogram is then generated using these values with a bin size of 1 second. This tells us how many times a post was voted for in each 1 second period after it was created. All we really care about here is the maximum number of votes that happened in a 1 second. The idea is that if we see a post where the maximum number of votes per second is un-reasonably high, we know that an army of bots hit the post, or a curator with a reasonably large trail voted for it. We'll also keep a list of when these peaks happened so we can perform some analysis later. ```python # Votes per second vps = [] # The lower end of the time bin time_slot = [] for post in top_posts: relatives = relative_times(post, scale = 1) hist = np.histogram(relatives, bins = int(max(relatives))) vmax, bmax = max_both(hist) vps.append(vmax) time_slot.append(bmax) print("The maximum number of posts per second is: {0}".format(max(vps))) plt.plot(vps) plt.ylabel("Votes per Second") plt.xlabel("Post #") ``` The maximum number of posts per second is: 66  More than 60 upvotes in 1 second? Unlikely without bots or curation trails. While anything above 1 vote per second feels a bit suspicious to me, we'll take 10 to be the cutoff and assume that > 10 upvates per second implies the use of bots. ```python bot_posts = [top_posts[i] for i in range(len(vps)) if vps[i] > 10] # 77 posts ``` Below we have some fairly simple code to compare the total number of votes in a time period (January 2018) to the number of unique voters in the same period. We do this using a set, which in python is guarenteed to not have duplicate entries. We add voters name to this set and after all the voters the length of the set will be the number of unique voters. Note: this isn't the most "pythonic" way to do this, but it's more clear with 2 for loops. ```python votes = 0 unique_voters = set() for post in top_posts: for vote in post['active_votes']: unique_voters.add(vote['voter']) votes += 1 print("Votes: {0}".format(votes)) print("Unique Voters: {0}".format(len(unique_voters))) print("Unique %: {0}".format(100 * len(unique_voters) / votes)) ``` <table> <tr> <td>Votes</td> <td>230,936</td> </tr> <tr> <td>Unique Voters</td> <td>49,271</td> </tr> <tr> <td>Uniqueness</td> <td>21.3%</td> </tr> </table> So for all the top 500 hundred posts in January, we have ~230,000 votes given out by ~50,000 voters. So we saw this is ~21% unique. We'll use this as our baseline for comparison. Next we do the same thing, but only for posts that we've classified as "using bots" (i.e replace `top_posts` with `bot_posts` in the above code). <table> <tr> <td>Votes</td> <td>63,102</td> </tr> <tr> <td>Unique Voters</td> <td>26,217</td> </tr> <tr> <td>Uniqueness</td> <td>41.5%</td> </tr> </table> So over these posts we see ~63,000 votes with ~26,000 unique voters, this gives a uniqunes of ~42%! This means that posts that have a big dump of votes at once tend to have less overlap in voters! For interest sake, lets change the cuttoff for suspicious posts to 20 and 30 votes per second. We do this by creating 2 (poorly named) new variables. Again we'll replace `top_posts` in the above code with `bot_posts2` and `bot_posts3`. ```python bot_posts2 = [top_posts[i] for i in range(len(vps)) if vps[i] > 20] # only 38 posts bot_posts3 = [top_posts[i] for i in range(len(vps)) if vps[i] > 30] # only 8 posts! ``` ##### 20 vote cutoff <table> <tr> <td>Votes</td> <td>37,021</td> </tr> <tr> <td>Unique Voters</td> <td>19,942</td> </tr> <tr> <td>Uniqueness</td> <td>53.9%</td> </tr> </table> ##### 30 vote cutoff <table> <tr> <td>Votes</td> <td>11,248</td> </tr> <tr> <td>Unique Voters</td> <td>8,374</td> </tr> <tr> <td>Uniqueness</td> <td>74.4%</td> </tr> </table> Weird! Our uniqueness continues to increase from ~21% to ~42% to ~54% and finally to ~75%. If some of these top posters are using bot armies, it's starting to look a lot like they've all using independent ones, it also seems likely that those massive spikes are coming from huge curation trails. Let's try looking at the bots that are actually involved in those spikes, maybe there will be some crossover? The following code runs through every post and creates a lists containing all the votes that happened in a specific time bin (the bin with the maximum number of votes). The logic here is that maybe these posts are gaining a large number of unique votes by first bumping their visibility usings bot votes or other methods. If the python here is a little confusing just ignore it and know that we've extracted the voters involved in the spikes outlined above. ```python # Need to cull the time_slots the same way we did the histogram peaks times = [time_slot[i] for i in range(len(vps)) if vps[i] > 10] spike_votes = [] for post, bn in zip(bot_posts, times): spike_votes.append([]) relatives = relative_times(post, scale = 1) for r, voter in zip(sorted(relatives), post['active_votes']): if bn <= r <= bn + 1: # == index: spike_votes[-1].append(voter) ``` Now we can use the same code as above to generate uniqueness for the posts above the first threshold of 10 votes per second. This give us 1,738 votes with 1,401 unique voters giving a uniquness of ~80%. Now we're gonna use `spike_votes` instead of `top_votes` in the code a bit higher up and we get our new numbers. <table> <tr> <td>Votes</td> <td>1,738</td> </tr> <tr> <td>Unique Voters</td> <td>1,401</td> </tr> <tr> <td>Uniqueness</td> <td>80.6%</td> </tr> </table> ## Conclusion I'm not sure what we can conclude about this. I started this analysis expecting to see a large amount of crossover between the top posts (or at least the ones with massive spikes) but instead it appears that these are the most unique of any portion of the votes. In hindsight this actually makes sense, if these posts are using bots to boost their vote counts (not total rewards) then they would need an army of smaller bots, which can probably only vote ~10 times per day. Of course a less pessimistic (and probably more accurate) interpretation of these results would suggest that these spikes are probably the result of curation trails. If you have any other interpretations of this data, or just any thoughts, please leave a comment below. ## Addendum And just as a sort of sanity check I did the uniqueness check for all posts that had < 10 votes per second maximum. As you can see this is fairly in line with what you might expect, it's not quite as low as the ~21% uniquness of including all posts, but it's fairly close to it. This will look fairly similar to the way we generate `bot_posts` but will instead only take posts with `< 10` votes per second max. ```python not_bot_posts = [top_posts[i] for i in range(len(vps)) if vps[i] < 10] ``` <table> <tr> <td>Votes</td> <td>163,473</td> </tr> <tr> <td>Unique Voters</td> <td>40,310</td> </tr> <tr> <td>Uniqueness</td> <td>24.7%</td> </tr> </table> <br /><hr/><em>Posted on <a href="https://utopian.io/utopian-io/@calebjohn/analysis-of-voting-spikes-on-the-steem-blockchain">Utopian.io - Rewarding Open Source Contributors</a></em><hr/>
author | calebjohn | ||||||
---|---|---|---|---|---|---|---|
permlink | analysis-of-voting-spikes-on-the-steem-blockchain | ||||||
category | utopian-io | ||||||
json_metadata | "{"community":"utopian","app":"utopian/1.0.0","format":"markdown","repository":{"id":54517947,"name":"steem","full_name":"steemit/steem","html_url":"https://github.com/steemit/steem","fork":false,"owner":{"login":"steemit"}},"pullRequests":[],"platform":"github","type":"analysis","tags":["utopian-io","utopian-io","steem","analysis"],"users":["calebjohn"],"links":["https://utopian.io/utopian-io/@calebjohn/analysis-of-steem-voting-behavior","https://cdn.utopian.io/posts/4de3d55a1e33768135276c863534671d7935article2_9_2.png"],"image":["https://cdn.utopian.io/posts/4de3d55a1e33768135276c863534671d7935article2_9_2.png"],"moderator":{"account":"eastmael","time":"2018-04-06T07:16:49.482Z","reviewed":true,"pending":false,"flagged":false},"questions":[{"question":"Is the project description formal?","answers":[{"value":"Yes itβs straight to the point","selected":true,"score":10},{"value":"Need more description ","selected":false,"score":5},{"value":"Not too descriptive","selected":false,"score":0}],"selected":0},{"question":"Is the language / grammar correct?","answers":[{"value":"Yes","selected":false,"score":20},{"value":"A few mistakes","selected":true,"score":10},{"value":"It's pretty bad","selected":false,"score":0}],"selected":1},{"question":"Was the template followed?","answers":[{"value":"Yes","selected":true,"score":10},{"value":"Partially","selected":false,"score":5},{"value":"No","selected":false,"score":0}],"selected":0},{"question":"Were the reasons for creating the analysis explained enough?","answers":[{"value":"Yes, it was explained well why the analysis was created","selected":true,"score":10},{"value":"No, it is not","selected":false,"score":0}],"selected":0},{"question":"Is it a recurring analysis?","answers":[{"value":"No, it is a new analysis","selected":false,"score":20},{"value":"Yes but the data are well processed and presented.","selected":true,"score":12},{"value":"Yes, it is a recurring analysis with different time period","selected":false,"score":4}],"selected":1},{"question":"Where the tools and query scripts used included in the contribution?","answers":[{"value":"Yes, both tools and scripts were included","selected":true,"score":10},{"value":"No, the query script was not included in the contribution.","selected":false,"score":5},{"value":"No","selected":false,"score":0}],"selected":0}],"score":84}" | ||||||
created | 2018-04-05 22:08:30 | ||||||
last_update | 2018-04-06 07:16:48 | ||||||
depth | 0 | ||||||
children | 6 | ||||||
last_payout | 2018-04-12 22:08:30 | ||||||
cashout_time | 1969-12-31 23:59:59 | ||||||
total_payout_value | 48.586 HBD | ||||||
curator_payout_value | 21.259 HBD | ||||||
pending_payout_value | 0.000 HBD | ||||||
promoted | 0.000 HBD | ||||||
body_length | 9,374 | ||||||
author_reputation | 637,772,892,322 | ||||||
root_title | "Analysis of Voting "Spikes" on the Steem Blockchain" | ||||||
beneficiaries |
| ||||||
max_accepted_payout | 1,000,000.000 HBD | ||||||
percent_hbd | 10,000 | ||||||
post_id | 48,557,528 | ||||||
net_rshares | 28,585,327,539,494 | ||||||
author_curate_reward | "" |
voter | weight | wgt% | rshares | pct | time |
---|---|---|---|---|---|
remlaps | 0 | 8,085,901,826 | 65% | ||
remlaps1 | 0 | 12,706,086,613 | 22% | ||
remlaps2 | 0 | 3,297,758,148 | 100% | ||
lisa.palmer | 0 | 1,159,154,115 | 22% | ||
cub2 | 0 | 74,309,548 | 100% | ||
astronomyizfun | 0 | 2,808,279,171 | 65% | ||
mys | 0 | 30,783,733,934 | 35% | ||
mkt | 0 | 11,505,989,415 | 30.13% | ||
crokkon | 0 | 30,687,375,589 | 100% | ||
loshcat | 0 | 2,529,311,897 | 100% | ||
utopian-io | 0 | 28,439,258,185,792 | 18.08% | ||
devart | 0 | 1,351,706,021 | 100% | ||
greenorange | 0 | 609,471,115 | 100% | ||
yorkchinese | 0 | 3,523,958,228 | 45% | ||
maczak6603 | 0 | 11,490,609,437 | 5% | ||
princewrites | 0 | 858,922,398 | 100% | ||
josephace135 | 0 | 13,519,870,258 | 100% | ||
humanduck | 0 | 205,003,980 | 100% | ||
marketstack | 0 | 4,982,902,611 | 0.85% | ||
steemnova | 0 | 1,833,910,638 | 35% | ||
mrcalxy | 0 | 52,065,371 | 100% | ||
clayjohn | 0 | 3,651,160,886 | 100% | ||
tusuharus | 0 | 351,872,503 | 100% |
.
author | crokkon |
---|---|
permlink | re-calebjohn-analysis-of-voting-spikes-on-the-steem-blockchain-20180406t071953756z |
category | utopian-io |
json_metadata | "{"app": ""}" |
created | 2018-04-06 07:19:54 |
last_update | 2022-09-18 12:06:15 |
depth | 1 |
children | 1 |
last_payout | 2018-04-13 07:19:54 |
cashout_time | 1969-12-31 23:59:59 |
total_payout_value | 0.000 HBD |
curator_payout_value | 0.000 HBD |
pending_payout_value | 0.000 HBD |
promoted | 0.000 HBD |
body_length | 1 |
author_reputation | 81,214,366,861,104 |
root_title | "Analysis of Voting "Spikes" on the Steem Blockchain" |
beneficiaries | [] |
max_accepted_payout | 1,000,000.000 HBD |
percent_hbd | 10,000 |
post_id | 48,616,928 |
net_rshares | 3,936,726,542 |
author_curate_reward | "" |
voter | weight | wgt% | rshares | pct | time |
---|---|---|---|---|---|
clayjohn | 0 | 3,936,726,542 | 100% |
That's a very good point! I didn't realize that the timestamp was block time, not creation time. To confirm that, I checked against the posts I have saved and regenerated the graph I have shown above using 3 second bins. I would attach it here, but it is predictably identical to the one above! I still maintain that ~60 votes in three seconds (~20 votes in one) is moderately suspicious (although certainly less so). That's a good suggestion! I think for my next post, I may look into clustering voters to see this kind of grouping. I'd also like to look into voters based on SBD value they've added to the post. Thanks for reading!
author | calebjohn |
---|---|
permlink | re-crokkon-re-calebjohn-analysis-of-voting-spikes-on-the-steem-blockchain-20180407t050909070z |
category | utopian-io |
json_metadata | {"tags":["utopian-io"],"community":"utopian","app":"utopian/1.0.0"} |
created | 2018-04-07 05:09:09 |
last_update | 2018-04-07 05:09:09 |
depth | 2 |
children | 0 |
last_payout | 2018-04-14 05:09:09 |
cashout_time | 1969-12-31 23:59:59 |
total_payout_value | 0.000 HBD |
curator_payout_value | 0.000 HBD |
pending_payout_value | 0.000 HBD |
promoted | 0.000 HBD |
body_length | 638 |
author_reputation | 637,772,892,322 |
root_title | "Analysis of Voting "Spikes" on the Steem Blockchain" |
beneficiaries | [] |
max_accepted_payout | 1,000,000.000 HBD |
percent_hbd | 10,000 |
post_id | 48,768,344 |
net_rshares | 4,967,826,541 |
author_curate_reward | "" |
voter | weight | wgt% | rshares | pct | time |
---|---|---|---|---|---|
clayjohn | 0 | 4,967,826,541 | 100% |
Thank you for the contribution. It has been approved. From the data, I can tell that there are votes per second. Is this for a single post only? Or 600+ votes per second but on different posts. Without looking into the details, what I can infer is that users just want to get a big number of votes to go into the trending perhaps? But these number of votes were made by by a unique number of users? So I guess I also don't know what to conclude from it. But I'd like to propose/suggest is, my impression here is that you're somehow looking for the bad guy/s. What I'd like to suggest is to analyze optimistically - look for the good guys and highlight them. You can contact us on [Discord](https://discord.gg/uTyJkNm). **[[utopian-moderator]](https://utopian.io/moderators)**
author | eastmael |
---|---|
permlink | re-calebjohn-analysis-of-voting-spikes-on-the-steem-blockchain-20180406t073001203z |
category | utopian-io |
json_metadata | {"tags":["utopian-io"],"community":"utopian","app":"utopian/1.0.0"} |
created | 2018-04-06 07:30:00 |
last_update | 2018-04-06 07:30:00 |
depth | 1 |
children | 2 |
last_payout | 2018-04-13 07:30:00 |
cashout_time | 1969-12-31 23:59:59 |
total_payout_value | 0.952 HBD |
curator_payout_value | 0.316 HBD |
pending_payout_value | 0.000 HBD |
promoted | 0.000 HBD |
body_length | 779 |
author_reputation | 78,967,407,130,763 |
root_title | "Analysis of Voting "Spikes" on the Steem Blockchain" |
beneficiaries | [] |
max_accepted_payout | 1,000,000.000 HBD |
percent_hbd | 10,000 |
post_id | 48,618,076 |
net_rshares | 410,115,082,315 |
author_curate_reward | "" |
voter | weight | wgt% | rshares | pct | time |
---|---|---|---|---|---|
utopian.tip | 0 | 410,115,082,315 | 39.29% |
This is actually for a selection of 500 posts from January, but the numbers quoted (maximum of 66 votes per second) are for individual posts. It is true though, I think it would be very unfair to attempt to draw any firm conclusions from the data above. That's an interesting angle! Highlighting strong curators could certainly be beneficial!
author | calebjohn |
---|---|
permlink | re-eastmael-re-calebjohn-analysis-of-voting-spikes-on-the-steem-blockchain-20180407t052846004z |
category | utopian-io |
json_metadata | {"tags":["utopian-io"],"community":"utopian","app":"utopian/1.0.0"} |
created | 2018-04-07 05:28:45 |
last_update | 2018-04-07 05:28:45 |
depth | 2 |
children | 0 |
last_payout | 2018-04-14 05:28:45 |
cashout_time | 1969-12-31 23:59:59 |
total_payout_value | 0.000 HBD |
curator_payout_value | 0.000 HBD |
pending_payout_value | 0.000 HBD |
promoted | 0.000 HBD |
body_length | 344 |
author_reputation | 637,772,892,322 |
root_title | "Analysis of Voting "Spikes" on the Steem Blockchain" |
beneficiaries | [] |
max_accepted_payout | 1,000,000.000 HBD |
percent_hbd | 10,000 |
post_id | 48,770,381 |
net_rshares | 4,866,442,326 |
author_curate_reward | "" |
voter | weight | wgt% | rshares | pct | time |
---|---|---|---|---|---|
clayjohn | 0 | 4,866,442,326 | 100% |
Hey @eastmael, I just gave you a tip for your hard work on moderation. Upvote this comment to support the utopian moderators and increase your future rewards!
author | utopian.tip |
---|---|
permlink | re-re-calebjohn-analysis-of-voting-spikes-on-the-steem-blockchain-20180406t073001203z-20180406t132923 |
category | utopian-io |
json_metadata | "" |
created | 2018-04-06 13:29:24 |
last_update | 2018-04-06 13:29:24 |
depth | 2 |
children | 0 |
last_payout | 2018-04-13 13:29:24 |
cashout_time | 1969-12-31 23:59:59 |
total_payout_value | 0.000 HBD |
curator_payout_value | 0.000 HBD |
pending_payout_value | 0.000 HBD |
promoted | 0.000 HBD |
body_length | 158 |
author_reputation | 238,310,597,885 |
root_title | "Analysis of Voting "Spikes" on the Steem Blockchain" |
beneficiaries | [] |
max_accepted_payout | 1,000,000.000 HBD |
percent_hbd | 10,000 |
post_id | 48,663,526 |
net_rshares | 0 |
### Hey @calebjohn I am @utopian-io. I have just upvoted you! #### Achievements - You have less than 500 followers. Just gave you a gift to help you succeed! - Seems like you contribute quite often. AMAZING! #### Community-Driven Witness! I am the first and only Steem Community-Driven Witness. <a href="https://discord.gg/zTrEMqB">Participate on Discord</a>. Lets GROW TOGETHER! - <a href="https://v2.steemconnect.com/sign/account-witness-vote?witness=utopian-io&approve=1">Vote for my Witness With SteemConnect</a> - <a href="https://v2.steemconnect.com/sign/account-witness-proxy?proxy=utopian-io&approve=1">Proxy vote to Utopian Witness with SteemConnect</a> - Or vote/proxy on <a href="https://steemit.com/~witnesses">Steemit Witnesses</a> [](https://steemit.com/~witnesses) **Up-vote this comment to grow my power and help Open Source contributions like this one. Want to chat? Join me on Discord https://discord.gg/Pc8HG9x**
author | utopian-io |
---|---|
permlink | re-calebjohn-analysis-of-voting-spikes-on-the-steem-blockchain-20180406t121151460z |
category | utopian-io |
json_metadata | {"tags":["utopian-io"],"community":"utopian","app":"utopian/1.0.0"} |
created | 2018-04-06 12:11:51 |
last_update | 2018-04-06 12:11:51 |
depth | 1 |
children | 0 |
last_payout | 2018-04-13 12:11:51 |
cashout_time | 1969-12-31 23:59:59 |
total_payout_value | 0.000 HBD |
curator_payout_value | 0.000 HBD |
pending_payout_value | 0.000 HBD |
promoted | 0.000 HBD |
body_length | 1,085 |
author_reputation | 152,955,367,999,756 |
root_title | "Analysis of Voting "Spikes" on the Steem Blockchain" |
beneficiaries | [] |
max_accepted_payout | 1,000,000.000 HBD |
percent_hbd | 10,000 |
post_id | 48,652,340 |
net_rshares | 0 |