create account

Steem Sincerity - Spam Filtering API and App Extensions by andybets

View this thread on: hive.blogpeakd.comecency.com
· @andybets · (edited)
$110.72
Steem Sincerity - Spam Filtering API and App Extensions
<center>
![](https://steemitimages.com/DQmZejyqAnYoWWMw7iiyuNUdrUGr1ckbpTZkSpuhZ9KHF3d/image.png)
</center>

## Steem Sincerity

This project will attempt to help the community address the spam problem by aiming to introduce the following:
1) A fast API which provides detailed classification information about any Steem account.
2) Chrome extension(s) for popular Steem web apps such as Condenser, which make this information easily available in these front-end interfaces, and allow the sorting and filtering of comments based on personalised criteria such as spam scores.
3) Modifications to front-ends and (hopefully) apps to support this API directly.

## Approach

A few months ago, I created an [experimental account classifier](http://steemreports.com/account-classifier/). This uses machine learning and a small training set to determine patterns which identify comments as spam (or from bots). I would like to adapt this by increasing the training set of accounts, increasing the complexity of the feature set and making it performant enough to apply to the hundreds of thousands of Steem accounts in a scalable manner.

I'm aware that hiding spam does not directly stop it from bloating the blockchain, or being used for self-voting, which are significant concerns, but it should assist with content discovery, and by reducing the visibility of spam, somewhat undermine the economic incentive to produce it. In addition to this, it may reduce the number of comments which are upvoted and increase the number which are downvoted by providing instant insights into each account commenting on posts.

## What's Available Now?

There is a minimally functional experimental endpoint at https://multi.tube/s/api/

This allows you to make HTTP GET calls like this:

<pre><code>https://multi.tube/s/api/get-classification-scores/account1,account2</code></pre>

...and which returns JSON responses like this:

<pre><code>{
	"account1": {
	    "classification_bot_score": 0.285714285714286,
	    "classification_human_score": 0.714285714285714,
	    "classification_spammer_score": 0.0
	},
	"account2": {
	    "classification_bot_score": 0.571428571428571,
	    "classification_human_score": 0.428571428571429,
	    "classification_spammer_score": 0.0
	}
} </code></pre>


Up to 100 accounts can be queried at once, and the results may be easily used to modify/suppress comment rendering. If a requested account has not been classified, it will not be returned in the response.

This API is still using the same spam scoring formula as the Steem Reports accounts classifier, and this really needs to be improved and trained again to increase the accuracy of classification scores, but I think it is helpful in its current state, and serves as something to develop the other aspects of the project against.

When using this API to exclude spam, it is envisaged that we could also use followers information to augment the data. For example, somebody may want to allow accounts they already follow to evade the spam filtering process so they never miss something they would want to see, as no spam filter is perfect.

## Training the Classifier Software

To improve the accuracy of the classifier I will need to collect three sets of account names:
1) Spammers - generating significant spam, and little content of value
2) Humans Content Creators - generating no (or very little) spam and valuable content
3) Bots

I will announce a training date soon, and collect recent examples of each of these types of accounts. They need to be recent as the database will store a maximum of 14 days of comments.

Please let me know if you would be able to help by providing accounts in these categories when the time comes, so I can notify you.

---

I'm using my multi.tube domain/server in the short-term, but if the project gets enough support I'll give it its own.

Again, please let me know if you have any questions or would like to help with the project.
👍  , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and 10 others
properties (23)
authorandybets
permlinkproject-sincerity-spam-filtering-api-and-app-extensions
categorysteemdev
json_metadata{"tags":["steemdev","steem","steemit","spam","steem-sincerity"],"links":["http://steemreports.com/account-classifier/","https://multi.tube/s/api/"],"app":"steemit/0.1","format":"markdown","image":["https://steemitimages.com/DQmZejyqAnYoWWMw7iiyuNUdrUGr1ckbpTZkSpuhZ9KHF3d/image.png"]}
created2018-03-14 11:52:15
last_update2018-03-20 07:07:33
depth0
children14
last_payout2018-03-21 11:52:15
cashout_time1969-12-31 23:59:59
total_payout_value83.663 HBD
curator_payout_value27.060 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length3,948
author_reputation15,189,090,569,005
root_title"Steem Sincerity - Spam Filtering API and App Extensions"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id44,355,635
net_rshares35,049,359,633,373
author_curate_reward""
vote details (74)
@bitgeek ·
comment
Congratulations @andybets, this post is the forth most rewarded post (based on pending payouts) in the last 12 hours written by a User account holder (accounts that hold between 0.1 and 1.0 Mega Vests). The total number of posts by User account holders during this period was 3118 and the total pending payments to posts in this category was $6762.85. To see the full list of highest paid posts across all accounts categories, [click here](www.steemit.com/steemit/@bitgeek/payout-stats-report-for-14th-march-2018--part-i). 

If you do not wish to receive these messages in future, please reply stop to this comment.
properties (22)
authorbitgeek
permlinkre-project-sincerity-spam-filtering-api-and-app-extensions-20180314t195746
categorysteemdev
json_metadata""
created2018-03-14 19:57:48
last_update2018-03-14 19:57:48
depth1
children0
last_payout2018-03-21 19:57:48
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length616
author_reputation13,049,044,453,787
root_title"Steem Sincerity - Spam Filtering API and App Extensions"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id44,443,477
net_rshares0
@bobinson ·
$0.02
interesting project. Looks a lot similar to what I tried with https://steemit.com/hello/@thefreebird/init-1

all the best.

PS: A query with invalid account generates internal server error. You may want to catch that exception :)
👍  
properties (23)
authorbobinson
permlinkre-andybets-project-sincerity-spam-filtering-api-and-app-extensions-20180314t203052690z
categorysteemdev
json_metadata{"tags":["steemdev"],"links":["https://steemit.com/hello/@thefreebird/init-1"],"app":"steemit/0.1"}
created2018-03-14 20:30:54
last_update2018-03-14 20:30:54
depth1
children2
last_payout2018-03-21 20:30:54
cashout_time1969-12-31 23:59:59
total_payout_value0.016 HBD
curator_payout_value0.003 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length229
author_reputation55,343,141,313,811
root_title"Steem Sincerity - Spam Filtering API and App Extensions"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id44,448,150
net_rshares6,710,606,554
author_curate_reward""
vote details (1)
@andybets ·
Yeah, thanks! I'm not always the most diligent when it comes to error handling.
properties (22)
authorandybets
permlinkre-bobinson-re-andybets-project-sincerity-spam-filtering-api-and-app-extensions-20180314t211052073z
categorysteemdev
json_metadata{"tags":["steemdev"],"app":"steemit/0.1"}
created2018-03-14 21:10:48
last_update2018-03-14 21:10:48
depth2
children1
last_payout2018-03-21 21:10:48
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length79
author_reputation15,189,090,569,005
root_title"Steem Sincerity - Spam Filtering API and App Extensions"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id44,453,967
net_rshares0
@bobinson ·
happens to the best of us :)
properties (22)
authorbobinson
permlinkre-andybets-re-bobinson-re-andybets-project-sincerity-spam-filtering-api-and-app-extensions-20180314t213007844z
categorysteemdev
json_metadata{"tags":["steemdev"],"app":"steemit/0.1"}
created2018-03-14 21:30:09
last_update2018-03-14 21:30:09
depth3
children0
last_payout2018-03-21 21:30:09
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length28
author_reputation55,343,141,313,811
root_title"Steem Sincerity - Spam Filtering API and App Extensions"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id44,456,615
net_rshares0
@cardboard ·
$0.03
Thats a neat idea :) tipuvote! 0.1
👍  ,
properties (23)
authorcardboard
permlinkre-andybets-project-sincerity-spam-filtering-api-and-app-extensions-20180314t130021759z
categorysteemdev
json_metadata{"tags":["steemdev"],"app":"steemit/0.1"}
created2018-03-14 13:00:24
last_update2018-03-14 13:00:24
depth1
children0
last_payout2018-03-21 13:00:24
cashout_time1969-12-31 23:59:59
total_payout_value0.032 HBD
curator_payout_value0.001 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length34
author_reputation31,522,757,177,122
root_title"Steem Sincerity - Spam Filtering API and App Extensions"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id44,367,521
net_rshares10,954,354,900
author_curate_reward""
vote details (2)
@duplibot ·
$0.02
I feel like we could be best friends!

I can definitely help with training and would like to see what other ways we could work together. I have a dataset of identified spammers as well as a whitelist of legitimate content creators, both of which continue to grow daily.

You can find me on steem.chat or catch me on Discord (duplibot#1884)
👍  
properties (23)
authorduplibot
permlinkre-andybets-project-sincerity-spam-filtering-api-and-app-extensions-20180315t185045712z
categorysteemdev
json_metadata{"tags":["steemdev"],"app":"steemit/0.1"}
created2018-03-15 18:51:06
last_update2018-03-15 18:51:06
depth1
children1
last_payout2018-03-22 18:51:06
cashout_time1969-12-31 23:59:59
total_payout_value0.018 HBD
curator_payout_value0.004 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length339
author_reputation2,540,468,850,371
root_title"Steem Sincerity - Spam Filtering API and App Extensions"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id44,648,059
net_rshares7,067,553,711
author_curate_reward""
vote details (1)
@andybets ·
Great! I'll be in touch soon.
properties (22)
authorandybets
permlinkre-duplibot-re-andybets-project-sincerity-spam-filtering-api-and-app-extensions-20180316t213258330z
categorysteemdev
json_metadata{"tags":["steemdev"],"app":"steemit/0.1"}
created2018-03-16 21:34:09
last_update2018-03-16 21:34:09
depth2
children0
last_payout2018-03-23 21:34:09
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length29
author_reputation15,189,090,569,005
root_title"Steem Sincerity - Spam Filtering API and App Extensions"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id44,831,640
net_rshares0
@fronttowardenemy ·
$0.02
I really like the idea of this. My hope is that spamming the Steem blockchain will be more effort than it's worth, and the spammers will move on to other platforms. 

I checked my account and it says I'm human! So far so good!
👍  
properties (23)
authorfronttowardenemy
permlinkre-andybets-project-sincerity-spam-filtering-api-and-app-extensions-20180314t172141873z
categorysteemdev
json_metadata{"tags":["steemdev"],"app":"steemit/0.1"}
created2018-03-14 17:21:42
last_update2018-03-14 17:21:42
depth1
children0
last_payout2018-03-21 17:21:42
cashout_time1969-12-31 23:59:59
total_payout_value0.016 HBD
curator_payout_value0.003 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length226
author_reputation63,901,774,804,002
root_title"Steem Sincerity - Spam Filtering API and App Extensions"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id44,416,711
net_rshares6,996,164,279
author_curate_reward""
vote details (1)
@gmzct ·
goodpost
properties (22)
authorgmzct
permlinkre-andybets-project-sincerity-spam-filtering-api-and-app-extensions-20180320t111749414z
categorysteemdev
json_metadata{"tags":["steemdev"],"app":"steemit/0.1"}
created2018-03-20 11:17:51
last_update2018-03-20 11:17:51
depth1
children0
last_payout2018-03-27 11:17:51
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length8
author_reputation-16,832,223,372
root_title"Steem Sincerity - Spam Filtering API and App Extensions"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id45,525,979
net_rshares0
@official-hord ·
$0.02
I really love the idea, But I want to ask how you intend to achieve this aim of checking for spams between thousands of messages from different users which is relevant to the post and at the same time has been used over and over again on the blockchain. 
This might be a little bit difficult but I wish you luck. 
💪
👍  
properties (23)
authorofficial-hord
permlinkre-andybets-project-sincerity-spam-filtering-api-and-app-extensions-20180314t210408600z
categorysteemdev
json_metadata{"tags":["steemdev"],"app":"steemit/0.1"}
created2018-03-14 21:04:18
last_update2018-03-14 21:04:18
depth1
children2
last_payout2018-03-21 21:04:18
cashout_time1969-12-31 23:59:59
total_payout_value0.020 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length315
author_reputation8,938,440,857,429
root_title"Steem Sincerity - Spam Filtering API and App Extensions"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id44,453,007
net_rshares6,853,385,416
author_curate_reward""
vote details (1)
@andybets · (edited)
Initially this won't check for individual spam messages, but will evaluate accounts. So any comments from 'spammer' accounts would be hidden (or marked). It uses some machine learning approaches which are actually easier to apply than they are to explain - though it's not perfect.
properties (22)
authorandybets
permlinkre-official-hord-re-andybets-project-sincerity-spam-filtering-api-and-app-extensions-20180314t211745926z
categorysteemdev
json_metadata{"tags":["steemdev"],"app":"steemit/0.1"}
created2018-03-14 21:17:42
last_update2018-03-15 05:40:24
depth2
children1
last_payout2018-03-21 21:17:42
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length281
author_reputation15,189,090,569,005
root_title"Steem Sincerity - Spam Filtering API and App Extensions"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id44,454,924
net_rshares0
@official-hord ·
Cool
properties (22)
authorofficial-hord
permlinkre-andybets-re-official-hord-re-andybets-project-sincerity-spam-filtering-api-and-app-extensions-20180316t223924449z
categorysteemdev
json_metadata{"tags":["steemdev"],"app":"steemit/0.1"}
created2018-03-16 22:39:24
last_update2018-03-16 22:39:24
depth3
children0
last_payout2018-03-23 22:39:24
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length4
author_reputation8,938,440,857,429
root_title"Steem Sincerity - Spam Filtering API and App Extensions"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id44,839,409
net_rshares0
@showoff ·
GREAT! I wondered for a long time if Steemit would make it through all this spam. Your project gives me hope! Great work!
properties (22)
authorshowoff
permlinkre-andybets-project-sincerity-spam-filtering-api-and-app-extensions-20180513t081801898z
categorysteemdev
json_metadata{"tags":["steemdev"],"app":"steemit/0.1"}
created2018-05-13 08:18:03
last_update2018-05-13 08:18:03
depth1
children0
last_payout2018-05-20 08:18:03
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length121
author_reputation315,845,984,817
root_title"Steem Sincerity - Spam Filtering API and App Extensions"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id55,427,722
net_rshares0
@tipu ·
<center><p><strong>Hi @andybets! You have received 0.25 SBD @tipU upvotehttps://i.imgur.com/JFq6JWX.png! from @cardboard !</strong></p><strong><a href="https://steemit.com/steemit/@tipu/upvotes-with-2-5-x-profit-tipu-upvote-service-guide" rel="noopener">@tipU!- upvotes with with 2.5 x profit</a> + <a href="https://steemit.com/steemit/@tipu/just-a-reminder-tipu-pays-out-100-to-steem-power-delegators-0-fees" rel="noopener">daily payouts to investors :)</a></strong></p><hr>https://i.imgur.com/kVF5RiI.gif<br></center>
👍  
properties (23)
authortipu
permlinkre-project-sincerity-spam-filtering-api-and-app-extensions-20180314t130115
categorysteemdev
json_metadata""
created2018-03-14 13:01:18
last_update2018-03-14 13:01:18
depth1
children0
last_payout2018-03-21 13:01:18
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length519
author_reputation55,916,319,164,488
root_title"Steem Sincerity - Spam Filtering API and App Extensions"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id44,367,687
net_rshares3,815,411,758
author_curate_reward""
vote details (1)