create account

How to Use AI to Find Articles with Ruby by inertia

View this thread on: hive.blogpeakd.comecency.com
· @inertia · (edited)
$8.57
How to Use AI to Find Articles with Ruby
**Note: This is pure magic and highly experimental.**  In a nutshell, we're going to look a the trending page and try to predict which new posts will reach trending.  To do this, we're going to use ID3.  According to Wikipedia:

> In decision tree learning, ID3 (Iterative Dichotomiser 3) is an algorithm invented by Ross Quinlan used to generate a decision tree from a dataset. ID3 is the precursor to the C4.5 algorithm, and is typically used in the machine learning and natural language processing domains.

*[ID3 algorithm](https://en.wikipedia.org/wiki/ID3_algorithm)*

In Ruby, we can use the ID3 algorithm through the `ai4r` gem.

Ok, it's not really magic.  So, how does it work?  I have ID3 look at some specific attributes of top 100 trending posts.  Specifically:

`author_reputation percent_steem_dollars promoted category net_votes`
  
Based on these attributes, I have it predict `total_pending_payout_value` of a new post.  If `total_pending_payout_value` can be predicted, we will display the difference between the prediction and the current pending payout.

As always, we use [Radiator](https://steemit.com/steem/@inertia/radiator-steem-ruby-api-client) with `bundler`.  You can get `bundler` with this command:

```bash
$ gem install bundler
```

I've tested it on various versions of ruby.  The oldest one I got it to work was:

`ruby 2.0.0p645 (2015-04-13 revision 50299) [x86_64-darwin14.4.0]`

First, make a project folder:

```bash
$ mkdir radiator
$ cd radiator
```

Create a file named `Gemfile` containing:

```ruby
source 'https://rubygems.org'
gem 'radiator', github: 'inertia186/radiator'
gem 'ai4r' # Adds general machine learning capabilities.
```

Then run the command:

```bash
$ bundle install
```

Create a file named `ai-scan.rb` containing:

```ruby
require 'rubygems'
require 'bundler/setup'

Bundler.require

def to_rep(raw)
  raw = raw.to_i
  level = Math.log10(raw.abs)
  level = [level - 9, 0].max
  level = (level * 9) + 25
  level.to_i
end

def base_value(raw)
  raw.split(' ').first.to_i
end

def symbol_value(raw)
  raw.split(' ').last
end

api = Radiator::Api.new
names = ARGV
data_labels = %w(
  author_reputation percent_steem_dollars promoted category net_votes
  total_pending_payout_value
)
prediction_label = data_labels.last

options = {
  limit: 100
}

options[:tag] = ARGV.first if ARGV.any?

response = api.get_discussions_by_trending(options)
trending_comments = response.result

data_items = trending_comments.map do |comment|
  data_labels.map do |label|
    case label
    when 'author_reputation'; to_rep comment[label]
    when 'promoted'; base_value comment[label]
    when 'total_pending_payout_value'; base_value comment[label]
    else; comment[label]
    end
  end
end

data_set = Ai4r::Data::DataSet.new data_labels: data_labels, data_items: data_items
id3 = Ai4r::Classifiers::ID3.new.build(data_set)

response = api.get_discussions_by_created(options)
new_comments = response.result - trending_comments
 
predictions = new_comments.map do |comment|
  next unless comment.mode == 'first_payout'

  data_item = data_labels.map do |label|
    case label
    when 'author_reputation'; to_rep comment[label]
    when 'promoted'; base_value comment[label]
    when 'total_pending_payout_value'; base_value comment[label]
    else; comment[label]
    end
  end

  prediction = (id3.eval(data_item) rescue nil)

  next if prediction.nil?

  {
    difference: prediction - base_value(comment.total_pending_payout_value),
    symbol: symbol_value(comment.total_pending_payout_value),
    url: "https://steemit.com#{comment.url}"
  }
end.reject(&:nil?)

if predictions.any?
  puts "Predicting the following payouts will rise by:"
  predictions.sort_by { |p| p[:difference] }.each do |prediction|
    puts "#{prediction[:difference]} #{prediction[:symbol]}: #{prediction[:url]}"
  end
else
  puts "Nothing to predict."
end
```

Then run it:

```bash
$ ruby ai-scan.rb
```

The expected output will be something like this:

```
Predicting the following payouts will rise by:
0 SBD: https://steemit.com/history/@steemizen/today-in-history-uss-arkansas
0 SBD: https://steemit.com/steem/@ozchartart/usdsteem-btc-daily-poloniex-bittrex-technical-analysis-market-report-update-162-jan-14-2017
10 SBD: https://steemit.com/travel/@writingamigo/traveler-s-observations-the-origins-of-habits-how-environement-forces-us-to-believe-that-it-is-our-fault
13 SBD: https://steemit.com/fiction/@johnjgeddes/tempest-and-tea-rediscovering-the-magic-within-part-1-of-2
15 SBD: https://steemit.com/travel/@exploretraveler/photo-of-the-day-skagway-alaska
17 SBD: https://steemit.com/news/@contentjunkie/spacex-launches-first-rocket-since-explosion
17 SBD: https://steemit.com/food/@anti-sophist/bold-lamb-loin-chops-and-basil-potatoes-2017114t195031380z
17 SBD: https://steemit.com/pizzagate/@gizmosia/the-video-the-world-must-watch-chilling-info-re-child-trafficking-posted-today
17 SBD: https://steemit.com/minecraft/@thedonutguy7/how-to-download-a-minecraft-map-for-windows
17 SBD: https://steemit.com/fly/@altcointrader77/flycoin-in-the-hands-of-a-trusted-few
17 SBD: https://steemit.com/fiction/@internutter/challenge-01476-d015-historical-hysterical-first
17 SBD: https://steemit.com/animal/@favorit/nature-that-surrounds-us-in-the-animal-world-black-stallion-23
18 SBD: https://steemit.com/film/@movie-online/confidential-secret-market-1974-romance-history
18 SBD: https://steemit.com/life/@lukestokes/day-6-update-the-wim-hof-method
18 SBD: https://steemit.com/kr/@leesunmoo/6r1hns
19 SBD: https://steemit.com/challenge30/@franks/challenge30-deep-space-mining-unobtainium
```

You can also pass a tag:

```bash
$ ruby ai-scan.rb photography
```

The expected output will be something like this:

```
Predicting the following payouts will rise by:
0 SBD: https://steemit.com/travel/@koskl/visiting-cusco-peru
0 SBD: https://steemit.com/nature/@zaskia/beautiful-flower
0 SBD: https://steemit.com/photography/@distantsignal/shooting-milkshake-web-series-on-vintage-russian-lenses
0 SBD: https://steemit.com/photography/@chrissysworld/the-sky-burns-the-angels-flee-der-himmel-brennt-die-engel-fliehn-english-deutsch
0 SBD: https://steemit.com/photography/@klava/white-truffle
0 SBD: https://steemit.com/photography/@rynow/sunken-fish-trailer
0 SBD: https://steemit.com/food/@lonilush/traditional-balkan-cheese-pie-burek-original-recipe-with-pictures
0 SBD: https://steemit.com/nature/@riostarr/mushrooms-on-dead-wood
1 SBD: https://steemit.com/photography/@richar/life-and-death-on-wall-street
1 SBD: https://steemit.com/photography/@xntryk1/swapmeet-finds-640
5 SBD: https://steemit.com/photography/@jasonrussell/jacks-fork-river-10-pictures
5 SBD: https://steemit.com/photography/@kalemandra/reflections
17 SBD: https://steemit.com/photography/@briansss/check-it-out-my-photo-album-of-my-trip-through-venezuela
17 SBD: https://steemit.com/food/@alizee/pecal-tubers-vegetables-papaya-flower
```

Either way, you can use these results as voting suggestions because the ID3 algorithm thinks these articles correlate to a future payout prediction.

Under the hood, here's a rough explanation of what's going on.  We take the trending posts, and just extract certain fields as inputs to ID3.  The inputs become:

| `author_reputation` | `percent_steem_dollars` | `promoted` | `category` | `net_votes` | `total_pending_payout_value` |
|-:|-:|-:|-|-:|-:|
| `52` | `10000` | `0` | `romance` | `146` | `16` |
| `58` | `10000` | `0` | `story` | `160` | `16` |
| `67` | `0` | `0` | `science` | `162` | `16` |
| `58` | `10000` | `0` | `travel` | `178` | `16` |
| `60` | `10000` | `0` | `gaming` | `166` | `16` |
| `54` | `10000` | `0` | `fiction` | `141` | `15` |
| `54` | `10000` | `0` | `food` | `163` | `15` |
| `53` | `10000` | `0` | `art` | `167` | `15` |
| `67` | `0` | `0` | `japan` | `108` | `15` |
| `61` | `10000` | `0` | `poker` | `21` | `15` |
| `59` | `10000` | `0` | `til` | `158` | `15` |
| `63` | `10000` | `0` | `music` | `165` | `15` |
| `60` | `10000` | `0` | `art` | `160` | `15` |
| `59` | `10000` | `0` | `aceh` | `155` | `15` |
| `59` | `10000` | `0` | `writing` | `147` | `15` |
| `55` | `10000` | `0` | `life` | `160` | `15` |
| `51` | `10000` | `0` | `painting` | `148` | `15` |
| `57` | `0` | `1` | `life` | `130` | `15` |
| `59` | `10000` | `0` | `travel` | `163` | `15` |

ID3 takes the above inputs and then compares them all to each new post, looking for correlations.  Then it tries to predict the final `total_pending_payout_value` for the new posts.

For instance, it might notice that authors with a reputation of `59`, posting in `til`, tend to have a `total_pending_payout_value` of `15`.  So if a new post matches, it'll make that prediction.

But then, it notices a correlation between certain `percent_steem_dollars`, `promoted`, and `category` posts, but only when the `category` is `science`.  It's that flexible.

As an analogy, it's a little bit like weather prediction: "In this area, on this day, for the last 100 years, when the temperature is `x` and the humidity is `y`, it rains `z` percent of the time."

You will notice, I *specifically* exclude the author name from the prediction inputs.  If you want to include it, you can add it yourself by modifying `data_labels` in the script and adding `author` to the beginning.

While including `author` might help ID3 make better predictions, personally, I'm not interested in correlating the author name.  We already have enough of those kinds of tools (albeit, without ID3).  I want ID3 to be indifferent about the author and try to make its prediction on a more subtle inputs, which is what it's designed to do.

![ruby](http://www.steemimg.com/images/2016/08/24/1024px-Ruby_logo.svgdcc20.png)

See my previous Ruby How To posts in: [#radiator](https://steemit.com/created/radiator) [#ruby](https://steemit.com/created/ruby)
👍  , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and 117 others
properties (23)
authorinertia
permlinkhow-to-use-ai-to-find-articles-with-ruby
categoryradiator
json_metadata{"tags":["radiator","ruby","steem","howto","machinelearning"],"image":["http://www.steemimg.com/images/2016/08/24/1024px-Ruby_logo.svgdcc20.png"],"links":["https://en.wikipedia.org/wiki/ID3_algorithm","https://steemit.com/steem/@inertia/radiator-steem-ruby-api-client","https://steemit.com/created/radiator","https://steemit.com/created/ruby"],"app":"steemit/0.1","format":"markdown"}
created2017-01-15 02:32:24
last_update2017-01-15 02:43:51
depth0
children4
last_payout2017-02-15 03:18:21
cashout_time1969-12-31 23:59:59
total_payout_value7.449 HBD
curator_payout_value1.120 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length9,855
author_reputation346,568,901,399,561
root_title"How to Use AI to Find Articles with Ruby"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd0
post_id2,249,098
net_rshares37,182,442,059,683
author_curate_reward""
vote details (181)
@abit ·
Interesting.
properties (22)
authorabit
permlinkre-inertia-how-to-use-ai-to-find-articles-with-ruby-20170115t041715447z
categoryradiator
json_metadata{"tags":["radiator"],"app":"steemit/0.1"}
created2017-01-15 04:17:24
last_update2017-01-15 04:17:24
depth1
children0
last_payout2017-02-15 03:18:21
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length12
author_reputation141,171,499,037,785
root_title"How to Use AI to Find Articles with Ruby"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id2,249,606
net_rshares0
@cardboard ·
Cool post. Did you measure corelation between predicted payout and the real one?
properties (22)
authorcardboard
permlinkre-inertia-how-to-use-ai-to-find-articles-with-ruby-20170115t112106073z
categoryradiator
json_metadata{"tags":["radiator"],"app":"steemit/0.1"}
created2017-01-15 11:21:15
last_update2017-01-15 11:21:15
depth1
children1
last_payout2017-02-15 03:18:21
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length80
author_reputation31,522,757,177,122
root_title"How to Use AI to Find Articles with Ruby"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id2,251,292
net_rshares0
@inertia ·
I'm still looking at it.  When I originally posted this post, my script said I would earn $17.  Then, 5 minutes later, it couldn't make any more predictions about this post.

The other samples in this post seem to correlate a little better than chance, on cursory analysis.  I'll do a more in-depth post later.
properties (22)
authorinertia
permlinkre-cardboard-re-inertia-how-to-use-ai-to-find-articles-with-ruby-20170115t182627684z
categoryradiator
json_metadata{"tags":["radiator"],"app":"steemit/0.1"}
created2017-01-15 18:26:27
last_update2017-01-15 18:26:27
depth2
children0
last_payout2017-02-15 03:18:21
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length310
author_reputation346,568,901,399,561
root_title"How to Use AI to Find Articles with Ruby"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id2,254,037
net_rshares0
@mightyenvz ·
Very helpful post! Interesting too.
properties (22)
authormightyenvz
permlinkre-inertia-how-to-use-ai-to-find-articles-with-ruby-20170117t170951537z
categoryradiator
json_metadata{"tags":["radiator"],"app":"steemit/0.1"}
created2017-01-17 17:09:51
last_update2017-01-17 17:09:51
depth1
children0
last_payout2017-02-15 03:18:21
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length35
author_reputation20,554,580,030
root_title"How to Use AI to Find Articles with Ruby"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id2,271,420
net_rshares0