create account

Getting started with SteemData by furion

View this thread on: hive.blogpeakd.comecency.com
· @furion · (edited)
$138.43
Getting started with SteemData
![](https://steemitimages.com/0x0/https://i.imgur.com/uAu5ST4.jpg)

### Getting Started
To use SteemData from your favorite language, just install the appropriate MongoDB library. You can find one for all major languages like [JavaScript](https://www.npmjs.com/package/mongodb), [Python](http://api.mongodb.com/python/current/installation.html), [Go](http://labix.org/mgo) and [others](https://docs.mongodb.com/manual/applications/drivers/).

You can connect to the public SteemData server via the following URI:
```
mongodb://steemit:steemit@mongo1.steemdata.com:27017/SteemData
```

This tutorial uses **Python**, for which you can install a neat helper library that includes PyMongo and a few extra niceties.

```
pip install -U steemdata
```

**Quick example:**
```
> from steemdata import SteemData

> s = SteemData()

> s.info()
mongodb://steemit:steemit@mongo1.steemdata.com:27017/SteemData

> s.Accounts.find_one({'name':'furion'})['balances']
{'SAVINGS_SBD': 100.0,
 'SAVINGS_STEEM': 0.0,
 'SBD': 6.453,
 'STEEM': 66.157,
 'VESTS': 86491944.341744}
```

### RoboMongo
I highly recommend [RoboMongo](https://robomongo.org/) as a cross-platform GUI utility for playing around with SteemData.

https://vimeo.com/205691651

*You can find sample queries from the video [here](https://gist.github.com/Netherdrake/a844ebf771c96929bee8ddb446d1cfa6)*.


### Collections

#### Accounts
Accounts contains Steem Accounts and their:
- account info / profile
- balances
- vesting routes
- open conversion requests
- voting history on posts
- a list of followers and followings
- witness votes
- curation stats

#### Posts
Here you can find all top-level posts, with full-text search support for content bodies.

#### Operations
Operations contains all the events that happened on the blockchain so far.
You can query for operations in individual blocks, or by time, operation type (comment, transfer, vote...) or arbitrary properties. See [Digging Deeper]() for examples.

#### AccountOperations
Same as operations, but with account ownership attached for easy querying.

#### PriceHistory
Snapshots of Bitcoin, STEEM, SBD and USD implied prices.

---

You can access collections easily via SteemData helper.
```
> s = SteemData()
> [print(x) for x in s.__dict__.keys()]
db
Operations
AccountOperations
PriceHistory
Posts
Accounts
```

We can see a few properties starting with UPPER case letters. These give us easy access to main SteemData collections. Alternatively, you can query a collection via `db` property.

```
s = SteemData()

# these two do the same thing
s.Accounts
s.db['Accounts']
```

### Querying
If you're new to MongoDB, I highly recommend [this querying guide](https://docs.mongodb.com/manual/tutorial/query-documents/).

I will only point out a few gotchas in regards to SteemData.

#### Using Indexes
For best performance on your queries, make sure you're using indexed fields whenever possible. You can check out which fields are indexed by using `index_information()`:
```
s = SteemData()
indexes = list(s.Operations.index_information())
print(indexes)
```

As you will find out, most commonly queried fields are indexed, like `account`/`name`, `type`, `timestamp`, `identifier`, `permlink`, `author`, `memo` to name a few.


#### Using Projection
Using projection will make queries a lot faster, save bandwidth and do the job of only returning the data that you need.

For example, if you're only interested in someone's followers, you can use `projection` to get only that field.

```
s.Accounts.find_one({'name': steemit_username},
                    projection={'followers': 1, '_id': 0})
```

This is similar to `SELECT followers FROM accounts` vs `SELECT * FROM accounts` in SQL.

#### Using Limits
By default, all results will be returned. This could make queries run for longer, and is wasteful, especially if you only need *some* results at a time (ie. top 100).

This is where limits come in, for example, if we need top 100 accounts by SteemPower:
```
q = s.Accounts.find({},
                    projection={'sp': 1, 'name': 1, '_id': 0},
                    sort=[('sp', -1)],
                    limit=100)
print(list(c))
```


#### Pagination
Following the above example, we can get the *next* 100 accounts (100-200) by using `skip` argument

```
q = s.Accounts.find({},
                    projection={'sp': 1, 'name': 1, '_id': 0},
                    sort=[('sp', -1)],
                    limit=100,
                    skip=100)
```

#### Syntax Sugar
If you'd like, you can also use method chaining instead of arguments. For example:
```
s.Accounts.find({}).projection(...).sort(...).limit(100).skip(100)
```


### Example
Lets wrap up with a practical example. The [folowers page on steemit](https://steemit.com/@furion/followers) is pretty bland - it only shows usernames. What if we could spice it up, by displaying users *profile picture, steem-power, reputation, and their own followers statistics*. How would we obtain this data? Here is a function that is powering [an API endpoint that does just that](https://api0.steemdata.com/busy.org/furion/with_metadata/followers).

```
def busy_account_following(account_name, following):
    """
    Fetch users followers or followings and their metadata.
    Returned list is ordered by follow time (newest followers first). \n
    Usage: `GET /busy/<string:account_name>/with_metadata/<string:following>`\n
    `following` must be 'following' or 'followers'.\n
    """
    if following not in ['following', 'followers']:
        raise ParseError(detail='Please specify following or followers.')

    acc = mongo.db['Accounts'].find_one({'name': account_name}, {following: 1, '_id': 0})
    if not acc:
        raise NotFound(detail='Could not find STEEM account %s' % account_name)

    # if follower list is empty
    if not acc[following]:
        return []

    allowed_fields = {
        '_id': 0, 'name': 1, 'sp': 1, 'rep': 1, 'profile.profile_image': 1,
        'followers_count': 1, 'following_count': 1, 'post_count': 1,
    }
    accounts_w_meta = list(mongo.db['Accounts'].find({'name': {'$in': acc[following]}}, allowed_fields))

    # return in LIFO order (last to follow is listed first)
    accounts_ordered = list(repeat('', len(acc[following])))
    for a in accounts_w_meta:
        with suppress(ValueError):
            accounts_ordered[acc[following].index(a.get('name', None))] = a
    return [x for x in accounts_ordered if x][::-1]
```

### Digging Deeper
If you'd like to learn how [SteemData Charts](https://steemdata.com/charts) work behind the scenes, feel free to download and run [this iPython Notebook](https://github.com/SteemData/steemdata-charts/blob/master/Charts.ipynb).
It should give you some ideas of what SteemData can be used for, as well as provides a quick way for you to start playing with code and writing your own.

![](http://i.imgur.com/qw65eQD.png)
👍  , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and 276 others
👎  
properties (23)
authorfurion
permlinkgetting-started-with-steemdata
categorysteemdata
json_metadata{"tags":["steemdata","steem","steemit","steemdev"],"image":["https://steemitimages.com/0x0/https://i.imgur.com/uAu5ST4.jpg","http://i.imgur.com/qw65eQD.png"],"links":["https://www.npmjs.com/package/mongodb","http://api.mongodb.com/python/current/installation.html","http://labix.org/mgo","https://docs.mongodb.com/manual/applications/drivers/","https://robomongo.org/","https://player.vimeo.com/video/205691651","https://gist.github.com/Netherdrake/a844ebf771c96929bee8ddb446d1cfa6","https://docs.mongodb.com/manual/tutorial/query-documents/","https://steemit.com/@furion/followers","https://api0.steemdata.com/busy.org/furion/with_metadata/followers","https://steemdata.com/charts","https://github.com/SteemData/steemdata-charts/blob/master/Charts.ipynb"],"app":"steemit/0.1","format":"markdown"}
created2017-02-28 14:53:09
last_update2017-03-01 07:56:21
depth0
children12
last_payout2017-03-31 20:51:21
cashout_time1969-12-31 23:59:59
total_payout_value129.564 HBD
curator_payout_value8.869 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length6,887
author_reputation116,503,940,714,958
root_title"Getting started with SteemData"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd0
post_id2,613,487
net_rshares158,184,961,158,387
author_curate_reward""
vote details (341)
@brianjuice ·
Hey @furion, is there a way to query the mongoDB using SQL?  I'm not familiar with the mongoDB shell or python, etc...
properties (22)
authorbrianjuice
permlinkre-furion-getting-started-with-steemdata-20170719t053058577z
categorysteemdata
json_metadata{"tags":["steemdata"],"users":["furion"],"app":"steemit/0.1"}
created2017-07-19 05:31:00
last_update2017-07-19 05:31:00
depth1
children0
last_payout2017-07-26 05:31:00
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length118
author_reputation642,950,894,905
root_title"Getting started with SteemData"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id8,942,941
net_rshares0
@eroche ·
This is great stuff. Thanks @furion
properties (22)
authoreroche
permlinkre-furion-getting-started-with-steemdata-20170308t091529230z
categorysteemdata
json_metadata{"tags":["steemdata"],"users":["furion"],"app":"steemit/0.1"}
created2017-03-08 09:15:27
last_update2017-03-08 09:15:27
depth1
children0
last_payout2017-03-31 20:51:21
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length35
author_reputation70,759,290,299,941
root_title"Getting started with SteemData"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id2,671,233
net_rshares0
@hr1 ·
I am not sure I understand it correctly.. Is SteemData a db interface to steem blockchain? Can it read data from and write to locally running steem node?
👍  ,
properties (23)
authorhr1
permlinkre-furion-getting-started-with-steemdata-20170301t155710730z
categorysteemdata
json_metadata{"tags":["steemdata"],"app":"steemit/0.1"}
created2017-03-01 15:57:09
last_update2017-03-01 15:57:09
depth1
children2
last_payout2017-03-31 20:51:21
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length153
author_reputation7,226,856,136,834
root_title"Getting started with SteemData"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id2,621,470
net_rshares15,357,220,973
author_curate_reward""
vote details (2)
@ripplerm ·
if i understand correctly, it's just an independent database, where the data on blockchain is continuously being copied into it.
properties (22)
authorripplerm
permlinkre-hr1-re-furion-getting-started-with-steemdata-20170301t162737665z
categorysteemdata
json_metadata{"tags":["steemdata"],"app":"steemit/0.1"}
created2017-03-01 16:27:33
last_update2017-03-01 16:27:33
depth2
children1
last_payout2017-03-31 20:51:21
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length128
author_reputation12,900,481,895,884
root_title"Getting started with SteemData"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id2,621,624
net_rshares0
@furion ·
that is correct
properties (22)
authorfurion
permlinkre-ripplerm-re-hr1-re-furion-getting-started-with-steemdata-20170301t163832769z
categorysteemdata
json_metadata{"tags":["steemdata"],"app":"steemit/0.1"}
created2017-03-01 16:38:33
last_update2017-03-01 16:38:33
depth3
children0
last_payout2017-03-31 20:51:21
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length15
author_reputation116,503,940,714,958
root_title"Getting started with SteemData"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id2,621,687
net_rshares0
@idikuci ·
Hey, 

I try to not be annoying but I'm not that smart so I need to ask questions.

Why is it that some "active_votes.rshares" are stored as strings and some as integers? any way you could fix that, it really messes with the results when I try get the most upvoted post etc.
properties (22)
authoridikuci
permlinkre-furion-getting-started-with-steemdata-20180328t113244146z
categorysteemdata
json_metadata{"tags":["steemdata"],"app":"steemit/0.1"}
created2018-03-28 11:34:30
last_update2018-03-28 11:34:30
depth1
children0
last_payout2018-04-04 11:34:30
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length274
author_reputation13,137,774,143,957
root_title"Getting started with SteemData"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id47,076,146
net_rshares0
@jamesc ·
Exciting stuff.  Thank you for all your work on this.  An ER diagram would be very helpful.  I would like to see the internal progress of the tables and the foreign keys.
👍  , , , , , , ,
properties (23)
authorjamesc
permlinkre-furion-getting-started-with-steemdata-20170228t151946832z
categorysteemdata
json_metadata{"tags":["steemdata"],"app":"steemit/0.1"}
created2017-02-28 15:19:45
last_update2017-02-28 15:19:45
depth1
children1
last_payout2017-03-31 20:51:21
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length170
author_reputation11,900,157,451,513
root_title"Getting started with SteemData"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id2,613,688
net_rshares478,142,495,407
author_curate_reward""
vote details (8)
@furion · (edited)
$5.67
Right now the structure is completely flat (as mongo is Document based db, it is very flexible in structure and nesting). I will be adding links between collections in future.

So basically, there are these collections, without relationships between them (yet):
```
Operations
AccountOperations
PriceHistory
Posts
Accounts
```
👍  , , , , , , , , , , , , , ,
properties (23)
authorfurion
permlinkre-jamesc-re-furion-getting-started-with-steemdata-20170228t173334679z
categorysteemdata
json_metadata{"tags":["steemdata"],"app":"steemit/0.1"}
created2017-02-28 17:33:33
last_update2017-02-28 17:33:57
depth2
children0
last_payout2017-03-31 20:51:21
cashout_time1969-12-31 23:59:59
total_payout_value4.265 HBD
curator_payout_value1.408 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length326
author_reputation116,503,940,714,958
root_title"Getting started with SteemData"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id2,614,441
net_rshares33,820,671,419,047
author_curate_reward""
vote details (15)
@team101 ·
Thanks for all your work.
properties (22)
authorteam101
permlinkre-furion-getting-started-with-steemdata-20170228t152450573z
categorysteemdata
json_metadata{"tags":["steemdata"],"app":"steemit/0.1"}
created2017-02-28 15:25:00
last_update2017-02-28 15:25:00
depth1
children0
last_payout2017-03-31 20:51:21
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length25
author_reputation12,700,047,182,916
root_title"Getting started with SteemData"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id2,613,708
net_rshares0
@transisto ·
I'm trying to kickstart the use of the #steemdev tag. Your post would fit well in that category.
👍  
properties (23)
authortransisto
permlinkre-furion-getting-started-with-steemdata-20170301t064256616z
categorysteemdata
json_metadata{"tags":["steemdev","steemdata"],"app":"steemit/0.1"}
created2017-03-01 06:42:54
last_update2017-03-01 06:42:54
depth1
children1
last_payout2017-03-31 20:51:21
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length96
author_reputation330,357,940,720,833
root_title"Getting started with SteemData"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id2,618,748
net_rshares15,357,220,973
author_curate_reward""
vote details (1)
@furion ·
awesome, thank you. I've added the tag.
properties (22)
authorfurion
permlinkre-transisto-re-furion-getting-started-with-steemdata-20170301t075703229z
categorysteemdata
json_metadata{"tags":["steemdata"],"app":"steemit/0.1"}
created2017-03-01 07:57:03
last_update2017-03-01 07:57:03
depth2
children0
last_payout2017-03-31 20:51:21
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length39
author_reputation116,503,940,714,958
root_title"Getting started with SteemData"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id2,619,032
net_rshares0
@troyvandeventer · (edited)
Well considering that I haven't done any real programming since the day's of Q'Basic and DOS....I kind of follow this.  Lol I just might need to brush up my skillset to really understand it.
properties (22)
authortroyvandeventer
permlinkre-furion-getting-started-with-steemdata-20170228t235144601z
categorysteemdata
json_metadata{"tags":["steemdata"],"app":"steemit/0.1"}
created2017-02-28 23:51:48
last_update2017-02-28 23:52:24
depth1
children0
last_payout2017-03-31 20:51:21
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length190
author_reputation85,611,991,768
root_title"Getting started with SteemData"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id2,616,848
net_rshares0