 [Pixabay](https://pixabay.com/en/analytics-chart-data-graph-1841554/) #Introduction I was very excited to see a analysis post concerning draining reward pool. After looking through the data, I've come to the conclusion that the analysis is flawed. It may be true, but the aggregation of the data is incorrect. This can lead to a suspect conclusion. I was not necessarily interested in the draining of the reward pool, but in the exploration of the distribution of rewards. Please see below image. # Initial The data set seemed to be incomplete in the bin < $1.00, so I fudged (this normally would be a big no-no, but I was doing some exploring).* # Exploration A. This was very interesting. I was expecting a linear or power law distribution. Instead I see somewhat bell curve distribution. There is a slight bifurcation in the distribution. B. The bifurcation of the posts is caused by the $5-$10 bin. I expected to see a power law rising from left to right. C and D. After further exploring I've come to the conclusion the the binning variability is skewing the distributions.* # Conclusion Because of not having access to the source data, I could not create fixed-width bins. I would have to say that any conclusions from this data must be suspect and **not used for any further analysis**. *These are some gotchas when trying to reach a conclusion or when using data to make decisions. Link: [Number of bins and width](https://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width) >There is no "best" number of bins, and different bin sizes can reveal different features of the data. Grouping data is at least as old as Graunt's work in the 17th century, but no systematic guidelines were given[10] until Sturges's work in 1926.[11] >Using wider bins where the density is low reduces noise due to sampling randomness; using narrower bins where the density is high (so the signal drowns the noise) gives greater precision to the density estimation. Thus varying the bin-width within a histogram can be beneficial. Nonetheless, equal-width bins are widely used. >Some theoreticians have attempted to determine an optimal number of bins, but these methods generally make strong assumptions about the shape of the distribution. Depending on the actual data distribution and the goals of the analysis, different bin widths may be appropriate, so experimentation is usually needed to determine an appropriate width. There are, however, various useful guidelines and rules of thumb.[12] Link: **[Potential fitting biases resulting from grouping data into variable width bins](http://www.sciencedirect.com/science/article/pii/S0370269314004183)** Image:  @gutzofter is crazy. Crazy like a fox.
author | gutzofter |
---|---|
permlink | data-analysis-gotchas |
category | data |
json_metadata | {"tags":["data","analysis","gotchas","steemit","introduction"],"users":["gutzofter"],"image":["https://i.supload.com/BJyNwsJdx.jpg","https://i.supload.com/BJO_Io1ux.png"],"links":["https://pixabay.com/en/analytics-chart-data-graph-1841554/","https://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width","http://www.sciencedirect.com/science/article/pii/S0370269314004183"],"app":"steemit/0.1","format":"markdown"} |
created | 2017-02-01 19:33:51 |
last_update | 2017-02-01 19:33:51 |
depth | 0 |
children | 4 |
last_payout | 2017-03-04 19:52:48 |
cashout_time | 1969-12-31 23:59:59 |
total_payout_value | 0.000 HBD |
curator_payout_value | 0.000 HBD |
pending_payout_value | 0.000 HBD |
promoted | 0.000 HBD |
body_length | 2,791 |
author_reputation | 7,621,537,677,018 |
root_title | "Data Analysis Gotchas" |
beneficiaries | [] |
max_accepted_payout | 1,000,000.000 HBD |
percent_hbd | 0 |
post_id | 2,398,797 |
net_rshares | 386,931,630,608 |
author_curate_reward | "" |
voter | weight | wgt% | rshares | pct | time |
---|---|---|---|---|---|
fyrstikken | 0 | 56,291,729,426 | 1% | ||
thebatchman | 0 | 1,119,253,294 | 3% | ||
thebatchman1 | 0 | 69,193,169 | 3% | ||
sergey44 | 0 | 480,880,325 | 100% | ||
steemradio | 0 | 662,443,104 | 100% | ||
nextgen622 | 0 | 78,701,220,144 | 100% | ||
matrixdweller | 0 | 534,848,039 | 1% | ||
daveks | 0 | 106,000,395,751 | 80% | ||
l0k1 | 0 | 10,220,115,694 | 10% | ||
majes | 0 | 20,416,858,102 | 100% | ||
digit | 0 | 3,428,545,669 | 100% | ||
ianstrat | 0 | 4,788,893,106 | 100% | ||
jdbry | 0 | 2,380,329,301 | 100% | ||
idnit | 0 | 2,534,203,981 | 100% | ||
sqube | 0 | 5,909,660,518 | 3% | ||
mrfoot | 0 | 1,397,240,057 | 100% | ||
gutzofter | 0 | 38,514,906,897 | 100% | ||
mistere | 0 | 501,951,279 | 100% | ||
idnit1 | 0 | 1,651,651,681 | 100% | ||
cherished | 0 | 401,393,364 | 100% | ||
the-architect | 0 | 401,415,609 | 100% | ||
tamersameeh | 0 | 368,466,637 | 100% | ||
idnit0 | 0 | 854,432,074 | 100% | ||
witidnit10 | 0 | 422,313,485 | 100% | ||
smileyface | 0 | 328,069,243 | 100% | ||
angels | 0 | 392,001,456 | 100% | ||
springtime | 0 | 328,068,793 | 100% | ||
delightful | 0 | 336,455,597 | 100% | ||
charm | 0 | 328,076,804 | 100% | ||
simplest | 0 | 319,561,018 | 100% | ||
idnit2 | 0 | 781,354,236 | 100% | ||
idnit3 | 0 | 630,433,100 | 100% | ||
idnit4 | 0 | 616,068,079 | 100% | ||
idnit5 | 0 | 609,976,386 | 100% | ||
idnit6 | 0 | 623,273,976 | 100% | ||
idnit7 | 0 | 617,786,142 | 100% | ||
idnit8 | 0 | 609,500,517 | 100% | ||
idnit9 | 0 | 1,583,878,665 | 100% | ||
victorious | 0 | 319,510,302 | 100% | ||
legends | 0 | 383,334,397 | 100% | ||
barvon | 0 | 2,031,784,679 | 100% | ||
kostaslou | 0 | 1,704,444,605 | 100% | ||
robertneleson | 0 | 660,517,041 | 100% | ||
dunia | 0 | 35,066,236,453 | 25% | ||
benjiparler | 0 | 244,131,954 | 100% | ||
salvador7424 | 0 | 364,826,459 | 100% |
Your post is a bit hard for me to understand.
author | abit |
---|---|
permlink | re-gutzofter-data-analysis-gotchas-20170205t201328614z |
category | data |
json_metadata | {"tags":["data"],"app":"steemit/0.1"} |
created | 2017-02-05 20:13:36 |
last_update | 2017-02-05 20:13:36 |
depth | 1 |
children | 3 |
last_payout | 2017-03-04 19:52:48 |
cashout_time | 1969-12-31 23:59:59 |
total_payout_value | 0.000 HBD |
curator_payout_value | 0.000 HBD |
pending_payout_value | 0.000 HBD |
promoted | 0.000 HBD |
body_length | 45 |
author_reputation | 141,171,499,037,785 |
root_title | "Data Analysis Gotchas" |
beneficiaries | [] |
max_accepted_payout | 1,000,000.000 HBD |
percent_hbd | 10,000 |
post_id | 2,431,090 |
net_rshares | 44,623,579,432 |
author_curate_reward | "" |
voter | weight | wgt% | rshares | pct | time |
---|---|---|---|---|---|
gutzofter | 0 | 44,623,579,432 | 100% |
How so? Is it because it is disorganized?
author | gutzofter |
---|---|
permlink | re-abit-re-gutzofter-data-analysis-gotchas-20170205t203118919z |
category | data |
json_metadata | {"tags":["data"],"app":"steemit/0.1"} |
created | 2017-02-05 20:31:21 |
last_update | 2017-02-05 20:31:21 |
depth | 2 |
children | 2 |
last_payout | 2017-03-04 19:52:48 |
cashout_time | 1969-12-31 23:59:59 |
total_payout_value | 0.000 HBD |
curator_payout_value | 0.000 HBD |
pending_payout_value | 0.000 HBD |
promoted | 0.000 HBD |
body_length | 41 |
author_reputation | 7,621,537,677,018 |
root_title | "Data Analysis Gotchas" |
beneficiaries | [] |
max_accepted_payout | 1,000,000.000 HBD |
percent_hbd | 10,000 |
post_id | 2,431,221 |
net_rshares | 0 |
Yeah.. not well organized, maybe.
author | abit |
---|---|
permlink | re-gutzofter-re-abit-re-gutzofter-data-analysis-gotchas-20170206t215452088z |
category | data |
json_metadata | {"tags":["data"],"app":"steemit/0.1"} |
created | 2017-02-06 21:55:06 |
last_update | 2017-02-06 21:55:06 |
depth | 3 |
children | 1 |
last_payout | 2017-03-04 19:52:48 |
cashout_time | 1969-12-31 23:59:59 |
total_payout_value | 0.000 HBD |
curator_payout_value | 0.000 HBD |
pending_payout_value | 0.000 HBD |
promoted | 0.000 HBD |
body_length | 33 |
author_reputation | 141,171,499,037,785 |
root_title | "Data Analysis Gotchas" |
beneficiaries | [] |
max_accepted_payout | 1,000,000.000 HBD |
percent_hbd | 10,000 |
post_id | 2,440,228 |
net_rshares | 47,187,603,504 |
author_curate_reward | "" |
voter | weight | wgt% | rshares | pct | time |
---|---|---|---|---|---|
gutzofter | 0 | 47,187,603,504 | 100% |