RE: BOINC User XML data serialization comparison by barton26

View this thread on: hive.blog | peakd.com | ecency.com

Viewing a response to: @cm-steem/boinc-user-xml-data-serialization-comparison

gridcoin · @barton26 · Jul 19 '18 (edited)

$0.89

ProtoBuffers seems almost too good to be true.  Are there any significant downsides to using ProtoBuffers for serialization?  Does it require a lot of CPU to serialize/deserialize?  Does it require special software?

👍 cm-steem, jringo, parejan, sodom, theissen, diogogomes

`author`	barton26
`permlink`	re-cm-steem-boinc-user-xml-data-serialization-comparison-20180719t014531949z
`category`	gridcoin
`json_metadata`	{"tags":["gridcoin"],"app":"steemit/0.1"}
`created`	2018-07-19 01:45:30
`last_update`	2018-07-19 01:45:45
`depth`	1
`children`	4
`last_payout`	2018-07-26 01:45:30
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.675 HBD
`curator_payout_value`	0.216 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	215
`author_reputation`	3,089,378,353,442
`root_title`	"BOINC User XML data serialization comparison"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	65,181,484
`net_rshares`	425,580,873,962
`author_curate_reward`	""

properties (23)vote details (6)

voter	rshares	pct
cm-steem	384,171,389,290	65%
sodom	2,564,266,270	100%
diogogomes	466,696,090	85%
jringo	26,605,157,823	100%
theissen	1,971,827,616	100%
parejan	9,801,536,873	100%

@cm-mobile · Jul 19 '18

$0.61

For a fairer comparison I should time how long it took to write to disk.

Downsides of proto buffers is just that it's slightly confusing to work with at first, but now we've got an established proto file it's easily replicated.

Doesnt need much cpu to serialize/deserialize, however I don't have the stats to back that up.

In terms of special software, just the protobuf3 software package - there should be alternative language implementations for interacting with the files in c++ for example.

👍 gentlebot, jringo, barton26

`author`	cm-mobile
`permlink`	re-barton26-re-cm-steem-boinc-user-xml-data-serialization-comparison-20180719t080313096z
`category`	gridcoin
`json_metadata`	{"tags":["gridcoin"],"app":"steemit/0.1"}
`created`	2018-07-19 08:03:15
`last_update`	2018-07-19 08:03:15
`depth`	2
`children`	0
`last_payout`	2018-07-26 08:03:15
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.457 HBD
`curator_payout_value`	0.151 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	497
`author_reputation`	64,075,241,881
`root_title`	"BOINC User XML data serialization comparison"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	65,212,477
`net_rshares`	289,645,032,465
`author_curate_reward`	""

properties (23)vote details (3)

voter	rshares	pct
barton26	10,563,391,272	100%
jringo	25,088,801,677	100%
gentlebot	253,992,839,516	15%

@ravonn · Jul 19 '18

$1.28

(it's Protobuf, @cm-steem :))

The only downside I can think of is that it's binary so it's more difficult to read off the air. I use Protobuf at work to publish data from a microcontroller to an Android app and a web service. Doing that in text with lexical interpretations would be a nightmare.

To further compress the binary serialization we could use 16 byte binary representation of the CPIDs instead of using it's hexadecimal form. I suspect that's where a lot of the storage goes.

👍 cm-steem, jringo, barton26

`author`	ravonn
`permlink`	re-barton26-re-cm-steem-boinc-user-xml-data-serialization-comparison-20180719t152952434z
`category`	gridcoin
`json_metadata`	{"tags":["gridcoin"],"users":["cm-steem"],"app":"steemit/0.1"}
`created`	2018-07-19 15:29:51
`last_update`	2018-07-19 15:29:51
`depth`	2
`children`	2
`last_payout`	2018-07-26 15:29:51
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	1.030 HBD
`curator_payout_value`	0.254 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	488
`author_reputation`	1,551,172,951,761
`root_title`	"BOINC User XML data serialization comparison"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	65,256,519
`net_rshares`	610,018,911,012
`author_curate_reward`	""

properties (23)vote details (3)

voter	rshares	pct
cm-steem	573,158,927,571	100%
barton26	10,806,227,853	100%
jringo	26,053,755,588	100%

@cm-steem · Jul 19 '18

$0.08

> The only downside I can think of is that it's binary so it's more difficult to read off the air.

Do you think that's possible via [flat buffers](https://google.github.io/flatbuffers/) or [grpc](https://grpc.io/)?

> To further compress the binary serialization we could use 16 byte binary representation of the CPIDs instead of using it's hexadecimal form. I suspect that's where a lot of the storage goes.

Do you have more details on how this can be done in python? Do you mean [compresing the string](https://github.com/CordySmith/PySmaz) or just converting the CPID from a string to binary?

The files would be far smaller if the CPID was omitted, relying on userId instead & perhaps constructing a separate index for userId:CPID for quick lookup.

👍 jringo, barton26

`author`	cm-steem
`permlink`	re-ravonn-re-barton26-re-cm-steem-boinc-user-xml-data-serialization-comparison-20180719t155339281z
`category`	gridcoin
`json_metadata`	{"tags":["gridcoin"],"links":["https://google.github.io/flatbuffers/","https://grpc.io/","https://github.com/CordySmith/PySmaz"],"app":"steemit/0.1"}
`created`	2018-07-19 15:53:36
`last_update`	2018-07-19 15:53:36
`depth`	3
`children`	1
`last_payout`	2018-07-26 15:53:36
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.058 HBD
`curator_payout_value`	0.017 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	754
`author_reputation`	58,522,774,254,119
`root_title`	"BOINC User XML data serialization comparison"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	65,259,080
`net_rshares`	36,203,595,184
`author_curate_reward`	""

properties (23)vote details (2)

voter	weight	wgt%	rshares	pct	time
barton26	0 B		10,563,391,272	100%
jringo	0 B		25,640,203,912	100%

@ravonn · Jul 23 '18 (edited)

> Do you think that's possible via flat buffers or grpc?

Never heard of those :)

> Do you have more details on how this can be done in python? Do you mean compresing the string or just converting the CPID from a string to binary?

Sure. Change `User.cpid`from `string` to `bytes` and assign using hex conversion:

```
>>> cpid = '5a094d7d93f6d6370e78a2ac8c008407'
>>> len(cpid)
32
>>> cpid.decode('hex')
'Z\tM}\x93\xf6\xd67\x0ex\xa2\xac\x8c\x00\x84\x07'
>>> len(cpid.decode('hex'))
16
```

It does make it more tedious to use but there should be a significant reduction in size.

👍 cm-mobile, sergino, lordofreward

`author`	ravonn
`permlink`	re-cm-steem-re-ravonn-re-barton26-re-cm-steem-boinc-user-xml-data-serialization-comparison-20180723t040122292z
`category`	gridcoin
`json_metadata`	{"tags":["gridcoin"],"app":"steemit/0.1"}
`created`	2018-07-23 04:01:21
`last_update`	2018-07-23 09:19:06
`depth`	4
`children`	0
`last_payout`	2018-07-30 04:01:21
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.000 HBD
`curator_payout_value`	0.000 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	580
`author_reputation`	1,551,172,951,761
`root_title`	"BOINC User XML data serialization comparison"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	65,650,630
`net_rshares`	2,033,259,686
`author_curate_reward`	""

properties (23)vote details (3)

voter	rshares	pct
cm-mobile	1,455,801,300	100%
sergino	370,993,020	1.5%
lordofreward	206,465,366	0.75%