create account

RE: BOINC User XML data serialization comparison by barton26

View this thread on: hive.blogpeakd.comecency.com

Viewing a response to: @cm-steem/boinc-user-xml-data-serialization-comparison

· @barton26 · (edited)
$0.89
ProtoBuffers seems almost too good to be true.  Are there any significant downsides to using ProtoBuffers for serialization?  Does it require a lot of CPU to serialize/deserialize?  Does it require special software?
👍  , , , , ,
properties (23)
authorbarton26
permlinkre-cm-steem-boinc-user-xml-data-serialization-comparison-20180719t014531949z
categorygridcoin
json_metadata{"tags":["gridcoin"],"app":"steemit/0.1"}
created2018-07-19 01:45:30
last_update2018-07-19 01:45:45
depth1
children4
last_payout2018-07-26 01:45:30
cashout_time1969-12-31 23:59:59
total_payout_value0.675 HBD
curator_payout_value0.216 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length215
author_reputation3,089,378,353,442
root_title"BOINC User XML data serialization comparison"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id65,181,484
net_rshares425,580,873,962
author_curate_reward""
vote details (6)
@cm-mobile ·
$0.61
For a fairer comparison I should time how long it took to write to disk.

Downsides of proto buffers is just that it's slightly confusing to work with at first, but now we've got an established proto file it's easily replicated.

Doesnt need much cpu to serialize/deserialize, however I don't have the stats to back that up.

In terms of special software, just the protobuf3 software package - there should be alternative language implementations for interacting with the files in c++ for example.
👍  , ,
properties (23)
authorcm-mobile
permlinkre-barton26-re-cm-steem-boinc-user-xml-data-serialization-comparison-20180719t080313096z
categorygridcoin
json_metadata{"tags":["gridcoin"],"app":"steemit/0.1"}
created2018-07-19 08:03:15
last_update2018-07-19 08:03:15
depth2
children0
last_payout2018-07-26 08:03:15
cashout_time1969-12-31 23:59:59
total_payout_value0.457 HBD
curator_payout_value0.151 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length497
author_reputation64,075,241,881
root_title"BOINC User XML data serialization comparison"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id65,212,477
net_rshares289,645,032,465
author_curate_reward""
vote details (3)
@ravonn ·
$1.28
(it's Protobuf, @cm-steem :))

The only downside I can think of is that it's binary so it's more difficult to read off the air. I use Protobuf at work to publish data from a microcontroller to an Android app and a web service. Doing that in text with lexical interpretations would be a nightmare.

To further compress the binary serialization we could use 16 byte binary representation of the CPIDs instead of using it's hexadecimal form. I suspect that's where a lot of the storage goes.
👍  , ,
properties (23)
authorravonn
permlinkre-barton26-re-cm-steem-boinc-user-xml-data-serialization-comparison-20180719t152952434z
categorygridcoin
json_metadata{"tags":["gridcoin"],"users":["cm-steem"],"app":"steemit/0.1"}
created2018-07-19 15:29:51
last_update2018-07-19 15:29:51
depth2
children2
last_payout2018-07-26 15:29:51
cashout_time1969-12-31 23:59:59
total_payout_value1.030 HBD
curator_payout_value0.254 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length488
author_reputation1,551,172,951,761
root_title"BOINC User XML data serialization comparison"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id65,256,519
net_rshares610,018,911,012
author_curate_reward""
vote details (3)
@cm-steem ·
$0.08
> The only downside I can think of is that it's binary so it's more difficult to read off the air.

Do you think that's possible via [flat buffers](https://google.github.io/flatbuffers/) or [grpc](https://grpc.io/)?

> To further compress the binary serialization we could use 16 byte binary representation of the CPIDs instead of using it's hexadecimal form. I suspect that's where a lot of the storage goes.

Do you have more details on how this can be done in python? Do you mean [compresing the string](https://github.com/CordySmith/PySmaz) or just converting the CPID from a string to binary?

The files would be far smaller if the CPID was omitted, relying on userId instead & perhaps constructing a separate index for userId:CPID for quick lookup.
👍  ,
properties (23)
authorcm-steem
permlinkre-ravonn-re-barton26-re-cm-steem-boinc-user-xml-data-serialization-comparison-20180719t155339281z
categorygridcoin
json_metadata{"tags":["gridcoin"],"links":["https://google.github.io/flatbuffers/","https://grpc.io/","https://github.com/CordySmith/PySmaz"],"app":"steemit/0.1"}
created2018-07-19 15:53:36
last_update2018-07-19 15:53:36
depth3
children1
last_payout2018-07-26 15:53:36
cashout_time1969-12-31 23:59:59
total_payout_value0.058 HBD
curator_payout_value0.017 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length754
author_reputation58,522,774,254,119
root_title"BOINC User XML data serialization comparison"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id65,259,080
net_rshares36,203,595,184
author_curate_reward""
vote details (2)
@ravonn · (edited)
> Do you think that's possible via flat buffers or grpc?

Never heard of those :)

> Do you have more details on how this can be done in python? Do you mean compresing the string or just converting the CPID from a string to binary?

Sure. Change `User.cpid`from `string` to `bytes` and assign using hex conversion:

```
>>> cpid = '5a094d7d93f6d6370e78a2ac8c008407'
>>> len(cpid)
32
>>> cpid.decode('hex')
'Z\tM}\x93\xf6\xd67\x0ex\xa2\xac\x8c\x00\x84\x07'
>>> len(cpid.decode('hex'))
16
```

It does make it more tedious to use but there should be a significant reduction in size.
👍  , ,
properties (23)
authorravonn
permlinkre-cm-steem-re-ravonn-re-barton26-re-cm-steem-boinc-user-xml-data-serialization-comparison-20180723t040122292z
categorygridcoin
json_metadata{"tags":["gridcoin"],"app":"steemit/0.1"}
created2018-07-23 04:01:21
last_update2018-07-23 09:19:06
depth4
children0
last_payout2018-07-30 04:01:21
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length580
author_reputation1,551,172,951,761
root_title"BOINC User XML data serialization comparison"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id65,650,630
net_rshares2,033,259,686
author_curate_reward""
vote details (3)