create account

A python script to synchronise copies of a log (like the blockchain) by l0k1

View this thread on: hive.blogpeakd.comecency.com
· @l0k1 · (edited)
$0.64
A python script to synchronise copies of a log (like the blockchain)
<div class="pull-left"><h1>I</h1></div> <br />have been busy this morning working out a way to reduce the disk activity required to keep one copy of the Steem blockchain <code>block_log</code> file that you only update periodically, for the purpose of providing a bittorrent download of the current version. 

Just to explain the use-case, if I make a snapshot of the `block_log`, I can generate a new `.torrent` file with the update, and seed it. I would make a daily backup of this file and save on disk activity by rotating the directory (keeping 7 copies with torrents created for 7 days), a user can be sure they can for 7 days download the whole thing.

If that time elapses, they can remove (without deleting the files) the torrent, and re-add the latest version, which will make their bittorrent client update it and bittorrent (like my update script) will only download and rewrite the changed data in the files.

There is no simple existing solution especially for networks for optimising the amount of disk activity when dealing with very very large files. I will later on be writing a script that uses the RPC to query and assemble all blocks for 1 day or 1 week or whichever settings seem most sensible, and produce a folder with these parts, and another script that joins this together and allows you to transmit or regenerate the whole file. 

In fact, the only application available these days that lets you synchronise very large files while avoiding rewriting unchanged parts of the file is precisely Bittorrent. 

So as part of my Witness Services, I will be providing a 7 daily updated torrents of the `block_log` that rotate so there is always a copy that you can spend as long as a week downloading before you will need to download a newer `.torrent` file, remove the torrent and re-add it leaving the existing file as it is.  The scripts below are how this progressive updating will be applied, and I will be configuring Nginx to also provide the 7 `.torrent` files for the last 7 days so you can easily find them.

## The scripts

First, for testing purposes so you can see how it works by taking a file, making a copy that has been truncated by an arbitrary number of bytes, that you can use to demonstrate the reassembly process run by the second script:

### truncate.py

    #!/usr/bin/python3
    # This script makes a copy of a file minus an arbitrary number of bytes at the end
    import os
    import argparse

    parser = argparse.ArgumentParser ( description = 'Copies a file except for some number of bites at the end' )
    parser.add_argument ( 'source', metavar = 'SOURCE', type = str, help = 'The file you want to copy sans data at the end' )
    parser.add_argument ( 'dest', metavar = 'DEST', type = str, help = 'The name of the new truncated copy' )
    parser.add_argument ( 'trunc', metavar = 'TRUNC', type = int, help = 'The amount in bytes you want to truncate from the original file' )

    args = parser.parse_args ( )

    print( 'Creating a truncated copy of ' + args.source + ' to file ' + args.dest + ' with ' + str ( args.trunc ) + ' bytes truncated' )

    source = open ( args.source, 'rb' )
    dest = open (args.dest, 'wb' )
    statinfo = os.stat ( args.source )
    print ( "Source file is " + str ( statinfo.st_size ) + ' bytes in size' )
    payloadsize = statinfo.st_size - args.trunc
    print ( "Coping " + str ( payloadsize ) + " of " + args.source )
    source.seek ( 0 )
    dest.write ( source.read ( payloadsize ) )

    source.close ( )
    dest.close ( )

The next script takes the 'current' and 'old' (up to date, out of date) copies of a file, and copies the new extra data in the file to the 'old' version, bringing it up to date:

### update-append.py

    #!/usr/bin/python3
    # This script copies the additional data in a log (added but never changed)
    # from a current version to the end of an out of date version, making
    # both files the same
    import os
    import argparse

    parser = argparse.ArgumentParser ( description = 'Updates a log file with new content appended to it' )
    parser.add_argument ( 'current', metavar = 'CURRENT', type = str, help = 'The file with new additional content' )
    parser.add_argument ( 'old', metavar = 'OLD', type = str, help = 'The file you want to append new content to' )

    args = parser.parse_args ( )

    print( 'Copying new data from ' + args.current + ' to file ' + args.old )

    current = open ( args.current, 'rb' )
    old = open (args.old, 'ab' )
    currentinfo = os.stat ( args.current )
    oldinfo = os.stat ( args.old )
    print ( 'current is ' + str ( currentinfo.st_size ) + ' bytes' )
    print ( 'old is ' + str ( oldinfo.st_size ) + ' bytes' )
    seekstart = oldinfo.st_size
    copysize = currentinfo.st_size - oldinfo.st_size
    print ( 'Copying from byte ' + str ( seekstart ) + ' of ' + args.current + ' and appending to file ' + args.old + ' a total of ' + str ( copysize ) + ' bytes' )
    current.seek ( seekstart )
    old.write ( current.read ( copysize ) )

    current.close ( )
    old.close ( )

This is the result you get running a test sequence on these two scripts with an arbitrary file:

     loki@vaioe  ~  ./truncate.py inception.tar.xz inception.tar.xz.1 10000
    Creating a truncated copy of inception.tar.xz to file inception.tar.xz.1 with 10000 bytes truncated
    Source file is 43601920 bytes in size
    Coping 43591920 of inception.tar.xz
     loki@vaioe  ~  ls -l inception.tar.xz*
    -rw-rw-r-- 1 loki loki 43601920 feb 24 08:03 inception.tar.xz
    -rw-rw-r-- 1 loki loki 43591920 mrt  6 09:56 inception.tar.xz.1
     loki@vaioe  ~  sha1sum inception.tar.xz*
    d69f893b56ff6bfcc20146be9c01765f2eac6de9  inception.tar.xz
    a3ff6ce812c666bbf54bc2ff520288272144a9ca  inception.tar.xz.1
     loki@vaioe  ~  ./update-append.py inception.tar.xz inception.tar.xz.1 
    Copying new data from inception.tar.xz to file inception.tar.xz.1
    current is 43601920 bytes
    old is 43591920 bytes
    Copying from byte 43591920 of inception.tar.xz and appending to file inception.tar.xz.1 a total of 10000 bytes
     loki@vaioe  ~  sha1sum inception.tar.xz*                           
    d69f893b56ff6bfcc20146be9c01765f2eac6de9  inception.tar.xz
    d69f893b56ff6bfcc20146be9c01765f2eac6de9  inception.tar.xz.1
     loki@vaioe  ~  

## Very Happy Update:

The script works perfectly on differently updated copies of the Steem `block_log`, below, a test I performed on (a copy of) the current live version on the RPC and the one that is on the torrent from this post: <a href="https://steemit.com/steem/@l0k1/torrents-of-a-relatively-recent-rpc-node-s-data-directory-and-already-compiled-steemd-to-use-with-the-data-directory">here!</a>. I will start to look into how to do this across a network connection, I think just alter the script to output to `stdout` (just `print` it instead of writing to file) and you can stream it in a pipe through scp or ssh with a command. A modified version could do a partial HTTP or FTP transfer that has the same result as well.

### Main thing is this works on the blockchain file so there is a way to rapidly sync it when one copy is out of date.

    loki@projectinception:~/test$ update-append.py current old
    Copying new data from current to file old
    current is 8194130879 bytes
    old is 8159354102 bytes
    Copying from byte 8159354102 of current and appending to file old a total of 34776777 bytes
    loki@projectinception:~/test$ sha1sum 
    current  old      
    loki@projectinception:~/test$ sha1sum *
    311f613137c9300a56ebcce343f08330cb6dffed  current
    311f613137c9300a56ebcce343f08330cb6dffed  old
    loki@projectinception:~/test$ 

<div class="pull-right">😎</div><br /><hr />

<center><code>We can't code here! This is Whale country!</code></center>
<div class="pull-left"><a href="https://steemit.com/@l0k1"><img src="http://s20.postimg.org/igf27v79p/signature_new_small.png" /></a></div>

<h2>Vote #1 <code>l0k1</code></h2><sub>Go to <a href="https://steemit.com/~witnesses">steemit.com/~witnesses</a> to cast your vote by typing <code>l0k1</code> into the text entry at the bottom of the leaderboard.</sub>

<sub>(note, my username is spelled <code>El Zero Kay One</code> or <code>Lima Zero Kilo One</code>, all lower case)</sub>

</div>
👍  , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and 103 others
properties (23)
authorl0k1
permlinka-python-script-to-synchronise-copies-of-a-log-like-the-blockchain
categorysysadmin
json_metadata{"tags":["sysadmin","backup","python"],"image":["http://s20.postimg.org/igf27v79p/signature_new_small.png"],"links":["https://steemit.com/steem/@l0k1/torrents-of-a-relatively-recent-rpc-node-s-data-directory-and-already-compiled-steemd-to-use-with-the-data-directory","https://steemit.com/@l0k1","https://steemit.com/~witnesses"],"app":"steemit/0.1","format":"markdown"}
created2017-03-06 09:01:03
last_update2017-03-06 13:20:00
depth0
children2
last_payout2017-04-06 12:42:03
cashout_time1969-12-31 23:59:59
total_payout_value0.527 HBD
curator_payout_value0.108 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length8,312
author_reputation94,800,257,230,993
root_title"A python script to synchronise copies of a log (like the blockchain)"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id2,655,725
net_rshares7,936,681,776,072
author_curate_reward""
vote details (167)
@ardina ·
fin jobb thanks
properties (22)
authorardina
permlinkre-l0k1-a-python-script-to-synchronise-copies-of-a-log-like-the-blockchain-20170306t095352936z
categorysysadmin
json_metadata{"tags":["sysadmin"],"app":"steemit/0.1"}
created2017-03-06 09:53:54
last_update2017-03-06 09:53:54
depth1
children0
last_payout2017-04-06 12:42:03
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length15
author_reputation1,182,671,789,406
root_title"A python script to synchronise copies of a log (like the blockchain)"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id2,655,925
net_rshares0
@tincho ·
Great work! Thank you!!!
👍  ,
properties (23)
authortincho
permlinkre-l0k1-a-python-script-to-synchronise-copies-of-a-log-like-the-blockchain-20170306t090541787z
categorysysadmin
json_metadata{"tags":["sysadmin"],"app":"steemit/0.1"}
created2017-03-06 09:05:42
last_update2017-03-06 09:05:42
depth1
children0
last_payout2017-04-06 12:42:03
cashout_time1969-12-31 23:59:59
total_payout_value0.000 HBD
curator_payout_value0.000 HBD
pending_payout_value0.000 HBD
promoted0.000 HBD
body_length24
author_reputation21,331,431,180,904
root_title"A python script to synchronise copies of a log (like the blockchain)"
beneficiaries[]
max_accepted_payout1,000,000.000 HBD
percent_hbd10,000
post_id2,655,745
net_rshares189,608,539,649
author_curate_reward""
vote details (2)