How The Hive Fork Manager manage The Hive micro-forks by mickiewicz

hive-139531 · @mickiewicz · Nov 21 '21

$58.70

How The Hive Fork Manager manage The Hive micro-forks

![hfm_rewind_title.png](https://images.hive.blog/DQmdCyzcVauDmaQVRDMYVuiYJZj7c4oXjNwqUG3w9CiM2mb/hfm_rewind_title.png)

There are situations when The Hive blockchain has to choose which of two valid but different blocks and their descendants should be included in the chain,  and which will be abandoned. The consensus algorithm solves the micro-fork problem, one queue of blocks will be abandoned and not visible in the blockchain, but what with the applications which already have built their state based on the abandoned blocks? The applications must solve this not trivial problem on their own, fortunately, The HAF introduced a common,  generic solution for the applications based on the Postgres database - automatically restore the SQL tables to state before the micro-fork occurrence.

>If You don't know what is The Hive Application Framework please read ["What is HAF"](https://hive.blog/hive-139531/@mickiewicz/what-is-haf)

##  How to rewind changes in SQL tables if they are made based on abandoned blocks?
The answer is simple: remember each change and revert them, if they are based on abandoned blocks. Let's move through an example:
We have a protocol ( described by JSONs  in CUSTOM JSON transactions ) which have three instructions:
1. **insert** a number with a given value to a table
2. **increment** numbers with a given value by one
3. **remove** numbers with a given value

Our application has a simple table NUMBERS with two columns: unique id and a number ( create the table with SQL: ```CREATE TABLE numbers(in SERIAL UNIQUE, number INTEGER``` )

Now we are getting blocks one by one with our protocol commands encoded in CUSTOM JSON transactions and content of the table after issuing the requested  commands:

| block number  |   command    | value        | 
|--------------------: | ----------------:   |  ----------:   | 
| 100                      | INSERT           |   (1, 33)     | 
| 101                      | INCREMENT |   (1, 34)     |
| 102                      | REMOVE        |   (1, 34)     | 

### How does Hive Fork Manger remember changes in tables?
Each table that is under the control of Hive Fork Manager has its shadow - a shadow table. A shadow table contains information about the changes for each row of its original version. We can distinguish only three operations that encompass any possible actions on any SQL tables content: we can ***create*** a new row, we can ***update*** already inserted row or we can ***delete*** the row. It means we don't have to bother with commands specified by the protocols (like in example insert, increment, remove), all cases can always be a presenter as the 3 basic edition operations. For the example we got the content of the shadow table:

| Change id | block number  |   command    | old value    | 
| -------------: |----------------------|:----------------:  |  -------:   |
|         1          | 100                      | CREATE         |   doesn't mater |
| 2                  | 101                      | UPDATE         |   (1, 33) |
| 3                  | 102                      | DELETE         |   (1,34)  |

Ok, but how to match saved changes with a row in the original table? In our example, it is quite easy, since each row has its unique ID, but this is not the case for every table. After a series of experiments, there was a decision to require any controlled table to inherit from some special table - a context base table,  which each application got during registration in The Hive Fork Manager. A context base table introduces a new column **hive_rowid** for each table inheriting for it, the value of this column is saved together with changes in a shadow table. Moreover, it is important to have a saved blocks number, for which the change occurred, any shadow table has a column with a block num. So for our example, the shadow table looks as below:

| Change id   | hive_rowid | block number  |   command    | old value           | 
| ---------------: | ---------------: |----------------------|:----------------:  |  -------:                 |
|  1                   |  1                    | 100                      | CREATE         |  doesn't mater |
|  2                   |   1                   | 101                      | UPDATE         |   (1, 33) |
|  3                   |   1                   | 102                      | DELETE         |    (1,34) |

### Ok, we have a shadow table, and now what?
If the Hive node will notify about abandoned blocks, then we can execute an opposite operation for each operation saved in a shadow table in reverse order of their occurrences.
For our example, let's imagine that the hive node informs the Hive Fork Manager about abandoning blocks 100, 101, and 102, in such case we execute operations in the order like below:
```
1. INSERT INTO numbers VALUES(1, 1, 34); -- oposite to last saved operation DELETE, insert row with id 1
2. UPDATE number SET id=1, number=33 WHERE hive_rowid=1; -- oposite to UPDATE, restore old values in a row 1
3. DELETE FROM numbers WHERE hive_rowid = 1; -- opposite to saved INSERT operation
```
**Tada!** All changes on table **numbers** made based on abandoned blocks 100, 101, and 102 are reverted! You can check the implementation [here](https://gitlab.syncad.com/hive/haf/-/blob/develop/src/hive_fork_manager/context_rewind/back_from_fork.sql), the function `hive.back_from_fork_one_table` is the entry point to the algorithm.

### Hmm, there is possible to do this much faster!
Instead of reverting each change in blocks one by one, it is better to back to the row state before the first abandoned block. In our example, it means simply removing the row with id 1 ( only 1 operation instead of 3 )! Indeed this algorithm is very fast, moreover, it was implemented in Hive Fork Manager, but because of some constraints in Postgres implementation it was replaced with a slower version with [commit](https://gitlab.syncad.com/hive/haf/-/commit/2e48fadd337c0667e3bce503c04e7a6a16fb33cb)

#### Why do we use a slower version of the rewind algorithm?
Because of constraints, SQL constraints can be applied on the SQL tables, for example: UNIQUE. If we made a small modification to the table definition in our example:
```CREATE TABLE numbers(in SERIAL UNIQUE, number INTEGER UNIQUE```
Now the numbers cannot repeat in the table. We can imagine situations when back to a state of rows may temporary violation the UNIQUE constraint during the rewind process, and the whole process will fail, even when at its end the constraint will be achieved:

| block number  |   command    | values        | 
|--------------------: | ----------------:   |  ----------:   | 
| 100                      | INSERT           |   (1, 33)     | 
| 101                      | INCREMENT |   (1, 34)     |
| 102                      | INSERT        |   (1,34),(2, 33)     | 
We got a shadow table:
| Change id   | hive_rowid | block number  |   command    | old value           | 
| ---------------: | ---------------: |----------------------|:----------------:  |  -------:                 |
|  1                   |  1                    | 100                      | INSERT         |  doesn't mater  but it is (1,33) |
|  2                   |   1                   | 101                      | UPDATE         |   (1, 33) |
|  3                   |   2                   | 102                      | INSERT         |    doesn't mater but it is (2,33) |
Let's imagine that blocks 101 and 102 become abandoned and we run the fastest rewind algorithm and the first row with hive rowid = 1 is reverted what means update its value back to  (1,33) -> It violates UNIQUE constraint for a value, now row 1 and 2 have value 33 and the SQL command will fail( there is a [test](https://gitlab.syncad.com/hive/haf/-/blob/develop/tests/integration/functional/hive_fork_manager/context_rewind/back_from_fork_constraint_unique_test.sql) to check if HFM will correctly work in such a situation). The fastest algorithm omitted constraints by using the Postgres feature ['SET CONSTRAINTS'](https://www.postgresql.org/docs/current/sql-set-constraints.html), which forces evaluate constraints during committing a transaction, not immediately during modifying a table. We have resigned from the fastest rewind algorithm because a constraint must be deferrable to be evaluated at the committing time what is inconvenient and disallow to use such a constraint to resolve 'ON CONFLICT' SQL construction what is described [here in the database documentation](https://www.postgresql.org/docs/12/sql-createtable.html):
>    Note that deferrable constraints cannot be used as conflict arbitrators in an INSERT statement that includes an ON CONFLICT DO UPDATE clause.

#### The fastest rewind algorithm was really fast
I have checked my notes to remind myself how much faster was the fastest algorithm against the current one, here are the results of measurements for 10k of rows:
| Test                        | THE FASTEST [ms]        |  CURRENT  [ms]          | CURRENT/THE FASTEST [-]|
| :------------------------   | :-----------------------     | :---------------------------    | :-------------------  |
| Back from insert 10k rows   | 23, 24, 23         **[23.3]**| 124, 124, 125        **[124.3]**|   5.33                |
| Back from delete 10k rows   | 36, 35, 36         **[35.6]**| 174, 169, 170          **[171]**|   4.80                |
| Back from update 10k rows   | 48, 48, 48           **[48]**| 238, 245, 239        **[240.7]**|   5.01                |
| Back from truncate 10k rows | 32, 31, 32         **[35.6]**| 166, 173, 166        **[168.3]**|   4.72                |

It means that the current algorithm is five times slower than the fastest! But we decided that the current speed is good enough to efficiently rewind changes by The Hive Fork Manager. If You are interested in what was really tested here are the performance tests for rewind [insert](https://gitlab.syncad.com/hive/haf/-/blob/develop/tests/integration/functional/hive_fork_manager/context_rewind/performance_insert_rows_one_by_one_test.sql),[delete](https://gitlab.syncad.com/hive/haf/-/blob/develop/tests/integration/functional/hive_fork_manager/context_rewind/performance_back_from_delete_rows_test.sql), [update](https://gitlab.syncad.com/hive/haf/-/blob/develop/tests/integration/functional/hive_fork_manager/context_rewind/performance_back_from_update_rows_test.sql) and [truncate](https://gitlab.syncad.com/hive/haf/-/blob/develop/tests/integration/functional/hive_fork_manager/context_rewind/performance_truncate_rows_test.sql) rows.
The Hive Fork Manger still suffers for the requirement for deferrable constraints, but only for FOREIGN KEY constraints. In one moment only one table can be rewind, and it may violate those constraints which are set between two tables. Narrowing the requirement only for FOREIGN KEY is less problematic for the applications than demanding all the constraints to be deferrable.

## How the shadow table is filled?
[The tables triggers](https://www.postgresql.org/docs/current/sql-createtrigger.html) are used to fill shadow tables. If a table needs to rewind its content in the case of micro-fork, then it must be registered in The Hive Fork Manager. During registration, the table got its shadow and a set of triggers is enabled on the table. The triggers are sensitive for the table content modification, and each change fires a procedure that fills the shadow table with a new row. You can check what exactly happened during a table registration by looking at the function [hive.register_table]( https://gitlab.syncad.com/hive/haf/-/blob/develop/src/hive_fork_manager/context_rewind/register_table.sql). The applications don't execute the registration function directly, they need to add inheritance from its contexts tables, which will add `hive_rowid` column and automatically start  `hive.register_table` function.

### Triggers are very slow
The triggers add significant overhead to operations on the tables. The overhead doesn't matter when the applications work on live sync, which means each new block is processed every three seconds what is a lot of time, but when the blockchain is replayed and a large number of irreversible blocks have to be processed immediately one by one then the triggers overhead is not acceptable. The Hive Fork Manager returns to the application the number of irreversible blocks to process, and the application may temporarily pull its tables from HFM care with function 'hive.app_context_detach'. The triggers are removed and the applications may process irreversible blocks much faster. After finishing processing the irreversible block the application back its table under HFM control with `hive.app_context_attach` function.

## Will a shadow table grow forever together with each change to its origin table?
Each shadow table is truncated when the Hive node informs The Hive Fork Manager about considering a new block as an irreversible block. All information saved in shadow tables for irreversible blocks are removed, and thus the shadow tables contain only rows for blocks that are near the HEAD BLOCK.

## That's all for today
I hope the post may help applications developers to understand why the HAF API looks as it looks, and how the performance of the application may be hitten by the internals of the Hive Fork Manager. There is no explanation of how the HFM knows on which block the change saved in a shadow table occurs, I will explain this in the next post about how the HFM passes blocks to applications for further processing.

If You want to start to write Your first HAF application, then please look at my post about the Hive Fork Manager [documentation](https://hive.blog/hive-139531/@mickiewicz/when-you-want-to-write-a-haf-application-then-first-read-the-hive-fork-manager-documentation).

👍 buildawhale, themarkymark, taskmaster4450, marki99, niallon11, mmmmkkkk311, fw206, x30, deepresearch, brofi, apeminingclub, mariuszkarowski, croupierbot, rozku, birdwatcher, susie-saver, makerhacks, mickiewicz, andablackwidow, mj008, senseiphil, rohansuares, borislavzlatanov, upmyvote, unyimeetuk, uruiamme, samrisso, informator, recoveryinc, discohedge, mk-pal-token, upfundme, mk-natrl-token, botvotes, movement19

`author`	mickiewicz
`permlink`	how-the-hive-fork-manager-manage-the-hive-micro-forks
`category`	hive-139531
`json_metadata`	{"tags":["hive-139531","dev","haf","hive","hfm"],"image":["https://images.hive.blog/DQmdCyzcVauDmaQVRDMYVuiYJZj7c4oXjNwqUG3w9CiM2mb/hfm_rewind_title.png"],"links":["https://hive.blog/hive-139531/@mickiewicz/what-is-haf"],"app":"hiveblog/0.1","format":"markdown"}
`created`	2021-11-21 18:21:15
`last_update`	2021-11-21 18:21:15
`depth`	0
`children`	2
`last_payout`	2021-11-28 18:21:15
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	29.369 HBD
`curator_payout_value`	29.330 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	13,839
`author_reputation`	20,569,771,333,459
`root_title`	"How The Hive Fork Manager manage The Hive micro-forks"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	0
`post_id`	107,904,039
`net_rshares`	20,000,587,964,473
`author_curate_reward`	""

properties (23)vote details (35)

voter	rshares	pct
borislavzlatanov	9,701,014,134	100%
uruiamme	3,993,881,921	40%
unyimeetuk	4,624,260,753	50%
themarkymark	3,637,467,036,589	10%
niallon11	1,075,908,084,806	100%
croupierbot	65,122,256,083	100%
buildawhale	7,479,680,716,892	10%
taskmaster4450	2,370,476,872,778	100%
makerhacks	33,904,730,427	10%
x30	754,246,437,069	10%
upmyvote	7,010,131,665	10%
deepresearch	359,449,121,908	20%
mmmmkkkk311	1,028,123,189,114	50%
upfundme	650,144,390	2.25%
movement19	479,018,092	2.5%
andablackwidow	19,564,890,246	100%
mariuszkarowski	101,299,125,235	50%
informator	3,047,982,985	25%
rozku	50,199,617,972	25%
fw206	987,916,571,954	62.2%
marki99	1,495,831,428,562	100%
birdwatcher	50,065,322,665	1%
mk-pal-token	673,396,210	25%
mk-natrl-token	599,871,448	100%
recoveryinc	2,748,851,867	5%
discohedge	1,237,005,323	3%
senseiphil	12,096,075,717	10%
rohansuares	10,408,472,621	100%
susie-saver	34,498,861,091	100%
samrisso	3,089,790,463	5%
botvotes	528,915,925	100%
brofi	245,938,621,479	3%
apeminingclub	117,779,572,829	10%
mj008	12,392,484,991	100%
mickiewicz	19,834,208,269	100%

`author`	hivebuzz
`permlink`	hivebuzz-notify-mickiewicz-20211122t122210
`category`	hive-139531
`json_metadata`	{"image":["http://hivebuzz.me/notify.t6.png"]}
`created`	2021-11-22 12:22:12
`last_update`	2021-11-22 12:22:12
`depth`	1
`children`	0
`last_payout`	2021-11-29 12:22:12
`cashout_time`	1969-12-31 23:59:59
`total_payout_value`	0.000 HBD
`curator_payout_value`	0.000 HBD
`pending_payout_value`	0.000 HBD
`promoted`	0.000 HBD
`body_length`	935
`author_reputation`	370,317,975,808,390
`root_title`	"How The Hive Fork Manager manage The Hive micro-forks"
`beneficiaries`	`[]`
`max_accepted_payout`	1,000,000.000 HBD
`percent_hbd`	10,000
`post_id`	107,919,854
`net_rshares`	0