5.3.2 Random Sampling
The DA block contains a header and body. The header is relatively small and can be downloaded and checked directly, whereas the body is much bigger, which is the one needed to be sampled randomly to verify data availability.When a block is produced, it is sliced into fragments by size, and then fragments are generated by applying the 2-dimension RS (Reed-Solomon) code. Then the Merkle trees are created for each row and column of each fragment, so there are Merkle trees.
Figure 5: Block slicing and encoding of data availability layer
These Merkle roots are finally formed into one Merkle tree and the root of the tree is used as the root of the whole block, and we combine the root and other metadata into the header. Then the header and original body are broadcast through the P2P network. When other DA nodes receive the block, they will repeat the 2-dimension RS encoding the same way above and calculate the root, then accept it if it is the same as the one in the header.
The validators of the settlement layer receive the header from the DA block proposer and connect to at least one of the DA nodes. These validators randomly download s of the fragments and their Merkle paths to the block root. If all of those fragments can be acquired successfully, the sampling validator can confirm the availability of the DA block at a very high possibility.
Next, we will calculate the probability that an unavailable DA block is recognized as available under such a mechanism of random sampling. As stated above, the DA block is encoded into RS code of fragments, and any k fragments of each row or column can recover that row or column, so the adversary needs to withhold at least fragments to make the whole block unavailable.
Assuming that one validator randomly samples fragments from a DA block, the probability of sampling at least one unavailable fragment performed by a single validator is shown below if the DA block has the minimal unavailable pieces.
That is also the minimal probability that one can correctly recognize an unavailable DA block. And if we have N active validators in the committee of the settlement layer and at most f of them are malicious, which is less than N/3. Also, we need to collect N - f votes to confirm the DA block. Therefore, at least f+1 validators of the N - f honest ones are required to find out the unavailability to not confirm the invalid DA block. So the unavailable DA block will be recognized by the network at the probability below:
In NERO, we have N = 21, f = 6, so the probabilities can be calculated under different k and S below
We can see that when and any unavailable DA block will be revealed at the probability of almost 100%, and the sampling validator only needs to download 0.09% of the original data in such condition.