← head back

blockchaindata

Classifying blockchain tech

29/3/2023

Summary

Only a few online communities are as passionate and loud as those in crypto. Discussions, when civilized, are around scalability (ie. tx fees) and security.

Hidden behind these high level properties are some fundamental underlying properties that are so central they can be found in any blockchain. Moreover, since they exists everywhere you can use them to categorize blockchains base on them.

This properties are:

  1. Data availability guarantees
  2. Compute or execution guarantees
Data and execution layer comparison

Data guarantees

Data availability is about storing and retrieving blockchain data. Having guarantees over data means resilience to nodes going offline. Some systems have all nodes storing all data eg. Bitcoin and Ethereum while other do not eg. Filecoin and Arweave.

The bottleneck for data availability is storage costs since transaction are normally storing data forever and network bandwidth available to nodes that

Compute guarantees

Compute or execution bottleneck is about processing transaction and ensuring the state of the blockchain is valid. Having guarantees over execution means resilience to processing invalid transactions. Some systems have all nodes process all transactions eg. Bitcoin and Ethereum again while other do not eg. Near and Polkadot.

The bottleneck for execution is hard drive read speed which means the time the disk needs to load the smart contract data into RAM before processing the transactions. For example, Ethereum today is around 870GB and since most consumer computers only have a few GBs of RAM the whole blockchain data are store on the hard drive which is one of the slowest parts of a computer. Some estimate that TPS could increase by ~100x if the bottleneck was the CPU instead.

Classifications

1. Part of consensus

When Bitcoin was release data availability was not a topic of discussion. That's because data can be expected to be stored for nodes that are part of consensus. If nodes omit any blockchain data then they might not being able to compute the blockchain state and risk losing their consensis rewards.

Data availability as part of consensus

For example, in Bitcoin if a node discard an older transaction it won't be able to verify future transactions with the BTC of the discarded transaction and will lose mining rewards. In Ethereum, if a node omitted data from a smart contract it won't be able to compute future state root changes in that contract and hence won't be able to participate in consensus and staking rewards.

2. Storage deals

A completely different approach is Filecoin deals. To be clear that is different from Filecoin consensus which follows a conventional pBFT leader-based consensus. Filecoin deals are essentially a marketplace with storage providers, each provider has a reputation based on their history and data is usually stored in multiple operators for better resilience.

Data availability storage deals

On one hand this method is riskier, since data is replicated less, but on the other and it's much cheaper and hence more scalable. A few projects on the Filecoin network are looking to experiment with it and created different blockchain models based on storage providers and deals.

3. Slashable DAS

If there's agreement on the blockchain data you can run a node to provide an execution environment. That means that data can be safeguard with "cheaper" methods but still have the same security guarantees.

One such method is data availability sampling (DAS) where a blockchain stores data and does not care about execution. The data are guarantee to be available by random sampling or in other words by continuously requesting random pieces so nodes cannot omit any to not lose rewards.

Data availability sampling slashable

This is part of Ethereum scalability roadmap with EIP4844 and the purpose of Celestia, EigenLayer's DA layer and Polygon's Avail. All these systems implement stake slashing for node's that do not behave honestly (ie. failed to provide the data proofs).

4. Optimistic DAS

A similar method is optimistic data availability sampling. This method still employs succinct proofs to ensure node's store the data but does not enforce slashing for nodes that do not provide the proofs.

Data availability sampling optimistic

For example, Arweave's uses a block recall system where block producer are required to provide DAS proof of a random older block. If they fail to include this proof then the node is not eligible to receive the rewards.

More options

1. Sharding

Data sharding is a method to split data or execution in "committees" that only process part of the blockchain. Near protocol is focusing on sharding its mainnet that is expected to split the network in 4 and hence improve its scalability by 4x.

Data availability sharding

Another approach in sharding is employed by the Internet Computer where the network is split into "committees" of 13 nodes making the trade-off between scalability and data resilience guarantees.

2. Expiry

State expiry is way to store non-permanent data in a blockchain. Storing data temporarily means that the blockchain size grows in a slower pace and hence nodes will eventually be able to store more data.

Data availability and state expiry

This method is not widely used today but it's still part of Ethereum roadmap and is expected to arrive in the "Splurge" part.

Summary

At the centre of any blockchain system is a data layer that is responsible to store the blockchain state and be resilient when nodes go offline. In the post, I've gone through the 3 common data availability systems used today: data availability sampling, data part of consensus and storage deals. I also briefly mentioned other ideas such as sharding and state expiry and their tradeoffs. In a future post I talk about execution and the varieties that are seen often in the wild.

Related Posts


Looking to grow your new product?

👉 Check out my free Funnel Visualization tool

Andreas Tzionis @

Github

Twitter

LinkedIn