Only a few online communities are as passionate and loud as those in crypto. Discussions, when civilized, are around scalability (ie. tx fees) and security.
Hidden behind these high level properties are some fundamental underlying properties that are so central they can be found in any blockchain. Moreover, since they exists everywhere you can use them to categorize blockchains base on them.
This properties are:
Data availability is about storing and retrieving blockchain data. Having guarantees over data means resilience to nodes going offline. Some systems have all nodes storing all data eg. Bitcoin and Ethereum while other do not eg. Filecoin and Arweave.
The bottleneck for data availability is storage costs since transaction are normally storing data forever and network bandwidth available to nodes that
Compute or execution bottleneck is about processing transaction and ensuring the state of the blockchain is valid. Having guarantees over execution means resilience to processing invalid transactions. Some systems have all nodes process all transactions eg. Bitcoin and Ethereum again while other do not eg. Near and Polkadot.
The bottleneck for execution is hard drive read speed which means the time the disk needs to load the smart contract data into RAM before processing the transactions. For example, Ethereum today is around 870GB and since most consumer computers only have a few GBs of RAM the whole blockchain data are store on the hard drive which is one of the slowest parts of a computer. Some estimate that TPS could increase by ~100x if the bottleneck was the CPU instead.
When Bitcoin was release data availability was not a topic of discussion. That's because data can be expected to be stored for nodes that are part of consensus. If nodes omit any blockchain data then they might not being able to compute the blockchain state and risk losing their consensis rewards.
For example, in Bitcoin if a node discard an older transaction it won't be able to verify future transactions with the BTC of the discarded transaction and will lose mining rewards. In Ethereum, if a node omitted data from a smart contract it won't be able to compute future state root changes in that contract and hence won't be able to participate in consensus and staking rewards.
A completely different approach is Filecoin deals. To be clear that is different from Filecoin consensus which follows a conventional pBFT leader-based consensus. Filecoin deals are essentially a marketplace with storage providers, each provider has a reputation based on their history and data is usually stored in multiple operators for better resilience.
On one hand this method is riskier, since data is replicated less, but on the other and it's much cheaper and hence more scalable. A few projects on the Filecoin network are looking to experiment with it and created different blockchain models based on storage providers and deals.
If there's agreement on the blockchain data you can run a node to provide an execution environment. That means that data can be safeguard with "cheaper" methods but still have the same security guarantees.
One such method is data availability sampling (DAS) where a blockchain stores data and does not care about execution. The data are guarantee to be available by random sampling or in other words by continuously requesting random pieces so nodes cannot omit any to not lose rewards.
This is part of Ethereum scalability roadmap with EIP4844 and the purpose of Celestia, EigenLayer's DA layer and Polygon's Avail. All these systems implement stake slashing for node's that do not behave honestly (ie. failed to provide the data proofs).
A similar method is optimistic data availability sampling. This method still employs succinct proofs to ensure node's store the data but does not enforce slashing for nodes that do not provide the proofs.
For example, Arweave's uses a block recall system where block producer are required to provide DAS proof of a random older block. If they fail to include this proof then the node is not eligible to receive the rewards.
Data sharding is a method to split data or execution in "committees" that only process part of the blockchain. Near protocol is focusing on sharding its mainnet that is expected to split the network in 4 and hence improve its scalability by 4x.
Another approach in sharding is employed by the Internet Computer where the network is split into "committees" of 13 nodes making the trade-off between scalability and data resilience guarantees.
State expiry is way to store non-permanent data in a blockchain. Storing data temporarily means that the blockchain size grows in a slower pace and hence nodes will eventually be able to store more data.
This method is not widely used today but it's still part of Ethereum roadmap and is expected to arrive in the "Splurge" part.
At the centre of any blockchain system is a data layer that is responsible to store the blockchain state and be resilient when nodes go offline. In the post, I've gone through the 3 common data availability systems used today: data availability sampling, data part of consensus and storage deals. I also briefly mentioned other ideas such as sharding and state expiry and their tradeoffs. In a future post I talk about execution and the varieties that are seen often in the wild.