[ADR - 42] State-sync (#3769)

* init adr 042: state sync * link to blockchain reactor * brapse proposal * formatting * response to feedback * Update docs/architecture/adr-042-state-sync.md Co-Authored-By: Aditya <adityasripal@gmail.com> * Update security models and more * clarify compression * typo * Update docs/architecture/adr-042-state-sync.md Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com>
2025-06-10 12:01:18 +00:00 · 2019-08-06 10:25:24 +02:00
parent e179787d40
commit d70135ec71
2 changed files with 239 additions and 0 deletions
--- a/docs/architecture/adr-042-state-sync.md
+++ b/docs/architecture/adr-042-state-sync.md
@ -0,0 +1,239 @@
+# ADR 042: State Sync Design
+
+## Changelog
+
+2019-06-27: Init by EB
+2019-07-04: Follow up by brapse
+
+## Context
+StateSync is a feature which would allow a new node to receive a
+snapshot of the application state without downloading blocks or going
+through consensus. Once downloaded, the node could switch to FastSync
+and eventually participate in consensus. The goal of StateSync is to
+facilitate setting up a new node as quickly as possible.
+
+## Considerations
+Because Tendermint doesn't know anything about the application state,
+StateSync will broker messages between nodes and through
+the ABCI to an opaque applicaton. The implementation will have multiple
+touch points on both the tendermint code base and ABCI application.
+
+* A StateSync reactor to facilitate peer communication - Tendermint
+* A Set of ABCI messages to transmit application state to the reactor - Tendermint
+* A Set of MultiStore APIs for exposing snapshot data to the ABCI - ABCI application
+* A Storage format with validation and performance considerations - ABCI application
+
+### Implementation Properties
+Beyond the approach, any implementation of StateSync can be evaluated
+across different criteria:
+
+* Speed: Expected throughput of producing and consuming snapshots
+* Safety: Cost of pushing invalid snapshots to a node
+* Liveness: Cost of preventing a node from receiving/constructing a snapshot
+* Effort: How much effort does an implementation require
+
+### Implementation Question
+* What is the format of a snapshot
+    * Complete snapshot 
+    * Ordered IAVL key ranges
+    * Compressed individually chunks which can be validated
+* How is data validated
+    * Trust a peer with it's data blindly
+    * Trust a majority of peers
+    * Use light client validation to validate each chunk against consensus
+      produced merkle tree root
+* What are the performance characteristics
+    * Random vs sequential reads
+    * How parallelizeable is the scheduling algorithm
+
+### Proposals
+Broadly speaking there are two approaches to this problem which have had
+varying degrees of discussion and progress. These approach can be
+summarized as:
+
+**Lazy:** Where snapshots are produced dynamically at request time. This
+solution would use the existing data structure.
+**Eager:** Where snapshots are produced periodically and served from disk at
+request time. This solution would create an auxiliary data structure
+optimized for batch read/writes.
+
+Additionally the propsosals tend to vary on how they provide safety
+properties. 
+
+**LightClient** Where a client can aquire the merkle root from the block
+headers synchronized from a trusted validator set. Subsets of the application state,
+called chunks can therefore be validated on receipt to ensure each chunk
+is part of the merkle root.
+
+**Majority of Peers** Where manifests of chunks along with checksums are
+downloaded and compared against versions provided by a majority of
+peers.
+
+#### Lazy StateSync
+An [initial specification](https://docs.google.com/document/d/15MFsQtNA0MGBv7F096FFWRDzQ1vR6_dics5Y49vF8JU/edit?ts=5a0f3629) was published by Alexis Sellier.
+In this design, the state has a given `size` of primitive elements (like
+keys or nodes), each element is assigned a number from 0 to `size-1`,
+and chunks consists of a range of such elements.  Ackratos raised
+[some concerns](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/edit)
+about this design, somewhat specific to the IAVL tree, and mainly concerning
+performance of random reads and of iterating through the tree to determine element numbers
+(ie. elements aren't indexed by the element number).
+
+An alternative design was suggested by Jae Kwon in
+[#3639](https://github.com/tendermint/tendermint/issues/3639) where chunking
+happens lazily and in a dynamic way: nodes request key ranges from their peers,
+and peers respond with some subset of the
+requested range and with notes on how to request the rest in parallel from other
+peers. Unlike chunk numbers, keys can be verified directly. And if some keys in the
+range are ommitted, proofs for the range will fail to verify.
+This way a node can start by requesting the entire tree from one peer,
+and that peer can respond with say the first few keys, and the ranges to request
+from other peers.
+
+Additionally, per chunk validation tends to come more naturally to the
+Lazy approach since it tends to use the existing structure of the tree
+(ie. keys or nodes) rather than state-sync specific chunks. Such a
+design for tendermint was originally tracked in
+[#828](https://github.com/tendermint/tendermint/issues/828).
+
+#### Eager StateSync
+Warp Sync as implemented in Parity
+["Warp Sync"](https://wiki.parity.io/Warp-Sync-Snapshot-Format.html) to rapidly
+download both blocks and state snapshots from peers. Data is carved into ~4MB
+chunks and snappy compressed. Hashes of snappy compressed chunks are stored in a
+manifest file which co-ordinates the state-sync. Obtaining a correct manifest
+file seems to require an honest majority of peers. This means you may not find
+out the state is incorrect until you download the whole thing and compare it
+with a verified block header. 
+
+A similar solution was implemented by Binance in
+[#3594](https://github.com/tendermint/tendermint/pull/3594)
+based on their initial implementation in
+[PR #3243](https://github.com/tendermint/tendermint/pull/3243)
+and [some learnings](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/edit).
+Note this still requires the honest majority peer assumption.
+
+As an eager protocol, warp-sync can efficiently compress larger, more
+predicatable chunks once per snapshot and service many new peers. By
+comparison lazy chunkers would have to compress each chunk at request
+time.
+
+### Analysis of Lazy vs Eager
+Lazy vs Eager have more in common than they differ. They all require
+reactors on the tendermint side, a set of ABCI messages and a method for
+serializing/deserializing snapshots facilitated by a SnapshotFormat.
+
+The biggest difference between Lazy and Eager proposals is in the
+read/write patterns necessitated by serving a snapshot chunk.
+Specifically, Lazy State Sync performs random reads to the underlying data
+structure while Eager can optimize for sequential reads.
+
+This distinctin between approaches was demonstrated by Binance's
+[ackratos](https://github.com/ackratos) in their implementation of [Lazy
+State sync](https://github.com/tendermint/tendermint/pull/3243), The
+[analysis](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/)
+of the performance, and follow up implementation of [Warp
+Sync](http://github.com/tendermint/tendermint/pull/3594).
+
+#### Compairing Security Models
+There are several different security models which have been
+discussed/proposed in the past but generally fall into two categories.
+
+Light client validation: In which the node receiving data is expected to
+first perform a light client sync and have all the nessesary block
+headers. Within the trusted block header (trusted in terms of from a
+validator set subject to [weak
+subjectivity](https://github.com/tendermint/tendermint/pull/3795)) and
+can compare any subset of keys called a chunk against the merkle root.
+The advantage of light client validation is that the block headers are
+signed by validators which have something to lose for malicious
+behaviour. If a validator were to provide an invalid proof, they can be
+slashed.
+
+Majority of peer validation: A manifest file containing a list of chunks
+along with checksums of each chunk is downloaded from a
+trusted source. That source can be a community resource similar to
+[sum.golang.org](https://sum.golang.org) or downloaded from the majority
+of peers. One disadantage of the majority of peer security model is the
+vuliberability to eclipse attacks in which a malicious users looks to
+saturate a target node's peer list and produce a manufactured picture of
+majority.
+
+A third option would be to include snapshot related data in the
+block header. This could include the manifest with related checksums and be
+secured through consensus. One challenge of this approach is to
+ensure that creating snapshots does not put undo burden on block
+propsers by synchronizing snapshot creation and block creation. One
+approach to minimizing the burden is for snapshots for height
+`H` to be included in block `H+n` where `n` is some `n` block away,
+giving the block propser enough time to complete the snapshot
+asynchronousy.
+
+## Proposal: Eager StateSync With Per Chunk Light Client Validation
+The conclusion after some concideration of the advantages/disadvances of
+eager/lazy and different security models is to produce a state sync
+which eagerly produces snapshots and uses light client validation. This
+approach has the performance advantages of pre-computing efficient
+snapshots which can streamed to new nodes on demand using sequential IO.
+Secondly, by using light client validation we cna validate each chunk on
+receipt and avoid the potential eclipse attack of majority of peer based
+security.
+
+### Implementation
+Tendermint is responsible for downloading and verifying chunks of
+AppState from peers. ABCI Application is responsible for taking
+AppStateChunk objects from TM and constructing a valid state tree whose
+root corresponds with the AppHash of syncing block. In particular we
+will need implement:
+
+* Build new StateSync reactor brokers message transmission between the peers
+  and the ABCI application
+* A set of ABCI Messages
+* Design SnapshotFormat as an interface which can:
+    * validate chunks
+    * read/write chunks from file
+    * read/write chunks to/from application state store
+    * convert manifests into chunkRequest ABCI messages
+* Implement SnapshotFormat for cosmos-hub with concrete implementation for:
+    * read/write chunks in a way which can be:
+        * parallelized across peers
+        * validated on receipt
+    * read/write to/from IAVL+ tree
+
+![StateSync Architecture Diagram](img/state-sync.png)
+
+## Implementation Path
+* Create StateSync reactor based on  [#3753](https://github.com/tendermint/tendermint/pull/3753)
+* Design SnapshotFormat with an eye towards cosmos-hub implementation
+* ABCI message to send/receive SnapshotFormat
+* IAVL+ changes to support SnapshotFormat
+* Deliver Warp sync (no chunk validation)
+* light client implementation for weak subjectivity
+* Deliver StateSync with chunk validation
+
+## Status
+
+Proposed
+
+## Concequences
+
+### Neutral
+
+### Positive
+* Safe & performant state sync design substantiated with real world implementation experience
+* General interfaces allowing application specific innovation
+* Parallizable implementation trajectory with reasonable engineering effort
+
+### Negative
+* Static Scheduling lacks opportunity for real time chunk availability optimizations
+
+## References
+[sync: Sync current state without full replay for Applications](https://github.com/tendermint/tendermint/issues/828) - original issue
+[tendermint state sync proposal](https://docs.google.com/document/d/15MFsQtNA0MGBv7F096FFWRDzQ1vR6_dics5Y49vF8JU/edit?ts=5a0f3629) - Cloudhead proposal
+[tendermint state sync proposal 2](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/edit) - ackratos proposal
+[proposal 2 implementation](https://github.com/tendermint/tendermint/pull/3243)  - ackratos implementation
+[WIP General/Lazy State-Sync pseudo-spec](https://github.com/tendermint/tendermint/issues/3639) - Jae Proposal
+[Warp Sync Implementation](https://github.com/tendermint/tendermint/pull/3594) - ackratos
+[Chunk Proposal](https://github.com/tendermint/tendermint/pull/3799) - Bucky proposed
+
+
--- a/docs/architecture/img/state-sync.png
+++ b/docs/architecture/img/state-sync.png