mirror of
https://github.com/fluencelabs/tendermint
synced 2025-05-04 11:02:13 +00:00
* init adr 042: state sync * link to blockchain reactor * brapse proposal * formatting * response to feedback * Update docs/architecture/adr-042-state-sync.md Co-Authored-By: Aditya <adityasripal@gmail.com> * Update security models and more * clarify compression * typo * Update docs/architecture/adr-042-state-sync.md Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com>
240 lines
11 KiB
Markdown
240 lines
11 KiB
Markdown
# ADR 042: State Sync Design
|
|
|
|
## Changelog
|
|
|
|
2019-06-27: Init by EB
|
|
2019-07-04: Follow up by brapse
|
|
|
|
## Context
|
|
StateSync is a feature which would allow a new node to receive a
|
|
snapshot of the application state without downloading blocks or going
|
|
through consensus. Once downloaded, the node could switch to FastSync
|
|
and eventually participate in consensus. The goal of StateSync is to
|
|
facilitate setting up a new node as quickly as possible.
|
|
|
|
## Considerations
|
|
Because Tendermint doesn't know anything about the application state,
|
|
StateSync will broker messages between nodes and through
|
|
the ABCI to an opaque applicaton. The implementation will have multiple
|
|
touch points on both the tendermint code base and ABCI application.
|
|
|
|
* A StateSync reactor to facilitate peer communication - Tendermint
|
|
* A Set of ABCI messages to transmit application state to the reactor - Tendermint
|
|
* A Set of MultiStore APIs for exposing snapshot data to the ABCI - ABCI application
|
|
* A Storage format with validation and performance considerations - ABCI application
|
|
|
|
### Implementation Properties
|
|
Beyond the approach, any implementation of StateSync can be evaluated
|
|
across different criteria:
|
|
|
|
* Speed: Expected throughput of producing and consuming snapshots
|
|
* Safety: Cost of pushing invalid snapshots to a node
|
|
* Liveness: Cost of preventing a node from receiving/constructing a snapshot
|
|
* Effort: How much effort does an implementation require
|
|
|
|
### Implementation Question
|
|
* What is the format of a snapshot
|
|
* Complete snapshot
|
|
* Ordered IAVL key ranges
|
|
* Compressed individually chunks which can be validated
|
|
* How is data validated
|
|
* Trust a peer with it's data blindly
|
|
* Trust a majority of peers
|
|
* Use light client validation to validate each chunk against consensus
|
|
produced merkle tree root
|
|
* What are the performance characteristics
|
|
* Random vs sequential reads
|
|
* How parallelizeable is the scheduling algorithm
|
|
|
|
### Proposals
|
|
Broadly speaking there are two approaches to this problem which have had
|
|
varying degrees of discussion and progress. These approach can be
|
|
summarized as:
|
|
|
|
**Lazy:** Where snapshots are produced dynamically at request time. This
|
|
solution would use the existing data structure.
|
|
**Eager:** Where snapshots are produced periodically and served from disk at
|
|
request time. This solution would create an auxiliary data structure
|
|
optimized for batch read/writes.
|
|
|
|
Additionally the propsosals tend to vary on how they provide safety
|
|
properties.
|
|
|
|
**LightClient** Where a client can aquire the merkle root from the block
|
|
headers synchronized from a trusted validator set. Subsets of the application state,
|
|
called chunks can therefore be validated on receipt to ensure each chunk
|
|
is part of the merkle root.
|
|
|
|
**Majority of Peers** Where manifests of chunks along with checksums are
|
|
downloaded and compared against versions provided by a majority of
|
|
peers.
|
|
|
|
#### Lazy StateSync
|
|
An [initial specification](https://docs.google.com/document/d/15MFsQtNA0MGBv7F096FFWRDzQ1vR6_dics5Y49vF8JU/edit?ts=5a0f3629) was published by Alexis Sellier.
|
|
In this design, the state has a given `size` of primitive elements (like
|
|
keys or nodes), each element is assigned a number from 0 to `size-1`,
|
|
and chunks consists of a range of such elements. Ackratos raised
|
|
[some concerns](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/edit)
|
|
about this design, somewhat specific to the IAVL tree, and mainly concerning
|
|
performance of random reads and of iterating through the tree to determine element numbers
|
|
(ie. elements aren't indexed by the element number).
|
|
|
|
An alternative design was suggested by Jae Kwon in
|
|
[#3639](https://github.com/tendermint/tendermint/issues/3639) where chunking
|
|
happens lazily and in a dynamic way: nodes request key ranges from their peers,
|
|
and peers respond with some subset of the
|
|
requested range and with notes on how to request the rest in parallel from other
|
|
peers. Unlike chunk numbers, keys can be verified directly. And if some keys in the
|
|
range are ommitted, proofs for the range will fail to verify.
|
|
This way a node can start by requesting the entire tree from one peer,
|
|
and that peer can respond with say the first few keys, and the ranges to request
|
|
from other peers.
|
|
|
|
Additionally, per chunk validation tends to come more naturally to the
|
|
Lazy approach since it tends to use the existing structure of the tree
|
|
(ie. keys or nodes) rather than state-sync specific chunks. Such a
|
|
design for tendermint was originally tracked in
|
|
[#828](https://github.com/tendermint/tendermint/issues/828).
|
|
|
|
#### Eager StateSync
|
|
Warp Sync as implemented in Parity
|
|
["Warp Sync"](https://wiki.parity.io/Warp-Sync-Snapshot-Format.html) to rapidly
|
|
download both blocks and state snapshots from peers. Data is carved into ~4MB
|
|
chunks and snappy compressed. Hashes of snappy compressed chunks are stored in a
|
|
manifest file which co-ordinates the state-sync. Obtaining a correct manifest
|
|
file seems to require an honest majority of peers. This means you may not find
|
|
out the state is incorrect until you download the whole thing and compare it
|
|
with a verified block header.
|
|
|
|
A similar solution was implemented by Binance in
|
|
[#3594](https://github.com/tendermint/tendermint/pull/3594)
|
|
based on their initial implementation in
|
|
[PR #3243](https://github.com/tendermint/tendermint/pull/3243)
|
|
and [some learnings](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/edit).
|
|
Note this still requires the honest majority peer assumption.
|
|
|
|
As an eager protocol, warp-sync can efficiently compress larger, more
|
|
predicatable chunks once per snapshot and service many new peers. By
|
|
comparison lazy chunkers would have to compress each chunk at request
|
|
time.
|
|
|
|
### Analysis of Lazy vs Eager
|
|
Lazy vs Eager have more in common than they differ. They all require
|
|
reactors on the tendermint side, a set of ABCI messages and a method for
|
|
serializing/deserializing snapshots facilitated by a SnapshotFormat.
|
|
|
|
The biggest difference between Lazy and Eager proposals is in the
|
|
read/write patterns necessitated by serving a snapshot chunk.
|
|
Specifically, Lazy State Sync performs random reads to the underlying data
|
|
structure while Eager can optimize for sequential reads.
|
|
|
|
This distinctin between approaches was demonstrated by Binance's
|
|
[ackratos](https://github.com/ackratos) in their implementation of [Lazy
|
|
State sync](https://github.com/tendermint/tendermint/pull/3243), The
|
|
[analysis](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/)
|
|
of the performance, and follow up implementation of [Warp
|
|
Sync](http://github.com/tendermint/tendermint/pull/3594).
|
|
|
|
#### Compairing Security Models
|
|
There are several different security models which have been
|
|
discussed/proposed in the past but generally fall into two categories.
|
|
|
|
Light client validation: In which the node receiving data is expected to
|
|
first perform a light client sync and have all the nessesary block
|
|
headers. Within the trusted block header (trusted in terms of from a
|
|
validator set subject to [weak
|
|
subjectivity](https://github.com/tendermint/tendermint/pull/3795)) and
|
|
can compare any subset of keys called a chunk against the merkle root.
|
|
The advantage of light client validation is that the block headers are
|
|
signed by validators which have something to lose for malicious
|
|
behaviour. If a validator were to provide an invalid proof, they can be
|
|
slashed.
|
|
|
|
Majority of peer validation: A manifest file containing a list of chunks
|
|
along with checksums of each chunk is downloaded from a
|
|
trusted source. That source can be a community resource similar to
|
|
[sum.golang.org](https://sum.golang.org) or downloaded from the majority
|
|
of peers. One disadantage of the majority of peer security model is the
|
|
vuliberability to eclipse attacks in which a malicious users looks to
|
|
saturate a target node's peer list and produce a manufactured picture of
|
|
majority.
|
|
|
|
A third option would be to include snapshot related data in the
|
|
block header. This could include the manifest with related checksums and be
|
|
secured through consensus. One challenge of this approach is to
|
|
ensure that creating snapshots does not put undo burden on block
|
|
propsers by synchronizing snapshot creation and block creation. One
|
|
approach to minimizing the burden is for snapshots for height
|
|
`H` to be included in block `H+n` where `n` is some `n` block away,
|
|
giving the block propser enough time to complete the snapshot
|
|
asynchronousy.
|
|
|
|
## Proposal: Eager StateSync With Per Chunk Light Client Validation
|
|
The conclusion after some concideration of the advantages/disadvances of
|
|
eager/lazy and different security models is to produce a state sync
|
|
which eagerly produces snapshots and uses light client validation. This
|
|
approach has the performance advantages of pre-computing efficient
|
|
snapshots which can streamed to new nodes on demand using sequential IO.
|
|
Secondly, by using light client validation we cna validate each chunk on
|
|
receipt and avoid the potential eclipse attack of majority of peer based
|
|
security.
|
|
|
|
### Implementation
|
|
Tendermint is responsible for downloading and verifying chunks of
|
|
AppState from peers. ABCI Application is responsible for taking
|
|
AppStateChunk objects from TM and constructing a valid state tree whose
|
|
root corresponds with the AppHash of syncing block. In particular we
|
|
will need implement:
|
|
|
|
* Build new StateSync reactor brokers message transmission between the peers
|
|
and the ABCI application
|
|
* A set of ABCI Messages
|
|
* Design SnapshotFormat as an interface which can:
|
|
* validate chunks
|
|
* read/write chunks from file
|
|
* read/write chunks to/from application state store
|
|
* convert manifests into chunkRequest ABCI messages
|
|
* Implement SnapshotFormat for cosmos-hub with concrete implementation for:
|
|
* read/write chunks in a way which can be:
|
|
* parallelized across peers
|
|
* validated on receipt
|
|
* read/write to/from IAVL+ tree
|
|
|
|

|
|
|
|
## Implementation Path
|
|
* Create StateSync reactor based on [#3753](https://github.com/tendermint/tendermint/pull/3753)
|
|
* Design SnapshotFormat with an eye towards cosmos-hub implementation
|
|
* ABCI message to send/receive SnapshotFormat
|
|
* IAVL+ changes to support SnapshotFormat
|
|
* Deliver Warp sync (no chunk validation)
|
|
* light client implementation for weak subjectivity
|
|
* Deliver StateSync with chunk validation
|
|
|
|
## Status
|
|
|
|
Proposed
|
|
|
|
## Concequences
|
|
|
|
### Neutral
|
|
|
|
### Positive
|
|
* Safe & performant state sync design substantiated with real world implementation experience
|
|
* General interfaces allowing application specific innovation
|
|
* Parallizable implementation trajectory with reasonable engineering effort
|
|
|
|
### Negative
|
|
* Static Scheduling lacks opportunity for real time chunk availability optimizations
|
|
|
|
## References
|
|
[sync: Sync current state without full replay for Applications](https://github.com/tendermint/tendermint/issues/828) - original issue
|
|
[tendermint state sync proposal](https://docs.google.com/document/d/15MFsQtNA0MGBv7F096FFWRDzQ1vR6_dics5Y49vF8JU/edit?ts=5a0f3629) - Cloudhead proposal
|
|
[tendermint state sync proposal 2](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/edit) - ackratos proposal
|
|
[proposal 2 implementation](https://github.com/tendermint/tendermint/pull/3243) - ackratos implementation
|
|
[WIP General/Lazy State-Sync pseudo-spec](https://github.com/tendermint/tendermint/issues/3639) - Jae Proposal
|
|
[Warp Sync Implementation](https://github.com/tendermint/tendermint/pull/3594) - ackratos
|
|
[Chunk Proposal](https://github.com/tendermint/tendermint/pull/3799) - Bucky proposed
|
|
|
|
|