mirror of
https://github.com/fluencelabs/tendermint
synced 2025-06-10 12:01:18 +00:00
[ADR - 42] State-sync (#3769)
* init adr 042: state sync * link to blockchain reactor * brapse proposal * formatting * response to feedback * Update docs/architecture/adr-042-state-sync.md Co-Authored-By: Aditya <adityasripal@gmail.com> * Update security models and more * clarify compression * typo * Update docs/architecture/adr-042-state-sync.md Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com>
This commit is contained in:
239
docs/architecture/adr-042-state-sync.md
Normal file
239
docs/architecture/adr-042-state-sync.md
Normal file
@ -0,0 +1,239 @@
|
||||
# ADR 042: State Sync Design
|
||||
|
||||
## Changelog
|
||||
|
||||
2019-06-27: Init by EB
|
||||
2019-07-04: Follow up by brapse
|
||||
|
||||
## Context
|
||||
StateSync is a feature which would allow a new node to receive a
|
||||
snapshot of the application state without downloading blocks or going
|
||||
through consensus. Once downloaded, the node could switch to FastSync
|
||||
and eventually participate in consensus. The goal of StateSync is to
|
||||
facilitate setting up a new node as quickly as possible.
|
||||
|
||||
## Considerations
|
||||
Because Tendermint doesn't know anything about the application state,
|
||||
StateSync will broker messages between nodes and through
|
||||
the ABCI to an opaque applicaton. The implementation will have multiple
|
||||
touch points on both the tendermint code base and ABCI application.
|
||||
|
||||
* A StateSync reactor to facilitate peer communication - Tendermint
|
||||
* A Set of ABCI messages to transmit application state to the reactor - Tendermint
|
||||
* A Set of MultiStore APIs for exposing snapshot data to the ABCI - ABCI application
|
||||
* A Storage format with validation and performance considerations - ABCI application
|
||||
|
||||
### Implementation Properties
|
||||
Beyond the approach, any implementation of StateSync can be evaluated
|
||||
across different criteria:
|
||||
|
||||
* Speed: Expected throughput of producing and consuming snapshots
|
||||
* Safety: Cost of pushing invalid snapshots to a node
|
||||
* Liveness: Cost of preventing a node from receiving/constructing a snapshot
|
||||
* Effort: How much effort does an implementation require
|
||||
|
||||
### Implementation Question
|
||||
* What is the format of a snapshot
|
||||
* Complete snapshot
|
||||
* Ordered IAVL key ranges
|
||||
* Compressed individually chunks which can be validated
|
||||
* How is data validated
|
||||
* Trust a peer with it's data blindly
|
||||
* Trust a majority of peers
|
||||
* Use light client validation to validate each chunk against consensus
|
||||
produced merkle tree root
|
||||
* What are the performance characteristics
|
||||
* Random vs sequential reads
|
||||
* How parallelizeable is the scheduling algorithm
|
||||
|
||||
### Proposals
|
||||
Broadly speaking there are two approaches to this problem which have had
|
||||
varying degrees of discussion and progress. These approach can be
|
||||
summarized as:
|
||||
|
||||
**Lazy:** Where snapshots are produced dynamically at request time. This
|
||||
solution would use the existing data structure.
|
||||
**Eager:** Where snapshots are produced periodically and served from disk at
|
||||
request time. This solution would create an auxiliary data structure
|
||||
optimized for batch read/writes.
|
||||
|
||||
Additionally the propsosals tend to vary on how they provide safety
|
||||
properties.
|
||||
|
||||
**LightClient** Where a client can aquire the merkle root from the block
|
||||
headers synchronized from a trusted validator set. Subsets of the application state,
|
||||
called chunks can therefore be validated on receipt to ensure each chunk
|
||||
is part of the merkle root.
|
||||
|
||||
**Majority of Peers** Where manifests of chunks along with checksums are
|
||||
downloaded and compared against versions provided by a majority of
|
||||
peers.
|
||||
|
||||
#### Lazy StateSync
|
||||
An [initial specification](https://docs.google.com/document/d/15MFsQtNA0MGBv7F096FFWRDzQ1vR6_dics5Y49vF8JU/edit?ts=5a0f3629) was published by Alexis Sellier.
|
||||
In this design, the state has a given `size` of primitive elements (like
|
||||
keys or nodes), each element is assigned a number from 0 to `size-1`,
|
||||
and chunks consists of a range of such elements. Ackratos raised
|
||||
[some concerns](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/edit)
|
||||
about this design, somewhat specific to the IAVL tree, and mainly concerning
|
||||
performance of random reads and of iterating through the tree to determine element numbers
|
||||
(ie. elements aren't indexed by the element number).
|
||||
|
||||
An alternative design was suggested by Jae Kwon in
|
||||
[#3639](https://github.com/tendermint/tendermint/issues/3639) where chunking
|
||||
happens lazily and in a dynamic way: nodes request key ranges from their peers,
|
||||
and peers respond with some subset of the
|
||||
requested range and with notes on how to request the rest in parallel from other
|
||||
peers. Unlike chunk numbers, keys can be verified directly. And if some keys in the
|
||||
range are ommitted, proofs for the range will fail to verify.
|
||||
This way a node can start by requesting the entire tree from one peer,
|
||||
and that peer can respond with say the first few keys, and the ranges to request
|
||||
from other peers.
|
||||
|
||||
Additionally, per chunk validation tends to come more naturally to the
|
||||
Lazy approach since it tends to use the existing structure of the tree
|
||||
(ie. keys or nodes) rather than state-sync specific chunks. Such a
|
||||
design for tendermint was originally tracked in
|
||||
[#828](https://github.com/tendermint/tendermint/issues/828).
|
||||
|
||||
#### Eager StateSync
|
||||
Warp Sync as implemented in Parity
|
||||
["Warp Sync"](https://wiki.parity.io/Warp-Sync-Snapshot-Format.html) to rapidly
|
||||
download both blocks and state snapshots from peers. Data is carved into ~4MB
|
||||
chunks and snappy compressed. Hashes of snappy compressed chunks are stored in a
|
||||
manifest file which co-ordinates the state-sync. Obtaining a correct manifest
|
||||
file seems to require an honest majority of peers. This means you may not find
|
||||
out the state is incorrect until you download the whole thing and compare it
|
||||
with a verified block header.
|
||||
|
||||
A similar solution was implemented by Binance in
|
||||
[#3594](https://github.com/tendermint/tendermint/pull/3594)
|
||||
based on their initial implementation in
|
||||
[PR #3243](https://github.com/tendermint/tendermint/pull/3243)
|
||||
and [some learnings](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/edit).
|
||||
Note this still requires the honest majority peer assumption.
|
||||
|
||||
As an eager protocol, warp-sync can efficiently compress larger, more
|
||||
predicatable chunks once per snapshot and service many new peers. By
|
||||
comparison lazy chunkers would have to compress each chunk at request
|
||||
time.
|
||||
|
||||
### Analysis of Lazy vs Eager
|
||||
Lazy vs Eager have more in common than they differ. They all require
|
||||
reactors on the tendermint side, a set of ABCI messages and a method for
|
||||
serializing/deserializing snapshots facilitated by a SnapshotFormat.
|
||||
|
||||
The biggest difference between Lazy and Eager proposals is in the
|
||||
read/write patterns necessitated by serving a snapshot chunk.
|
||||
Specifically, Lazy State Sync performs random reads to the underlying data
|
||||
structure while Eager can optimize for sequential reads.
|
||||
|
||||
This distinctin between approaches was demonstrated by Binance's
|
||||
[ackratos](https://github.com/ackratos) in their implementation of [Lazy
|
||||
State sync](https://github.com/tendermint/tendermint/pull/3243), The
|
||||
[analysis](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/)
|
||||
of the performance, and follow up implementation of [Warp
|
||||
Sync](http://github.com/tendermint/tendermint/pull/3594).
|
||||
|
||||
#### Compairing Security Models
|
||||
There are several different security models which have been
|
||||
discussed/proposed in the past but generally fall into two categories.
|
||||
|
||||
Light client validation: In which the node receiving data is expected to
|
||||
first perform a light client sync and have all the nessesary block
|
||||
headers. Within the trusted block header (trusted in terms of from a
|
||||
validator set subject to [weak
|
||||
subjectivity](https://github.com/tendermint/tendermint/pull/3795)) and
|
||||
can compare any subset of keys called a chunk against the merkle root.
|
||||
The advantage of light client validation is that the block headers are
|
||||
signed by validators which have something to lose for malicious
|
||||
behaviour. If a validator were to provide an invalid proof, they can be
|
||||
slashed.
|
||||
|
||||
Majority of peer validation: A manifest file containing a list of chunks
|
||||
along with checksums of each chunk is downloaded from a
|
||||
trusted source. That source can be a community resource similar to
|
||||
[sum.golang.org](https://sum.golang.org) or downloaded from the majority
|
||||
of peers. One disadantage of the majority of peer security model is the
|
||||
vuliberability to eclipse attacks in which a malicious users looks to
|
||||
saturate a target node's peer list and produce a manufactured picture of
|
||||
majority.
|
||||
|
||||
A third option would be to include snapshot related data in the
|
||||
block header. This could include the manifest with related checksums and be
|
||||
secured through consensus. One challenge of this approach is to
|
||||
ensure that creating snapshots does not put undo burden on block
|
||||
propsers by synchronizing snapshot creation and block creation. One
|
||||
approach to minimizing the burden is for snapshots for height
|
||||
`H` to be included in block `H+n` where `n` is some `n` block away,
|
||||
giving the block propser enough time to complete the snapshot
|
||||
asynchronousy.
|
||||
|
||||
## Proposal: Eager StateSync With Per Chunk Light Client Validation
|
||||
The conclusion after some concideration of the advantages/disadvances of
|
||||
eager/lazy and different security models is to produce a state sync
|
||||
which eagerly produces snapshots and uses light client validation. This
|
||||
approach has the performance advantages of pre-computing efficient
|
||||
snapshots which can streamed to new nodes on demand using sequential IO.
|
||||
Secondly, by using light client validation we cna validate each chunk on
|
||||
receipt and avoid the potential eclipse attack of majority of peer based
|
||||
security.
|
||||
|
||||
### Implementation
|
||||
Tendermint is responsible for downloading and verifying chunks of
|
||||
AppState from peers. ABCI Application is responsible for taking
|
||||
AppStateChunk objects from TM and constructing a valid state tree whose
|
||||
root corresponds with the AppHash of syncing block. In particular we
|
||||
will need implement:
|
||||
|
||||
* Build new StateSync reactor brokers message transmission between the peers
|
||||
and the ABCI application
|
||||
* A set of ABCI Messages
|
||||
* Design SnapshotFormat as an interface which can:
|
||||
* validate chunks
|
||||
* read/write chunks from file
|
||||
* read/write chunks to/from application state store
|
||||
* convert manifests into chunkRequest ABCI messages
|
||||
* Implement SnapshotFormat for cosmos-hub with concrete implementation for:
|
||||
* read/write chunks in a way which can be:
|
||||
* parallelized across peers
|
||||
* validated on receipt
|
||||
* read/write to/from IAVL+ tree
|
||||
|
||||

|
||||
|
||||
## Implementation Path
|
||||
* Create StateSync reactor based on [#3753](https://github.com/tendermint/tendermint/pull/3753)
|
||||
* Design SnapshotFormat with an eye towards cosmos-hub implementation
|
||||
* ABCI message to send/receive SnapshotFormat
|
||||
* IAVL+ changes to support SnapshotFormat
|
||||
* Deliver Warp sync (no chunk validation)
|
||||
* light client implementation for weak subjectivity
|
||||
* Deliver StateSync with chunk validation
|
||||
|
||||
## Status
|
||||
|
||||
Proposed
|
||||
|
||||
## Concequences
|
||||
|
||||
### Neutral
|
||||
|
||||
### Positive
|
||||
* Safe & performant state sync design substantiated with real world implementation experience
|
||||
* General interfaces allowing application specific innovation
|
||||
* Parallizable implementation trajectory with reasonable engineering effort
|
||||
|
||||
### Negative
|
||||
* Static Scheduling lacks opportunity for real time chunk availability optimizations
|
||||
|
||||
## References
|
||||
[sync: Sync current state without full replay for Applications](https://github.com/tendermint/tendermint/issues/828) - original issue
|
||||
[tendermint state sync proposal](https://docs.google.com/document/d/15MFsQtNA0MGBv7F096FFWRDzQ1vR6_dics5Y49vF8JU/edit?ts=5a0f3629) - Cloudhead proposal
|
||||
[tendermint state sync proposal 2](https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/edit) - ackratos proposal
|
||||
[proposal 2 implementation](https://github.com/tendermint/tendermint/pull/3243) - ackratos implementation
|
||||
[WIP General/Lazy State-Sync pseudo-spec](https://github.com/tendermint/tendermint/issues/3639) - Jae Proposal
|
||||
[Warp Sync Implementation](https://github.com/tendermint/tendermint/pull/3594) - ackratos
|
||||
[Chunk Proposal](https://github.com/tendermint/tendermint/pull/3799) - Bucky proposed
|
||||
|
||||
|
BIN
docs/architecture/img/state-sync.png
Normal file
BIN
docs/architecture/img/state-sync.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 99 KiB |
Reference in New Issue
Block a user