1
0
mirror of https://github.com/fluencelabs/tendermint synced 2025-07-26 01:31:56 +00:00
Files
.circleci
.github
DOCKER
benchmarks
blockchain
cmd
config
consensus
docs
_static
_templates
architecture
assets
examples
images
spec
specification
new-spec
block-structure.rst
byzantine-consensus-algorithm.rst
configuration.rst
corruption.rst
fast-sync.rst
genesis.rst
light-client-protocol.rst
merkle.rst
rpc.rst
secure-p2p.rst
validators.rst
wire-protocol.rst
.python-version
Makefile
README.md
abci-cli.rst
app-architecture.rst
app-development.rst
conf.py
deploy-testnets.rst
determinism.rst
ecosystem.rst
getting-started.rst
how-to-read-logs.rst
index.rst
install.rst
introduction.rst
requirements.txt
specification.rst
terraform-and-ansible.rst
transactional-semantics.rst
using-tendermint.rst
evidence
lite
mempool
networks
node
p2p
proxy
rpc
scripts
state
test
types
version
.editorconfig
.gitignore
CHANGELOG.md
CODE_OF_CONDUCT.md
CONTRIBUTING.md
Gopkg.lock
Gopkg.toml
LICENSE
Makefile
README.md
ROADMAP.md
SECURITY.md
Vagrantfile
appveyor.yml
codecov.yml
docker-compose.yml
tendermint/docs/specification/corruption.rst

71 lines
2.2 KiB
ReStructuredText
Raw Normal View History

Corruption
==========
Important step
--------------
Make sure you have a backup of the Tendermint data directory.
Possible causes
---------------
Remember that most corruption is caused by hardware issues:
- RAID controllers with faulty / worn out battery backup, and an unexpected power loss
- Hard disk drives with write-back cache enabled, and an unexpected power loss
- Cheap SSDs with insufficient power-loss protection, and an unexpected power-loss
- Defective RAM
- Defective or overheating CPU(s)
Other causes can be:
- Database systems configured with fsync=off and an OS crash or power loss
- Filesystems configured to use write barriers plus a storage layer that ignores write barriers. LVM is a particular culprit.
- Tendermint bugs
- Operating system bugs
- Admin error
- directly modifying Tendermint data-directory contents
(Source: https://wiki.postgresql.org/wiki/Corruption)
WAL Corruption
--------------
If consensus WAL is corrupted at the lastest height and you are trying to start
Tendermint, replay will fail with panic.
Recovering from data corruption can be hard and time-consuming. Here are two approaches you can take:
1) Delete the WAL file and restart Tendermint. It will attempt to sync with other peers.
2) Try to repair the WAL file manually:
1. Create a backup of the corrupted WAL file:
.. code:: bash
cp "$TMHOME/data/cs.wal/wal" > /tmp/corrupted_wal_backup
2. Use ./scripts/wal2json to create a human-readable version
.. code:: bash
./scripts/wal2json/wal2json "$TMHOME/data/cs.wal/wal" > /tmp/corrupted_wal
3. Search for a "CORRUPTED MESSAGE" line.
4. By looking at the previous message and the message after the corrupted one
and looking at the logs, try to rebuild the message. If the consequent
messages are marked as corrupted too (this may happen if length header
got corrupted or some writes did not make it to the WAL ~ truncation),
then remove all the lines starting from the corrupted one and restart
Tendermint.
.. code:: bash
$EDITOR /tmp/corrupted_wal
5. After editing, convert this file back into binary form by running:
.. code:: bash
./scripts/json2wal/json2wal /tmp/corrupted_wal > "$TMHOME/data/cs.wal/wal"