mirror of
https://github.com/fluencelabs/tendermint
synced 2025-04-25 06:42:16 +00:00
* update docs/examples * ansible: add node IDs from docs/examples * better monikers * ansible: clearer paths * upgrade version * examples: updates * docs: consolidate terraform & ansible * remove deprecated info, small reorgs * docs build fix * docs: t&a critical commit * s/dummy/kvstore/g * terraform/DO region unavailable, persistent error can't be bothered to debug rn * terraform: need vars * networks: t&a standalone integration script for DO * t&a more updates * examples: add script that shows what the testnet command does * use AMS3, since AMS2 is not available
71 lines
2.2 KiB
ReStructuredText
71 lines
2.2 KiB
ReStructuredText
Corruption
|
|
==========
|
|
|
|
Important step
|
|
--------------
|
|
|
|
Make sure you have a backup of the Tendermint data directory.
|
|
|
|
Possible causes
|
|
---------------
|
|
|
|
Remember that most corruption is caused by hardware issues:
|
|
|
|
- RAID controllers with faulty / worn out battery backup, and an unexpected power loss
|
|
- Hard disk drives with write-back cache enabled, and an unexpected power loss
|
|
- Cheap SSDs with insufficient power-loss protection, and an unexpected power-loss
|
|
- Defective RAM
|
|
- Defective or overheating CPU(s)
|
|
|
|
Other causes can be:
|
|
|
|
- Database systems configured with fsync=off and an OS crash or power loss
|
|
- Filesystems configured to use write barriers plus a storage layer that ignores write barriers. LVM is a particular culprit.
|
|
- Tendermint bugs
|
|
- Operating system bugs
|
|
- Admin error
|
|
- directly modifying Tendermint data-directory contents
|
|
|
|
(Source: https://wiki.postgresql.org/wiki/Corruption)
|
|
|
|
WAL Corruption
|
|
--------------
|
|
|
|
If consensus WAL is corrupted at the lastest height and you are trying to start
|
|
Tendermint, replay will fail with panic.
|
|
|
|
Recovering from data corruption can be hard and time-consuming. Here are two approaches you can take:
|
|
|
|
1) Delete the WAL file and restart Tendermint. It will attempt to sync with other peers.
|
|
2) Try to repair the WAL file manually:
|
|
|
|
1. Create a backup of the corrupted WAL file:
|
|
|
|
.. code:: bash
|
|
|
|
cp "$TMHOME/data/cs.wal/wal" > /tmp/corrupted_wal_backup
|
|
|
|
2. Use ./scripts/wal2json to create a human-readable version
|
|
|
|
.. code:: bash
|
|
|
|
./scripts/wal2json/wal2json "$TMHOME/data/cs.wal/wal" > /tmp/corrupted_wal
|
|
|
|
3. Search for a "CORRUPTED MESSAGE" line.
|
|
4. By looking at the previous message and the message after the corrupted one
|
|
and looking at the logs, try to rebuild the message. If the consequent
|
|
messages are marked as corrupted too (this may happen if length header
|
|
got corrupted or some writes did not make it to the WAL ~ truncation),
|
|
then remove all the lines starting from the corrupted one and restart
|
|
Tendermint.
|
|
|
|
.. code:: bash
|
|
|
|
$EDITOR /tmp/corrupted_wal
|
|
|
|
5. After editing, convert this file back into binary form by running:
|
|
|
|
.. code:: bash
|
|
|
|
./scripts/json2wal/json2wal /tmp/corrupted_wal > "$TMHOME/data/cs.wal/wal"
|