From 83c6f2864df90d118c9626ed5d5f1e9ae9b68538 Mon Sep 17 00:00:00 2001 From: Anton Kaliaev Date: Wed, 23 May 2018 10:29:44 +0400 Subject: [PATCH] document the consensus WAL Refs #1494 --- docs/running-in-production.rst | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/docs/running-in-production.rst b/docs/running-in-production.rst index d9bbe33b..2e435f83 100644 --- a/docs/running-in-production.rst +++ b/docs/running-in-production.rst @@ -18,3 +18,33 @@ Consensus WAL ------------- Consensus module writes every message to the WAL (write-ahead log). + +It also issues fsync syscall through `File#Sync +__` for messages signed by this node (to +prevent double signing). + +Under the hood, it uses `autofile.Group +__`, which +rotates files when those get too big (> 10MB). + +The total maximum size is 1GB. We only need the latest block and the block before it, +but if the former is dragging on across many rounds, we want all those rounds. + +Replay +~~~~~~ + +Consensus module will replay all the messages of the last height written to WAL +before a crash (if such occurs). + +The private validator may try to sign messages during replay because it runs +somewhat autonomously and does not know about replay process. + +For example, if we got all the way to precommit in the WAL and then crash, +after we replay the proposal message, the private validator will try to sign a +prevote. But it will fail. That's ok because we’ll see the prevote later in the +WAL. Then it will go to precommit, and that time it will work because the +private validator contains the ``LastSignBytes`` and then we’ll replay the +precommit from the WAL. + +Make sure to read about `WAL corruption +<./specification/corruption.html#wal-corruption>__` and recovery strategies.