blockchain: Reorg reactor (#3561)
* go routines in blockchain reactor
* Added reference to the go routine diagram
* Initial commit
* cleanup
* Undo testing_logger change, committed by mistake
* Fix the test loggers
* pulled some fsm code into pool.go
* added pool tests
* changes to the design
added block requests under peer
moved the request trigger in the reactor poolRoutine, triggered now by a ticker
in general moved everything required for making block requests smarter in the poolRoutine
added a simple map of heights to keep track of what will need to be requested next
added a few more tests
* send errors to FSM in a different channel than blocks
send errors (RemovePeer) from switch on a different channel than the
one receiving blocks
renamed channels
added more pool tests
* more pool tests
* lint errors
* more tests
* more tests
* switch fast sync to new implementation
* fixed data race in tests
* cleanup
* finished fsm tests
* address golangci comments :)
* address golangci comments :)
* Added timeout on next block needed to advance
* updating docs and cleanup
* fix issue in test from previous cleanup
* cleanup
* Added termination scenarios, tests and more cleanup
* small fixes to adr, comments and cleanup
* Fix bug in sendRequest()
If we tried to send a request to a peer not present in the switch, a
missing continue statement caused the request to be blackholed in a peer
that was removed and never retried.
While this bug was manifesting, the reactor kept asking for other
blocks that would be stored and never consumed. Added the number of
unconsumed blocks in the math for requesting blocks ahead of current
processing height so eventually there will be no more blocks requested
until the already received ones are consumed.
* remove bpPeer's didTimeout field
* Use distinct err codes for peer timeout and FSM timeouts
* Don't allow peers to update with lower height
* review comments from Ethan and Zarko
* some cleanup, renaming, comments
* Move block execution in separate goroutine
* Remove pool's numPending
* review comments
* fix lint, remove old blockchain reactor and duplicates in fsm tests
* small reorg around peer after review comments
* add the reactor spec
* verify block only once
* review comments
* change to int for max number of pending requests
* cleanup and godoc
* Add configuration flag fast sync version
* golangci fixes
* fix config template
* move both reactor versions under blockchain
* cleanup, golint, renaming stuff
* updated documentation, fixed more golint warnings
* integrate with behavior package
* sync with master
* gofmt
* add changelog_pending entry
* move to improvments
* suggestion to changelog entry
2019-07-23 10:58:52 +02:00
|
|
|
package v0
|
2015-03-25 00:15:18 -07:00
|
|
|
|
|
|
|
import (
|
2018-11-01 07:07:18 +01:00
|
|
|
"errors"
|
2018-01-21 13:32:04 -05:00
|
|
|
"fmt"
|
2015-04-16 17:46:27 -07:00
|
|
|
"reflect"
|
2015-03-25 00:15:18 -07:00
|
|
|
"time"
|
|
|
|
|
2018-07-09 13:01:23 +04:00
|
|
|
amino "github.com/tendermint/go-amino"
|
|
|
|
|
|
|
|
"github.com/tendermint/tendermint/libs/log"
|
2017-04-08 22:04:06 -04:00
|
|
|
"github.com/tendermint/tendermint/p2p"
|
2015-04-01 17:30:16 -07:00
|
|
|
sm "github.com/tendermint/tendermint/state"
|
blockchain: Reorg reactor (#3561)
* go routines in blockchain reactor
* Added reference to the go routine diagram
* Initial commit
* cleanup
* Undo testing_logger change, committed by mistake
* Fix the test loggers
* pulled some fsm code into pool.go
* added pool tests
* changes to the design
added block requests under peer
moved the request trigger in the reactor poolRoutine, triggered now by a ticker
in general moved everything required for making block requests smarter in the poolRoutine
added a simple map of heights to keep track of what will need to be requested next
added a few more tests
* send errors to FSM in a different channel than blocks
send errors (RemovePeer) from switch on a different channel than the
one receiving blocks
renamed channels
added more pool tests
* more pool tests
* lint errors
* more tests
* more tests
* switch fast sync to new implementation
* fixed data race in tests
* cleanup
* finished fsm tests
* address golangci comments :)
* address golangci comments :)
* Added timeout on next block needed to advance
* updating docs and cleanup
* fix issue in test from previous cleanup
* cleanup
* Added termination scenarios, tests and more cleanup
* small fixes to adr, comments and cleanup
* Fix bug in sendRequest()
If we tried to send a request to a peer not present in the switch, a
missing continue statement caused the request to be blackholed in a peer
that was removed and never retried.
While this bug was manifesting, the reactor kept asking for other
blocks that would be stored and never consumed. Added the number of
unconsumed blocks in the math for requesting blocks ahead of current
processing height so eventually there will be no more blocks requested
until the already received ones are consumed.
* remove bpPeer's didTimeout field
* Use distinct err codes for peer timeout and FSM timeouts
* Don't allow peers to update with lower height
* review comments from Ethan and Zarko
* some cleanup, renaming, comments
* Move block execution in separate goroutine
* Remove pool's numPending
* review comments
* fix lint, remove old blockchain reactor and duplicates in fsm tests
* small reorg around peer after review comments
* add the reactor spec
* verify block only once
* review comments
* change to int for max number of pending requests
* cleanup and godoc
* Add configuration flag fast sync version
* golangci fixes
* fix config template
* move both reactor versions under blockchain
* cleanup, golint, renaming stuff
* updated documentation, fixed more golint warnings
* integrate with behavior package
* sync with master
* gofmt
* add changelog_pending entry
* move to improvments
* suggestion to changelog entry
2019-07-23 10:58:52 +02:00
|
|
|
"github.com/tendermint/tendermint/store"
|
2015-04-01 17:30:16 -07:00
|
|
|
"github.com/tendermint/tendermint/types"
|
2015-03-25 00:15:18 -07:00
|
|
|
)
|
|
|
|
|
|
|
|
const (
|
2017-01-17 20:58:27 +04:00
|
|
|
// BlockchainChannel is a channel for blocks and status updates (`BlockStore` height)
|
|
|
|
BlockchainChannel = byte(0x40)
|
|
|
|
|
2018-06-21 01:57:35 -07:00
|
|
|
trySyncIntervalMS = 10
|
|
|
|
|
2015-03-25 11:33:39 -07:00
|
|
|
// stop syncing when last block's time is
|
|
|
|
// within this much of the system time.
|
2015-04-23 14:59:12 -07:00
|
|
|
// stopSyncingDurationMinutes = 10
|
|
|
|
|
2015-04-21 19:51:23 -07:00
|
|
|
// ask for best height every 10s
|
|
|
|
statusUpdateIntervalSeconds = 10
|
|
|
|
// check if we should switch to consensus reactor
|
2015-08-18 10:51:55 -07:00
|
|
|
switchToConsensusIntervalSeconds = 1
|
2018-04-06 13:46:40 -07:00
|
|
|
|
|
|
|
// NOTE: keep up to date with bcBlockResponseMessage
|
|
|
|
bcBlockResponseMessagePrefixSize = 4
|
|
|
|
bcBlockResponseMessageFieldKeySize = 1
|
2018-04-09 15:14:33 +03:00
|
|
|
maxMsgSize = types.MaxBlockSizeBytes +
|
2018-04-06 13:46:40 -07:00
|
|
|
bcBlockResponseMessagePrefixSize +
|
|
|
|
bcBlockResponseMessageFieldKeySize
|
2015-03-25 00:15:18 -07:00
|
|
|
)
|
|
|
|
|
2015-04-21 19:51:23 -07:00
|
|
|
type consensusReactor interface {
|
2015-05-27 22:06:33 -04:00
|
|
|
// for when we switch from blockchain reactor and fast sync to
|
|
|
|
// the consensus machine
|
2017-12-27 20:40:36 -05:00
|
|
|
SwitchToConsensus(sm.State, int)
|
2015-03-25 13:17:45 -07:00
|
|
|
}
|
|
|
|
|
2018-02-26 17:35:01 +04:00
|
|
|
type peerError struct {
|
|
|
|
err error
|
|
|
|
peerID p2p.ID
|
|
|
|
}
|
|
|
|
|
|
|
|
func (e peerError) Error() string {
|
|
|
|
return fmt.Sprintf("error with peer %v: %s", e.peerID, e.err.Error())
|
|
|
|
}
|
|
|
|
|
2015-03-25 00:15:18 -07:00
|
|
|
// BlockchainReactor handles long-term catchup syncing.
|
|
|
|
type BlockchainReactor struct {
|
2015-07-20 14:40:41 -07:00
|
|
|
p2p.BaseReactor
|
|
|
|
|
2017-12-27 20:40:36 -05:00
|
|
|
// immutable
|
|
|
|
initialState sm.State
|
|
|
|
|
2018-01-03 11:57:42 +01:00
|
|
|
blockExec *sm.BlockExecutor
|
blockchain: Reorg reactor (#3561)
* go routines in blockchain reactor
* Added reference to the go routine diagram
* Initial commit
* cleanup
* Undo testing_logger change, committed by mistake
* Fix the test loggers
* pulled some fsm code into pool.go
* added pool tests
* changes to the design
added block requests under peer
moved the request trigger in the reactor poolRoutine, triggered now by a ticker
in general moved everything required for making block requests smarter in the poolRoutine
added a simple map of heights to keep track of what will need to be requested next
added a few more tests
* send errors to FSM in a different channel than blocks
send errors (RemovePeer) from switch on a different channel than the
one receiving blocks
renamed channels
added more pool tests
* more pool tests
* lint errors
* more tests
* more tests
* switch fast sync to new implementation
* fixed data race in tests
* cleanup
* finished fsm tests
* address golangci comments :)
* address golangci comments :)
* Added timeout on next block needed to advance
* updating docs and cleanup
* fix issue in test from previous cleanup
* cleanup
* Added termination scenarios, tests and more cleanup
* small fixes to adr, comments and cleanup
* Fix bug in sendRequest()
If we tried to send a request to a peer not present in the switch, a
missing continue statement caused the request to be blackholed in a peer
that was removed and never retried.
While this bug was manifesting, the reactor kept asking for other
blocks that would be stored and never consumed. Added the number of
unconsumed blocks in the math for requesting blocks ahead of current
processing height so eventually there will be no more blocks requested
until the already received ones are consumed.
* remove bpPeer's didTimeout field
* Use distinct err codes for peer timeout and FSM timeouts
* Don't allow peers to update with lower height
* review comments from Ethan and Zarko
* some cleanup, renaming, comments
* Move block execution in separate goroutine
* Remove pool's numPending
* review comments
* fix lint, remove old blockchain reactor and duplicates in fsm tests
* small reorg around peer after review comments
* add the reactor spec
* verify block only once
* review comments
* change to int for max number of pending requests
* cleanup and godoc
* Add configuration flag fast sync version
* golangci fixes
* fix config template
* move both reactor versions under blockchain
* cleanup, golint, renaming stuff
* updated documentation, fixed more golint warnings
* integrate with behavior package
* sync with master
* gofmt
* add changelog_pending entry
* move to improvments
* suggestion to changelog entry
2019-07-23 10:58:52 +02:00
|
|
|
store *store.BlockStore
|
2018-01-03 11:57:42 +01:00
|
|
|
pool *BlockPool
|
|
|
|
fastSync bool
|
|
|
|
|
|
|
|
requestsCh <-chan BlockRequest
|
2018-02-26 17:35:01 +04:00
|
|
|
errorsCh <-chan peerError
|
2015-03-25 00:15:18 -07:00
|
|
|
}
|
|
|
|
|
2017-01-17 20:58:27 +04:00
|
|
|
// NewBlockchainReactor returns new reactor instance.
|
blockchain: Reorg reactor (#3561)
* go routines in blockchain reactor
* Added reference to the go routine diagram
* Initial commit
* cleanup
* Undo testing_logger change, committed by mistake
* Fix the test loggers
* pulled some fsm code into pool.go
* added pool tests
* changes to the design
added block requests under peer
moved the request trigger in the reactor poolRoutine, triggered now by a ticker
in general moved everything required for making block requests smarter in the poolRoutine
added a simple map of heights to keep track of what will need to be requested next
added a few more tests
* send errors to FSM in a different channel than blocks
send errors (RemovePeer) from switch on a different channel than the
one receiving blocks
renamed channels
added more pool tests
* more pool tests
* lint errors
* more tests
* more tests
* switch fast sync to new implementation
* fixed data race in tests
* cleanup
* finished fsm tests
* address golangci comments :)
* address golangci comments :)
* Added timeout on next block needed to advance
* updating docs and cleanup
* fix issue in test from previous cleanup
* cleanup
* Added termination scenarios, tests and more cleanup
* small fixes to adr, comments and cleanup
* Fix bug in sendRequest()
If we tried to send a request to a peer not present in the switch, a
missing continue statement caused the request to be blackholed in a peer
that was removed and never retried.
While this bug was manifesting, the reactor kept asking for other
blocks that would be stored and never consumed. Added the number of
unconsumed blocks in the math for requesting blocks ahead of current
processing height so eventually there will be no more blocks requested
until the already received ones are consumed.
* remove bpPeer's didTimeout field
* Use distinct err codes for peer timeout and FSM timeouts
* Don't allow peers to update with lower height
* review comments from Ethan and Zarko
* some cleanup, renaming, comments
* Move block execution in separate goroutine
* Remove pool's numPending
* review comments
* fix lint, remove old blockchain reactor and duplicates in fsm tests
* small reorg around peer after review comments
* add the reactor spec
* verify block only once
* review comments
* change to int for max number of pending requests
* cleanup and godoc
* Add configuration flag fast sync version
* golangci fixes
* fix config template
* move both reactor versions under blockchain
* cleanup, golint, renaming stuff
* updated documentation, fixed more golint warnings
* integrate with behavior package
* sync with master
* gofmt
* add changelog_pending entry
* move to improvments
* suggestion to changelog entry
2019-07-23 10:58:52 +02:00
|
|
|
func NewBlockchainReactor(state sm.State, blockExec *sm.BlockExecutor, store *store.BlockStore,
|
2018-01-03 11:29:19 +01:00
|
|
|
fastSync bool) *BlockchainReactor {
|
|
|
|
|
2015-12-07 16:57:33 -08:00
|
|
|
if state.LastBlockHeight != store.Height() {
|
2018-03-04 14:58:43 +04:00
|
|
|
panic(fmt.Sprintf("state (%v) and store (%v) height mismatch", state.LastBlockHeight,
|
2018-01-03 11:29:19 +01:00
|
|
|
store.Height()))
|
2015-03-25 11:33:39 -07:00
|
|
|
}
|
2017-12-27 20:40:36 -05:00
|
|
|
|
2018-06-21 01:57:35 -07:00
|
|
|
requestsCh := make(chan BlockRequest, maxTotalRequesters)
|
|
|
|
|
|
|
|
const capacity = 1000 // must be bigger than peers count
|
2018-03-19 08:22:45 +03:00
|
|
|
errorsCh := make(chan peerError, capacity) // so we don't block in #Receive#pool.AddBlock
|
2018-03-04 14:58:43 +04:00
|
|
|
|
2015-03-25 00:15:18 -07:00
|
|
|
pool := NewBlockPool(
|
|
|
|
store.Height()+1,
|
|
|
|
requestsCh,
|
2018-02-26 17:35:01 +04:00
|
|
|
errorsCh,
|
2015-03-25 00:15:18 -07:00
|
|
|
)
|
2018-03-04 14:58:43 +04:00
|
|
|
|
2015-03-25 00:15:18 -07:00
|
|
|
bcR := &BlockchainReactor{
|
2017-12-27 20:40:36 -05:00
|
|
|
initialState: state,
|
|
|
|
blockExec: blockExec,
|
2016-01-06 17:14:20 -08:00
|
|
|
store: store,
|
|
|
|
pool: pool,
|
2016-03-24 18:08:18 -07:00
|
|
|
fastSync: fastSync,
|
2016-01-06 17:14:20 -08:00
|
|
|
requestsCh: requestsCh,
|
2018-02-26 17:35:01 +04:00
|
|
|
errorsCh: errorsCh,
|
2015-03-25 00:15:18 -07:00
|
|
|
}
|
2017-05-02 11:53:32 +04:00
|
|
|
bcR.BaseReactor = *p2p.NewBaseReactor("BlockchainReactor", bcR)
|
2015-03-25 00:15:18 -07:00
|
|
|
return bcR
|
|
|
|
}
|
|
|
|
|
2017-10-20 23:56:21 +04:00
|
|
|
// SetLogger implements cmn.Service by setting the logger on reactor and pool.
|
|
|
|
func (bcR *BlockchainReactor) SetLogger(l log.Logger) {
|
|
|
|
bcR.BaseService.Logger = l
|
|
|
|
bcR.pool.Logger = l
|
|
|
|
}
|
|
|
|
|
|
|
|
// OnStart implements cmn.Service.
|
2015-08-04 18:44:15 -07:00
|
|
|
func (bcR *BlockchainReactor) OnStart() error {
|
2016-03-24 18:08:18 -07:00
|
|
|
if bcR.fastSync {
|
2017-11-06 13:20:39 -05:00
|
|
|
err := bcR.pool.Start()
|
2015-08-04 19:04:00 -07:00
|
|
|
if err != nil {
|
|
|
|
return err
|
|
|
|
}
|
2015-07-20 14:40:41 -07:00
|
|
|
go bcR.poolRoutine()
|
2015-03-25 00:15:18 -07:00
|
|
|
}
|
2015-08-04 18:44:15 -07:00
|
|
|
return nil
|
2015-03-25 00:15:18 -07:00
|
|
|
}
|
|
|
|
|
2017-10-20 23:56:21 +04:00
|
|
|
// OnStop implements cmn.Service.
|
2015-07-21 18:31:01 -07:00
|
|
|
func (bcR *BlockchainReactor) OnStop() {
|
2015-07-20 14:40:41 -07:00
|
|
|
bcR.pool.Stop()
|
2015-03-25 00:15:18 -07:00
|
|
|
}
|
|
|
|
|
2017-01-17 20:58:27 +04:00
|
|
|
// GetChannels implements Reactor
|
2015-03-25 00:15:18 -07:00
|
|
|
func (bcR *BlockchainReactor) GetChannels() []*p2p.ChannelDescriptor {
|
|
|
|
return []*p2p.ChannelDescriptor{
|
2017-09-05 16:37:20 -04:00
|
|
|
{
|
2018-04-06 13:46:40 -07:00
|
|
|
ID: BlockchainChannel,
|
|
|
|
Priority: 10,
|
|
|
|
SendQueueCapacity: 1000,
|
|
|
|
RecvBufferCapacity: 50 * 4096,
|
2018-04-09 15:14:33 +03:00
|
|
|
RecvMessageCapacity: maxMsgSize,
|
2015-03-25 00:15:18 -07:00
|
|
|
},
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-01-17 20:58:27 +04:00
|
|
|
// AddPeer implements Reactor by sending our state to peer.
|
2017-09-12 20:49:22 -04:00
|
|
|
func (bcR *BlockchainReactor) AddPeer(peer p2p.Peer) {
|
2018-04-03 07:03:08 -07:00
|
|
|
msgBytes := cdc.MustMarshalBinaryBare(&bcStatusResponseMessage{bcR.store.Height()})
|
2019-07-25 07:35:30 +02:00
|
|
|
peer.Send(BlockchainChannel, msgBytes)
|
|
|
|
// it's OK if send fails. will try later in poolRoutine
|
|
|
|
|
2017-11-08 02:42:27 +00:00
|
|
|
// peer is added to the pool once we receive the first
|
|
|
|
// bcStatusResponseMessage from the peer and call pool.SetPeerHeight
|
2015-03-25 00:15:18 -07:00
|
|
|
}
|
|
|
|
|
2017-01-17 20:58:27 +04:00
|
|
|
// RemovePeer implements Reactor by removing peer from the pool.
|
2017-09-12 20:49:22 -04:00
|
|
|
func (bcR *BlockchainReactor) RemovePeer(peer p2p.Peer, reason interface{}) {
|
2018-01-01 21:27:38 -05:00
|
|
|
bcR.pool.RemovePeer(peer.ID())
|
2015-03-25 00:15:18 -07:00
|
|
|
}
|
|
|
|
|
2017-08-26 02:33:19 -06:00
|
|
|
// respondToPeer loads a block and sends it to the requesting peer,
|
|
|
|
// if we have it. Otherwise, we'll respond saying we don't have it.
|
|
|
|
// According to the Tendermint spec, if all nodes are honest,
|
|
|
|
// no node should be requesting for a block that's non-existent.
|
2018-01-03 11:29:19 +01:00
|
|
|
func (bcR *BlockchainReactor) respondToPeer(msg *bcBlockRequestMessage,
|
|
|
|
src p2p.Peer) (queued bool) {
|
|
|
|
|
2017-08-26 02:33:19 -06:00
|
|
|
block := bcR.store.LoadBlock(msg.Height)
|
|
|
|
if block != nil {
|
2018-04-03 07:03:08 -07:00
|
|
|
msgBytes := cdc.MustMarshalBinaryBare(&bcBlockResponseMessage{Block: block})
|
|
|
|
return src.TrySend(BlockchainChannel, msgBytes)
|
2017-08-26 02:33:19 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
bcR.Logger.Info("Peer asking for a block we don't have", "src", src, "height", msg.Height)
|
|
|
|
|
2018-04-03 07:03:08 -07:00
|
|
|
msgBytes := cdc.MustMarshalBinaryBare(&bcNoBlockResponseMessage{Height: msg.Height})
|
|
|
|
return src.TrySend(BlockchainChannel, msgBytes)
|
2017-08-26 02:33:19 -06:00
|
|
|
}
|
|
|
|
|
2017-01-17 20:58:27 +04:00
|
|
|
// Receive implements Reactor by handling 4 types of messages (look below).
|
2017-09-12 20:49:22 -04:00
|
|
|
func (bcR *BlockchainReactor) Receive(chID byte, src p2p.Peer, msgBytes []byte) {
|
2018-07-09 13:01:23 +04:00
|
|
|
msg, err := decodeMsg(msgBytes)
|
2015-03-25 00:15:18 -07:00
|
|
|
if err != nil {
|
2018-03-04 13:42:45 +04:00
|
|
|
bcR.Logger.Error("Error decoding message", "src", src, "chId", chID, "msg", msg, "err", err, "bytes", msgBytes)
|
|
|
|
bcR.Switch.StopPeerForError(src, err)
|
2015-03-25 00:15:18 -07:00
|
|
|
return
|
|
|
|
}
|
2015-03-25 17:16:49 -07:00
|
|
|
|
2018-11-01 07:07:18 +01:00
|
|
|
if err = msg.ValidateBasic(); err != nil {
|
|
|
|
bcR.Logger.Error("Peer sent us invalid msg", "peer", src, "msg", msg, "err", err)
|
|
|
|
bcR.Switch.StopPeerForError(src, err)
|
|
|
|
return
|
|
|
|
}
|
|
|
|
|
2017-05-02 11:53:32 +04:00
|
|
|
bcR.Logger.Debug("Receive", "src", src, "chID", chID, "msg", msg)
|
2015-03-25 00:15:18 -07:00
|
|
|
|
2015-04-14 15:57:16 -07:00
|
|
|
switch msg := msg.(type) {
|
2015-04-16 17:46:27 -07:00
|
|
|
case *bcBlockRequestMessage:
|
2019-07-25 07:35:30 +02:00
|
|
|
bcR.respondToPeer(msg, src)
|
2015-04-16 17:46:27 -07:00
|
|
|
case *bcBlockResponseMessage:
|
2018-01-01 21:27:38 -05:00
|
|
|
bcR.pool.AddBlock(src.ID(), msg.Block, len(msgBytes))
|
2015-04-21 19:51:23 -07:00
|
|
|
case *bcStatusRequestMessage:
|
|
|
|
// Send peer our state.
|
2018-04-03 07:03:08 -07:00
|
|
|
msgBytes := cdc.MustMarshalBinaryBare(&bcStatusResponseMessage{bcR.store.Height()})
|
2019-07-25 07:35:30 +02:00
|
|
|
src.TrySend(BlockchainChannel, msgBytes)
|
2015-04-21 19:51:23 -07:00
|
|
|
case *bcStatusResponseMessage:
|
|
|
|
// Got a peer status. Unverified.
|
2018-01-01 21:27:38 -05:00
|
|
|
bcR.pool.SetPeerHeight(src.ID(), msg.Height)
|
2015-03-25 00:15:18 -07:00
|
|
|
default:
|
2018-08-10 00:25:57 -05:00
|
|
|
bcR.Logger.Error(fmt.Sprintf("Unknown message type %v", reflect.TypeOf(msg)))
|
2015-03-25 00:15:18 -07:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-03-25 11:33:39 -07:00
|
|
|
// Handle messages from the poolReactor telling the reactor what to do.
|
2015-05-28 02:18:53 -07:00
|
|
|
// NOTE: Don't sleep in the FOR_LOOP or otherwise slow it down!
|
2015-03-25 00:15:18 -07:00
|
|
|
func (bcR *BlockchainReactor) poolRoutine() {
|
2015-03-25 11:33:39 -07:00
|
|
|
|
|
|
|
trySyncTicker := time.NewTicker(trySyncIntervalMS * time.Millisecond)
|
2015-04-21 19:51:23 -07:00
|
|
|
statusUpdateTicker := time.NewTicker(statusUpdateIntervalSeconds * time.Second)
|
|
|
|
switchToConsensusTicker := time.NewTicker(switchToConsensusIntervalSeconds * time.Second)
|
2015-03-25 11:33:39 -07:00
|
|
|
|
2017-10-26 18:29:23 -04:00
|
|
|
blocksSynced := 0
|
|
|
|
|
2017-12-27 20:40:36 -05:00
|
|
|
chainID := bcR.initialState.ChainID
|
|
|
|
state := bcR.initialState
|
2017-10-05 16:50:05 +04:00
|
|
|
|
2017-10-26 18:29:23 -04:00
|
|
|
lastHundred := time.Now()
|
|
|
|
lastRate := 0.0
|
|
|
|
|
2018-06-21 01:57:35 -07:00
|
|
|
didProcessCh := make(chan struct{}, 1)
|
|
|
|
|
2019-04-16 15:54:19 +08:00
|
|
|
go func() {
|
|
|
|
for {
|
|
|
|
select {
|
|
|
|
case <-bcR.Quit():
|
|
|
|
return
|
|
|
|
case <-bcR.pool.Quit():
|
|
|
|
return
|
|
|
|
case request := <-bcR.requestsCh:
|
|
|
|
peer := bcR.Switch.Peers().Get(request.PeerID)
|
|
|
|
if peer == nil {
|
|
|
|
continue
|
|
|
|
}
|
|
|
|
msgBytes := cdc.MustMarshalBinaryBare(&bcBlockRequestMessage{request.Height})
|
|
|
|
queued := peer.TrySend(BlockchainChannel, msgBytes)
|
|
|
|
if !queued {
|
|
|
|
bcR.Logger.Debug("Send queue is full, drop block request", "peer", peer.ID(), "height", request.Height)
|
|
|
|
}
|
|
|
|
case err := <-bcR.errorsCh:
|
|
|
|
peer := bcR.Switch.Peers().Get(err.peerID)
|
|
|
|
if peer != nil {
|
|
|
|
bcR.Switch.StopPeerForError(peer, err)
|
|
|
|
}
|
2018-06-21 01:57:35 -07:00
|
|
|
|
2019-04-16 15:54:19 +08:00
|
|
|
case <-statusUpdateTicker.C:
|
|
|
|
// ask for status updates
|
|
|
|
go bcR.BroadcastStatusRequest() // nolint: errcheck
|
2018-06-21 01:57:35 -07:00
|
|
|
|
2019-04-16 15:54:19 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}()
|
2018-06-21 01:57:35 -07:00
|
|
|
|
2019-04-16 15:54:19 +08:00
|
|
|
FOR_LOOP:
|
|
|
|
for {
|
|
|
|
select {
|
2017-05-29 23:11:40 -04:00
|
|
|
case <-switchToConsensusTicker.C:
|
2017-10-20 21:19:25 -04:00
|
|
|
height, numPending, lenRequesters := bcR.pool.GetStatus()
|
2015-07-20 14:40:41 -07:00
|
|
|
outbound, inbound, _ := bcR.Switch.NumPeers()
|
2017-10-26 18:29:23 -04:00
|
|
|
bcR.Logger.Debug("Consensus ticker", "numPending", numPending, "total", lenRequesters,
|
2015-08-18 10:51:55 -07:00
|
|
|
"outbound", outbound, "inbound", inbound)
|
|
|
|
if bcR.pool.IsCaughtUp() {
|
2017-05-02 11:53:32 +04:00
|
|
|
bcR.Logger.Info("Time to switch to consensus reactor!", "height", height)
|
2015-04-21 19:51:23 -07:00
|
|
|
bcR.pool.Stop()
|
2018-11-26 15:31:11 -05:00
|
|
|
conR, ok := bcR.Switch.Reactor("CONSENSUS").(consensusReactor)
|
|
|
|
if ok {
|
|
|
|
conR.SwitchToConsensus(state, blocksSynced)
|
|
|
|
}
|
2019-07-25 07:35:30 +02:00
|
|
|
// else {
|
|
|
|
// should only happen during testing
|
|
|
|
// }
|
2015-04-21 19:51:23 -07:00
|
|
|
|
|
|
|
break FOR_LOOP
|
|
|
|
}
|
2018-06-21 01:57:35 -07:00
|
|
|
|
2017-05-29 23:11:40 -04:00
|
|
|
case <-trySyncTicker.C: // chan time
|
2018-06-21 01:57:35 -07:00
|
|
|
select {
|
|
|
|
case didProcessCh <- struct{}{}:
|
|
|
|
default:
|
|
|
|
}
|
|
|
|
|
|
|
|
case <-didProcessCh:
|
|
|
|
// NOTE: It is a subtle mistake to process more than a single block
|
|
|
|
// at a time (e.g. 10) here, because we only TrySend 1 request per
|
|
|
|
// loop. The ratio mismatch can result in starving of blocks, a
|
|
|
|
// sudden burst of requests and responses, and repeat.
|
|
|
|
// Consequently, it is better to split these routines rather than
|
|
|
|
// coupling them as it's written here. TODO uncouple from request
|
|
|
|
// routine.
|
|
|
|
|
|
|
|
// See if there are any blocks to sync.
|
|
|
|
first, second := bcR.pool.PeekTwoBlocks()
|
|
|
|
//bcR.Logger.Info("TrySync peeked", "first", first, "second", second)
|
|
|
|
if first == nil || second == nil {
|
|
|
|
// We need both to sync the first block.
|
|
|
|
continue FOR_LOOP
|
|
|
|
} else {
|
|
|
|
// Try again quickly next loop.
|
|
|
|
didProcessCh <- struct{}{}
|
|
|
|
}
|
|
|
|
|
2018-09-12 23:44:43 +04:00
|
|
|
firstParts := first.MakePartSet(types.BlockPartSizeBytes)
|
2018-06-21 01:57:35 -07:00
|
|
|
firstPartsHeader := firstParts.Header()
|
2019-02-11 16:31:34 +04:00
|
|
|
firstID := types.BlockID{Hash: first.Hash(), PartsHeader: firstPartsHeader}
|
2018-06-21 01:57:35 -07:00
|
|
|
// Finally, verify the first block using the second's commit
|
|
|
|
// NOTE: we can probably make this more efficient, but note that calling
|
|
|
|
// first.Hash() doesn't verify the tx contents, so MakePartSet() is
|
|
|
|
// currently necessary.
|
|
|
|
err := state.Validators.VerifyCommit(
|
|
|
|
chainID, firstID, first.Height, second.LastCommit)
|
|
|
|
if err != nil {
|
|
|
|
bcR.Logger.Error("Error in validation", "err", err)
|
|
|
|
peerID := bcR.pool.RedoRequest(first.Height)
|
|
|
|
peer := bcR.Switch.Peers().Get(peerID)
|
|
|
|
if peer != nil {
|
|
|
|
// NOTE: we've already removed the peer's request, but we
|
|
|
|
// still need to clean up the rest.
|
|
|
|
bcR.Switch.StopPeerForError(peer, fmt.Errorf("BlockchainReactor validation error: %v", err))
|
2015-03-25 11:33:39 -07:00
|
|
|
}
|
2018-11-26 15:31:11 -05:00
|
|
|
peerID2 := bcR.pool.RedoRequest(second.Height)
|
|
|
|
peer2 := bcR.Switch.Peers().Get(peerID2)
|
|
|
|
if peer2 != nil && peer2 != peer {
|
|
|
|
// NOTE: we've already removed the peer's request, but we
|
|
|
|
// still need to clean up the rest.
|
|
|
|
bcR.Switch.StopPeerForError(peer2, fmt.Errorf("BlockchainReactor validation error: %v", err))
|
|
|
|
}
|
2018-06-21 01:57:35 -07:00
|
|
|
continue FOR_LOOP
|
|
|
|
} else {
|
|
|
|
bcR.pool.PopRequest()
|
|
|
|
|
|
|
|
// TODO: batch saves so we dont persist to disk every block
|
|
|
|
bcR.store.SaveBlock(first, firstParts, second.LastCommit)
|
|
|
|
|
|
|
|
// TODO: same thing for app - but we would need a way to
|
|
|
|
// get the hash without persisting the state
|
|
|
|
var err error
|
|
|
|
state, err = bcR.blockExec.ApplyBlock(state, firstID, first)
|
2015-03-25 11:33:39 -07:00
|
|
|
if err != nil {
|
2018-06-21 01:57:35 -07:00
|
|
|
// TODO This is bad, are we zombie?
|
2019-02-12 06:02:44 +01:00
|
|
|
panic(fmt.Sprintf("Failed to process committed block (%d:%X): %v", first.Height, first.Hash(), err))
|
2018-06-21 01:57:35 -07:00
|
|
|
}
|
|
|
|
blocksSynced++
|
|
|
|
|
|
|
|
if blocksSynced%100 == 0 {
|
|
|
|
lastRate = 0.9*lastRate + 0.1*(100/time.Since(lastHundred).Seconds())
|
|
|
|
bcR.Logger.Info("Fast Sync Rate", "height", bcR.pool.height,
|
|
|
|
"max_peer_height", bcR.pool.MaxPeerHeight(), "blocks/s", lastRate)
|
|
|
|
lastHundred = time.Now()
|
2015-03-25 11:33:39 -07:00
|
|
|
}
|
|
|
|
}
|
|
|
|
continue FOR_LOOP
|
2018-06-21 01:57:35 -07:00
|
|
|
|
2018-02-12 14:31:52 +04:00
|
|
|
case <-bcR.Quit():
|
2015-03-25 00:15:18 -07:00
|
|
|
break FOR_LOOP
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-01-17 20:58:27 +04:00
|
|
|
// BroadcastStatusRequest broadcasts `BlockStore` height.
|
2015-04-21 19:51:23 -07:00
|
|
|
func (bcR *BlockchainReactor) BroadcastStatusRequest() error {
|
2018-04-03 07:03:08 -07:00
|
|
|
msgBytes := cdc.MustMarshalBinaryBare(&bcStatusRequestMessage{bcR.store.Height()})
|
|
|
|
bcR.Switch.Broadcast(BlockchainChannel, msgBytes)
|
2015-03-25 00:15:18 -07:00
|
|
|
return nil
|
|
|
|
}
|
|
|
|
|
|
|
|
//-----------------------------------------------------------------------------
|
|
|
|
// Messages
|
|
|
|
|
2017-01-17 20:58:27 +04:00
|
|
|
// BlockchainMessage is a generic message for this reactor.
|
2018-11-01 07:07:18 +01:00
|
|
|
type BlockchainMessage interface {
|
|
|
|
ValidateBasic() error
|
|
|
|
}
|
2015-04-14 15:57:16 -07:00
|
|
|
|
blockchain: Reorg reactor (#3561)
* go routines in blockchain reactor
* Added reference to the go routine diagram
* Initial commit
* cleanup
* Undo testing_logger change, committed by mistake
* Fix the test loggers
* pulled some fsm code into pool.go
* added pool tests
* changes to the design
added block requests under peer
moved the request trigger in the reactor poolRoutine, triggered now by a ticker
in general moved everything required for making block requests smarter in the poolRoutine
added a simple map of heights to keep track of what will need to be requested next
added a few more tests
* send errors to FSM in a different channel than blocks
send errors (RemovePeer) from switch on a different channel than the
one receiving blocks
renamed channels
added more pool tests
* more pool tests
* lint errors
* more tests
* more tests
* switch fast sync to new implementation
* fixed data race in tests
* cleanup
* finished fsm tests
* address golangci comments :)
* address golangci comments :)
* Added timeout on next block needed to advance
* updating docs and cleanup
* fix issue in test from previous cleanup
* cleanup
* Added termination scenarios, tests and more cleanup
* small fixes to adr, comments and cleanup
* Fix bug in sendRequest()
If we tried to send a request to a peer not present in the switch, a
missing continue statement caused the request to be blackholed in a peer
that was removed and never retried.
While this bug was manifesting, the reactor kept asking for other
blocks that would be stored and never consumed. Added the number of
unconsumed blocks in the math for requesting blocks ahead of current
processing height so eventually there will be no more blocks requested
until the already received ones are consumed.
* remove bpPeer's didTimeout field
* Use distinct err codes for peer timeout and FSM timeouts
* Don't allow peers to update with lower height
* review comments from Ethan and Zarko
* some cleanup, renaming, comments
* Move block execution in separate goroutine
* Remove pool's numPending
* review comments
* fix lint, remove old blockchain reactor and duplicates in fsm tests
* small reorg around peer after review comments
* add the reactor spec
* verify block only once
* review comments
* change to int for max number of pending requests
* cleanup and godoc
* Add configuration flag fast sync version
* golangci fixes
* fix config template
* move both reactor versions under blockchain
* cleanup, golint, renaming stuff
* updated documentation, fixed more golint warnings
* integrate with behavior package
* sync with master
* gofmt
* add changelog_pending entry
* move to improvments
* suggestion to changelog entry
2019-07-23 10:58:52 +02:00
|
|
|
// RegisterBlockchainMessages registers the fast sync messages for amino encoding.
|
2018-04-03 07:03:08 -07:00
|
|
|
func RegisterBlockchainMessages(cdc *amino.Codec) {
|
|
|
|
cdc.RegisterInterface((*BlockchainMessage)(nil), nil)
|
2018-08-13 21:40:49 +08:00
|
|
|
cdc.RegisterConcrete(&bcBlockRequestMessage{}, "tendermint/blockchain/BlockRequest", nil)
|
|
|
|
cdc.RegisterConcrete(&bcBlockResponseMessage{}, "tendermint/blockchain/BlockResponse", nil)
|
|
|
|
cdc.RegisterConcrete(&bcNoBlockResponseMessage{}, "tendermint/blockchain/NoBlockResponse", nil)
|
|
|
|
cdc.RegisterConcrete(&bcStatusResponseMessage{}, "tendermint/blockchain/StatusResponse", nil)
|
|
|
|
cdc.RegisterConcrete(&bcStatusRequestMessage{}, "tendermint/blockchain/StatusRequest", nil)
|
2018-04-03 07:03:08 -07:00
|
|
|
}
|
2015-04-14 15:57:16 -07:00
|
|
|
|
2018-07-09 13:01:23 +04:00
|
|
|
func decodeMsg(bz []byte) (msg BlockchainMessage, err error) {
|
2018-04-09 15:14:33 +03:00
|
|
|
if len(bz) > maxMsgSize {
|
2018-07-09 13:01:23 +04:00
|
|
|
return msg, fmt.Errorf("Msg exceeds max size (%d > %d)", len(bz), maxMsgSize)
|
2018-04-09 15:14:33 +03:00
|
|
|
}
|
2018-04-03 07:03:08 -07:00
|
|
|
err = cdc.UnmarshalBinaryBare(bz, &msg)
|
2015-03-25 00:15:18 -07:00
|
|
|
return
|
|
|
|
}
|
|
|
|
|
|
|
|
//-------------------------------------
|
|
|
|
|
2015-03-25 17:16:49 -07:00
|
|
|
type bcBlockRequestMessage struct {
|
2017-12-01 19:04:53 -06:00
|
|
|
Height int64
|
2015-03-25 00:15:18 -07:00
|
|
|
}
|
|
|
|
|
2018-11-01 07:07:18 +01:00
|
|
|
// ValidateBasic performs basic validation.
|
|
|
|
func (m *bcBlockRequestMessage) ValidateBasic() error {
|
|
|
|
if m.Height < 0 {
|
|
|
|
return errors.New("Negative Height")
|
|
|
|
}
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
|
2015-04-16 17:46:27 -07:00
|
|
|
func (m *bcBlockRequestMessage) String() string {
|
2018-08-10 00:25:57 -05:00
|
|
|
return fmt.Sprintf("[bcBlockRequestMessage %v]", m.Height)
|
2015-03-25 00:15:18 -07:00
|
|
|
}
|
|
|
|
|
2017-08-26 02:33:19 -06:00
|
|
|
type bcNoBlockResponseMessage struct {
|
2017-12-01 19:04:53 -06:00
|
|
|
Height int64
|
2017-08-26 02:33:19 -06:00
|
|
|
}
|
|
|
|
|
2018-11-01 07:07:18 +01:00
|
|
|
// ValidateBasic performs basic validation.
|
|
|
|
func (m *bcNoBlockResponseMessage) ValidateBasic() error {
|
|
|
|
if m.Height < 0 {
|
|
|
|
return errors.New("Negative Height")
|
|
|
|
}
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
|
blockchain: Reorg reactor (#3561)
* go routines in blockchain reactor
* Added reference to the go routine diagram
* Initial commit
* cleanup
* Undo testing_logger change, committed by mistake
* Fix the test loggers
* pulled some fsm code into pool.go
* added pool tests
* changes to the design
added block requests under peer
moved the request trigger in the reactor poolRoutine, triggered now by a ticker
in general moved everything required for making block requests smarter in the poolRoutine
added a simple map of heights to keep track of what will need to be requested next
added a few more tests
* send errors to FSM in a different channel than blocks
send errors (RemovePeer) from switch on a different channel than the
one receiving blocks
renamed channels
added more pool tests
* more pool tests
* lint errors
* more tests
* more tests
* switch fast sync to new implementation
* fixed data race in tests
* cleanup
* finished fsm tests
* address golangci comments :)
* address golangci comments :)
* Added timeout on next block needed to advance
* updating docs and cleanup
* fix issue in test from previous cleanup
* cleanup
* Added termination scenarios, tests and more cleanup
* small fixes to adr, comments and cleanup
* Fix bug in sendRequest()
If we tried to send a request to a peer not present in the switch, a
missing continue statement caused the request to be blackholed in a peer
that was removed and never retried.
While this bug was manifesting, the reactor kept asking for other
blocks that would be stored and never consumed. Added the number of
unconsumed blocks in the math for requesting blocks ahead of current
processing height so eventually there will be no more blocks requested
until the already received ones are consumed.
* remove bpPeer's didTimeout field
* Use distinct err codes for peer timeout and FSM timeouts
* Don't allow peers to update with lower height
* review comments from Ethan and Zarko
* some cleanup, renaming, comments
* Move block execution in separate goroutine
* Remove pool's numPending
* review comments
* fix lint, remove old blockchain reactor and duplicates in fsm tests
* small reorg around peer after review comments
* add the reactor spec
* verify block only once
* review comments
* change to int for max number of pending requests
* cleanup and godoc
* Add configuration flag fast sync version
* golangci fixes
* fix config template
* move both reactor versions under blockchain
* cleanup, golint, renaming stuff
* updated documentation, fixed more golint warnings
* integrate with behavior package
* sync with master
* gofmt
* add changelog_pending entry
* move to improvments
* suggestion to changelog entry
2019-07-23 10:58:52 +02:00
|
|
|
func (m *bcNoBlockResponseMessage) String() string {
|
|
|
|
return fmt.Sprintf("[bcNoBlockResponseMessage %d]", m.Height)
|
2017-08-26 02:33:19 -06:00
|
|
|
}
|
|
|
|
|
2015-03-25 00:15:18 -07:00
|
|
|
//-------------------------------------
|
|
|
|
|
2015-03-25 17:16:49 -07:00
|
|
|
type bcBlockResponseMessage struct {
|
2015-03-25 00:15:18 -07:00
|
|
|
Block *types.Block
|
|
|
|
}
|
|
|
|
|
2018-11-01 07:07:18 +01:00
|
|
|
// ValidateBasic performs basic validation.
|
|
|
|
func (m *bcBlockResponseMessage) ValidateBasic() error {
|
2018-12-17 11:51:53 -05:00
|
|
|
return m.Block.ValidateBasic()
|
2018-11-01 07:07:18 +01:00
|
|
|
}
|
|
|
|
|
2015-04-16 17:46:27 -07:00
|
|
|
func (m *bcBlockResponseMessage) String() string {
|
2018-08-10 00:25:57 -05:00
|
|
|
return fmt.Sprintf("[bcBlockResponseMessage %v]", m.Block.Height)
|
2015-03-25 00:15:18 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
//-------------------------------------
|
|
|
|
|
2015-04-21 19:51:23 -07:00
|
|
|
type bcStatusRequestMessage struct {
|
2017-12-01 19:04:53 -06:00
|
|
|
Height int64
|
2015-04-21 19:51:23 -07:00
|
|
|
}
|
|
|
|
|
2018-11-01 07:07:18 +01:00
|
|
|
// ValidateBasic performs basic validation.
|
|
|
|
func (m *bcStatusRequestMessage) ValidateBasic() error {
|
|
|
|
if m.Height < 0 {
|
|
|
|
return errors.New("Negative Height")
|
|
|
|
}
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
|
2015-04-21 19:51:23 -07:00
|
|
|
func (m *bcStatusRequestMessage) String() string {
|
2018-08-10 00:25:57 -05:00
|
|
|
return fmt.Sprintf("[bcStatusRequestMessage %v]", m.Height)
|
2015-04-21 19:51:23 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
//-------------------------------------
|
|
|
|
|
|
|
|
type bcStatusResponseMessage struct {
|
2017-12-01 19:04:53 -06:00
|
|
|
Height int64
|
2015-03-25 00:15:18 -07:00
|
|
|
}
|
|
|
|
|
2018-11-01 07:07:18 +01:00
|
|
|
// ValidateBasic performs basic validation.
|
|
|
|
func (m *bcStatusResponseMessage) ValidateBasic() error {
|
|
|
|
if m.Height < 0 {
|
|
|
|
return errors.New("Negative Height")
|
|
|
|
}
|
|
|
|
return nil
|
|
|
|
}
|
|
|
|
|
2015-04-21 19:51:23 -07:00
|
|
|
func (m *bcStatusResponseMessage) String() string {
|
2018-08-10 00:25:57 -05:00
|
|
|
return fmt.Sprintf("[bcStatusResponseMessage %v]", m.Height)
|
2015-03-25 00:15:18 -07:00
|
|
|
}
|