2016-11-23 18:20:46 -05:00
|
|
|
package consensus
|
|
|
|
|
|
|
|
import (
|
new pubsub package
comment out failing consensus tests for now
rewrite rpc httpclient to use new pubsub package
import pubsub as tmpubsub, query as tmquery
make event IDs constants
EventKey -> EventTypeKey
rename EventsPubsub to PubSub
mempool does not use pubsub
rename eventsSub to pubsub
new subscribe API
fix channel size issues and consensus tests bugs
refactor rpc client
add missing discardFromChan method
add mutex
rename pubsub to eventBus
remove IsRunning from WSRPCConnection interface (not needed)
add a comment in broadcastNewRoundStepsAndVotes
rename registerEventCallbacks to broadcastNewRoundStepsAndVotes
See https://dave.cheney.net/2014/03/19/channel-axioms
stop eventBuses after reactor tests
remove unnecessary Unsubscribe
return subscribe helper function
move discardFromChan to where it is used
subscribe now returns an err
this gives us ability to refuse to subscribe if pubsub is at its max
capacity.
use context for control overflow
cache queries
handle err when subscribing in replay_test
rename testClientID to testSubscriber
extract var
set channel buffer capacity to 1 in replay_file
fix byzantine_test
unsubscribe from single event, not all events
refactor httpclient to return events to appropriate channels
return failing testReplayCrashBeforeWriteVote test
fix TestValidatorSetChanges
refactor code a bit
fix testReplayCrashBeforeWriteVote
add comment
fix TestValidatorSetChanges
fixes from Bucky's review
update comment [ci skip]
test TxEventBuffer
update changelog
fix TestValidatorSetChanges (2nd attempt)
only do wg.Done when no errors
benchmark event bus
create pubsub server inside NewEventBus
only expose config params (later if needed)
set buffer capacity to 0 so we are not testing cache
new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ}
This should allow to subscribe to all transactions! or a specific one
using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'"
use TimeoutCommit instead of afterPublishEventNewBlockTimeout
TimeoutCommit is the time a node waits after committing a block, before
it goes into the next height. So it will finish everything from the last
block, but then wait a bit. The idea is this gives it time to hear more
votes from other validators, to strengthen the commit it includes in the
next block. But it also gives it time to hear about new transactions.
waitForBlockWithUpdatedVals
rewrite WAL crash tests
Task:
test that we can recover from any WAL crash.
Solution:
the old tests were relying on event hub being run in the same thread (we
were injecting the private validator's last signature).
when considering a rewrite, we considered two possible solutions: write
a "fuzzy" testing system where WAL is crashing upon receiving a new
message, or inject failures and trigger them in tests using something
like https://github.com/coreos/gofail.
remove sleep
no cs.Lock around wal.Save
test different cases (empty block, non-empty block, ...)
comments
add comments
test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks
fixes as per Bucky's last review
reset subscriptions on UnsubscribeAll
use a simple counter to track message for which we panicked
also, set a smaller part size for all test cases
2017-06-26 19:00:30 +04:00
|
|
|
"context"
|
2018-08-10 00:25:57 -05:00
|
|
|
"fmt"
|
2016-11-23 18:20:46 -05:00
|
|
|
"sync"
|
|
|
|
"testing"
|
|
|
|
"time"
|
|
|
|
|
new pubsub package
comment out failing consensus tests for now
rewrite rpc httpclient to use new pubsub package
import pubsub as tmpubsub, query as tmquery
make event IDs constants
EventKey -> EventTypeKey
rename EventsPubsub to PubSub
mempool does not use pubsub
rename eventsSub to pubsub
new subscribe API
fix channel size issues and consensus tests bugs
refactor rpc client
add missing discardFromChan method
add mutex
rename pubsub to eventBus
remove IsRunning from WSRPCConnection interface (not needed)
add a comment in broadcastNewRoundStepsAndVotes
rename registerEventCallbacks to broadcastNewRoundStepsAndVotes
See https://dave.cheney.net/2014/03/19/channel-axioms
stop eventBuses after reactor tests
remove unnecessary Unsubscribe
return subscribe helper function
move discardFromChan to where it is used
subscribe now returns an err
this gives us ability to refuse to subscribe if pubsub is at its max
capacity.
use context for control overflow
cache queries
handle err when subscribing in replay_test
rename testClientID to testSubscriber
extract var
set channel buffer capacity to 1 in replay_file
fix byzantine_test
unsubscribe from single event, not all events
refactor httpclient to return events to appropriate channels
return failing testReplayCrashBeforeWriteVote test
fix TestValidatorSetChanges
refactor code a bit
fix testReplayCrashBeforeWriteVote
add comment
fix TestValidatorSetChanges
fixes from Bucky's review
update comment [ci skip]
test TxEventBuffer
update changelog
fix TestValidatorSetChanges (2nd attempt)
only do wg.Done when no errors
benchmark event bus
create pubsub server inside NewEventBus
only expose config params (later if needed)
set buffer capacity to 0 so we are not testing cache
new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ}
This should allow to subscribe to all transactions! or a specific one
using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'"
use TimeoutCommit instead of afterPublishEventNewBlockTimeout
TimeoutCommit is the time a node waits after committing a block, before
it goes into the next height. So it will finish everything from the last
block, but then wait a bit. The idea is this gives it time to hear more
votes from other validators, to strengthen the commit it includes in the
next block. But it also gives it time to hear about new transactions.
waitForBlockWithUpdatedVals
rewrite WAL crash tests
Task:
test that we can recover from any WAL crash.
Solution:
the old tests were relying on event hub being run in the same thread (we
were injecting the private validator's last signature).
when considering a rewrite, we considered two possible solutions: write
a "fuzzy" testing system where WAL is crashing upon receiving a new
message, or inject failures and trigger them in tests using something
like https://github.com/coreos/gofail.
remove sleep
no cs.Lock around wal.Save
test different cases (empty block, non-empty block, ...)
comments
add comments
test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks
fixes as per Bucky's last review
reset subscriptions on UnsubscribeAll
use a simple counter to track message for which we panicked
also, set a smaller part size for all test cases
2017-06-26 19:00:30 +04:00
|
|
|
"github.com/stretchr/testify/require"
|
2018-08-10 00:25:57 -05:00
|
|
|
cmn "github.com/tendermint/tendermint/libs/common"
|
2017-04-21 18:19:41 -04:00
|
|
|
"github.com/tendermint/tendermint/p2p"
|
2019-05-02 05:15:53 +08:00
|
|
|
sm "github.com/tendermint/tendermint/state"
|
2016-11-23 18:20:46 -05:00
|
|
|
"github.com/tendermint/tendermint/types"
|
|
|
|
)
|
|
|
|
|
|
|
|
//----------------------------------------------
|
|
|
|
// byzantine failures
|
|
|
|
|
|
|
|
// 4 validators. 1 is byzantine. The other three are partitioned into A (1 val) and B (2 vals).
|
|
|
|
// byzantine validator sends conflicting proposals into A and B,
|
|
|
|
// and prevotes/precommits on both of them.
|
|
|
|
// B sees a commit, A doesn't.
|
|
|
|
// Byzantine validator refuses to prevote.
|
|
|
|
// Heal partition and ensure A sees the commit
|
|
|
|
func TestByzantine(t *testing.T) {
|
|
|
|
N := 4
|
2018-05-16 19:13:45 +02:00
|
|
|
logger := consensusLogger().With("test", "byzantine")
|
2019-02-18 08:45:27 +01:00
|
|
|
css, cleanup := randConsensusNet(N, "consensus_byzantine_test", newMockTickerFunc(false), newCounter)
|
|
|
|
defer cleanup()
|
2016-12-19 10:44:25 -05:00
|
|
|
|
|
|
|
// give the byzantine validator a normal ticker
|
2018-01-24 23:34:57 -05:00
|
|
|
ticker := NewTimeoutTicker()
|
|
|
|
ticker.SetLogger(css[0].Logger)
|
|
|
|
css[0].SetTimeoutTicker(ticker)
|
2016-11-23 18:20:46 -05:00
|
|
|
|
|
|
|
switches := make([]*p2p.Switch, N)
|
2017-05-14 21:44:01 +02:00
|
|
|
p2pLogger := logger.With("module", "p2p")
|
2016-11-23 18:20:46 -05:00
|
|
|
for i := 0; i < N; i++ {
|
2018-09-18 22:14:40 +02:00
|
|
|
switches[i] = p2p.MakeSwitch(
|
|
|
|
config.P2P,
|
|
|
|
i,
|
|
|
|
"foo", "1.0.0",
|
|
|
|
func(i int, sw *p2p.Switch) *p2p.Switch {
|
|
|
|
return sw
|
|
|
|
})
|
2017-05-14 21:44:01 +02:00
|
|
|
switches[i].SetLogger(p2pLogger.With("validator", i))
|
2016-11-23 18:20:46 -05:00
|
|
|
}
|
|
|
|
|
2019-02-23 08:11:27 +04:00
|
|
|
blocksSubs := make([]types.Subscription, N)
|
new pubsub package
comment out failing consensus tests for now
rewrite rpc httpclient to use new pubsub package
import pubsub as tmpubsub, query as tmquery
make event IDs constants
EventKey -> EventTypeKey
rename EventsPubsub to PubSub
mempool does not use pubsub
rename eventsSub to pubsub
new subscribe API
fix channel size issues and consensus tests bugs
refactor rpc client
add missing discardFromChan method
add mutex
rename pubsub to eventBus
remove IsRunning from WSRPCConnection interface (not needed)
add a comment in broadcastNewRoundStepsAndVotes
rename registerEventCallbacks to broadcastNewRoundStepsAndVotes
See https://dave.cheney.net/2014/03/19/channel-axioms
stop eventBuses after reactor tests
remove unnecessary Unsubscribe
return subscribe helper function
move discardFromChan to where it is used
subscribe now returns an err
this gives us ability to refuse to subscribe if pubsub is at its max
capacity.
use context for control overflow
cache queries
handle err when subscribing in replay_test
rename testClientID to testSubscriber
extract var
set channel buffer capacity to 1 in replay_file
fix byzantine_test
unsubscribe from single event, not all events
refactor httpclient to return events to appropriate channels
return failing testReplayCrashBeforeWriteVote test
fix TestValidatorSetChanges
refactor code a bit
fix testReplayCrashBeforeWriteVote
add comment
fix TestValidatorSetChanges
fixes from Bucky's review
update comment [ci skip]
test TxEventBuffer
update changelog
fix TestValidatorSetChanges (2nd attempt)
only do wg.Done when no errors
benchmark event bus
create pubsub server inside NewEventBus
only expose config params (later if needed)
set buffer capacity to 0 so we are not testing cache
new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ}
This should allow to subscribe to all transactions! or a specific one
using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'"
use TimeoutCommit instead of afterPublishEventNewBlockTimeout
TimeoutCommit is the time a node waits after committing a block, before
it goes into the next height. So it will finish everything from the last
block, but then wait a bit. The idea is this gives it time to hear more
votes from other validators, to strengthen the commit it includes in the
next block. But it also gives it time to hear about new transactions.
waitForBlockWithUpdatedVals
rewrite WAL crash tests
Task:
test that we can recover from any WAL crash.
Solution:
the old tests were relying on event hub being run in the same thread (we
were injecting the private validator's last signature).
when considering a rewrite, we considered two possible solutions: write
a "fuzzy" testing system where WAL is crashing upon receiving a new
message, or inject failures and trigger them in tests using something
like https://github.com/coreos/gofail.
remove sleep
no cs.Lock around wal.Save
test different cases (empty block, non-empty block, ...)
comments
add comments
test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks
fixes as per Bucky's last review
reset subscriptions on UnsubscribeAll
use a simple counter to track message for which we panicked
also, set a smaller part size for all test cases
2017-06-26 19:00:30 +04:00
|
|
|
reactors := make([]p2p.Reactor, N)
|
2016-11-23 18:20:46 -05:00
|
|
|
for i := 0; i < N; i++ {
|
2018-03-06 13:40:09 +04:00
|
|
|
// make first val byzantine
|
2016-11-23 18:20:46 -05:00
|
|
|
if i == 0 {
|
2018-04-05 07:05:45 -07:00
|
|
|
// NOTE: Now, test validators are MockPV, which by default doesn't
|
|
|
|
// do any safety checks.
|
|
|
|
css[i].privValidator.(*types.MockPV).DisableChecks()
|
2017-12-01 19:04:53 -06:00
|
|
|
css[i].decideProposal = func(j int) func(int64, int) {
|
|
|
|
return func(height int64, round int) {
|
2017-05-02 11:53:32 +04:00
|
|
|
byzantineDecideProposalFunc(t, height, round, css[j], switches[j])
|
2016-11-23 18:20:46 -05:00
|
|
|
}
|
|
|
|
}(i)
|
2017-12-01 19:04:53 -06:00
|
|
|
css[i].doPrevote = func(height int64, round int) {}
|
2016-11-23 18:20:46 -05:00
|
|
|
}
|
|
|
|
|
2018-06-01 20:22:46 -04:00
|
|
|
eventBus := css[i].eventBus
|
new pubsub package
comment out failing consensus tests for now
rewrite rpc httpclient to use new pubsub package
import pubsub as tmpubsub, query as tmquery
make event IDs constants
EventKey -> EventTypeKey
rename EventsPubsub to PubSub
mempool does not use pubsub
rename eventsSub to pubsub
new subscribe API
fix channel size issues and consensus tests bugs
refactor rpc client
add missing discardFromChan method
add mutex
rename pubsub to eventBus
remove IsRunning from WSRPCConnection interface (not needed)
add a comment in broadcastNewRoundStepsAndVotes
rename registerEventCallbacks to broadcastNewRoundStepsAndVotes
See https://dave.cheney.net/2014/03/19/channel-axioms
stop eventBuses after reactor tests
remove unnecessary Unsubscribe
return subscribe helper function
move discardFromChan to where it is used
subscribe now returns an err
this gives us ability to refuse to subscribe if pubsub is at its max
capacity.
use context for control overflow
cache queries
handle err when subscribing in replay_test
rename testClientID to testSubscriber
extract var
set channel buffer capacity to 1 in replay_file
fix byzantine_test
unsubscribe from single event, not all events
refactor httpclient to return events to appropriate channels
return failing testReplayCrashBeforeWriteVote test
fix TestValidatorSetChanges
refactor code a bit
fix testReplayCrashBeforeWriteVote
add comment
fix TestValidatorSetChanges
fixes from Bucky's review
update comment [ci skip]
test TxEventBuffer
update changelog
fix TestValidatorSetChanges (2nd attempt)
only do wg.Done when no errors
benchmark event bus
create pubsub server inside NewEventBus
only expose config params (later if needed)
set buffer capacity to 0 so we are not testing cache
new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ}
This should allow to subscribe to all transactions! or a specific one
using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'"
use TimeoutCommit instead of afterPublishEventNewBlockTimeout
TimeoutCommit is the time a node waits after committing a block, before
it goes into the next height. So it will finish everything from the last
block, but then wait a bit. The idea is this gives it time to hear more
votes from other validators, to strengthen the commit it includes in the
next block. But it also gives it time to hear about new transactions.
waitForBlockWithUpdatedVals
rewrite WAL crash tests
Task:
test that we can recover from any WAL crash.
Solution:
the old tests were relying on event hub being run in the same thread (we
were injecting the private validator's last signature).
when considering a rewrite, we considered two possible solutions: write
a "fuzzy" testing system where WAL is crashing upon receiving a new
message, or inject failures and trigger them in tests using something
like https://github.com/coreos/gofail.
remove sleep
no cs.Lock around wal.Save
test different cases (empty block, non-empty block, ...)
comments
add comments
test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks
fixes as per Bucky's last review
reset subscriptions on UnsubscribeAll
use a simple counter to track message for which we panicked
also, set a smaller part size for all test cases
2017-06-26 19:00:30 +04:00
|
|
|
eventBus.SetLogger(logger.With("module", "events", "validator", i))
|
|
|
|
|
2019-02-23 08:11:27 +04:00
|
|
|
var err error
|
|
|
|
blocksSubs[i], err = eventBus.Subscribe(context.Background(), testSubscriber, types.EventQueryNewBlock)
|
new pubsub package
comment out failing consensus tests for now
rewrite rpc httpclient to use new pubsub package
import pubsub as tmpubsub, query as tmquery
make event IDs constants
EventKey -> EventTypeKey
rename EventsPubsub to PubSub
mempool does not use pubsub
rename eventsSub to pubsub
new subscribe API
fix channel size issues and consensus tests bugs
refactor rpc client
add missing discardFromChan method
add mutex
rename pubsub to eventBus
remove IsRunning from WSRPCConnection interface (not needed)
add a comment in broadcastNewRoundStepsAndVotes
rename registerEventCallbacks to broadcastNewRoundStepsAndVotes
See https://dave.cheney.net/2014/03/19/channel-axioms
stop eventBuses after reactor tests
remove unnecessary Unsubscribe
return subscribe helper function
move discardFromChan to where it is used
subscribe now returns an err
this gives us ability to refuse to subscribe if pubsub is at its max
capacity.
use context for control overflow
cache queries
handle err when subscribing in replay_test
rename testClientID to testSubscriber
extract var
set channel buffer capacity to 1 in replay_file
fix byzantine_test
unsubscribe from single event, not all events
refactor httpclient to return events to appropriate channels
return failing testReplayCrashBeforeWriteVote test
fix TestValidatorSetChanges
refactor code a bit
fix testReplayCrashBeforeWriteVote
add comment
fix TestValidatorSetChanges
fixes from Bucky's review
update comment [ci skip]
test TxEventBuffer
update changelog
fix TestValidatorSetChanges (2nd attempt)
only do wg.Done when no errors
benchmark event bus
create pubsub server inside NewEventBus
only expose config params (later if needed)
set buffer capacity to 0 so we are not testing cache
new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ}
This should allow to subscribe to all transactions! or a specific one
using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'"
use TimeoutCommit instead of afterPublishEventNewBlockTimeout
TimeoutCommit is the time a node waits after committing a block, before
it goes into the next height. So it will finish everything from the last
block, but then wait a bit. The idea is this gives it time to hear more
votes from other validators, to strengthen the commit it includes in the
next block. But it also gives it time to hear about new transactions.
waitForBlockWithUpdatedVals
rewrite WAL crash tests
Task:
test that we can recover from any WAL crash.
Solution:
the old tests were relying on event hub being run in the same thread (we
were injecting the private validator's last signature).
when considering a rewrite, we considered two possible solutions: write
a "fuzzy" testing system where WAL is crashing upon receiving a new
message, or inject failures and trigger them in tests using something
like https://github.com/coreos/gofail.
remove sleep
no cs.Lock around wal.Save
test different cases (empty block, non-empty block, ...)
comments
add comments
test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks
fixes as per Bucky's last review
reset subscriptions on UnsubscribeAll
use a simple counter to track message for which we panicked
also, set a smaller part size for all test cases
2017-06-26 19:00:30 +04:00
|
|
|
require.NoError(t, err)
|
2016-11-23 18:20:46 -05:00
|
|
|
|
2019-05-02 05:15:53 +08:00
|
|
|
conR := NewConsensusReactor(css[i], true) // so we don't start the consensus states
|
2017-05-14 21:44:01 +02:00
|
|
|
conR.SetLogger(logger.With("validator", i))
|
new pubsub package
comment out failing consensus tests for now
rewrite rpc httpclient to use new pubsub package
import pubsub as tmpubsub, query as tmquery
make event IDs constants
EventKey -> EventTypeKey
rename EventsPubsub to PubSub
mempool does not use pubsub
rename eventsSub to pubsub
new subscribe API
fix channel size issues and consensus tests bugs
refactor rpc client
add missing discardFromChan method
add mutex
rename pubsub to eventBus
remove IsRunning from WSRPCConnection interface (not needed)
add a comment in broadcastNewRoundStepsAndVotes
rename registerEventCallbacks to broadcastNewRoundStepsAndVotes
See https://dave.cheney.net/2014/03/19/channel-axioms
stop eventBuses after reactor tests
remove unnecessary Unsubscribe
return subscribe helper function
move discardFromChan to where it is used
subscribe now returns an err
this gives us ability to refuse to subscribe if pubsub is at its max
capacity.
use context for control overflow
cache queries
handle err when subscribing in replay_test
rename testClientID to testSubscriber
extract var
set channel buffer capacity to 1 in replay_file
fix byzantine_test
unsubscribe from single event, not all events
refactor httpclient to return events to appropriate channels
return failing testReplayCrashBeforeWriteVote test
fix TestValidatorSetChanges
refactor code a bit
fix testReplayCrashBeforeWriteVote
add comment
fix TestValidatorSetChanges
fixes from Bucky's review
update comment [ci skip]
test TxEventBuffer
update changelog
fix TestValidatorSetChanges (2nd attempt)
only do wg.Done when no errors
benchmark event bus
create pubsub server inside NewEventBus
only expose config params (later if needed)
set buffer capacity to 0 so we are not testing cache
new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ}
This should allow to subscribe to all transactions! or a specific one
using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'"
use TimeoutCommit instead of afterPublishEventNewBlockTimeout
TimeoutCommit is the time a node waits after committing a block, before
it goes into the next height. So it will finish everything from the last
block, but then wait a bit. The idea is this gives it time to hear more
votes from other validators, to strengthen the commit it includes in the
next block. But it also gives it time to hear about new transactions.
waitForBlockWithUpdatedVals
rewrite WAL crash tests
Task:
test that we can recover from any WAL crash.
Solution:
the old tests were relying on event hub being run in the same thread (we
were injecting the private validator's last signature).
when considering a rewrite, we considered two possible solutions: write
a "fuzzy" testing system where WAL is crashing upon receiving a new
message, or inject failures and trigger them in tests using something
like https://github.com/coreos/gofail.
remove sleep
no cs.Lock around wal.Save
test different cases (empty block, non-empty block, ...)
comments
add comments
test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks
fixes as per Bucky's last review
reset subscriptions on UnsubscribeAll
use a simple counter to track message for which we panicked
also, set a smaller part size for all test cases
2017-06-26 19:00:30 +04:00
|
|
|
conR.SetEventBus(eventBus)
|
2016-11-23 18:20:46 -05:00
|
|
|
|
2019-02-06 18:23:25 +04:00
|
|
|
var conRI p2p.Reactor = conR
|
2017-05-29 23:11:40 -04:00
|
|
|
|
2018-03-06 13:40:09 +04:00
|
|
|
// make first val byzantine
|
2016-11-23 18:20:46 -05:00
|
|
|
if i == 0 {
|
|
|
|
conRI = NewByzantineReactor(conR)
|
|
|
|
}
|
2018-03-06 13:40:09 +04:00
|
|
|
|
2016-11-23 18:20:46 -05:00
|
|
|
reactors[i] = conRI
|
2019-05-02 05:15:53 +08:00
|
|
|
sm.SaveState(css[i].blockExec.DB(), css[i].state) //for save height 1's validators info
|
2016-11-23 18:20:46 -05:00
|
|
|
}
|
|
|
|
|
new pubsub package
comment out failing consensus tests for now
rewrite rpc httpclient to use new pubsub package
import pubsub as tmpubsub, query as tmquery
make event IDs constants
EventKey -> EventTypeKey
rename EventsPubsub to PubSub
mempool does not use pubsub
rename eventsSub to pubsub
new subscribe API
fix channel size issues and consensus tests bugs
refactor rpc client
add missing discardFromChan method
add mutex
rename pubsub to eventBus
remove IsRunning from WSRPCConnection interface (not needed)
add a comment in broadcastNewRoundStepsAndVotes
rename registerEventCallbacks to broadcastNewRoundStepsAndVotes
See https://dave.cheney.net/2014/03/19/channel-axioms
stop eventBuses after reactor tests
remove unnecessary Unsubscribe
return subscribe helper function
move discardFromChan to where it is used
subscribe now returns an err
this gives us ability to refuse to subscribe if pubsub is at its max
capacity.
use context for control overflow
cache queries
handle err when subscribing in replay_test
rename testClientID to testSubscriber
extract var
set channel buffer capacity to 1 in replay_file
fix byzantine_test
unsubscribe from single event, not all events
refactor httpclient to return events to appropriate channels
return failing testReplayCrashBeforeWriteVote test
fix TestValidatorSetChanges
refactor code a bit
fix testReplayCrashBeforeWriteVote
add comment
fix TestValidatorSetChanges
fixes from Bucky's review
update comment [ci skip]
test TxEventBuffer
update changelog
fix TestValidatorSetChanges (2nd attempt)
only do wg.Done when no errors
benchmark event bus
create pubsub server inside NewEventBus
only expose config params (later if needed)
set buffer capacity to 0 so we are not testing cache
new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ}
This should allow to subscribe to all transactions! or a specific one
using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'"
use TimeoutCommit instead of afterPublishEventNewBlockTimeout
TimeoutCommit is the time a node waits after committing a block, before
it goes into the next height. So it will finish everything from the last
block, but then wait a bit. The idea is this gives it time to hear more
votes from other validators, to strengthen the commit it includes in the
next block. But it also gives it time to hear about new transactions.
waitForBlockWithUpdatedVals
rewrite WAL crash tests
Task:
test that we can recover from any WAL crash.
Solution:
the old tests were relying on event hub being run in the same thread (we
were injecting the private validator's last signature).
when considering a rewrite, we considered two possible solutions: write
a "fuzzy" testing system where WAL is crashing upon receiving a new
message, or inject failures and trigger them in tests using something
like https://github.com/coreos/gofail.
remove sleep
no cs.Lock around wal.Save
test different cases (empty block, non-empty block, ...)
comments
add comments
test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks
fixes as per Bucky's last review
reset subscriptions on UnsubscribeAll
use a simple counter to track message for which we panicked
also, set a smaller part size for all test cases
2017-06-26 19:00:30 +04:00
|
|
|
defer func() {
|
|
|
|
for _, r := range reactors {
|
|
|
|
if rr, ok := r.(*ByzantineReactor); ok {
|
|
|
|
rr.reactor.Switch.Stop()
|
|
|
|
} else {
|
|
|
|
r.(*ConsensusReactor).Switch.Stop()
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}()
|
|
|
|
|
2017-05-02 00:43:49 -04:00
|
|
|
p2p.MakeConnectedSwitches(config.P2P, N, func(i int, s *p2p.Switch) *p2p.Switch {
|
2016-11-23 18:20:46 -05:00
|
|
|
// ignore new switch s, we already made ours
|
|
|
|
switches[i].AddReactor("CONSENSUS", reactors[i])
|
|
|
|
return switches[i]
|
|
|
|
}, func(sws []*p2p.Switch, i, j int) {
|
|
|
|
// the network starts partitioned with globally active adversary
|
|
|
|
if i != 0 {
|
|
|
|
return
|
|
|
|
}
|
|
|
|
p2p.Connect2Switches(sws, i, j)
|
|
|
|
})
|
|
|
|
|
2018-06-04 20:48:35 -07:00
|
|
|
// start the non-byz state machines.
|
|
|
|
// note these must be started before the byz
|
2016-12-22 22:02:58 -05:00
|
|
|
for i := 1; i < N; i++ {
|
|
|
|
cr := reactors[i].(*ConsensusReactor)
|
2017-10-26 18:29:23 -04:00
|
|
|
cr.SwitchToConsensus(cr.conS.GetState(), 0)
|
2016-12-22 22:02:58 -05:00
|
|
|
}
|
|
|
|
|
2018-06-04 20:48:35 -07:00
|
|
|
// start the byzantine state machine
|
|
|
|
byzR := reactors[0].(*ByzantineReactor)
|
|
|
|
s := byzR.reactor.conS.GetState()
|
|
|
|
byzR.reactor.SwitchToConsensus(s, 0)
|
|
|
|
|
2016-11-23 18:20:46 -05:00
|
|
|
// byz proposer sends one block to peers[0]
|
|
|
|
// and the other block to peers[1] and peers[2].
|
|
|
|
// note peers and switches order don't match.
|
|
|
|
peers := switches[0].Peers().List()
|
2018-03-06 13:40:09 +04:00
|
|
|
|
|
|
|
// partition A
|
2016-11-23 18:20:46 -05:00
|
|
|
ind0 := getSwitchIndex(switches, peers[0])
|
2018-03-06 13:40:09 +04:00
|
|
|
|
|
|
|
// partition B
|
2016-11-23 18:20:46 -05:00
|
|
|
ind1 := getSwitchIndex(switches, peers[1])
|
|
|
|
ind2 := getSwitchIndex(switches, peers[2])
|
|
|
|
p2p.Connect2Switches(switches, ind1, ind2)
|
|
|
|
|
2018-03-06 13:40:09 +04:00
|
|
|
// wait for someone in the big partition (B) to make a block
|
2019-02-23 08:11:27 +04:00
|
|
|
<-blocksSubs[ind2].Out()
|
2016-11-23 18:20:46 -05:00
|
|
|
|
2017-05-02 11:53:32 +04:00
|
|
|
t.Log("A block has been committed. Healing partition")
|
2016-11-23 18:20:46 -05:00
|
|
|
p2p.Connect2Switches(switches, ind0, ind1)
|
|
|
|
p2p.Connect2Switches(switches, ind0, ind2)
|
|
|
|
|
|
|
|
// wait till everyone makes the first new block
|
|
|
|
// (one of them already has)
|
|
|
|
wg := new(sync.WaitGroup)
|
|
|
|
wg.Add(2)
|
|
|
|
for i := 1; i < N-1; i++ {
|
|
|
|
go func(j int) {
|
2019-02-23 08:11:27 +04:00
|
|
|
<-blocksSubs[j].Out()
|
2016-11-23 18:20:46 -05:00
|
|
|
wg.Done()
|
|
|
|
}(i)
|
|
|
|
}
|
|
|
|
|
|
|
|
done := make(chan struct{})
|
|
|
|
go func() {
|
|
|
|
wg.Wait()
|
|
|
|
close(done)
|
|
|
|
}()
|
|
|
|
|
|
|
|
tick := time.NewTicker(time.Second * 10)
|
|
|
|
select {
|
|
|
|
case <-done:
|
|
|
|
case <-tick.C:
|
|
|
|
for i, reactor := range reactors {
|
2018-08-10 00:25:57 -05:00
|
|
|
t.Log(fmt.Sprintf("Consensus Reactor %v", i))
|
|
|
|
t.Log(fmt.Sprintf("%v", reactor))
|
2016-11-23 18:20:46 -05:00
|
|
|
}
|
|
|
|
t.Fatalf("Timed out waiting for all validators to commit first block")
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
//-------------------------------
|
|
|
|
// byzantine consensus functions
|
|
|
|
|
2017-12-01 19:04:53 -06:00
|
|
|
func byzantineDecideProposalFunc(t *testing.T, height int64, round int, cs *ConsensusState, sw *p2p.Switch) {
|
2016-11-23 18:20:46 -05:00
|
|
|
// byzantine user should create two proposals and try to split the vote.
|
|
|
|
// Avoid sending on internalMsgQueue and running consensus state.
|
|
|
|
|
|
|
|
// Create a new proposal block from state/txs from the mempool.
|
|
|
|
block1, blockParts1 := cs.createProposalBlock()
|
2019-06-21 07:58:32 +02:00
|
|
|
polRound, propBlockID := cs.ValidRound, types.BlockID{Hash: block1.Hash(), PartsHeader: blockParts1.Header()}
|
2018-10-31 15:27:11 +01:00
|
|
|
proposal1 := types.NewProposal(height, round, polRound, propBlockID)
|
2017-09-06 11:50:43 -04:00
|
|
|
if err := cs.privValidator.SignProposal(cs.state.ChainID, proposal1); err != nil {
|
|
|
|
t.Error(err)
|
|
|
|
}
|
2016-11-23 18:20:46 -05:00
|
|
|
|
|
|
|
// Create a new proposal block from state/txs from the mempool.
|
|
|
|
block2, blockParts2 := cs.createProposalBlock()
|
2019-06-21 07:58:32 +02:00
|
|
|
polRound, propBlockID = cs.ValidRound, types.BlockID{Hash: block2.Hash(), PartsHeader: blockParts2.Header()}
|
2018-10-31 15:27:11 +01:00
|
|
|
proposal2 := types.NewProposal(height, round, polRound, propBlockID)
|
2017-09-06 11:50:43 -04:00
|
|
|
if err := cs.privValidator.SignProposal(cs.state.ChainID, proposal2); err != nil {
|
|
|
|
t.Error(err)
|
|
|
|
}
|
2016-11-23 18:20:46 -05:00
|
|
|
|
|
|
|
block1Hash := block1.Hash()
|
|
|
|
block2Hash := block2.Hash()
|
|
|
|
|
|
|
|
// broadcast conflicting proposals/block parts to peers
|
|
|
|
peers := sw.Peers().List()
|
2017-05-02 11:53:32 +04:00
|
|
|
t.Logf("Byzantine: broadcasting conflicting proposals to %d peers", len(peers))
|
2016-11-23 18:20:46 -05:00
|
|
|
for i, peer := range peers {
|
|
|
|
if i < len(peers)/2 {
|
|
|
|
go sendProposalAndParts(height, round, cs, peer, proposal1, block1Hash, blockParts1)
|
|
|
|
} else {
|
|
|
|
go sendProposalAndParts(height, round, cs, peer, proposal2, block2Hash, blockParts2)
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-12-01 19:04:53 -06:00
|
|
|
func sendProposalAndParts(height int64, round int, cs *ConsensusState, peer p2p.Peer, proposal *types.Proposal, blockHash []byte, parts *types.PartSet) {
|
2016-11-23 18:20:46 -05:00
|
|
|
// proposal
|
|
|
|
msg := &ProposalMessage{Proposal: proposal}
|
2018-04-05 07:05:45 -07:00
|
|
|
peer.Send(DataChannel, cdc.MustMarshalBinaryBare(msg))
|
2016-11-23 18:20:46 -05:00
|
|
|
|
|
|
|
// parts
|
|
|
|
for i := 0; i < parts.Total(); i++ {
|
|
|
|
part := parts.GetPart(i)
|
|
|
|
msg := &BlockPartMessage{
|
|
|
|
Height: height, // This tells peer that this part applies to us.
|
|
|
|
Round: round, // This tells peer that this part applies to us.
|
|
|
|
Part: part,
|
|
|
|
}
|
2018-04-05 07:05:45 -07:00
|
|
|
peer.Send(DataChannel, cdc.MustMarshalBinaryBare(msg))
|
2016-11-23 18:20:46 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
// votes
|
|
|
|
cs.mtx.Lock()
|
2018-10-13 01:21:46 +02:00
|
|
|
prevote, _ := cs.signVote(types.PrevoteType, blockHash, parts.Header())
|
|
|
|
precommit, _ := cs.signVote(types.PrecommitType, blockHash, parts.Header())
|
2016-11-23 18:20:46 -05:00
|
|
|
cs.mtx.Unlock()
|
|
|
|
|
2018-04-05 07:05:45 -07:00
|
|
|
peer.Send(VoteChannel, cdc.MustMarshalBinaryBare(&VoteMessage{prevote}))
|
|
|
|
peer.Send(VoteChannel, cdc.MustMarshalBinaryBare(&VoteMessage{precommit}))
|
2016-11-23 18:20:46 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
//----------------------------------------
|
|
|
|
// byzantine consensus reactor
|
|
|
|
|
|
|
|
type ByzantineReactor struct {
|
2017-10-04 16:40:45 -04:00
|
|
|
cmn.Service
|
2016-11-23 18:20:46 -05:00
|
|
|
reactor *ConsensusReactor
|
|
|
|
}
|
|
|
|
|
|
|
|
func NewByzantineReactor(conR *ConsensusReactor) *ByzantineReactor {
|
|
|
|
return &ByzantineReactor{
|
|
|
|
Service: conR,
|
|
|
|
reactor: conR,
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
func (br *ByzantineReactor) SetSwitch(s *p2p.Switch) { br.reactor.SetSwitch(s) }
|
|
|
|
func (br *ByzantineReactor) GetChannels() []*p2p.ChannelDescriptor { return br.reactor.GetChannels() }
|
2017-09-12 20:49:22 -04:00
|
|
|
func (br *ByzantineReactor) AddPeer(peer p2p.Peer) {
|
2016-11-23 18:20:46 -05:00
|
|
|
if !br.reactor.IsRunning() {
|
|
|
|
return
|
|
|
|
}
|
|
|
|
|
|
|
|
// Create peerState for peer
|
2017-09-12 20:49:22 -04:00
|
|
|
peerState := NewPeerState(peer).SetLogger(br.reactor.Logger)
|
|
|
|
peer.Set(types.PeerStateKey, peerState)
|
2016-11-23 18:20:46 -05:00
|
|
|
|
|
|
|
// Send our state to peer.
|
|
|
|
// If we're fast_syncing, broadcast a RoundStepMessage later upon SwitchToConsensus().
|
|
|
|
if !br.reactor.fastSync {
|
2018-10-31 14:20:36 +01:00
|
|
|
br.reactor.sendNewRoundStepMessage(peer)
|
2016-11-23 18:20:46 -05:00
|
|
|
}
|
|
|
|
}
|
2017-09-12 20:49:22 -04:00
|
|
|
func (br *ByzantineReactor) RemovePeer(peer p2p.Peer, reason interface{}) {
|
2016-11-23 18:20:46 -05:00
|
|
|
br.reactor.RemovePeer(peer, reason)
|
|
|
|
}
|
2017-09-12 20:49:22 -04:00
|
|
|
func (br *ByzantineReactor) Receive(chID byte, peer p2p.Peer, msgBytes []byte) {
|
2016-11-23 18:20:46 -05:00
|
|
|
br.reactor.Receive(chID, peer, msgBytes)
|
|
|
|
}
|
2019-05-28 03:39:58 +09:00
|
|
|
func (br *ByzantineReactor) InitPeer(peer p2p.Peer) p2p.Peer { return peer }
|