129 Commits

Author SHA1 Message Date
Anton Kaliaev
a076b48202
libs/db: boltdb: use slice instead of sync.Map (#3633)
for storing batch keys and values in boltDBBatch.

NOTE: batch does not have to be safe for concurrent access. Delete may
be slow, but given it should not be used often, it's ok.

Fixes #3631
2019-05-07 14:06:20 +04:00
Anton Kaliaev
a7358bc69f
libs/db: conditional compilation (#3628)
* libs/db: conditional compilation

For cleveldb: go build -tags cleveldb
For boltdb: go build -tags boltdb

Fixes #3611

* document db_backend param better

* remove deprecated LevelDBBackend

* update changelog

* add missing lines

* add new line

* fix TestRemoteDB

* add a line about boltdb tag

* Revert "remove deprecated LevelDBBackend"

This reverts commit 1aa85453f76605e0c4d967601bbe26240e668d51.

* make PR non breaking

* change DEPRECATED label format

https://stackoverflow.com/a/36360323/820520
2019-05-07 12:33:47 +04:00
Leo
d0d9ef16f7 libs/db: fix boltdb batching
`[]byte` is unhashable and needs to be casted to a string.

See https://github.com/tendermint/tendermint/pull/3610#discussion_r280995538

Ref https://github.com/tendermint/tendermint/issues/3626
2019-05-06 20:10:39 +00:00
Anton Kaliaev
60b833403c
libs/db: close boltDBIterator (#3627)
Refs https://github.com/tendermint/tendermint/pull/3610#discussion_r281201274

If we do not close, other txs will be stuck forever.
2019-05-06 21:37:41 +04:00
Anton Kaliaev
debf8f70c9
libs/db: bbolt (etcd's fork of bolt) (#3610)
Description:

    add boltdb implement for db interface.

    The boltdb is safe and high performence embedded KV database. It is based on B+tree and Mmap.
    Now it is maintained by etcd-io org. https://github.com/etcd-io/bbolt
    It is much better than LSM tree (levelDB) when RANDOM and SCAN read.

Replaced #3604
Refs #1835

Commits:

* add boltdb for storage

* update boltdb in gopkg.toml

* update bbolt in Gopkg.lock

* dep update  Gopkg.lock

* add boltdb'bucket when create Boltdb

* some modifies

* address my own comments

* fix most of the Iterator tests for boltdb

* bolt does not support concurrent writes while iterating

* add WARNING word

* note that ReadOnly option is not supported

* fixes after Ismail's review

Co-Authored-By: melekes <anton.kalyaev@gmail.com>

* panic if View returns error

plus specify bucket in the top comment
2019-05-06 13:42:21 +04:00
JamesRay
2c26d95ab9 cs/replay: execCommitBlock should not read from state.lastValidators (#3067)
* execCommitBlock should not read from state.lastValidators

* fix height 1

* fix blockchain/reactor_test

* fix consensus/mempool_test

* fix consensus/reactor_test

* fix consensus/replay_test

* add CHANGELOG

* fix consensus/reactor_test

* fix consensus/replay_test

* add a test for replay validators change

* fix mem_pool test

* fix byzantine test

* remove a redundant code

* reduce validator change blocks to 6

* fix

* return peer0 config

* seperate testName

* seperate testName 1

* seperate testName 2

* seperate app db path

* seperate app db path 1

* add a lock before startNet

* move the lock to reactor_test

* simulate just once

* try to find problem

* handshake only saveState when app version changed

* update gometalinter to 3.0.0 (#3233)

in the attempt to fix https://circleci.com/gh/tendermint/tendermint/43165

also

    code is simplified by running gofmt -s .
    remove unused vars
    enable linters we're currently passing
    remove deprecated linters
(cherry picked from commit d47094550315c094512a242445e0dde24b5a03f5)

* gofmt code

* goimport code

* change the bool name to testValidatorsChange

* adjust receive kvstore.ProtocolVersion

* adjust receive kvstore.ProtocolVersion 1

* adjust receive kvstore.ProtocolVersion 3

* fix merge execution.go

* fix merge develop

* fix merge develop 1

* fix run cleanupFunc

* adjust code according to reviewers' opinion

* modify the func name match the convention

* simplify simulate a chain containing some validator change txs 1

* test CI error

* Merge remote-tracking branch 'upstream/develop' into fixReplay 1

* fix pubsub_test

* subscribeUnbuffered vote channel
2019-05-01 17:15:53 -04:00
Thane Thomson
70592cc4d8 libs/common: remove deprecated PanicXXX functions (#3595)
* Remove deprecated PanicXXX functions from codebase

As per discussion over
[here](https://github.com/tendermint/tendermint/pull/3456#discussion_r278423492),
we need to remove these `PanicXXX` functions and eliminate our
dependence on them. In this PR, each and every `PanicXXX` function call
is replaced with a simple `panic` call.

* add a changelog entry
2019-04-26 14:23:43 +04:00
kevlubkcm
f2aa1bf50e bandaid for non-deterministic clist test (#3575)
* add a deterministic timeout

Co-Authored-By: kevlubkcm <36485490+kevlubkcm@users.noreply.github.com>
2019-04-17 18:14:01 +02:00
kevlubkcm
621c0e629d docs: fix typo in clist readme (#3574) 2019-04-17 19:09:17 +04:00
hucc
5b8888b01b common: CMap: slight optimization in Keys() and Values(). (#3567) 2019-04-16 12:04:08 +04:00
zjubfd
439312b9c0 blockchain: dismiss request channel delay (#3459)
Fixes #3457

The topic of the issue is that : write a BlockRequest int requestsCh channel will create an timer at the same time that stop the peer 15s later if no block have been received . But pop a BlockRequest from requestsCh and send it out may delay more than 15s later. So that the peer will be stopped for error("send nothing to us").
Extracting requestsCh into its own goroutine can make sure that every BlockRequest been handled timely.

Instead of the requestsCh handling, we should probably pull the didProcessCh handling in a separate go routine since this is the one "starving" the other channel handlers. I believe the way it is right now, we still have issues with high delays in errorsCh handling that might cause sending requests to invalid/ disconnected peers.
2019-04-16 11:54:19 +04:00
zjubfd
2233dd45bd libs: remove useless code in group (#3504)
* lib: remove useless code in group
* update change log
* Update CHANGELOG_PENDING.md

Co-Authored-By: guagualvcha <baifudong@lancai.cn>
2019-03-29 18:47:53 +01:00
tracebundy
660bd4a53e fix comment (#3454) 2019-03-20 08:30:49 -04:00
Anton Kaliaev
7af4b5086a Remove RepeatTimer and refactor Switch#Broadcast (#3429)
* p2p: refactor Switch#Broadcast func

- call wg.Add only once
- do not call peers.List twice!
  * bad for perfomance
  * peers list can change in between calls!

Refs #3306

* p2p: use time.Ticker instead of RepeatTimer

no need in RepeatTimer since we don't Reset them

Refs #3306

* libs/common: remove RepeatTimer (also TimerMaker and Ticker interface)

"ancient code that’s caused no end of trouble" Ethan

I believe there's much simplier way to write a ticker than can be reset
https://medium.com/@arpith/resetting-a-ticker-in-go-63858a2c17ec
2019-03-19 20:10:54 -04:00
Anton Kaliaev
7457133307
grpcdb: close Iterator/ReverseIterator after use (#3424)
Fixes #3402
2019-03-14 15:00:58 +04:00
Anton Kaliaev
d741c7b478
limit number of /subscribe clients and queries per client (#3269)
* limit number of /subscribe clients and queries per client

Add the following config variables (under [rpc] section):
  * max_subscription_clients
  * max_subscriptions_per_client
  * timeout_broadcast_tx_commit

Fixes #2826

new HTTPClient interface for subscriptions

finalize HTTPClient events interface

remove EventSubscriber

fix data race

```
WARNING: DATA RACE
Read at 0x00c000a36060 by goroutine 129:
  github.com/tendermint/tendermint/rpc/client.(*Local).Subscribe.func1()
      /go/src/github.com/tendermint/tendermint/rpc/client/localclient.go:168 +0x1f0

Previous write at 0x00c000a36060 by goroutine 132:
  github.com/tendermint/tendermint/rpc/client.(*Local).Subscribe()
      /go/src/github.com/tendermint/tendermint/rpc/client/localclient.go:191 +0x4e0
  github.com/tendermint/tendermint/rpc/client.WaitForOneEvent()
      /go/src/github.com/tendermint/tendermint/rpc/client/helpers.go:64 +0x178
  github.com/tendermint/tendermint/rpc/client_test.TestTxEventsSentWithBroadcastTxSync.func1()
      /go/src/github.com/tendermint/tendermint/rpc/client/event_test.go:139 +0x298
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:827 +0x162

Goroutine 129 (running) created at:
  github.com/tendermint/tendermint/rpc/client.(*Local).Subscribe()
      /go/src/github.com/tendermint/tendermint/rpc/client/localclient.go:164 +0x4b7
  github.com/tendermint/tendermint/rpc/client.WaitForOneEvent()
      /go/src/github.com/tendermint/tendermint/rpc/client/helpers.go:64 +0x178
  github.com/tendermint/tendermint/rpc/client_test.TestTxEventsSentWithBroadcastTxSync.func1()
      /go/src/github.com/tendermint/tendermint/rpc/client/event_test.go:139 +0x298
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:827 +0x162

Goroutine 132 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:878 +0x659
  github.com/tendermint/tendermint/rpc/client_test.TestTxEventsSentWithBroadcastTxSync()
      /go/src/github.com/tendermint/tendermint/rpc/client/event_test.go:119 +0x186
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:827 +0x162
==================
```

lite client works (tested manually)

godoc comments

httpclient: do not close the out channel

use TimeoutBroadcastTxCommit

no timeout for unsubscribe

but 1s Local (5s HTTP) timeout for resubscribe

format code

change Subscribe#out cap to 1

and replace config vars with RPCConfig

TimeoutBroadcastTxCommit can't be greater than rpcserver.WriteTimeout

rpc: Context as first parameter to all functions

reformat code

fixes after my own review

fixes after Ethan's review

add test stubs

fix config.toml

* fixes after manual testing

- rpc: do not recommend to use BroadcastTxCommit because it's slow and wastes
Tendermint resources (pubsub)
- rpc: better error in Subscribe and BroadcastTxCommit
- HTTPClient: do not resubscribe if err = ErrAlreadySubscribed

* fixes after Ismail's review

* Update rpc/grpc/grpc_test.go

Co-Authored-By: melekes <anton.kalyaev@gmail.com>
2019-03-11 22:45:58 +04:00
Yumin Xia
b021f1e505 libs/db: close batch (#3397)
ClevelDB requires closing when WriteBatch is no longer needed, https://godoc.org/github.com/jmhodges/levigo#WriteBatch.Close

Fixes the memory leak in https://github.com/cosmos/cosmos-sdk/issues/3842
2019-03-10 12:46:32 +04:00
Anton Kaliaev
f25d727035
make dupl linter pass (#3385)
Refs #3262
2019-03-07 09:10:34 +04:00
Jack Zampolin
8c9df30e28 libs/db: Add cleveldb.Stats() (#3379)
Fixes: #3378

* Add stats to cleveldb implementation

* update changelog

* remote TODO

also
- sort keys
- preallocate memory

* fix const initializer []string literal is not a constant

* add test
2019-03-05 10:56:46 +04:00
Anton Kaliaev
8a962ffc46
deps: update gogo/protobuf from 1.1.1 to 1.2.1 and golang/protobuf from 1.1.0 to 1.3.0 (#3357)
* deps: update gogo/proto from 1.1.1 to 1.2.1

- verified changes manually
  git diff 636bf030~ ba06b47c --stat -- ':!*.pb.go' ':!test'

* deps: update golang/protobuf from 1.1.0 to 1.3.0

- verified changes manually
  git diff b4deda0~ c823c79 -- ':!*.pb.go' ':!test'
2019-03-04 12:17:38 +04:00
zjubfd
d95894152b fix dirty data in peerset,resolve #3304 (#3359)
* fix dirty data in peerset

startInitPeer before PeerSet add the peer,
once mconnection start and Receive of one
Reactor faild, will try to remove it from PeerSet
while PeerSet still not contain the peer. Fix
this by change the order.

* fix test FilterDuplicate

* fix start/stop race

* fix err
2019-03-02 15:17:37 -05:00
Anton Kaliaev
ec9bff5234
rename WAL#Flush to WAL#FlushAndSync (#3345)
* rename WAL#Flush to WAL#FlushAndSync

- rename auto#Flush to auto#FlushAndSync
- cleanup WAL interface to not leak implementation details!
  * remove Group()
  * add WALReader interface and return it in SearchForEndHeight()
- add interface assertions

Refs #3337

* replace WALReader with io.ReadCloser
2019-02-25 09:11:07 +04:00
Anton Kaliaev
cdf3a74f48 Unclean shutdown on SIGINT / SIGTERM (#3308)
* libs/common: TrapSignal accepts logger as a first parameter

 and does not block anymore
* previously it was dumping "captured ..." msg to os.Stdout
* TrapSignal should not be responsible for blocking thread of execution

Refs #3238

* exit with zero (0) code upon receiving SIGTERM/SIGINT

Refs #3238

* fix formatting in docs/app-dev/abci-cli.md

Co-Authored-By: melekes <anton.kalyaev@gmail.com>

* fix formatting in docs/app-dev/abci-cli.md

Co-Authored-By: melekes <anton.kalyaev@gmail.com>
2019-02-23 10:48:28 -05:00
Anton Kaliaev
2137ecc130 pubsub 2.0 (#3227)
* green pubsub tests :OK:

* get rid of clientToQueryMap

* Subscribe and SubscribeUnbuffered

* start adapting other pkgs to new pubsub

* nope

* rename MsgAndTags to Message

* remove TagMap

it does not bring any additional benefits

* bring back EventSubscriber

* fix test

* fix data race in TestStartNextHeightCorrectly

```
Write at 0x00c0001c7418 by goroutine 796:
  github.com/tendermint/tendermint/consensus.TestStartNextHeightCorrectly()
      /go/src/github.com/tendermint/tendermint/consensus/state_test.go:1296 +0xad
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:827 +0x162

Previous read at 0x00c0001c7418 by goroutine 858:
  github.com/tendermint/tendermint/consensus.(*ConsensusState).addVote()
      /go/src/github.com/tendermint/tendermint/consensus/state.go:1631 +0x1366
  github.com/tendermint/tendermint/consensus.(*ConsensusState).tryAddVote()
      /go/src/github.com/tendermint/tendermint/consensus/state.go:1476 +0x8f
  github.com/tendermint/tendermint/consensus.(*ConsensusState).handleMsg()
      /go/src/github.com/tendermint/tendermint/consensus/state.go:667 +0xa1e
  github.com/tendermint/tendermint/consensus.(*ConsensusState).receiveRoutine()
      /go/src/github.com/tendermint/tendermint/consensus/state.go:628 +0x794

Goroutine 796 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:878 +0x659
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:1119 +0xa8
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:827 +0x162
  testing.runTests()
      /usr/local/go/src/testing/testing.go:1117 +0x4ee
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:1034 +0x2ee
  main.main()
      _testmain.go:214 +0x332

Goroutine 858 (running) created at:
  github.com/tendermint/tendermint/consensus.(*ConsensusState).startRoutines()
      /go/src/github.com/tendermint/tendermint/consensus/state.go:334 +0x221
  github.com/tendermint/tendermint/consensus.startTestRound()
      /go/src/github.com/tendermint/tendermint/consensus/common_test.go:122 +0x63
  github.com/tendermint/tendermint/consensus.TestStateFullRound1()
      /go/src/github.com/tendermint/tendermint/consensus/state_test.go:255 +0x397
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:827 +0x162
```

* fixes after my own review

* fix formatting

* wait 100ms before kicking a subscriber out

+ a test for indexer_service

* fixes after my second review

* no timeout

* add changelog entries

* fix merge conflicts

* fix typos after Thane's review

Co-Authored-By: melekes <anton.kalyaev@gmail.com>

* reformat code

* rewrite indexer service in the attempt to fix failing test

https://github.com/tendermint/tendermint/pull/3227/#issuecomment-462316527

* Revert "rewrite indexer service in the attempt to fix failing test"

This reverts commit 0d9107a098230de7138abb1c201877c246e89ed1.

* another attempt to fix indexer

* fixes after Ethan's review

* use unbuffered channel when indexing transactions

Refs https://github.com/tendermint/tendermint/pull/3227#discussion_r258786716

* add a comment for EventBus#SubscribeUnbuffered

* format code
2019-02-22 23:11:27 -05:00
Thane Thomson
dff3deb2a9 cs: sync WAL more frequently (#3300)
As per #3043, this adds a ticker to sync the WAL every 2s while the WAL is running.

* Flush WAL every 2s

This adds a ticker that flushes the WAL every 2s while the WAL is
running. This is related to #3043.

* Fix spelling

* Increase timeout to 2mins for slower build environments

* Make WAL sync interval configurable

* Add TODO to replace testChan with more comprehensive testBus

* Remove extraneous debug statement

* Remove testChan in favour of using system time

As per
https://github.com/tendermint/tendermint/pull/3300#discussion_r255886586,
this removes the `testChan` WAL member and replaces the approach with a
system time-oriented one. In this new approach, we keep track of the
system time at which each flush and periodic flush successfully
occurred.

The naming of the various functions is also updated here to be more
consistent with "flushing" as opposed to "sync'ing".

* Update naming convention and ensure lock for timestamp update

* Add Flush method as part of WAL interface

Adds a `Flush` method as part of the WAL interface to enforce the idea
that we can manually trigger a WAL flush from outside of the WAL. This
is employed in the consensus state management to flush the WAL prior to
signing votes/proposals, as per https://github.com/tendermint/tendermint/issues/3043#issuecomment-453853630

* Update CHANGELOG_PENDING

* Remove mutex approach and replace with DI

The dependency injection approach to dealing with testing concerns could
allow similar effects to some kind of "testing bus"-based approach. This
commit introduces an example of this, where instead of relying on
(potentially fragile) timing of things between the code and the test, we
inject code into the function under test that can signal the test
through a channel.

This allows us to avoid the `time.Sleep()`-based approach previously
employed.

* Update comment on WAL flushing during vote signing

Co-Authored-By: thanethomson <connect@thanethomson.com>

* Simplify flush interval definition

Co-Authored-By: thanethomson <connect@thanethomson.com>

* Expand commentary on WAL disk flushing

Co-Authored-By: thanethomson <connect@thanethomson.com>

* Add broken test to illustrate WAL sync test problem

Removes test-related state (dependency injection code) from the WAL data
structure and adds test code to illustrate the problem with using
`WALGenerateNBlocks` and `wal.SearchForEndHeight` to test periodic
sync'ing.

* Fix test error messages

* Use WAL group buffer size to check for flush

A function is added to `libs/autofile/group.go#Group` in order to return
the size of the buffered data (i.e. data that has not yet been flushed
to disk). The test now checks that, prior to a `time.Sleep`, the group
buffer has data in it. After the `time.Sleep` (during which time the
periodic flush should have been called), the buffer should be empty.

* Remove config root dir removal from #3291

* Add godoc for NewWAL mentioning periodic sync
2019-02-20 09:45:18 +04:00
Ismail Khoffi
b089587b42 make gosec linter pass (#3294)
* not related to linter: remove obsolete constants:
 - `Insecure` and `Secure` and type `Security` are not used anywhere

* not related to linter: update example

 - NewInsecure was deleted; change example to NewRemoteDB

* address: Binds to all network interfaces (gosec):

 - bind to localhost instead of 0.0.0.0
 - regenerate test key and cert for this purpose (was valid for ::) and
 otherwise we would see:
 transport: authentication handshake failed: x509: certificate is
 valid for ::, not 127.0.0.1\"

(used https://github.com/google/keytransparency/blob/master/scripts/gen_server_keys.sh
to regenerate certs)

* use sha256 in tests instead of md5; time difference is negligible

* nolint usage of math/rand in test and add comment on its import

 - crypto/rand is slower and we do not need sth more secure in tests

* enable linter in circle-ci

* another nolint math/rand in test

* replace another occurrence of md5

* consistent comment about importing math/rand
2019-02-12 08:54:12 +04:00
Anton Kaliaev
7fd51e6ade
make govet linter pass (#3292)
* make govet linter pass

Refs #3262

* close PipeReader and check for err
2019-02-11 16:31:34 +04:00
Anton Kaliaev
fcebdf6720
Merge pull request #3261 from tendermint/anton/circle
Revert "quick fix for CircleCI (#2279)"
2019-02-07 10:28:22 +04:00
Ethan Buchman
45b70ae031
fix non deterministic test failures and race in privval socket (#3258)
* node: decrease retry conn timeout in test

Should fix #3256

The retry timeout was set to the default, which is the same as the
accept timeout, so it's no wonder this would fail. Here we decrease the
retry timeout so we can try many times before the accept timeout.

* p2p: increase handshake timeout in test

This fails sometimes, presumably because the handshake timeout is so low
(only 50ms). So increase it to 1s. Should fix #3187

* privval: fix race with ping. closes #3237

Pings happen in a go-routine and can happen concurrently with other
messages. Since we use a request/response protocol, we expect to send a
request and get back the corresponding response. But with pings
happening concurrently, this assumption could be violated. We were using
a mutex, but only a RWMutex, where the RLock was being held for sending
messages - this was to allow the underlying connection to be replaced if
it fails. Turns out we actually need to use a full lock (not just a read
lock) to prevent multiple requests from happening concurrently.

* node: fix test name. DelayedStop -> DelayedStart

* autofile: Wait() method

In the TestWALTruncate in consensus/wal_test.go we remove the WAL
directory at the end of the test. However the wal.Stop() does not
properly wait for the autofile group to finish shutting down. Hence it
was possible that the group's go-routine is still running when the
cleanup happens, which causes a panic since the directory disappeared.
Here we add a Wait() method to properly wait until the go-routine exits
so we can safely clean up. This fixes #2852.
2019-02-06 10:24:43 -05:00
Ethan Buchman
4429826229
cmn: GetFreePort (#3255) 2019-02-06 10:14:03 -05:00
Anton Kaliaev
1386707ceb
rename TestGCRandom to _TestGCRandom 2019-02-06 18:36:54 +04:00
Anton Kaliaev
6941d1bb35
use nolint label instead of commenting 2019-02-06 18:21:32 +04:00
Anton Kaliaev
23314daee4
comment out clist tests w/ depend on runtime.SetFinalizer 2019-02-06 15:38:02 +04:00
Anton Kaliaev
3c8156a55a
preallocating memory when we can 2019-02-06 15:24:54 +04:00
Anton Kaliaev
da33dd04cc
switch to golangci-lint from gometalinter 🚤 2019-02-06 15:00:55 +04:00
Ethan Buchman
39eba4e154
WAL: better errors and new fail point (#3246)
* privval: more info in errors

* wal: change Debug logs to Info

* wal: log and return error on corrupted wal instead of panicing

* fail: Exit right away instead of sending interupt

* consensus: FAIL before handling our own vote

allows to replicate #3089:
- run using `FAIL_TEST_INDEX=0`
- delete some bytes from the end of the WAL
- start normally

Results in logs like:

```
I[2019-02-03|18:12:58.225] Searching for height module=consensus wal=/Users/ethanbuchman/.tendermint/data/cs.wal/wal height=1 min=0 max=0
E[2019-02-03|18:12:58.225] Error on catchup replay. Proceeding to start ConsensusState anyway module=consensus err="failed to read data: EOF"
I[2019-02-03|18:12:58.225] Started node module=main nodeInfo="{ProtocolVersion:{P2P:6 Block:9 App:1} ID_:35e87e93f2e31f305b65a5517fd2102331b56002 ListenAddr:tcp://0.0.0.0:26656 Network:test-chain-J8JvJH Version:0.29.1 Channels:4020212223303800 Moniker:Ethans-MacBook-Pro.local Other:{TxIndex:on RPCAddress:tcp://0.0.0.0:26657}}"
E[2019-02-03|18:12:58.226] Couldn't connect to any seeds module=p2p
I[2019-02-03|18:12:59.229] Timed out module=consensus dur=998.568ms height=1 round=0 step=RoundStepNewHeight
I[2019-02-03|18:12:59.230] enterNewRound(1/0). Current: 1/0/RoundStepNewHeight module=consensus height=1 round=0
I[2019-02-03|18:12:59.230] enterPropose(1/0). Current: 1/0/RoundStepNewRound module=consensus height=1 round=0
I[2019-02-03|18:12:59.230] enterPropose: Our turn to propose module=consensus height=1 round=0 proposer=AD278B7767B05D7FBEB76207024C650988FA77D5 privValidator="PrivValidator{AD278B7767B05D7FBEB76207024C650988FA77D5 LH:1, LR:0, LS:2}"
E[2019-02-03|18:12:59.230] enterPropose: Error signing proposal module=consensus height=1 round=0 err="Error signing proposal: Step regression at height 1 round 0. Got 1, last step 2"
I[2019-02-03|18:13:02.233] Timed out module=consensus dur=3s height=1 round=0 step=RoundStepPropose
I[2019-02-03|18:13:02.233] enterPrevote(1/0). Current: 1/0/RoundStepPropose module=consensus
I[2019-02-03|18:13:02.233] enterPrevote: ProposalBlock is nil module=consensus height=1 round=0
E[2019-02-03|18:13:02.234] Error signing vote module=consensus height=1 round=0 vote="Vote{0:AD278B7767B0 1/00/1(Prevote) 000000000000 000000000000 @ 2019-02-04T02:13:02.233897Z}" err="Error signing vote: Conflicting data"
```

Notice the EOF, the step regression, and the conflicting data.

* wal: change errors to be DataCorruptionError

* exit on corrupt WAL

* fix log

* fix new line
2019-02-04 13:00:06 -05:00
Anton Kaliaev
d470945503
update gometalinter to 3.0.0 (#3233)
in the attempt to fix https://circleci.com/gh/tendermint/tendermint/43165

also

    code is simplified by running gofmt -s .
    remove unused vars
    enable linters we're currently passing
    remove deprecated linters
2019-01-30 12:24:26 +04:00
Anton Kaliaev
8985a1fa63
pubsub: fixes after Ethan's review (#3212)
in https://github.com/tendermint/tendermint/pull/3209
2019-01-29 13:16:43 +04:00
Ethan Buchman
9b6b792ce7 pubsub: comments 2019-01-26 14:30:29 +04:00
Anton Kaliaev
5f4d8e031e [log] fix year format (#3125)
Refs #3060
2019-01-14 14:10:13 -05:00
Dev Ojha
df32ea4be5 Make testing logger that doesn't write to stdout (#2997) 2018-12-11 12:17:21 +04:00
Dev Ojha
4571f0fbe8 Enforce validators can only use the correct pubkey type (#2739)
* Enforce validators can only use the correct pubkey type

* adapt to variable renames

* Address comments from #2636

* separate updating and validation logic

* update spec

* Add test case for TestStringSliceEqual, clarify slice copying code

* Address @ebuchman's comments

* Split up testing validator update execution, and its validation
2018-11-28 09:09:27 -05:00
Jae Kwon
416d143bf7 R4R: Swap start/end in ReverseIterator (#2913)
* Swap start/end in ReverseIterator

* update CHANGELOG_PENDING

* fixes from review
2018-11-28 08:49:24 -05:00
Ethan Buchman
47a0669d12
Fix fast sync stack with wrong block #2457 (#2731)
* fix fastsync may stuck by a wrong block

* fixes from updates

* fixes from review

* Align spec with the changes

* fmt
2018-11-26 15:31:11 -05:00
Anton Kaliaev
98e442a8de return back initially allowed level if we encounter allowed key (#2889)
Fixes #2868 where module=main setting overrides all others
2018-11-25 23:34:22 -05:00
Anton Kaliaev
60018d6148 comment out until someone decides to tackle #2285 (#2760)
current code results in panic and we certainly don't want that.
https://github.com/tendermint/tendermint/pull/2286#issuecomment-418281846
2018-11-16 18:17:07 -05:00
Anton Kaliaev
5a6822c8ac abci: localClient improvements & bugfixes & pubsub Unsubscribe issues (#2748)
* use READ lock/unlock in ConsensusState#GetLastHeight

Refs #2721

* do not use defers when there's no need

* fix peer formatting (output its address instead of the pointer)

```
[54310]: E[11-02|11:59:39.851] Connection failed @ sendRoutine              module=p2p peer=0xb78f00 conn=MConn{74.207.236.148:26656} err="pong timeout"
```

https://github.com/tendermint/tendermint/issues/2721#issuecomment-435326581

* panic if peer has no state

https://github.com/tendermint/tendermint/issues/2721#issuecomment-435347165

It's confusing that sometimes we check if peer has a state, but most of
the times we expect it to be there

1. add79700b5/mempool/reactor.go (L138)
2. add79700b5/rpc/core/consensus.go (L196) (edited)

I will change everything to always assume peer has a state and panic
otherwise

that should help identify issues earlier

* abci/localclient: extend lock on app callback

App callback should be protected by lock as well (note this was already
done for InitChainAsync, why not for others???). Otherwise, when we
execute the block, tx might come in and call the callback in the same
time we're updating it in execBlockOnProxyApp => DATA RACE

Fixes #2721

Consensus state is locked

```
goroutine 113333 [semacquire, 309 minutes]:
sync.runtime_SemacquireMutex(0xc00180009c, 0xc0000c7e00)
        /usr/local/go/src/runtime/sema.go:71 +0x3d
sync.(*RWMutex).RLock(0xc001800090)
        /usr/local/go/src/sync/rwmutex.go:50 +0x4e
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).GetRoundState(0xc001800000, 0x0)
        /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:218 +0x46
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusReactor).queryMaj23Routine(0xc0017def80, 0x11104a0, 0xc0072488f0, 0xc007248
9c0)
        /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/reactor.go:735 +0x16d
created by github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusReactor).AddPeer
        /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/reactor.go:172 +0x236
```

because localClient is locked

```
goroutine 1899 [semacquire, 309 minutes]:
sync.runtime_SemacquireMutex(0xc00003363c, 0xc0000cb500)
        /usr/local/go/src/runtime/sema.go:71 +0x3d
sync.(*Mutex).Lock(0xc000033638)
        /usr/local/go/src/sync/mutex.go:134 +0xff
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/abci/client.(*localClient).SetResponseCallback(0xc0001fb560, 0xc007868540)
        /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/abci/client/local_client.go:32 +0x33
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/proxy.(*appConnConsensus).SetResponseCallback(0xc00002f750, 0xc007868540)
        /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/proxy/app_conn.go:57 +0x40
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/state.execBlockOnProxyApp(0x1104e20, 0xc002ca0ba0, 0x11092a0, 0xc00002f750, 0xc0001fe960, 0xc000bfc660, 0x110cfe0, 0xc000090330, 0xc9d12, 0xc000d9d5a0, ...)
        /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/state/execution.go:230 +0x1fd
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/state.(*BlockExecutor).ApplyBlock(0xc002c2a230, 0x7, 0x0, 0xc000eae880, 0x6, 0xc002e52c60, 0x16, 0x1f927, 0xc9d12, 0xc000d9d5a0, ...)
        /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/state/execution.go:96 +0x142
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).finalizeCommit(0xc001800000, 0x1f928)
        /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1339 +0xa3e
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).tryFinalizeCommit(0xc001800000, 0x1f928)
        /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1270 +0x451
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).enterCommit.func1(0xc001800000, 0x0, 0x1f928)
        /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1218 +0x90
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).enterCommit(0xc001800000, 0x1f928, 0x0)
        /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1247 +0x6b8
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).addVote(0xc001800000, 0xc003d8dea0, 0xc000cf4cc0, 0x28, 0xf1, 0xc003bc7ad0, 0xc003bc7b10)
        /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1659 +0xbad
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).tryAddVote(0xc001800000, 0xc003d8dea0, 0xc000cf4cc0, 0x28, 0xf1, 0xf1, 0xf1)
        /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1517 +0x59
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).handleMsg(0xc001800000, 0xd98200, 0xc0070dbed0, 0xc000cf4cc0, 0x28)
        /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:660 +0x64b
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).receiveRoutine(0xc001800000, 0x0)
        /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:617 +0x670
created by github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).OnStart
        /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:311 +0x132
```

tx comes in and CheckTx is executed right when we execute the block

```
goroutine 111044 [semacquire, 309 minutes]:
sync.runtime_SemacquireMutex(0xc00003363c, 0x0)
        /usr/local/go/src/runtime/sema.go:71 +0x3d
sync.(*Mutex).Lock(0xc000033638)
        /usr/local/go/src/sync/mutex.go:134 +0xff
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/abci/client.(*localClient).CheckTxAsync(0xc0001fb0e0, 0xc002d94500, 0x13f, 0x280, 0x0)
        /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/abci/client/local_client.go:85 +0x47
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/proxy.(*appConnMempool).CheckTxAsync(0xc00002f720, 0xc002d94500, 0x13f, 0x280, 0x1)
        /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/proxy/app_conn.go:114 +0x51
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/mempool.(*Mempool).CheckTx(0xc002d3a320, 0xc002d94500, 0x13f, 0x280, 0xc0072355f0, 0x0, 0x0)
        /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/mempool/mempool.go:316 +0x17b
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/core.BroadcastTxSync(0xc002d94500, 0x13f, 0x280, 0x0, 0x0, 0x0)
        /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/core/mempool.go:93 +0xb8
reflect.Value.call(0xd85560, 0x10326c0, 0x13, 0xec7b8b, 0x4, 0xc00663f180, 0x1, 0x1, 0xc00663f180, 0xc00663f188, ...)
        /usr/local/go/src/reflect/value.go:447 +0x449
reflect.Value.Call(0xd85560, 0x10326c0, 0x13, 0xc00663f180, 0x1, 0x1, 0x0, 0x0, 0xc005cc9344)
        /usr/local/go/src/reflect/value.go:308 +0xa4
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server.makeHTTPHandler.func2(0x1102060, 0xc00663f100, 0xc0082d7900)
        /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server/handlers.go:269 +0x188
net/http.HandlerFunc.ServeHTTP(0xc002c81f20, 0x1102060, 0xc00663f100, 0xc0082d7900)
        /usr/local/go/src/net/http/server.go:1964 +0x44
net/http.(*ServeMux).ServeHTTP(0xc002c81b60, 0x1102060, 0xc00663f100, 0xc0082d7900)
        /usr/local/go/src/net/http/server.go:2361 +0x127
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server.maxBytesHandler.ServeHTTP(0x10f8a40, 0xc002c81b60, 0xf4240, 0x1102060, 0xc00663f100, 0xc0082d7900)
        /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server/http_server.go:219 +0xcf
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server.RecoverAndLogHandler.func1(0x1103220, 0xc00121e620, 0xc0082d7900)
        /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server/http_server.go:192 +0x394
net/http.HandlerFunc.ServeHTTP(0xc002c06ea0, 0x1103220, 0xc00121e620, 0xc0082d7900)
        /usr/local/go/src/net/http/server.go:1964 +0x44
net/http.serverHandler.ServeHTTP(0xc001a1aa90, 0x1103220, 0xc00121e620, 0xc0082d7900)
        /usr/local/go/src/net/http/server.go:2741 +0xab
net/http.(*conn).serve(0xc00785a3c0, 0x11041a0, 0xc000f844c0)
        /usr/local/go/src/net/http/server.go:1847 +0x646
created by net/http.(*Server).Serve
        /usr/local/go/src/net/http/server.go:2851 +0x2f5
```

* consensus: use read lock in Receive#VoteMessage

* use defer to unlock mutex because application might panic

* use defer in every method of the localClient

* add a changelog entry

* drain channels before Unsubscribe(All)

Read 55362ed766/libs/pubsub/pubsub.go (L13)
for the detailed explanation of the issue.

We'll need to fix it someday. Make sure to keep an eye on
https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-033-pubsub.md

* retry instead of panic when peer has no state in reactors other than consensus

in /dump_consensus_state RPC endpoint, skip a peer with no state

* rpc/core/mempool: simplify error messages

* rpc/core/mempool: use time.After instead of timer

also, do not log DeliverTx result (to be consistent with other memthods)

* unlock before calling the callback in reqRes#SetCallback
2018-11-13 11:32:51 -05:00
Anton Kaliaev
1944d8534b test AutoFile#Size (happy path) 2018-11-09 16:29:43 +01:00
Anton Kaliaev
13badc1d29 [autofile/group] do not panic when checking size
It's OK if the head will grow a little bit bigger, but we'll avoid
panic.

Refs #2703
2018-11-09 16:29:43 +01:00
Anton Kaliaev
091d2c3e5e openFile creates a file if not exist => ErrNotExist is not possible 2018-11-09 16:29:43 +01:00