7276 Commits

Author SHA1 Message Date
Thane Thomson
7b162f5c54 node: refactor node.NewNode (#3456)
The node.NewNode method is pretty complex at the moment, an in order to address issues like #3156, we need to simplify the interface for partial node instantiation. In some places, we don't need to build up a full node (like in the node.TestCreateProposalBlock test), but the complexity of such partial instantiation needs to be reduced.

This PR aims to eventually make this easier/simpler.

See also this gist https://gist.github.com/thanethomson/56e1640d057a26186e38ad678a1d114c for some background work done when starting to refactor here.

## Commits:

* [WIP] Refactor node.NewNode to simplify

The `node.NewNode` method is pretty complex at the moment, an in order
to address issues like #3156, we need to simplify the interface for
partial node instantiation. In some places, we don't need to build up a
full node (like in the `node.TestCreateProposalBlock` test), but the
complexity of such partial instantiation needs to be reduced.

This PR aims to eventually make this easier/simpler.

* Refactor state loading and genesis doc provider into state package

* Refactor for clarity of return parameters

* Fix incorrect capitalization of error messages

* Simplify extracted functions' names

* Document optionally-prefixed functions

* Refactor optionallyFastSync for clarity of separation of concerns

* Restructure function for early return

* Restructure function for early return

* Remove dependence on deprecated panic functions

* refactor code a bit more

plus, expose PEXReactor on node

* align logger names

* add a changelog entry

* align logger names 2

* add a note about PEXReactor returning nil
2019-04-26 16:05:39 +04:00
Thane Thomson
70592cc4d8 libs/common: remove deprecated PanicXXX functions (#3595)
* Remove deprecated PanicXXX functions from codebase

As per discussion over
[here](https://github.com/tendermint/tendermint/pull/3456#discussion_r278423492),
we need to remove these `PanicXXX` functions and eliminate our
dependence on them. In this PR, each and every `PanicXXX` function call
is replaced with a simple `panic` call.

* add a changelog entry
2019-04-26 14:23:43 +04:00
Ethan Buchman
90997ab1b5 docs: update contributing.md (#3503)
Minor updates to reflect squash merging and how to prepare releases
2019-04-23 13:34:14 +04:00
Zarko Milosevic
b738add80c cs: fix nondeterministic tests (#3582)
Should fix #3451, #2723 and #3317.

Test TestResetTimeoutPrecommitUponNewHeight is simplified so it reduces a risk of timeout failure. Furthermore, timeout we wait for TimeoutEvents is increased, and the timeout value is more precisely computed. This should hopefully decrease a chance of non-deterministic test failures.

This assertion is problematic to ensure consistently due to dependency on scheduler. On the other hand, if I am not wrong, order in which messages are read from the channel respects order in which messages are written. Therefore, process will receive 2f+1 precommits that are not all for v (one is for nil) so TriggeredTimeoutPrecommit would be set to true. So we don't need to assert it. I know that it would be better to still assert to it but I don't know how to do it without sleep and that is ugly and is causing us nondeterministic failures.
2019-04-23 13:19:16 +04:00
Greg Hill
968e955c46 testnet cmd: add config option (#3559)
Option to explicitly provide the -config in the testnet command, closing #3160.
2019-04-23 12:52:46 +04:00
Anton Kaliaev
ebf815ee57
cs/replay: check appHash for each block (#3579)
* check every block appHash

Fixes #3460

Not really needed, but it would detect if the app changed in a way it
shouldn't have.

* add a changelog entry

* no need to return an error if we panic

* rename methods

* rename methods once again

* add a test

* correct error msg

plus fix a few go-lint warnings

* better panic msg
2019-04-23 12:22:40 +04:00
Ismail Khoffi
8db7e74b87 privval: increase timeout to mitigate non-deterministic test failure (#3580)
This should fix #3576 (ran it many times locally but only time will tell). The test actually only checked for the opcode of the error. From the name of the test we actually want to test if we see a timeout after a pre-defined time.

## Commits:

* increase readWrite timeout as it is also used in the `Accept` of the tcp
listener:

 - before this caused the readWriteTimeout to kick in (rarely) while Accept
 - as a side-effect: remove obsolete time.Sleep: in both listener cases
 the Accept will only return after successfully accepting and the timeout
 that is supposed to be tested here will be triggered because there is a
 read without a write
 - see if we actually run into a timeout error (the whole purpose of
 this test AFAIU)

Signed-off-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>

* makee local test-vars `const`

Signed-off-by: Ismail Khoffi <Ismail.Khoffi@gmail.com>

## Additional comments:

@xla: Confusing how an accept could take longer than that, but assuming a noisy environment full of little docker whales will be slower than what 50 years of silicon are capable of. The only thing I'd be vary of is that we mask structural issues with the code by just bumping the timeout, if we are sensitive towards that it warrants invesigation, but again this might only be true in the environment our CI runs in.
2019-04-22 12:04:04 +04:00
Ethan Buchman
4253e67c07
Merge pull request #3571 from tendermint/v0.31
V0.31.5
2019-04-19 07:46:11 -04:00
Sean Braithwaite
671c5c9b84 crypto: Proof of Concept for iterative version of SimpleHashFromByteSlices (#2611) (#3530)
(#2611) had suggested that an iterative version of
SimpleHashFromByteSlice would be faster, presumably because
 we can envision some overhead accumulating from stack
frames and function calls. Additionally, a recursive algorithm risks
hitting the stack limit and causing a stack overflow should the tree
be too large.

Provided here is an iterative alternative, a simple test to assert
correctness and a benchmark. On the performance side, there appears to
be no overall difference:

```
BenchmarkSimpleHashAlternatives/recursive-4                20000 77677 ns/op
BenchmarkSimpleHashAlternatives/iterative-4                20000 76802 ns/op
```

On the surface it might seem that the additional overhead is due to
the different allocation patterns of the implementations. The recursive
version uses a single `[][]byte` slices which it then re-slices at each level of the tree.
The iterative version reproduces `[][]byte` once within the function and
then rewrites sub-slices of that array at each level of the tree.

Eexperimenting by modifying the code to simply calculate the
hash and not store the result show little to no difference in performance.

These preliminary results suggest:
1. The performance of the current implementation is pretty good
2. Go has low overhead for recursive functions
3. The performance of the SimpleHashFromByteSlice routine is dominated
by the actual hashing of data

Although this work is in no way exhaustive, point #3 suggests that
optimizations of this routine would need to take an alternative
approach to make significant improvements on the current performance.

Finally, considering that the recursive implementation is easier to
read, it might not be worthwhile to switch to a less intuitive
implementation for so little benefit.

* re-add slice re-writing
* [crypto] Document SimpleHashFromByteSlicesIterative
2019-04-18 17:31:36 +02:00
kevlubkcm
f2aa1bf50e bandaid for non-deterministic clist test (#3575)
* add a deterministic timeout

Co-Authored-By: kevlubkcm <36485490+kevlubkcm@users.noreply.github.com>
2019-04-17 18:14:01 +02:00
Thane Thomson
90465f727f rpc: add support for batched requests/responses (#3534)
Continues from #3280 in building support for batched requests/responses in the JSON RPC (as per issue #3213).

* Add JSON RPC batching for client and server

As per #3213, this adds support for [JSON RPC batch requests and
responses](https://www.jsonrpc.org/specification#batch).

* Add additional checks to ensure client responses are the same as results

* Fix case where a notification is sent and no response is expected

* Add test to check that JSON RPC notifications in a batch are left out in responses

* Update CHANGELOG_PENDING.md

* Update PR number now that PR has been created

* Make errors start with lowercase letter

* Refactor batch functionality to be standalone

This refactors the batching functionality to rather act in a standalone
way. In light of supporting concurrent goroutines making use of the same
client, it would make sense to have batching functionality where one
could create a batch of requests per goroutine and send that batch
without interfering with a batch from another goroutine.

* Add examples for simple and batch HTTP client usage

* Check errors from writer and remove nolinter directives

* Make error strings start with lowercase letter

* Refactor examples to make them testable

* Use safer deferred shutdown for example Tendermint test node

* Recompose rpcClient interface from pre-existing interface components

* Rename WaitGroup for brevity

* Replace empty ID string with request ID

* Remove extraneous test case

* Convert first letter of errors.Wrap() messages to lowercase

* Remove extraneous function parameter

* Make variable declaration terse

* Reorder WaitGroup.Done call to help prevent race conditions in the face of failure

* Swap mutex to value representation and remove initialization

* Restore empty JSONRPC string ID in response to prevent nil

* Make JSONRPCBufferedRequest private

* Revert PR hard link in CHANGELOG_PENDING

* Add client ID for JSONRPCClient

This adds code to automatically generate a randomized client ID for the
JSONRPCClient, and adds a check of the IDs in the responses (if one was
set in the requests).

* Extract response ID validation into separate function

* Remove extraneous comments

* Reorder fields to indicate clearly which are protected by the mutex

* Refactor for loop to remove indexing

* Restructure and combine loop

* Flatten conditional block for better readability

* Make multi-variable declaration slightly more readable

* Change for loop style

* Compress error check statements

* Make function description more generic to show that we support different protocols

* Preallocate memory for request and result objects
2019-04-17 19:10:12 +04:00
kevlubkcm
621c0e629d docs: fix typo in clist readme (#3574) 2019-04-17 19:09:17 +04:00
Anton Kaliaev
c0e8fb5085
p2p: (seed mode) limit the number of attempts to connect to a peer (#3573)
* use dialPeer function in a seed mode

Fixes #3532

by storing a number of attempts we've tried to connect in-memory and
removing the address from addrbook when number of attempts > 16
2019-04-17 16:44:26 +04:00
Ethan Buchman
d2eab536ac
Merge pull request #3568 from tendermint/anton/release-v0.31.5
v0.31.5 changelog and version updates
v0.31.5
2019-04-16 16:51:18 -04:00
Ismail Khoffi
18bd5b627a Apply suggestions from code review
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
2019-04-16 15:13:30 +04:00
Anton Kaliaev
4474a5ec70
bump version 2019-04-16 13:38:54 +04:00
Anton Kaliaev
3cb7013c38
update changelog 2019-04-16 13:38:54 +04:00
hucc
5b8888b01b common: CMap: slight optimization in Keys() and Values(). (#3567) 2019-04-16 12:04:08 +04:00
zjubfd
439312b9c0 blockchain: dismiss request channel delay (#3459)
Fixes #3457

The topic of the issue is that : write a BlockRequest int requestsCh channel will create an timer at the same time that stop the peer 15s later if no block have been received . But pop a BlockRequest from requestsCh and send it out may delay more than 15s later. So that the peer will be stopped for error("send nothing to us").
Extracting requestsCh into its own goroutine can make sure that every BlockRequest been handled timely.

Instead of the requestsCh handling, we should probably pull the didProcessCh handling in a separate go routine since this is the one "starving" the other channel handlers. I believe the way it is right now, we still have issues with high delays in errorsCh handling that might cause sending requests to invalid/ disconnected peers.
2019-04-16 11:54:19 +04:00
dongsamb
f1cf10150a gitignore: add .vendor-new (#3566) 2019-04-16 08:49:03 +04:00
Anton Kaliaev
50b87c3445 state: Use last height changed if validator set is empty (#3560)
What happened:

New code was supposed to fall back to last height changed when/if it
failed to find validators at checkpoint height (to make release
non-breaking).

But because we did not check if validator set is empty, the fall back
logic was never executed => resulting in LoadValidators returning an
empty validator set for cases where `lastStoredHeight` is checkpoint
height (i.e. almost all heights if the application does not change
validator set often).

How it was found:

one of our users - @sunboshan reported a bug here
https://github.com/tendermint/tendermint/pull/3537#issuecomment-482711833

* use last height changed in validator set is empty
* add a changelog entry
2019-04-15 16:53:38 +02:00
Sean Braithwaite
f2119c35de adr: PeerBehaviour updates (#3558)
* [adr] Peer behaviour adr updates
* [docs] fix Behaved function signature
* [adr] typo fix in code example
2019-04-15 16:38:45 +02:00
Ethan Buchman
d35c08724c
Merge pull request #3563 from tendermint/master
Merge master to develop
2019-04-15 08:16:37 -04:00
Ethan Buchman
1c6d9d20e4
Merge pull request #3553 from tendermint/v0.31
V0.31
2019-04-15 08:16:00 -04:00
Ethan Buchman
4695414393
Merge pull request #3548 from tendermint/release/v0.31.4
Release/v0.31.4
v0.31.4
2019-04-12 10:56:03 -04:00
Ismail Khoffi
def5c8cf12 address review comments: (#3550)
- mention ADR in release summary
 - remove [p2p] api changes
 - amend v0.31.3 log to contain note about breaking change
2019-04-12 10:48:34 -04:00
Ismail Khoffi
b6da8880c2 prepare v0.31.4 release:
- prep changelog
 - add missing changelog entries
 - fix minor glitch in existing changelog (v0.31.2)
 - bump versions
2019-04-12 14:24:51 +02:00
Martin Dyring-Andersen
a453628c4e Fix a couple of typos (#3547)
Fix some typos in p2p/transport.go
2019-04-12 13:25:14 +02:00
Sean Braithwaite
4e4224213f adr: Peer Behaviour (#3539)
* [adr] ADR 037: Peer Behaviour inital draft
* Update docs/architecture/adr-037-peer-behaviour.md

Co-Authored-By: brapse <brapse@gmail.com>

* Update docs/architecture/adr-037-peer-behaviour.md
Co-Authored-By: brapse <brapse@gmail.com>

* [docs] adr-037 Better footnote styling
* [ADR] ADR-037 adjust Footnotes for github markdown
* [ADR] ADR-037 fix numbered list
2019-04-12 12:32:00 +02:00
Alexander Simmerl
b5b3b85697
Bring back NodeInfo NetAddress form the dead (#3545)
A prior change to address accidental DNS lookups introduced the
SocketAddr on peer, which was then used to add it to the addressbook.
Which in turn swallowed the self reported port of the peer, which is
important on a reconnect. This change revives the NetAddress on NodeInfo
which the Peer carries, but now returns an error to avoid nil
dereferencing another issue observed in the past. Additionally we could
potentially address #3532, yet the original problem statemenf of that
issue stands.

As a drive-by optimisation `MarkAsGood` now takes only a `p2p.ID` which
makes it interface a bit stricter and leaner.
2019-04-12 12:31:02 +02:00
Anton Kaliaev
18d2c45c33
rpc: Fix response time grow over time (#3537)
* rpc: store validator info periodly

* increase ValidatorSetStoreInterval

also

- unexpose it
- add a comment
- refactor code
- add a benchmark, which shows that 100000 results in ~ 100ms to get 100
validators

* make the change non-breaking

* expand comment

* rename valSetStoreInterval to valSetCheckpointInterval

* change the panic msg

* add a test and changelog entry

* update changelog entry

* update changelog entry

* add a link to PR

* fix test

* Update CHANGELOG_PENDING.md

Co-Authored-By: melekes <anton.kalyaev@gmail.com>

* update comment

* use MaxInt64 func
2019-04-12 10:46:07 +02:00
Anton Kaliaev
c3df21fe82 add missing changelog entry (#3544)
* add missing changelog entry
2019-04-11 17:59:14 +02:00
Anton Kaliaev
bcec8be035
p2p: do not log err if peer is private (#3474)
* add actionable advice for ErrAddrBookNonRoutable err

Should replace https://github.com/tendermint/tendermint/pull/3463

* reorder checks in addrbook#addAddress so

ErrAddrBookPrivate is returned first

and do not log error in DialPeersAsync if the address is private
because it's not an error
2019-04-11 15:32:16 +02:00
Anton Kaliaev
9a415b0572
docs: abci#Commit: better explain the possible deadlock (#3536) 2019-04-09 18:21:35 +02:00
Anton Kaliaev
40da355234
docs: fix block.Header.Time description (#3529)
It's not proposer local time anymore, but a weighted median

Fixes #3514
2019-04-03 14:56:51 +02:00
Anton Kaliaev
f965a4db15
p2p: seed mode refactoring (#3011)
ListOfKnownAddresses is removed
panic if addrbook size is less than zero
CrawlPeers does not attempt to connect to existing or peers we're currently dialing
various perf. fixes
improved tests (though not complete)
move IsDialingOrExistingAddress check into DialPeerWithAddress (Fixes #2716)


* addrbook: preallocate memory when saving addrbook to file

* addrbook: remove oldestFirst struct and check for ID

* oldestFirst replaced with sort.Slice
* ID is now mandatory, so no need to check

* addrbook: remove ListOfKnownAddresses

GetSelection is used instead in seed mode.

* addrbook: panic if size is less than 0

* rewrite addrbook#saveToFile to not use a counter

* test AttemptDisconnects func

* move IsDialingOrExistingAddress check into DialPeerWithAddress

* save and cleanup crawl peer data

* get rid of DefaultSeedDisconnectWaitPeriod

* make linter happy

* fix TestPEXReactorSeedMode

* fix comment

* add a changelog entry

* Apply suggestions from code review

Co-Authored-By: melekes <anton.kalyaev@gmail.com>

* rename ErrDialingOrExistingAddress to ErrCurrentlyDialingOrExistingAddress

* lowercase errors

* do not persist seed data

pros:
- no extra files
- less IO

cons:
- if the node crashes, seed might crawl a peer too soon

* fixes after Ethan's review

* add a changelog entry

* we should only consult Switch about peers

checking addrbook size does not make sense since only PEX reactor uses
it for dialing peers!

https://github.com/tendermint/tendermint/pull/3011#discussion_r270948875
2019-04-03 11:22:52 +02:00
Ethan Buchman
75ffa2bf1c
Merge pull request #3528 from tendermint/v0.31
Merge v0.31.3 to master
2019-04-02 19:18:57 -04:00
Ethan Buchman
086d6cbe8c
Merge pull request #3527 from tendermint/v0.31
Merge V0.31.3 back to develop
2019-04-02 16:49:44 -04:00
Ethan Buchman
6cc3f4d87c
Merge pull request #3525 from tendermint/release/v0.31.3
Release/v0.31.3
v0.31.3
2019-04-02 16:45:04 -04:00
Ethan Buchman
3cfd9757a7
changelog and version v0.31.3 2019-04-02 09:14:33 -04:00
Ethan Buchman
882622ec10
Fixes tendermint/tendermint#3522
* OriginalAddr -> SocketAddr

OriginalAddr records the originally dialed address for outbound peers,
rather than the peer's self reported address. For inbound peers, it was
nil. Here, we rename it to SocketAddr and for inbound peers, set it to
the RemoteAddr of the connection.

* use SocketAddr

Numerous places in the code call peer.NodeInfo().NetAddress().
However, this call to NetAddress() may perform a DNS lookup if the
reported NodeInfo.ListenAddr includes a name. Failure of this lookup
returns a nil address, which can lead to panics in the code.

Instead, call peer.SocketAddr() to return the static address of the
connection.

* remove nodeInfo.NetAddress()

Expose `transport.NetAddress()`, a static result determined
when the transport is created. Removing NetAddress() from the nodeInfo
prevents accidental DNS lookups.

* fixes from review

* linter

* fixes from review
2019-04-01 19:59:57 -04:00
Ethan Buchman
1ecf814838
Fixes tendermint/tendermint#3439
* make sure we create valid private keys:

 - genPrivKey samples and rejects invalid fieldelems (like libsecp256k1)
 - GenPrivKeySecp256k1 uses `(sha(secret) mod (n − 1)) + 1`
 - fix typo, rename test file: s/secpk256k1/secp256k1/

* Update crypto/secp256k1/secp256k1.go
2019-04-01 19:45:57 -04:00
Greg Szabo
e4a03f249d Release message changelog link fix (#3519) 2019-04-01 14:18:18 -04:00
Ethan Buchman
56d8aa42b3
Merge pull request #3520 from tendermint/v0.31
Merge v0.31.2 release back to develop
2019-04-01 14:17:58 -04:00
Ismail Khoffi
79e9f20578
Merge pull request #3518 from tendermint/prepare-release-v0.31.2
Release v0.31.2
v0.31.2
2019-04-01 17:58:28 +02:00
Ismail Khoffi
ab24925c94 prepare changelog and bump versions to v0.31.2 2019-04-01 17:49:34 +02:00
Greg Szabo
0ae41cc663 Fix for wrong version tag (#3517)
* Fix for wrong version tag (tag on the release branch instead of master)
2019-04-01 17:47:00 +02:00
Ethan Buchman
422d04c8ba Bucky/mempool txsmap (#3512)
* mempool: resCb -> globalCb

* reqResCb takes an externalCb

* failing test for #3509

* txsMap is sync.Map

* update changelog
2019-03-31 13:14:18 +02:00
zjubfd
2233dd45bd libs: remove useless code in group (#3504)
* lib: remove useless code in group
* update change log
* Update CHANGELOG_PENDING.md

Co-Authored-By: guagualvcha <baifudong@lancai.cn>
2019-03-29 18:47:53 +01:00
Greg Szabo
9199f3f613 Release management using CircleCI (#3498)
* Release management using CircleCI

* Changelog updated
2019-03-29 12:57:16 +01:00