223 Commits

Author SHA1 Message Date
Anton Kaliaev
bcf10d5bae p2p: peer state init too late and pex message too soon (#3634)
* fix peer state init to late

Peer does not have a state yet. We set it in AddPeer.
We need an new interface before mconnection is started

* pex message to soon

fix reconnection pex send too fast,
error is caused lastReceivedRequests is still
not deleted when a peer reconnected

* add test case for initpeer

* add prove case

* remove potentially infinite loop

* Update consensus/reactor.go

Co-Authored-By: guagualvcha <baifudong@lancai.cn>

* Update consensus/reactor_test.go

Co-Authored-By: guagualvcha <baifudong@lancai.cn>

* document Reactor interface better

* refactor TestReactorReceiveDoesNotPanicIfAddPeerHasntBeenCalledYet

* fix merge conflicts

* blockchain: remove peer's ID from the pool in InitPeer

Refs #3338

* pex: resetPeersRequestsInfo both upon InitPeer and RemovePeer

* ensure RemovePeer is always called before InitPeer

by removing the peer from the switch last (after we've stopped it and
removed from all reactors)

* add some comments for ConsensusReactor#InitPeer

* fix pex reactor

* format code

* fix spelling

* update changelog

* remove unused methods

* do not clear lastReceivedRequests upon error

only in RemovePeer

* call InitPeer before we start the peer!

* add a comment to InitPeer

* write a test

* use waitUntilSwitchHasAtLeastNPeers func

* bring back timeouts

* Test to ensure Receive panics if InitPeer has not been called
2019-05-27 14:39:58 -04:00
Anton Kaliaev
1e073817de
rpc: /dial_peers: only mark peers as persistent if flag is on (#3620)
## Description

also

    handle errors from DialPeersAsync
    remove nil addr from log msg
    fix TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode

This is a follow-up from
#3593 (review)

Fixes most of the #3617, except #3593 (comment)

## Commits

* rpc: /dial_peers: only mark peers as persistent if flag is on

also

- handle errors from DialPeersAsync
- remove nil addr from log msg
- fix TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode

This is a follow-up from
https://github.com/tendermint/tendermint/pull/3593#pullrequestreview-233556909

* remove a call to AddPersistentPeers

TestDialFail will trigger a reconnect
2019-05-07 11:09:06 +04:00
Anton Kaliaev
8711af608f
p2p: make persistent prop independent of conn direction (#3593)
## Description

Previously only outbound peers can be persistent.

Now, even if the peer is inbound, if it's marked as persistent, when/if conn is lost,
Tendermint will try to reconnect. This part is actually optional and can be reverted.

Plus, seed won't disconnect from inbound peer if it's marked as
persistent. Fixes #3362

## Commits

* make persistent prop independent of conn direction

Previously only outbound peers can be persistent. Now, even if the peer
is inbound, if it's marked as persistent, when/if conn is lost,
Tendermint will try to reconnect.

Plus, seed won't disconnect from inbound peer if it's marked as
persistent. Fixes #3362

* fix TestPEXReactorDialPeer test

* add a changelog entry

* update changelog

* add two tests

* reformat code

* test UnsafeDialPeers and UnsafeDialSeeds

* add TestSwitchDialPeersAsync

* spec: update p2p/config spec

* fixes after Ismail's review

* Apply suggestions from code review

Co-Authored-By: melekes <anton.kalyaev@gmail.com>

* fix merge conflict

* remove sleep from TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode

We don't need it actually.
2019-05-03 17:21:56 +04:00
Thane Thomson
70592cc4d8 libs/common: remove deprecated PanicXXX functions (#3595)
* Remove deprecated PanicXXX functions from codebase

As per discussion over
[here](https://github.com/tendermint/tendermint/pull/3456#discussion_r278423492),
we need to remove these `PanicXXX` functions and eliminate our
dependence on them. In this PR, each and every `PanicXXX` function call
is replaced with a simple `panic` call.

* add a changelog entry
2019-04-26 14:23:43 +04:00
Alexander Simmerl
b5b3b85697
Bring back NodeInfo NetAddress form the dead (#3545)
A prior change to address accidental DNS lookups introduced the
SocketAddr on peer, which was then used to add it to the addressbook.
Which in turn swallowed the self reported port of the peer, which is
important on a reconnect. This change revives the NetAddress on NodeInfo
which the Peer carries, but now returns an error to avoid nil
dereferencing another issue observed in the past. Additionally we could
potentially address #3532, yet the original problem statemenf of that
issue stands.

As a drive-by optimisation `MarkAsGood` now takes only a `p2p.ID` which
makes it interface a bit stricter and leaner.
2019-04-12 12:31:02 +02:00
Anton Kaliaev
bcec8be035
p2p: do not log err if peer is private (#3474)
* add actionable advice for ErrAddrBookNonRoutable err

Should replace https://github.com/tendermint/tendermint/pull/3463

* reorder checks in addrbook#addAddress so

ErrAddrBookPrivate is returned first

and do not log error in DialPeersAsync if the address is private
because it's not an error
2019-04-11 15:32:16 +02:00
Anton Kaliaev
f965a4db15
p2p: seed mode refactoring (#3011)
ListOfKnownAddresses is removed
panic if addrbook size is less than zero
CrawlPeers does not attempt to connect to existing or peers we're currently dialing
various perf. fixes
improved tests (though not complete)
move IsDialingOrExistingAddress check into DialPeerWithAddress (Fixes #2716)


* addrbook: preallocate memory when saving addrbook to file

* addrbook: remove oldestFirst struct and check for ID

* oldestFirst replaced with sort.Slice
* ID is now mandatory, so no need to check

* addrbook: remove ListOfKnownAddresses

GetSelection is used instead in seed mode.

* addrbook: panic if size is less than 0

* rewrite addrbook#saveToFile to not use a counter

* test AttemptDisconnects func

* move IsDialingOrExistingAddress check into DialPeerWithAddress

* save and cleanup crawl peer data

* get rid of DefaultSeedDisconnectWaitPeriod

* make linter happy

* fix TestPEXReactorSeedMode

* fix comment

* add a changelog entry

* Apply suggestions from code review

Co-Authored-By: melekes <anton.kalyaev@gmail.com>

* rename ErrDialingOrExistingAddress to ErrCurrentlyDialingOrExistingAddress

* lowercase errors

* do not persist seed data

pros:
- no extra files
- less IO

cons:
- if the node crashes, seed might crawl a peer too soon

* fixes after Ethan's review

* add a changelog entry

* we should only consult Switch about peers

checking addrbook size does not make sense since only PEX reactor uses
it for dialing peers!

https://github.com/tendermint/tendermint/pull/3011#discussion_r270948875
2019-04-03 11:22:52 +02:00
Ethan Buchman
882622ec10
Fixes tendermint/tendermint#3522
* OriginalAddr -> SocketAddr

OriginalAddr records the originally dialed address for outbound peers,
rather than the peer's self reported address. For inbound peers, it was
nil. Here, we rename it to SocketAddr and for inbound peers, set it to
the RemoteAddr of the connection.

* use SocketAddr

Numerous places in the code call peer.NodeInfo().NetAddress().
However, this call to NetAddress() may perform a DNS lookup if the
reported NodeInfo.ListenAddr includes a name. Failure of this lookup
returns a nil address, which can lead to panics in the code.

Instead, call peer.SocketAddr() to return the static address of the
connection.

* remove nodeInfo.NetAddress()

Expose `transport.NetAddress()`, a static result determined
when the transport is created. Removing NetAddress() from the nodeInfo
prevents accidental DNS lookups.

* fixes from review

* linter

* fixes from review
2019-04-01 19:59:57 -04:00
Anton Kaliaev
7af4b5086a Remove RepeatTimer and refactor Switch#Broadcast (#3429)
* p2p: refactor Switch#Broadcast func

- call wg.Add only once
- do not call peers.List twice!
  * bad for perfomance
  * peers list can change in between calls!

Refs #3306

* p2p: use time.Ticker instead of RepeatTimer

no need in RepeatTimer since we don't Reset them

Refs #3306

* libs/common: remove RepeatTimer (also TimerMaker and Ticker interface)

"ancient code that’s caused no end of trouble" Ethan

I believe there's much simplier way to write a ticker than can be reset
https://medium.com/@arpith/resetting-a-ticker-in-go-63858a2c17ec
2019-03-19 20:10:54 -04:00
Anton Kaliaev
100ff08de9
p2p: do not panic when filter times out (#3384)
Fixes #3369
2019-03-11 15:31:53 +04:00
zjubfd
d95894152b fix dirty data in peerset,resolve #3304 (#3359)
* fix dirty data in peerset

startInitPeer before PeerSet add the peer,
once mconnection start and Receive of one
Reactor faild, will try to remove it from PeerSet
while PeerSet still not contain the peer. Fix
this by change the order.

* fix test FilterDuplicate

* fix start/stop race

* fix err
2019-03-02 15:17:37 -05:00
Anton Kaliaev
ffd3bf8448
remove or comment out unused code 2019-02-06 15:16:38 +04:00
Anton Kaliaev
2449bf7300 p2p: file descriptor leaks (#3150)
* close peer's connection to avoid fd leak

Fixes #2967

* rename peer#Addr to RemoteAddr

* fix test

* fixes after Ethan's review

* bring back the check

* changelog entry

* write a test for switch#acceptRoutine

* increase timeouts? :(

* remove extra assertNPeersWithTimeout

* simplify test

* assert number of peers (just to be safe)

* Cleanup in OnStop

* run tests with verbose flag on CircleCI

* spawn a reading routine to prevent connection from closing

* get port from the listener

random port is faster, but often results in

```
panic: listen tcp 127.0.0.1:44068: bind: address already in use [recovered]
        panic: listen tcp 127.0.0.1:44068: bind: address already in use

goroutine 79 [running]:
testing.tRunner.func1(0xc0001bd600)
        /usr/local/go/src/testing/testing.go:792 +0x387
panic(0x974d20, 0xc0001b0500)
        /usr/local/go/src/runtime/panic.go:513 +0x1b9
github.com/tendermint/tendermint/p2p.MakeSwitch(0xc0000f42a0, 0x0, 0x9fb9cc, 0x9, 0x9fc346, 0xb, 0xb42128, 0x0, 0x0, 0x0, ...)
        /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:182 +0xa28
github.com/tendermint/tendermint/p2p.MakeConnectedSwitches(0xc0000f42a0, 0x2, 0xb42128, 0xb41eb8, 0x4f1205, 0xc0001bed80, 0x4f16ed)
        /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:75 +0xf9
github.com/tendermint/tendermint/p2p.MakeSwitchPair(0xbb8d20, 0xc0001bd600, 0xb42128, 0x2f7, 0x4f16c0)
        /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:94 +0x4c
github.com/tendermint/tendermint/p2p.TestSwitches(0xc0001bd600)
        /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:117 +0x58
testing.tRunner(0xc0001bd600, 0xb42038)
        /usr/local/go/src/testing/testing.go:827 +0xbf
created by testing.(*T).Run
        /usr/local/go/src/testing/testing.go:878 +0x353
exit status 2
FAIL    github.com/tendermint/tendermint/p2p    0.350s
```
2019-01-22 13:23:18 -05:00
Ethan Buchman
a14fd8eba0
p2p: fix peer count mismatch #2332 (#2969)
* p2p: test case for peer count mismatch #2332

* p2p: fix peer count mismatch #2332

* changelog

* use httptest.Server to scrape Prometheus metrics
2018-12-05 07:32:27 -05:00
Ethan Buchman
1bb7e31d63
p2p: panic on transport error (#2968)
* p2p: panic on transport error

Addresses #2823. Currently, the acceptRoutine exits if the transport returns
an error trying to accept a new connection. Once this happens, the node
can't accept any new connections. So here, we panic instead. While we
could potentially be more intelligent by rerunning the acceptRoutine, the
error may indicate something more fundamental (eg. file desriptor limit)
that requires a restart anyways. We can leave it to process managers to
handle that restart, and notify operators about the panic.

* changelog
2018-12-04 19:16:06 -05:00
Ethan Buchman
6168b404a7
p2p: NewMultiplexTransport takes an MConnConfig (#2869)
* p2p: NewMultiplexTransport takes an MConnConfig

* changelog

* move test func to test file
2018-11-17 03:16:49 -05:00
Mehmet Gurevin
905abf1388 p2p: re-check after sleeps (#2664)
* p2p: re-check after sleeps

* use NodeInfo as an interface

* Revert "use NodeInfo as an interface"

This reverts commit 5f7d055e6c745ac8c8e5a9a7f0bd5ea5bc3d448c.

* Revert "p2p: re-check after sleeps"

This reverts commit 7f41070da070eadd3312efce1cc821aaf3e23771.

* preserve dial to itself

* ignore ensured connections while re-connecting

* re-check after sleep

* keep protocol definition on net addresses

* decrease log level

* Revert "preserve dial to itself"

This reverts commit 0c6e0fc58da78c378c32bb9ded2dd04ad5e754a9.

* correct func comment according to modification

Co-Authored-By: mgurevin <mehmet@gurevin.net>
2018-11-11 08:14:52 -05:00
Jae Kwon
5b19fcf204 p2p: AddressBook requires addresses to have IDs; Do not close conn immediately after sending pex addrs in seed mode (#2797)
* Require addressbook to only store addresses with valid ID

* Do not shut down peer immediately after sending pex addrs in SeedMode

* p2p: fix #2773

* seed mode: use go-routine to sleep before stopping peer
2018-11-11 06:50:25 -05:00
Ethan Buchman
746d137f86
p2p: Restore OriginalAddr (#2668)
* p2p: bring back OriginalAddr

* p2p: set OriginalAddr

* update changelog
2018-10-18 18:26:32 -04:00
Ethan Buchman
0baa7588c2
p2p: NodeInfo is an interface; General cleanup (#2556)
* p2p: NodeInfo is an interface

* (squash) fixes from review

* (squash) more fixes from review

* p2p: remove peerConn.HandshakeTimeout

* p2p: NodeInfo is two interfaces. Remove String()

* fixes from review

* remove test code from peer.RemoteIP()

* p2p: remove peer.OriginalAddr(). See #2618

* use a mockPeer in peer_set_test.go

* p2p: fix testNodeInfo naming

* p2p: remove unused var

* remove testRandNodeInfo

* fix linter

* fix retry dialing self

* fix rpc
2018-10-12 19:25:33 -04:00
goolAdapter
5f88fe0e9b fix p2p switch FlushThrottle value (#2569) 2018-10-08 10:05:12 +04:00
Matthew Slipper
587116dae1 metrics: Add additional metrics to p2p and consensus (#2425)
* Add additional metrics to p2p and consensus
Partially addresses https://github.com/cosmos/cosmos-sdk/issues/2169.
* WIP
* Updates from code review
* Updates from code review
* Add instrumentation namespace to configuration
* Fix test failure
* Updates from code review
* Add quotes
* Add atomic load
* Use storeint64
* Use addInt64 in writePacketMsgTo
2018-09-25 13:14:38 +02:00
Alexander Simmerl
bdd01310a0 p2p: Integrate new Transport
We are swapping the exisiting listener implementation with the newly
introduced Transport and its default implementation MultiplexTransport,
removing a large chunk of old connection setup and handling scattered
over the Peer and Switch code. The Switch requires a Transport now and
handles externally passed Peer filters.
2018-09-18 22:26:43 +02:00
Anton Kaliaev
eb5cf0f0dd ignore existing peers in DialPeersAsync (#2327)
* ignore existing peers in DialPeersAsync

Fixes #2253

* rename HasPeerWithAddress to IsDialingOrExistingAddress

[breaking] remove Switch#IsDialing

* check if addrBook is nil

to be consistent with other usages of addrBook across Switch

* different log messages for 2 use-cases
2018-09-05 11:52:22 -04:00
Anton Kaliaev
6fad8eaf5a [p2p/pex] connect to more than 10 peers (#2169)
* [p2p/pex] connect to more than 10 peers

also, remove DefaultMinNumOutboundPeers because a) I am not sure it's
needed b) it's super confusing

look closely

```
maxPeers := sw.config.MaxNumPeers - DefaultMinNumOutboundPeers
if maxPeers <= sw.peers.Size() {
  sw.Logger.Info("Ignoring inbound connection: already have enough peers", "address", inConn.RemoteAddr().String(), "numPeers", sw.peers.Size(), "max", maxPeers)
```

we print maxPeers = config.MaxPeers - DefaultMinNumOutboundPeers. So we
may not have enough peers even though we say we have enough.

Refs #2130

* update spec

* replace MaxNumPeers with MaxNumInboundPeers/MaxNumOutboundPeers

Refs #2130

* update changelog

* make max rpc conns formula visible to users

* update spec

* docs: note max outbound peers excludes persistent
2018-08-14 18:25:56 -04:00
Ethan Buchman
359898dcac p2p: fix conn leak. part of #2046 2018-07-24 21:53:37 -04:00
Anton Kaliaev
9962e598a0
reconnect to self-reported address if persistent peer is inbound (#2031)
* reconnect to self-reported address if persistent peer is inbound

* add a fixme
2018-07-23 21:15:08 +04:00
Anton Kaliaev
b31ee798bd
preserve original address and dial it instead of self-reported address (#1994)
Refs #1720
2018-07-18 13:23:29 +04:00
Ethan Buchman
d55243f0e6 fix import paths 2018-07-01 22:36:49 -04:00
Anton Kaliaev
61c5791fa3
revert back to Jae's original payload size limit
except now we calculate the max size using the maxPacketMsgSize()
function, which frees developers from having to know amino encoding
details.

plus, 10 additional bytes are added to leave the room for amino upgrades
(both making it more efficient / less efficient)
2018-06-29 12:57:17 +04:00
Liamsi
d2c05bc5b9 Revert "delete everything" (includes everything non-go-crypto)
This reverts commit 96a3502
2018-06-20 17:35:30 -07:00
Liamsi
96a3502126 delete everything 2018-06-20 15:19:08 -07:00
Anton Kaliaev
205d8b8062
fixes after @xla review
- move prometheus metrics into internal packages
- *Option structs
- misc. format changes
2018-06-20 12:40:25 +04:00
Anton Kaliaev
84812145cb
friendly apis for constructors 2018-06-20 12:40:25 +04:00
Anton Kaliaev
19699d644f
p2p metric, make height and totalTxs gauges 2018-06-20 12:38:45 +04:00
Alexander Simmerl
c661a3ec21
Fix race when mutating MConnConfig
Instead of mutating the passed in MConnConfig part of P2PConfig we just
use the default and override the values, the same as before as it was
always the default version. This is yet another good reason to not embed
information and access to config structs in our components and will go
away with the ongoing refactoring in #1325.
2018-06-07 01:09:13 +02:00
Alexander Simmerl
ea896865a7
Collapse PeerConfig into P2PConfig
As both configs are concerned with the p2p packaage and PeerConfig is
only used inside of the package there is no good reason to keep the
couple of fields separate, therefore it is collapsed into the more
general P2PConifg. This is a stepping stone towards a setup where the
components inside of p2p do not have any knowledge about the config.

follow-up to #1325
2018-06-05 02:07:56 +02:00
Alexander Simmerl
3255c076e5
Remove auth_enc config option
As we didn't hear any voices requesting this feature, we removed the
option to disable it and always have peer connection auth encrypted.

closes #1518
follow-up #1325
2018-06-01 21:07:20 +02:00
Ethan Buchman
d454b1b25f SkipDuplicate -> AllowDuplicate; fix p2p test on mac 2018-05-30 21:44:39 -04:00
Alexander Simmerl
5796e879b9
Introduce option to skip duplicate ip check
In some scenarios like tests we want to disable the guard which prevents
peers connecting from the same ip.

Fixes #1632
Closes #1634
2018-05-30 10:40:22 +02:00
Anton Kaliaev
2a0e9f93ce
provide arg to error
BEFORE:

```
E[05-24|11:55:37.229] Dialing failed                               pex=0 addr=022ec801d79025caab3afbbf816d92ff8450d040@127.0.0.2:6593 err="Connect to self: <nil>" attempts=0
```

AFTER:

```
E[05-24|11:55:37.229] Dialing failed                               pex=0 addr=022ec801d79025caab3afbbf816d92ff8450d040@127.0.0.2:6593 err="Connect to self: 022ec801d79025caab3afbbf816d92ff8450d040@127.0.0.2:6593" attempts=0
```
2018-05-25 15:11:32 +04:00
Alexander Simmerl
01fd102dba
Incoporate review feedback 2018-05-23 01:56:03 +02:00
Alexander Simmerl
91b6d3f18c
Do not set address for self error 2018-05-21 18:47:14 +02:00
Alexander Simmerl
77f09f5b5e
Move to ne.IP 2018-05-16 19:21:12 +02:00
Ethan Buchman
1fe41be929
p2p: prevent connections from same ip 2018-05-16 19:21:12 +02:00
Ethan Buchman
68a0b3f95b version bump. add roadmap back. minor fixes 2018-05-15 22:42:29 -04:00
Ethan Buchman
fae94a44a2 p2p/pex: some addrbook fixes
* fix before/after in isBad()
* allow multiple IPs per ID even if ID isOld
2018-04-28 20:09:02 -04:00
Ethan Buchman
c90bf77566 rpc: add n_peers to /net_info 2018-04-28 16:09:18 -04:00
Ethan Buchman
6805ddf1b8 p2p: change some logs from Error to Debug. #1476 2018-04-28 16:00:45 -04:00
Ethan Buchman
2761861b6b p2p: MinNumOutboundPeers. Closes #1501 2018-04-28 15:52:05 -04:00