tendermint

mirror of https://github.com/fluencelabs/tendermint synced 2025-05-29 14:11:21 +00:00

History

HaoyangLiu 1bb8e02a96 mempool: fix broadcastTxRoutine leak (#3478 )

Refs #3306, irisnet@fdbb676

I ran an irishub validator. After the validator node ran several days, I dump the whole goroutine stack. I found that there were hundreds of broadcastTxRoutine. However, the connected peer quantity was less than 30. So I belive that there must be broadcastTxRoutine leakage issue.

According to my analysis, I think the root cause of this issue locate in below code:

		select {
		case <-next.NextWaitChan():
			// see the start of the for loop for nil check
			next = next.Next()
		case <-peer.Quit():
			return
		case <-memR.Quit():
			return
		}

As we know, if multiple paths are avaliable in the same time, then a random path will be selected. Suppose that next.NextWaitChan() and peer.Quit() are both avaliable, and next.NextWaitChan() is chosen.

                // send memTx
		msg := &TxMessage{Tx: memTx.tx}
		success := peer.Send(MempoolChannel, cdc.MustMarshalBinaryBare(msg))
		if !success {
			time.Sleep(peerCatchupSleepIntervalMS * time.Millisecond)
			continue
		}

Then next will be non-empty and the peer send operation won't be success. As a result, this go routine will be track into infinite loop and won't be released.

My proposal is to check peer.Quit() and memR.Quit() in every loop no matter whether next is nil.

2019-03-26 09:29:06 +01:00

bench_test.go

mempool no gossip back (#2778 )

2019-03-26 09:27:29 +01:00

cache_test.go

mempool no gossip back (#2778 )

2019-03-26 09:27:29 +01:00

mempool_test.go

mempool no gossip back (#2778 )

2019-03-26 09:27:29 +01:00

mempool.go

mempool no gossip back (#2778 )