mirror of
https://github.com/fluencelabs/tendermint
synced 2025-05-29 14:11:21 +00:00
Refs #3306, irisnet@fdbb676 I ran an irishub validator. After the validator node ran several days, I dump the whole goroutine stack. I found that there were hundreds of broadcastTxRoutine. However, the connected peer quantity was less than 30. So I belive that there must be broadcastTxRoutine leakage issue. According to my analysis, I think the root cause of this issue locate in below code: select { case <-next.NextWaitChan(): // see the start of the for loop for nil check next = next.Next() case <-peer.Quit(): return case <-memR.Quit(): return } As we know, if multiple paths are avaliable in the same time, then a random path will be selected. Suppose that next.NextWaitChan() and peer.Quit() are both avaliable, and next.NextWaitChan() is chosen. // send memTx msg := &TxMessage{Tx: memTx.tx} success := peer.Send(MempoolChannel, cdc.MustMarshalBinaryBare(msg)) if !success { time.Sleep(peerCatchupSleepIntervalMS * time.Millisecond) continue } Then next will be non-empty and the peer send operation won't be success. As a result, this go routine will be track into infinite loop and won't be released. My proposal is to check peer.Quit() and memR.Quit() in every loop no matter whether next is nil.