mirror of
https://github.com/fluencelabs/tendermint
synced 2025-05-16 00:21:19 +00:00
106 lines
5.7 KiB
Markdown
106 lines
5.7 KiB
Markdown
|
# ADR 011: Monitoring
|
||
|
|
||
|
## Changelog
|
||
|
|
||
|
08-06-2018: Initial draft
|
||
|
11-06-2018: Reorg after @xla comments
|
||
|
13-06-2018: Clarification about usage of labels
|
||
|
|
||
|
## Context
|
||
|
|
||
|
In order to bring more visibility into Tendermint, we would like it to report
|
||
|
metrics and, maybe later, traces of transactions and RPC queries. See
|
||
|
https://github.com/tendermint/tendermint/issues/986.
|
||
|
|
||
|
A few solutions were considered:
|
||
|
|
||
|
1. [Prometheus](https://prometheus.io)
|
||
|
a) Prometheus API
|
||
|
b) [go-kit metrics package](https://github.com/go-kit/kit/tree/master/metrics) as an interface plus Prometheus
|
||
|
c) [telegraf](https://github.com/influxdata/telegraf)
|
||
|
d) new service, which will listen to events emitted by pubsub and report metrics
|
||
|
5. [OpenCensus](https://opencensus.io/go/index.html)
|
||
|
|
||
|
### 1. Prometheus
|
||
|
|
||
|
Prometheus seems to be the most popular product out there for monitoring. It has
|
||
|
a Go client library, powerful queries, alerts.
|
||
|
|
||
|
**a) Prometheus API**
|
||
|
|
||
|
We can commit to using Prometheus in Tendermint, but I think Tendermint users
|
||
|
should be free to choose whatever monitoring tool they feel will better suit
|
||
|
their needs (if they don't have existing one already). So we should try to
|
||
|
abstract interface enough so people can switch between Prometheus and other
|
||
|
similar tools.
|
||
|
|
||
|
**b) go-kit metrics package as an interface**
|
||
|
|
||
|
metrics package provides a set of uniform interfaces for service
|
||
|
instrumentation and offers adapters to popular metrics packages:
|
||
|
|
||
|
https://godoc.org/github.com/go-kit/kit/metrics#pkg-subdirectories
|
||
|
|
||
|
Comparing to Prometheus API, we're losing customisability and control, but gaining
|
||
|
freedom in choosing any instrument from the above list given we will extract
|
||
|
metrics creation into a separate function (see "providers" in node/node.go).
|
||
|
|
||
|
**c) telegraf**
|
||
|
|
||
|
Unlike already discussed options, telegraf does not require modifying Tendermint
|
||
|
source code. You create something called an input plugin, which polls
|
||
|
Tendermint RPC every second and calculates the metrics itself.
|
||
|
|
||
|
While it may sound good, but some metrics we want to report are not exposed via
|
||
|
RPC or pubsub, therefore can't be accessed externally.
|
||
|
|
||
|
**d) service, listening to pubsub**
|
||
|
|
||
|
Same issue as the above.
|
||
|
|
||
|
### 2. opencensus
|
||
|
|
||
|
opencensus provides both metrics and tracing, which may be important in the
|
||
|
future. It's API looks different from go-kit and Prometheus, but looks like it
|
||
|
covers everything we need.
|
||
|
|
||
|
Unfortunately, OpenCensus go client does not define any
|
||
|
interfaces, so if we want to abstract away metrics we
|
||
|
will need to write interfaces ourselves.
|
||
|
|
||
|
### List of metrics
|
||
|
|
||
|
| | Name | Type | Labels | Description |
|
||
|
| - | --------------------------------------- | ------- | ------------------------- | ----------------------------------------------------------------------------- |
|
||
|
| A | height | Counter | height-X | |
|
||
|
| A | validators | Gauge | height-X | Number of validators who signed |
|
||
|
| A | missing_validators | Gauge | height-X | umber of validators who did not sign |
|
||
|
| A | byzantine_validators | Gauge | height-X | Number of validators who tried to double sign |
|
||
|
| A | block_interval | Timing | | Time between this and last block (Block.Header.Time) |
|
||
|
| | block_time | Timing | | Time to create a block (from creating a proposal to commit) |
|
||
|
| | time_between_blocks | Timing | | Time between committing last block and (receiving proposal creating proposal) |
|
||
|
| A | rounds | Counter | height-X | Number of rounds |
|
||
|
| | prevotes | Counter | height-X height-X-round-Y | |
|
||
|
| | precommits | Counter | height-X height-X-round-Y | |
|
||
|
| | prevotes_total_power | Counter | height-X height-X-round-Y | |
|
||
|
| | precommits_total_power | Counter | height-X height-X-round-Y | |
|
||
|
| A | num_txs | Counter | height-X | |
|
||
|
| | total_txs | Counter | | |
|
||
|
| | block_size | Gauge | height-X | In bytes |
|
||
|
| | peers | Gauge | | Number of peers node's connected to |
|
||
|
| | power | Gauge | | |
|
||
|
|
||
|
`A` - will be implemented in the fist place.
|
||
|
|
||
|
**Proposed solution**
|
||
|
|
||
|
## Status
|
||
|
|
||
|
## Consequences
|
||
|
|
||
|
### Positive
|
||
|
|
||
|
### Negative
|
||
|
|
||
|
### Neutral
|