mirror of
https://github.com/fluencelabs/tendermint
synced 2025-06-11 20:31:20 +00:00
add labels column
This commit is contained in:
@ -1,103 +0,0 @@
|
|||||||
# ADR 010: Monitoring
|
|
||||||
|
|
||||||
## Changelog
|
|
||||||
|
|
||||||
08-06-2018: Initial draft
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
In order to bring more visibility into Tendermint, we would like it to report
|
|
||||||
metrics and, maybe later, traces of transactions and RPC queries. See
|
|
||||||
https://github.com/tendermint/tendermint/issues/986.
|
|
||||||
|
|
||||||
A few solutions were considered:
|
|
||||||
|
|
||||||
1. [Prometheus](https://prometheus.io)
|
|
||||||
a) Prometheus API
|
|
||||||
b) [go-kit metrics package](https://github.com/go-kit/kit/tree/master/metrics) as an interface plus Prometheus
|
|
||||||
c) [telegraf](https://github.com/influxdata/telegraf)
|
|
||||||
d) new service, which will listen to events emitted by pubsub and report metrics
|
|
||||||
5. [OpenCensus](https://opencensus.io/go/index.html)
|
|
||||||
|
|
||||||
### 1. Prometheus
|
|
||||||
|
|
||||||
Prometheus seems to be the most popular product out there for monitoring. It has
|
|
||||||
a Go client library, powerful queries, alerts.
|
|
||||||
|
|
||||||
**a) Prometheus API**
|
|
||||||
|
|
||||||
We can commit to using Prometheus in Tendermint, but I think Tendermint users
|
|
||||||
should be free to choose whatever monitoring tool they feel will better suit
|
|
||||||
their needs (if they don't have existing one already). So we should try to
|
|
||||||
abstract interface enough so people can switch between Prometheus and other
|
|
||||||
similar tools.
|
|
||||||
|
|
||||||
**b) go-kit metrics package as an interface**
|
|
||||||
|
|
||||||
metrics package provides a set of uniform interfaces for service
|
|
||||||
instrumentation and offers adapters to popular metrics packages:
|
|
||||||
|
|
||||||
https://godoc.org/github.com/go-kit/kit/metrics#pkg-subdirectories
|
|
||||||
|
|
||||||
Comparing to Prometheus API, we're losing customisability and control, but gaining
|
|
||||||
freedom in choosing any instrument from the above list given we will extract
|
|
||||||
metrics creation into a separate function (see "providers" in node/node.go).
|
|
||||||
|
|
||||||
**c) telegraf**
|
|
||||||
|
|
||||||
Unlike already discussed options, telegraf does not require modifying Tendermint
|
|
||||||
source code. You create something called an input plugin, which polls
|
|
||||||
Tendermint RPC every second and calculates the metrics itself.
|
|
||||||
|
|
||||||
While it may sound good, but some metrics we want to report are not exposed via
|
|
||||||
RPC or pubsub, therefore can't be accessed externally.
|
|
||||||
|
|
||||||
**d) service, listening to pubsub**
|
|
||||||
|
|
||||||
Same issue as the above.
|
|
||||||
|
|
||||||
### 2. opencensus
|
|
||||||
|
|
||||||
opencensus provides both metrics and tracing, which may be important in the
|
|
||||||
future. It's API looks different from go-kit and Prometheus, but looks like it
|
|
||||||
covers everything we need.
|
|
||||||
|
|
||||||
Unfortunately, OpenCensus go client does not define any
|
|
||||||
interfaces, so if we want to abstract away metrics we
|
|
||||||
will need to write interfaces ourselves.
|
|
||||||
|
|
||||||
### List of metrics
|
|
||||||
|
|
||||||
| | Name | Type | |
|
|
||||||
| - | --------------------------------------- | ------- | ----------------------------------------------------------------------------- |
|
|
||||||
| A | height | Counter | |
|
|
||||||
| A | validators:<height> | Gauge | Number of validators who signed |
|
|
||||||
| A | missing_validators:<height> | Gauge | Number of validators who did not sign |
|
|
||||||
| A | byzantine_validators:<height> | Gauge | Number of validators who tried to double sign |
|
|
||||||
| A | block_interval | Timing | Time between this and last block (Block.Header.Time) |
|
|
||||||
| | block_time | Timing | Time to create a block (from creating a proposal to commit) |
|
|
||||||
| | time_between_blocks | Timing | Time between committing last block and (receiving proposal creating proposal) |
|
|
||||||
| A | rounds:<height> | Counter | Number of rounds |
|
|
||||||
| | prevotes:<height>:<round> | Counter | |
|
|
||||||
| | precommits:<height>:<round> | Counter | |
|
|
||||||
| | prevotes_total_power:<height>:<round> | Counter | |
|
|
||||||
| | precommits_total_power:<height>:<round> | Counter | |
|
|
||||||
| A | num_txs:<height> | Counter | |
|
|
||||||
| | total_txs | Counter | |
|
|
||||||
| | block_size:<height> | Gauge | In bytes |
|
|
||||||
| | peers | Gauge | Number of peers node's connected to |
|
|
||||||
| | power | Gauge | |
|
|
||||||
|
|
||||||
`A` - will be implemented in the fist place.
|
|
||||||
|
|
||||||
**Proposed solution**
|
|
||||||
|
|
||||||
## Status
|
|
||||||
|
|
||||||
## Consequences
|
|
||||||
|
|
||||||
### Positive
|
|
||||||
|
|
||||||
### Negative
|
|
||||||
|
|
||||||
### Neutral
|
|
105
docs/architecture/adr-011-monitoring.md
Normal file
105
docs/architecture/adr-011-monitoring.md
Normal file
@ -0,0 +1,105 @@
|
|||||||
|
# ADR 011: Monitoring
|
||||||
|
|
||||||
|
## Changelog
|
||||||
|
|
||||||
|
08-06-2018: Initial draft
|
||||||
|
11-06-2018: Reorg after @xla comments
|
||||||
|
13-06-2018: Clarification about usage of labels
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
In order to bring more visibility into Tendermint, we would like it to report
|
||||||
|
metrics and, maybe later, traces of transactions and RPC queries. See
|
||||||
|
https://github.com/tendermint/tendermint/issues/986.
|
||||||
|
|
||||||
|
A few solutions were considered:
|
||||||
|
|
||||||
|
1. [Prometheus](https://prometheus.io)
|
||||||
|
a) Prometheus API
|
||||||
|
b) [go-kit metrics package](https://github.com/go-kit/kit/tree/master/metrics) as an interface plus Prometheus
|
||||||
|
c) [telegraf](https://github.com/influxdata/telegraf)
|
||||||
|
d) new service, which will listen to events emitted by pubsub and report metrics
|
||||||
|
5. [OpenCensus](https://opencensus.io/go/index.html)
|
||||||
|
|
||||||
|
### 1. Prometheus
|
||||||
|
|
||||||
|
Prometheus seems to be the most popular product out there for monitoring. It has
|
||||||
|
a Go client library, powerful queries, alerts.
|
||||||
|
|
||||||
|
**a) Prometheus API**
|
||||||
|
|
||||||
|
We can commit to using Prometheus in Tendermint, but I think Tendermint users
|
||||||
|
should be free to choose whatever monitoring tool they feel will better suit
|
||||||
|
their needs (if they don't have existing one already). So we should try to
|
||||||
|
abstract interface enough so people can switch between Prometheus and other
|
||||||
|
similar tools.
|
||||||
|
|
||||||
|
**b) go-kit metrics package as an interface**
|
||||||
|
|
||||||
|
metrics package provides a set of uniform interfaces for service
|
||||||
|
instrumentation and offers adapters to popular metrics packages:
|
||||||
|
|
||||||
|
https://godoc.org/github.com/go-kit/kit/metrics#pkg-subdirectories
|
||||||
|
|
||||||
|
Comparing to Prometheus API, we're losing customisability and control, but gaining
|
||||||
|
freedom in choosing any instrument from the above list given we will extract
|
||||||
|
metrics creation into a separate function (see "providers" in node/node.go).
|
||||||
|
|
||||||
|
**c) telegraf**
|
||||||
|
|
||||||
|
Unlike already discussed options, telegraf does not require modifying Tendermint
|
||||||
|
source code. You create something called an input plugin, which polls
|
||||||
|
Tendermint RPC every second and calculates the metrics itself.
|
||||||
|
|
||||||
|
While it may sound good, but some metrics we want to report are not exposed via
|
||||||
|
RPC or pubsub, therefore can't be accessed externally.
|
||||||
|
|
||||||
|
**d) service, listening to pubsub**
|
||||||
|
|
||||||
|
Same issue as the above.
|
||||||
|
|
||||||
|
### 2. opencensus
|
||||||
|
|
||||||
|
opencensus provides both metrics and tracing, which may be important in the
|
||||||
|
future. It's API looks different from go-kit and Prometheus, but looks like it
|
||||||
|
covers everything we need.
|
||||||
|
|
||||||
|
Unfortunately, OpenCensus go client does not define any
|
||||||
|
interfaces, so if we want to abstract away metrics we
|
||||||
|
will need to write interfaces ourselves.
|
||||||
|
|
||||||
|
### List of metrics
|
||||||
|
|
||||||
|
| | Name | Type | Labels | Description |
|
||||||
|
| - | --------------------------------------- | ------- | ------------------------- | ----------------------------------------------------------------------------- |
|
||||||
|
| A | height | Counter | height-X | |
|
||||||
|
| A | validators | Gauge | height-X | Number of validators who signed |
|
||||||
|
| A | missing_validators | Gauge | height-X | umber of validators who did not sign |
|
||||||
|
| A | byzantine_validators | Gauge | height-X | Number of validators who tried to double sign |
|
||||||
|
| A | block_interval | Timing | | Time between this and last block (Block.Header.Time) |
|
||||||
|
| | block_time | Timing | | Time to create a block (from creating a proposal to commit) |
|
||||||
|
| | time_between_blocks | Timing | | Time between committing last block and (receiving proposal creating proposal) |
|
||||||
|
| A | rounds | Counter | height-X | Number of rounds |
|
||||||
|
| | prevotes | Counter | height-X height-X-round-Y | |
|
||||||
|
| | precommits | Counter | height-X height-X-round-Y | |
|
||||||
|
| | prevotes_total_power | Counter | height-X height-X-round-Y | |
|
||||||
|
| | precommits_total_power | Counter | height-X height-X-round-Y | |
|
||||||
|
| A | num_txs | Counter | height-X | |
|
||||||
|
| | total_txs | Counter | | |
|
||||||
|
| | block_size | Gauge | height-X | In bytes |
|
||||||
|
| | peers | Gauge | | Number of peers node's connected to |
|
||||||
|
| | power | Gauge | | |
|
||||||
|
|
||||||
|
`A` - will be implemented in the fist place.
|
||||||
|
|
||||||
|
**Proposed solution**
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
### Positive
|
||||||
|
|
||||||
|
### Negative
|
||||||
|
|
||||||
|
### Neutral
|
Reference in New Issue
Block a user