Overhaul stats: Research for metrics crates

I'm working on this issue:

https://github.com/torrust/torrust-tracker/issues/1403

In the beginning, I only wanted to refactor stats a little bit:

- To make it easier to add more metrics without breaking changes in the data structure.
- To align the in-memory format with the Prometheus format.

It's not a complex work, however I might be reinventing the wheel. There are many crates to handle metrics. After a preliminary research it seems there are two main ways to handle metrics:

- Protocol-agnostic instrumentation: (https://github.com/metrics-rs/metrics)
  - Similar to log/tracing
- Based on the [prometheus](https://docs.rs/prometheus)(https://github.com/tikv/rust-prometheus)

There is also a crate to be able to use both in the same project: https://github.com/instrumentisto/metrics-prometheus-rs


`metrics` can export to TCP and Prometheus:

- [metrics-exporter-tcp](https://docs.rs/metrics-exporter-tcp) - outputs metrics to clients over TCP
- [metrics-exporter-prometheus](https://docs.rs/metrics-exporter-prometheus) - serves a Prometheus scrape endpoint

So in theory it does what I'm trying to implement.

Our requirements are:

- Expose via REST API in JSON format.
- Expose via REST API in Prometheus format.
- Expose via GraphQL API in the future.
- Extendable without breaking changes.
- Zero overhead when disabled.
- Allow merging metrics.
- Metrics metadata (metrics name) or labels (Prometheus name)

My first impressions are:

- I should finish the current "stats overhaul" epic with our custom solution based on events. I think it won't take me long. After that we can plan a proper migration to one of this crates.
- I would use the generic one `metrics` if there are  no drawbacks comparing to using Prometheus directly.
- I guess, if we use `metrics` we have to build our recorder if we want to expose the metrics in our APIs. However, we should review the idea of exposing the metrics with independent APIs. There could be other crates to expose metrics via REST of GrapqhQL. For example: https://github.com/naamancurtis/async_graphql_telemetry_extension
- Event if we end up using a third-party crate, events are going to be useful for other things or to let people build their own stats. Events are not decoupled from stats, so it would be easy to migrate to a metrics crate and keep events.
- I don't know how easy it would be to test with this crates using macros. My experience with tracing is that it's hard because they also use global recorders. They also have local recorders (like tracing) but for tracing I could not make it work. See https://github.com/torrust/torrust-tracker/pull/1147

I will:

- Read metrics documentation.
- Compare the three alternatives:
  - Custom metrics
  - `metrics` crate
  - `prometeous` crate

And share my conclusions.

cc @da2ce7 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overhaul stats: Research for metrics crates #1428

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Overhaul stats: Research for metrics crates #1428

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions