Skip to content

Override default response failed error response in logs from TraceLayer in tower-http #1146

@josecelano

Description

@josecelano

Relates to: #1069

I'm working on issue #1069

We have some errors in the logs automatically added by the TraceLayer we use in Axum. Errors loke like this:

  2024-11-01T09:27:25.749262Z ERROR tower_http::trace::on_failure: response failed, classification: Status code: 500 Internal Server Error, latency: 0 ms
    at /home/josecelano/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-http-0.6.1/src/trace/on_failure.rs:93

As I explained on the other issue the error comes from using the TraceLayer in the router definition:

    router
        .layer(middleware::from_fn_with_state(state, v1::middlewares::auth::auth))
        .route(&format!("{api_url_prefix}/health_check"), get(health_check_handler))
        .layer(CompressionLayer::new())
        .layer(SetRequestIdLayer::x_request_id(MakeRequestUuid))
        .layer(PropagateHeaderLayer::new(HeaderName::from_static("x-request-id")))
        .layer(
            TraceLayer::new_for_http()
                .make_span_with(DefaultMakeSpan::new().level(Level::INFO))
                .on_request(|request: &Request<axum::body::Body>, _span: &Span| {
                    let method = request.method().to_string();
                    let uri = request.uri().to_string();
                    let request_id = request
                        .headers()
                        .get("x-request-id")
                        .map(|v| v.to_str().unwrap_or_default())
                        .unwrap_or_default();

                    tracing::span!(
                        target: API_LOG_TARGET,
                        tracing::Level::INFO, "request", method = %method, uri = %uri, request_id = %request_id);
                })
                .on_response(|response: &Response, latency: Duration, _span: &Span| {
                    let status_code = response.status();
                    let request_id = response
                        .headers()
                        .get("x-request-id")
                        .map(|v| v.to_str().unwrap_or_default())
                        .unwrap_or_default();
                    let latency_ms = latency.as_millis();

                    tracing::span!(
                        target: API_LOG_TARGET,
                        tracing::Level::INFO, "response", latency = %latency_ms, status = %status_code, request_id = %request_id);
                }),
        )
        .layer(SetRequestIdLayer::x_request_id(MakeRequestUuid))
        .layer(
            ServiceBuilder::new()
                // this middleware goes above `TimeoutLayer` because it will receive
                // errors returned by `TimeoutLayer`
                .layer(HandleErrorLayer::new(|_: BoxError| async { StatusCode::REQUEST_TIMEOUT }))
                .layer(TimeoutLayer::new(DEFAULT_TIMEOUT)),
        )

TraceLayer uses a default "on failure" function that prints that error when the response is a 500 response.

We have the possibility to override the error message calling the on_failure on the router TraceLayer builder:

.on_failure(
    |failure_classification: ServerErrorsFailureClass, _latency: Duration, _span: &Span| {
        tracing::error!(target: HTTP_TRACKER_LOG_TARGET, "response failed, classification: {failure_classification}");
    },
),

I would do it because:

  1. We can control what's written into the logs. We need that to write assertions in tests. See Remove tracing errors from tests execution output #1142. If we don't control the error message, the test could be broken by third-party changes.
  2. It's easier to find the code that is writing the message. It's our code, not external code.
  3. We migth need to add some extra fields to the log line to identify the request/response in order to implement this.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions