Skip to content

docs(issue): re-enable caddy http3 and document issue-29 rationale #31

@josecelano

Description

@josecelano

Re-enable Caddy HTTP/3 and Document ISSUE-29 Rationale

Related:
#29,
#30,
ISSUE-29-research-high-cpu-load-after-udp-fix.md

Overview

ISSUE-29 removed Caddy UDP port mapping 443:443/udp (HTTP/3 over QUIC) during a controlled
production experiment. The observations showed no measurable CPU improvement after disabling
HTTP/3, while service availability remained stable.

This follow-up issue proposes re-enabling HTTP/3 at the Caddy edge and documenting why this
reversal is intentional. The goal is to restore HTTP/3 capability for present and future clients,
while keeping a controlled rollback path if resource cost or reliability regresses.

This issue also clarifies the protocol boundary in the current architecture:

  • Edge protocol (client -> Caddy) can include HTTP/3.
  • Backend protocol (Caddy -> tracker/grafana) remains reverse-proxy HTTP and does not require
    tracker native HTTP/3 support.

Problem Statement

The current ISSUE-29 wording can be read as "HTTP/3 disabled for hygiene" even though the
experiment outcome was only that disabling HTTP/3 did not fix CPU pressure.

Keeping HTTP/3 disabled by default may also block automatic support for clients that prefer
or require HTTP/3 in the future. Since this demo already runs a Caddy edge proxy, re-enabling
UDP 443 is a low-complexity way to restore HTTP/3 capability without changing backend services.

The change should therefore be treated as a product-capability decision with operational
guardrails, not as a CPU-remediation tactic.

Goals

  1. Re-enable Caddy UDP 443 publish mapping for HTTP/3 at the edge.
  2. Keep backend application topology unchanged.
  3. Record explicitly that ISSUE-29 did not show CPU benefit from disabling HTTP/3.
  4. Document why re-enable is being done now (future compatibility/capability) and how rollback
    will be handled if needed.

Proposed Change

  1. Re-add "443:443/udp" in Caddy service ports in
    server/opt/torrust/docker-compose.yml.
  2. Apply the same change on live /opt/torrust/docker-compose.yml and recreate only Caddy.
  3. Observe immediate, T+1h, and T+next-day checkpoints with the same metrics used in ISSUE-29.

Rollback Triggers

If any trigger is met after re-enable, revert by removing "443:443/udp" again and record
the rollback in evidence:

  1. Caddy CPU increases by more than 20% sustained for 24h vs pre-change baseline.
  2. Host load average increases by more than 15% sustained for 24h vs pre-change baseline.
  3. New external availability regression appears on tracked HTTP1 or UDP1 endpoints.

Deliverables

  • Compose change that re-enables Caddy UDP 443 publish mapping.
  • Evidence notes for post-change observations (immediate, T+1h, T+next-day).
  • Updated ISSUE-29 wording that clearly separates:
    • measured performance result,
    • capability/product decision,
    • rollback criteria and operational safeguards.

Implementation Plan

  • Re-add "443:443/udp" for Caddy in server/opt/torrust/docker-compose.yml.
  • Apply only that change on the live server and recreate only Caddy.
  • Validate Caddy health and confirm host UDP 443 listener exists after deploy.
  • Capture immediate post-change metrics: mpstat, docker stats, Prometheus HTTP1/UDP1
    rates, and newtrackon.com/raw sample.
  • Capture T+1h and T+next-day checkpoints with the same metrics.
  • Evaluate rollback triggers; if triggered, revert and record evidence.
  • Update ISSUE-29 text to explain why the earlier disablement is being reversed now.
  • Ensure ISSUE-29 states backend services do not need native HTTP/3 for edge HTTP/3 support.
  • Run ./scripts/lint.sh and fix any markdown/cspell issues.

Acceptance Criteria

  • Caddy HTTP/3 edge capability is re-enabled via 443:443/udp mapping.
  • Immediate, T+1h, and T+next-day evidence snapshots are recorded.
  • No rollback trigger is met during the observation window, or rollback is executed and
    documented if a trigger is met.
  • ISSUE-29 explicitly states that disabling HTTP/3 did not reduce CPU in prior observations.
  • ISSUE-29 explicitly states why HTTP/3 was re-enabled and under which conditions it may be
    disabled again.
  • Documentation clearly states edge HTTP/3 is independent from backend native HTTP/3 support.
  • All changed files pass ./scripts/lint.sh.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions