@@ -110,13 +110,48 @@ From `https://newtrackon.com/raw` during this window:
110110
111111Agreed observation windows for Phase 2:
112112
113- | Checkpoint | Target time (UTC) | Status |
114- | ---------- | ------------------------ | ------- |
115- | T+1 h | 2026-05-04 16:31 | pending |
116- | T+next day | 2026-05-05 (any morning) | pending |
113+ | Checkpoint | Target time (UTC) | Status |
114+ | ---------- | ------------------------ | -------- |
115+ | T+1 h | 2026-05-04 16:31 | complete |
116+ | T+next day | 2026-05-05 (any morning) | pending |
117117
118118Capture the same metrics at each checkpoint: ` mpstat ` , ` docker stats ` , Prometheus
119119HTTP1/UDP1 rates, and a ` newtrackon.com/raw ` sample.
120120
121+ ## T+1 h Observation (2026-05-04T16:54:13Z)
122+
123+ Capture timestamp (UTC): ` 2026-05-04T16:54:13Z ` (~ 1 h 36 min after change).
124+
125+ - Host load average: ` 8.52 / 8.25 / 8.03 `
126+ - ` mpstat ` all CPUs: ` %usr=34.11 ` , ` %sys=15.43 ` , ` %soft=19.58 ` , ` %idle=30.61 `
127+ - ` mpstat ` CPU2: ` %soft=100.00 ` , ` %idle=0.00 ` — ** unchanged from pre-change**
128+ - Container CPU snapshot:
129+ - ` caddy ` : ` 321.33% `
130+ - ` tracker ` : ` 95.47% `
131+ - ` mysql ` : ` 7.06% `
132+ - ` grafana ` : ` 0.32% `
133+ - ` prometheus ` : ` 0.00% `
134+ - ` ps ` top processes: ` caddy 301% ` , ` torrust-tracker 88.9% ` , ` ksoftirqd/2 15.0% `
135+ - Prometheus rates:
136+ - HTTP1 request rate: ` 1834.0 req/s `
137+ - UDP1 request rate: ` 2440.0 req/s `
138+
139+ ### External probe sample (newtrackon.com/raw)
140+
141+ - ` https://http1.torrust-tracker-demo.com:443/announce ` -> ` Working `
142+ - ` udp://udp1.torrust-tracker-demo.com:6969/announce ` -> ` Working `
143+
144+ ### Assessment
145+
146+ ** No improvement observed.** CPU2 remains 100% softirq (` ksoftirqd/2 ` still
147+ pinned). Load, Caddy CPU (~ 320%), and tracker CPU (~ 95%) are all within the same
148+ range as before the change. Removing the Caddy UDP 443 port had no measurable
149+ effect on the softirq saturation, ruling out HTTP/3 (QUIC) as the root cause.
150+
151+ The Phase 2 change is safe to keep (it was correct hygiene — we have no HTTP/3
152+ listener anyway), but it did not solve the CPU problem. The investigation must
153+ continue with Phase 3 (RPS/RFS CPU affinity) or a deeper look at why Caddy
154+ alone is consuming ~ 300% CPU at the observed request rate.
155+
121156Keep this single change in place until both checkpoints are completed before
122157deciding whether to keep HTTP/3 disabled permanently or revert.
0 commit comments