| doc-type | issue | |||||||
|---|---|---|---|---|---|---|---|---|
| issue-type | bug | |||||||
| status | done | |||||||
| priority | p2 | |||||||
| github-issue | 1740 | |||||||
| spec-path | docs/issues/closed/1740-fix-container-workflow-caching.md | |||||||
| branch | 1740-fix-container-workflow-caching | |||||||
| related-pr | ||||||||
| last-updated-utc | ||||||||
| semantic-links |
|
The container workflow (.github/workflows/container.yaml) has a step-ordering bug and a
cache-scoping gap that prevent the GHA Docker layer cache from working reliably.
- GitHub issue: #1740
- Related workflow:
.github/workflows/container.yaml - Related: #1726 — Reduce Build Times with sccache
The test job builds the container image with docker/build-push-action and uses
cache-from: type=gha / cache-to: type=gha to persist Docker layer cache between runs.
The intent is that the cargo chef cook layer (dependency compilation, the slow part) is
only rebuilt when Cargo.lock or Cargo.toml files change.
In practice the cache provides little benefit because of several problems described below.
The current step order in the test job is:
setup-buildx → build-push-action → inspect → checkout → compose
docker/build-push-action resolves ./Containerfile relative to the workspace root, which
is only populated after actions/checkout. On a cold cache the job will either fail (no
Containerfile) or silently use a stale checked-out tree from a previous run.
The correct order is:
checkout → setup-buildx → build-push-action → inspect → compose
The test job runs two targets in parallel — debug and release — and both write to the
same GHA cache scope. The two jobs race to update the cache; whichever finishes last overwrites
the other's entries. On the next run, only one target gets a warm cache.
GitHub's GHA cache is also capped at 10 GB per repository. The debug and release Docker layer caches for a Rust workspace of this size can easily exceed that limit together, causing evictions.
Scoping the cache per target with scope=${{ matrix.target }} isolates the two caches:
cache-from: type=gha,scope=${{ matrix.target }}
cache-to: type=gha,scope=${{ matrix.target }},mode=maxEven with the above fixes, the cargo nextest archive step that compiles workspace crates will
recompile on every source change. This is expected: the cargo chef pattern intentionally
separates dependency compilation (cached) from workspace-crate compilation (not cached). On
GitHub's shared 2-core runners this step takes ~15–25 minutes for a full Rust workspace.
Reducing that cost is tracked separately in #1726.
The docker-e2e job in .github/workflows/testing.yaml also builds the tracker container
image, but it does so indirectly through two Rust binaries:
e2e_tests_runnercallsDocker::build("./Containerfile", tag)which runs plaindocker build -f ./Containerfile -t <tag> .qbittorrent_e2e_runnercallscompose.build()which runsdocker compose build
Neither path goes through BuildKit with the GHA cache backend (type=gha), so the image is
always built from scratch on every run. docker/setup-buildx-action is not present in that
job, so the GHA cache backend is never available to the plain docker CLI calls.
Proposed fix: add an explicit pre-build step to the docker-e2e job using
docker/setup-buildx-action + docker/build-push-action with cache-from/cache-to: type=gha
before the Rust runners execute. The runners accept a --tracker-image flag, so they can be
pointed at the pre-built image tag instead of rebuilding it themselves. This avoids modifying
the Rust source code.
The step order would become:
checkout → setup-buildx → build-tracker-image (cached) → run-e2e-tests → run-qbt-e2e-tests
The pre-build step produces a local image tag (e.g. torrust-tracker:e2e-local) that the
runners consume via --tracker-image torrust-tracker:e2e-local. A --no-build flag (or
equivalent) would need to be added to the runners, or alternatively the runners can be made
to skip their own build when the image already exists in the local daemon cache.
The .dockerignore was created in the original container overhaul and has never been updated.
It correctly excludes target/, .git/, storage/, .github/, and a handful of top-level
files, but leaves several directories and files in the build context that have no role in
compiling or testing Rust code:
| Path | Size | Effect |
|---|---|---|
docs/ |
3.6 MB | Any doc edit busts COPY . /build/src |
.coverage/ |
888 KB | Coverage artifacts bust the source layer |
integration_tests_sqlite3.db |
60 KB | Runtime DB busts the source layer |
AGENTS.md |
24 KB | AI agent instructions not needed |
.githooks/ |
8 KB | Git hooks not needed at build time |
codecov.yaml, compose.*.yaml |
small | CI config not needed |
.markdownlint.json, .yamllint-ci.yml, .taplo.toml |
small | Linter config not needed |
project-words.txt |
small | Spell-checker dictionary not needed |
Because COPY . /build/src appears in the recipe, build_debug, build, test_debug, and
test stages, any file change in the unfiltered context invalidates those layers, triggering a
full cargo nextest archive recompile even when no Rust source changed.
Additionally, the existing entry /cSpell.json is incorrectly cased — the actual file is
cspell.json (lowercase) — so it is not excluded on case-sensitive Linux filesystems.
The publish_development and publish_release jobs in container.yaml have a worse variant
of the checkout bug from Problem 1: actions/checkout is absent entirely. The step order
in both jobs is:
meta → login → setup-buildx → build-and-push
docker/build-push-action therefore cannot find ./Containerfile on a cold runner and will
fail or use a stale workspace from a previous run.
Both publish jobs also write to the default unscoped GHA cache (type=gha with no scope=
parameter), sharing the cache namespace with the test matrix jobs and with each other.
Even after applying Fix 2 (scoping the test job by ${{ matrix.target }}), the
publish_development and publish_release jobs still write to the default unscoped namespace.
A cache write from publish_release (which builds the release target) overwrites the entry
written by the test release matrix target, and vice versa.
Using a consistent workflow-prefixed naming scheme for every scope= parameter prevents all
cross-job and cross-workflow collisions:
| Job | Recommended scope name |
|---|---|
container.yaml test debug |
container-debug |
container.yaml test release |
container-release |
container.yaml publish_development |
container-publish-dev |
container.yaml publish_release |
container-publish-release |
testing.yaml docker-e2e (after Fix 3) |
testing-docker-e2e |
GitHub's GHA cache is capped at 10 GB per repository. With multiple workflows and build targets, the cache can grow quickly. Using isolated scopes ensures that each layer cache is retained independently and unaffected by other jobs, preventing unnecessary evictions.
In the test job, move the checkout step before setup-buildx:
steps:
- id: checkout
name: Checkout Repository
uses: actions/checkout@v6
- id: setup
name: Setup Toolchain
uses: docker/setup-buildx-action@v4
- id: build
name: Build
uses: docker/build-push-action@v7
with:
file: ./Containerfile
push: false
load: true
target: ${{ matrix.target }}
tags: torrust-tracker:local
cache-from: type=gha,scope=container-${{ matrix.target }}
cache-to: type=gha,scope=container-${{ matrix.target }},mode=max
- id: inspect
name: Inspect
run: docker image inspect torrust-tracker:local
- id: compose
name: Compose
run: |
...Replace the unscoped cache-from/cache-to entries (in all jobs that build the image) with
workflow-prefixed scoped ones:
cache-from: type=gha,scope=container-${{ matrix.target }}
cache-to: type=gha,scope=container-${{ matrix.target }},mode=maxAdd docker/setup-buildx-action and a docker/build-push-action pre-build step to the
docker-e2e job in .github/workflows/testing.yaml, scoped to the release target
(the only target needed by the E2E runners):
- id: setup-buildx
name: Setup Buildx
uses: docker/setup-buildx-action@v4
- id: build-tracker-image
name: Build Tracker Image
uses: docker/build-push-action@v7
with:
file: ./Containerfile
push: false
load: true
target: release
tags: torrust-tracker:e2e-local
cache-from: type=gha,scope=testing-docker-e2e
cache-to: type=gha,scope=testing-docker-e2e,mode=maxThen pass --tracker-image torrust-tracker:e2e-local --skip-build to both runners. A
--skip-build flag must be added to e2e_tests_runner (which calls Docker::build()) and
qbittorrent_e2e_runner (which calls compose.build()) to skip their internal image builds
when the image already exists locally.
Add all paths that do not contribute to building or testing the Rust workspace:
/AGENTS.md
/codecov.yaml
/compose.*.yaml
/cspell.json
/docs/
/integration_tests_sqlite3.db
/project-words.txt
/.coverage/
/.githooks/
/.markdownlint.json
/.taplo.toml
/.yamllint-ci.yml
Also remove the stale /cSpell.json entry and replace it with the correctly-cased
/cspell.json above.
Add actions/checkout as the first step in both publish_development and publish_release,
add an explicit target: release, and replace the unscoped cache entries:
steps:
- id: checkout
name: Checkout Repository
uses: actions/checkout@v6
- id: meta
name: Docker Meta
uses: docker/metadata-action@v6
# ...
- id: login
name: Login to Docker Hub
uses: docker/login-action@v4
# ...
- id: setup
name: Setup Toolchain
uses: docker/setup-buildx-action@v4
- name: Build and push
uses: docker/build-push-action@v7
with:
file: ./Containerfile
push: true
target: release
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha,scope=container-publish-dev
cache-to: type=gha,scope=container-publish-dev,mode=maxFor publish_release, use scope=container-publish-release instead to keep the caches
isolated.
Update the scope= parameter in Fix 2 and Fix 3 to use the full workflow-prefixed names
from Problem 7, so that no two jobs in any workflow can collide:
testjob:scope=container-${{ matrix.target }}(expands tocontainer-debugorcontainer-release)publish_development:scope=container-publish-devpublish_release:scope=container-publish-releasedocker-e2ejob:scope=testing-docker-e2e
- Move
actions/checkoutto the first step in thetestjob - Add
scope=container-${{ matrix.target }}tocache-fromandcache-toin thetestjob - Verify that a second run on the same branch shows a cache hit for the
cargo chef cooklayer in the build log - Confirm the
composestep still works correctly after the reorder - Add
docker/setup-buildx-action+docker/build-push-actionpre-build step to thedocker-e2ejob withscope=testing-docker-e2eGHA cache - Add
--skip-buildflag toe2e_tests_runnerandqbittorrent_e2e_runnerso the pre-built image is used instead of rebuilding - Pass
--tracker-image torrust-tracker:e2e-local --skip-buildto all threeqbittorrent_e2e_runnerinvocations indocker-e2e - Verify that the build logs show cache hits for layers by reviewing the workflow execution in the GitHub Actions tab after rerunning the jobs
- Update
.dockerignoreto exclude non-build files (docs/,.coverage/, compose files, linter configs,AGENTS.md,integration_tests_sqlite3.db, etc.) and fix the stale/cSpell.jsonentry (wrong case; actual file iscspell.json) - Add inline comments to the two non-obvious Containerfile patterns discovered from git
history:
- The
cargo nextest archive ... ; rm -f /build/temp.tar.zstline independencies_debuganddependencies— explain that it is a deliberate pre-linking warm-up step: running the linker during the cached dep layer means the subsequentbuildstage link step is shorter on a cache hit; it is not a mistake or leftover. - The
COPY ./share/ ...+sqlite3 ... "VACUUM;"block intester— explain that the default SQLite database must be initialized in the base image because tests depend on it at runtime, so it cannot be deferred to thetest/test_debugstages.
- The
- Add
actions/checkoutas the first step inpublish_developmentandpublish_release - Add
target: release,cache-from: type=gha,scope=container-publish-devandcache-to: type=gha,scope=container-publish-devtopublish_development; usecontainer-publish-releasescope forpublish_release - Use workflow-prefixed scope names throughout all jobs:
container-debug,container-release,container-publish-dev,container-publish-release,testing-docker-e2e - Verify both publish jobs build and push successfully after the checkout and scope fixes
docker/build-push-actioncaching docs: https://docs.docker.com/build/ci/github-actions/cache/- GHA cache backend for BuildKit: https://github.com/moby/buildkit?tab=readme-ov-file#github-actions-cache-experimental
cargo-chefrepository: https://github.com/LukeMathWalker/cargo-chefdocker/setup-buildx-action: https://github.com/docker/setup-buildx-action- Related workflow:
.github/workflows/testing.yaml