This document explains the architectural decisions behind the E2E testing system, including the split testing approach and Docker-based deployment workflow validation.
The split E2E testing architecture ensures reliable CI while maintaining comprehensive coverage:
┌───────────────────────────────────────────────────────────────────┐
│ E2E Test Suites │
└─────┬────────────────┬──────────────────┬─────────────────────────┘
│ │ │
│ │ │
┌─────▼──────┐ ┌─────▼──────────┐ ┌───▼──────────────────┐
│ Provision │ │Configuration │ │ Full Local │
│ Tests │ │ Tests │ │ Tests │
│ │ │ │ │ │
│ LXD VMs │ │ Docker │ │ LXD VMs + Docker │
│ (CI Safe) │ │ Containers │ │ (Local Only) │
│ │ │ (CI Safe) │ │ │
└─────┬──────┘ └───────┬────────┘ └───┬──────────────────┘
│ │ │
┌─────▼────────┐ ┌─────▼────────┐ ┌───▼──────────────────┐
│ OpenTofu/ │ │ Testcontain- │ │ OpenTofu + Ansible │
│ LXD │ │ ers │ │ (Full Stack) │
│Infrastructure│ │ Docker │ │ │
│ Layer │ │ Management │ │ │
└──────────────┘ └──────────────┘ └──────────────────────┘
│ │ │
┌──────▼──────┐ ┌──────▼──────────┐ ┌─────────▼─────────┐
│ VM Creation │ │Ansible Playbooks│ │ Complete Stack │
│ Cloud-init │ │ Configuration │ │ Validation │
│ Validation │ │ Validation │ │ │
└─────────────┘ └─────────────────┘ └───────────────────┘
- Infrastructure Lifecycle Tests: Infrastructure creation and basic VM setup validation
- Deployment Workflow Tests: Software installation and application deployment
- Complete Workflow Tests: End-to-end integration validation for comprehensive testing
This architecture provides:
- Reliability: Each test suite works independently in CI environments
- Speed: Focused testing reduces execution time
- Coverage: Combined suites provide complete deployment validation
- Debugging: Clear separation makes issue identification easier
The E2E testing system uses a Docker-based architecture for testing the deployment workflow commands (configure, release, run, test) efficiently and reliably in CI environments.
We use a single Docker image (provisioned-instance) representing the pre-provisioned state, and execute all deployment commands sequentially within that container during E2E tests.
Why Sequential Instead of Multi-Image?
Initially, we considered creating separate Docker images for each deployment phase (configured, released, running). However, this approach was rejected due to:
- High Maintenance Overhead: Every code change would require updating multiple Docker images
- Slower Execution: Building 4 images takes longer than running 4 commands sequentially
- Synchronization Complexity: Keeping multiple images in sync with code changes is error-prone
- No Real Benefit: Parallel test execution overhead (Docker build + startup) exceeds sequential execution time
Sequential Execution Benefits:
- ✅ Single Source of Truth: One Dockerfile to maintain
- ✅ Faster Overall: Sequential commands in one container (~48s) vs multiple image builds
- ✅ Realistic Testing: Matches real deployment workflow exactly
- ✅ Easy Debugging: Single container lifecycle with
--keepflag - ✅ Automatic Synchronization: Code changes tested via Ansible playbooks without image rebuilds
Trade-offs Accepted:
- ❌ Cannot test individual commands in isolation (use unit/integration tests for that)
- ❌ Cannot run E2E tests for different commands in parallel
- ❌ Must run full sequence to test later commands
See ADR: Single Docker Image for Sequential E2E Command Testing for the complete architectural decision.
Purpose: Represents the state after VM provisioning but before configuration.
Contents:
- Ubuntu 24.04 LTS base (matches production VMs)
- SSH server (via supervisor for container-native process management)
torrustuser with sudo access- No application dependencies installed
- Ready for Ansible configuration
E2E Test Workflow:
// E2E deployment workflow tests (simplified)
async fn run_deployment_workflow_tests() -> Result<()> {
// 1. Start single container (provisioned state)
let container = start_provisioned_container().await?;
// 2. Run deployment commands sequentially
run_create_command()?; // Create environment
run_register_command()?; // Register container IP
run_configure_command()?; // Install dependencies (modifies container)
run_release_command()?; // Deploy applications (modifies container)
run_run_command()?; // Start services (modifies container)
run_test_command()?; // Validate deployment
// 3. Cleanup
container.stop().await?;
Ok(())
}Key Characteristics:
- Stateful Testing: Each command modifies the container state for the next command
- Complete Workflow: Tests the full deployment pipeline end-to-end
- Fast Execution: ~48 seconds total (container start + all commands + validation)
- CI Reliable: Avoids GitHub Actions connectivity issues with LXD VMs
- Low Maintenance: Single Dockerfile, changes propagate automatically via playbooks
- Realistic Testing: Sequential execution matches real deployment workflow exactly
- Fast Feedback: Faster than building multiple images, comparable to parallel execution
- Simple Debugging: Use
--keepflag to inspect final container state - CI Reliability: Single container uses fewer resources, avoids VM networking issues
- Code Synchronization: Ansible playbooks ensure image reflects current code
What This Tests:
- ✅ Complete deployment workflow (create → register → configure → release → run → test)
- ✅ Command integration and state transitions
- ✅ Ansible playbook execution in container environment
- ✅ Service deployment and validation
What This Doesn't Test:
- ❌ Individual command isolation (use unit tests)
- ❌ Infrastructure provisioning (use
e2e-infrastructure-lifecycle-tests) - ❌ VM-specific features (use
e2e-complete-workflow-testslocally)
| Aspect | Docker Container | LXD VM |
|---|---|---|
| Network Reliability (CI) | ✅ Excellent | ❌ Poor (GitHub Actions issues) |
| Startup Time | ✅ ~2-3 seconds | |
| Production Similarity | ✅ Full VM (matches production) | |
| Resource Usage | ✅ Lightweight | |
| Best For | Configuration/deployment workflow | Infrastructure provisioning |
Result: Use Docker containers for deployment workflow tests, LXD VMs for infrastructure tests.
Problem: GitHub Actions runners experience intermittent network connectivity problems within LXD VMs that cause:
- Docker GPG key downloads to fail (
Network is unreachableerrors) - Package repository access timeouts
- Generally flaky network behavior
Root Cause: This is a known issue with GitHub-hosted runners:
- GitHub Issue #13003 - Network connectivity issues with LXD VMs
- GitHub Issue #1187 - Original networking issue
- GitHub Issue #2890 - Specific apt repository timeout issues
Solution: We split E2E tests into two suites:
- Infrastructure Lifecycle Tests: Use LXD VMs for infrastructure testing only (no network-heavy operations inside VM)
- Deployment Workflow Tests: Use Docker containers which have reliable network connectivity on GitHub Actions
- Complete Workflow Tests: Available for comprehensive local testing where network connectivity works
Implementation: Deployment workflow tests use Docker containers with:
- Direct internet access for package downloads
- Reliable networking for Ansible connectivity
- No nested virtualization issues
- Infrastructure tests: Focus on infrastructure readiness, minimal network dependencies
- Deployment tests: Focus on software functionality, reliable network access via containers
- Complete tests: Comprehensive validation for development workflows
- Independence: Each suite should be runnable independently without conflicts
The split E2E testing approach ensures reliable CI while maintaining comprehensive coverage of the entire deployment pipeline.