This guide explains how to run and understand the End-to-End (E2E) tests for the Torrust Tracker Deployer project.
The E2E tests validate the complete deployment process using two independent test suites:
- E2E Provision and Destroy Tests - Test infrastructure provisioning and destruction lifecycle using LXD VMs
- E2E Configuration Tests - Test software installation and configuration using Docker containers
This split approach ensures reliable testing in CI environments while maintaining comprehensive coverage.
Test infrastructure provisioning and destruction lifecycle (VM creation, cloud-init, and destruction):
cargo run --bin e2e-provision-and-destroy-testsTest software installation and configuration (Ansible playbooks):
cargo run --bin e2e-config-testsFor local development, you can run the complete end-to-end test:
cargo run --bin e2e-tests-fulle2e-tests-full binary cannot run on GitHub Actions due to network connectivity issues, but is useful for local validation.
All test binaries support these options:
--keep- Keep the test environment after completion (useful for debugging)--templates-dir- Specify custom templates directory path--help- Show help information
# Run provision and destroy tests
cargo run --bin e2e-provision-and-destroy-tests
# Run provision and destroy tests with debugging (keep environment)
cargo run --bin e2e-provision-and-destroy-tests -- --keep
# Run configuration tests with debugging
cargo run --bin e2e-config-tests -- --keep
# Run full local tests with custom templates
cargo run --bin e2e-tests-full -- --templates-dir ./custom/templatesTests the complete infrastructure lifecycle using LXD VMs:
-
Preflight Cleanup
- Removes artifacts from previous test runs that may have failed to clean up
-
Infrastructure Provisioning
- Uses OpenTofu configuration from
templates/tofu/lxd/ - Creates LXD container with Ubuntu and cloud-init configuration
- Uses OpenTofu configuration from
-
Cloud-init Completion
- Waits for cloud-init to finish system initialization
- Validates user accounts and SSH key setup
- Verifies basic network interface setup
-
Infrastructure Destruction
- Destroys infrastructure using
DestroyCommand(application layer) - Falls back to manual cleanup if
DestroyCommandfails - Ensures proper resource cleanup regardless of test success or failure
- Destroys infrastructure using
Validation:
- ✅ VM is created and running
- ✅ Cloud-init status is "done"
- ✅ Boot completion marker file exists (
/var/lib/cloud/instance/boot-finished) - ✅ Infrastructure is properly destroyed after tests complete
The provision and destroy tests use the DestroyCommand from the application layer to test the complete infrastructure lifecycle. This provides:
- Application Layer Testing: Tests the actual command that users will execute
- Idempotent Cleanup: Destroy command can be run multiple times safely
- Fallback Strategy: Manual cleanup if destroy command fails (ensures CI reliability)
Implementation:
// Import destroy command from application layer
use torrust_tracker_deployer_lib::application::commands::destroy::DestroyCommand;
// Execute destroy via application command
async fn cleanup_with_destroy_command(
environment: Environment<Provisioned>,
opentofu_client: Arc<OpenTofuClient>,
repository: Arc<dyn EnvironmentRepository>,
) -> Result<(), DestroyCommandError> {
let destroy_cmd = DestroyCommand::new(opentofu_client, repository);
destroy_cmd.execute(environment)?;
Ok(())
}Fallback Cleanup:
If the DestroyCommand fails (e.g., due to infrastructure issues), the test suite falls back to manual cleanup:
// Try application layer destroy first
if let Err(e) = run_destroy_command(&context).await {
error!("DestroyCommand failed: {}, falling back to manual cleanup", e);
cleanup_test_infrastructure(&context).await?;
}This ensures:
- CI tests always clean up resources
- Real-world destroy command is validated
- Infrastructure issues don't block CI
For detailed destroy command documentation, see:
Tests software installation and configuration using Docker containers:
-
Container Setup
- Creates Docker container from
docker/provisioned-instance/ - Configures SSH connectivity for Ansible
- Creates Docker container from
-
Software Installation (
install-docker.yml)- Installs Docker Community Edition
- Configures Docker service
- Validates Docker daemon is running
-
Docker Compose Installation (
install-docker-compose.yml)- Installs Docker Compose binary
- Validates installation with test configuration
Validation:
- ✅ Container is accessible via SSH
- ✅ Docker version command works
- ✅ Docker daemon service is active
- ✅ Docker Compose version command works
- ✅ Can parse and validate a test docker-compose.yml file
Combines both provision and configuration phases in a single LXD VM for comprehensive local testing.
The project provides a dependency installer tool that automatically detects and installs required dependencies:
# Install all required dependencies
cargo run --bin dependency-installer install
# Check which dependencies are installed
cargo run --bin dependency-installer check
# List all dependencies with status
cargo run --bin dependency-installer listThe installer supports:
- cargo-machete - Detects unused Rust dependencies
- OpenTofu - Infrastructure provisioning tool
- Ansible - Configuration management tool
- LXD - VM-based testing infrastructure
For detailed information, see packages/dependency-installer/README.md.
If you prefer manual installation or need to troubleshoot:
-
LXD installed and configured
sudo snap install lxd sudo lxd init # Follow the setup prompts -
OpenTofu installed
# Installation instructions in docs/tech-stack/opentofu.md
-
Docker installed
# Docker is available on most systems or in CI environments docker --version -
Ansible installed
# Installation instructions in docs/tech-stack/ansible.md
Requires all of the above: LXD, OpenTofu, Docker, and Ansible.
After setup (automated or manual), verify all dependencies are available:
# Quick check (exit code indicates success/failure)
cargo run --bin dependency-installer check
# Detailed check with logging
cargo run --bin dependency-installer check --verboseIf provision tests fail and leave LXD resources behind:
# Check running containers
lxc list
# Stop and delete the test container
lxc stop torrust-tracker-vm
lxc delete torrust-tracker-vm
# Or use OpenTofu to clean up
cd build/tofu/lxd
tofu destroy -auto-approveIf configuration tests fail and leave Docker resources behind:
# Check running containers
docker ps -a
# Stop and remove test containers
docker stop $(docker ps -q --filter "ancestor=torrust-provisioned-instance")
docker rm $(docker ps -aq --filter "ancestor=torrust-provisioned-instance")
# Remove test images if needed
docker rmi torrust-provisioned-instance- LXD daemon not running:
sudo systemctl start lxd - Insufficient privileges: Ensure your user is in the
lxdgroup - OpenTofu state corruption: Delete
build/tofu/lxd/terraform.tfstateand retry - Cloud-init timeout: VM may need more time; check
lxc exec torrust-tracker-vm -- cloud-init status
- Docker daemon not running:
sudo systemctl start docker - Container build failures: Check Docker image build logs
- SSH connectivity to container: Verify container networking and SSH service
- Ansible connection errors: Check container SSH configuration and key permissions
- Network connectivity in VMs: Known limitation - use split test suites for reliable testing
- SSH connectivity failures: Usually means cloud-init is still running or SSH configuration failed
- Mixed infrastructure issues: Combines all provision and configuration issues above
Use Provision Tests (e2e-provision-tests) when:
- Testing infrastructure changes (OpenTofu, LXD configuration)
- Validating VM creation and cloud-init setup
- Working on provisioning-related features
Use Configuration Tests (e2e-config-tests) when:
- Testing Ansible playbooks and software installation
- Validating configuration management changes
- Working on application deployment features
Use Full Local Tests (e2e-tests-full) when:
- Comprehensive local validation before CI
- Integration testing of provision + configuration
- Debugging end-to-end deployment issues
Problem: GitHub Actions runners experience intermittent network connectivity problems within LXD VMs that cause:
- Docker GPG key downloads to fail (
Network is unreachableerrors) - Package repository access timeouts
- Generally flaky network behavior
Root Cause: This is a known issue with GitHub-hosted runners:
- GitHub Issue #13003 - Network connectivity issues with LXD VMs
- GitHub Issue #1187 - Original networking issue
- GitHub Issue #2890 - Specific apt repository timeout issues
Solution: We split E2E tests into two suites:
- Provision Tests: Use LXD VMs for infrastructure testing only (no network-heavy operations inside VM)
- Configuration Tests: Use Docker containers which have reliable network connectivity on GitHub Actions
- Full Local Tests: Available for comprehensive local testing where network connectivity works
Implementation: Configuration tests use Docker containers with:
- Direct internet access for package downloads
- Reliable networking for Ansible connectivity
- No nested virtualization issues
Use the --keep flag to inspect the environment after test completion:
cargo run --bin e2e-provision-tests -- --keep
# After test completion, connect to the LXD container:
lxc exec torrust-tracker-vm -- /bin/bashcargo run --bin e2e-config-tests -- --keep
# After test completion, find and connect to the Docker container:
docker ps
docker exec -it <container-id> /bin/bashcargo run --bin e2e-tests-full -- --keep
# Connect to the LXD VM as above
lxc exec torrust-tracker-vm -- /bin/bashThe split E2E testing architecture ensures reliable CI while maintaining comprehensive coverage:
┌───────────────────────────────────────────────────────────────────┐
│ E2E Test Suites │
└─────┬────────────────┬──────────────────┬─────────────────────────┘
│ │ │
│ │ │
┌─────▼──────┐ ┌─────▼──────────┐ ┌───▼──────────────────┐
│ Provision │ │Configuration │ │ Full Local │
│ Tests │ │ Tests │ │ Tests │
│ │ │ │ │ │
│ LXD VMs │ │ Docker │ │ LXD VMs + Docker │
│ (CI Safe) │ │ Containers │ │ (Local Only) │
│ │ │ (CI Safe) │ │ │
└─────┬──────┘ └───────┬────────┘ └───┬──────────────────┘
│ │ │
┌─────▼────────┐ ┌─────▼────────┐ ┌───▼──────────────────┐
│ OpenTofu/ │ │ Testcontain- │ │ OpenTofu + Ansible │
│ LXD │ │ ers │ │ (Full Stack) │
│Infrastructure│ │ Docker │ │ │
│ Layer │ │ Management │ │ │
└──────────────┘ └──────────────┘ └──────────────────────┘
│ │ │
┌──────▼──────┐ ┌──────▼──────────┐ ┌─────────▼─────────┐
│ VM Creation │ │Ansible Playbooks│ │ Complete Stack │
│ Cloud-init │ │ Configuration │ │ Validation │
│ Validation │ │ Validation │ │ │
└─────────────┘ └─────────────────┘ └───────────────────┘
- Provision Tests: Infrastructure creation and basic VM setup validation
- Configuration Tests: Software installation and application deployment
- Full Local Tests: End-to-end integration validation for comprehensive testing
This architecture provides:
- Reliability: Each test suite works independently in CI environments
- Speed: Focused testing reduces execution time
- Coverage: Combined suites provide complete deployment validation
- Debugging: Clear separation makes issue identification easier
The E2E testing system uses a Docker architecture representing different deployment phases, allowing for efficient testing of the configuration, release, and run phases of the deployment pipeline.
Purpose: Represents the state after VM provisioning but before configuration.
Contents:
- Ubuntu 24.04 LTS base (matches production VMs)
- SSH server (via supervisor for container-native process management)
torrustuser with sudo access- No application dependencies installed
- Ready for Ansible configuration
Usage: E2E configuration testing - simulates a freshly provisioned VM ready for software installation.
The planned architecture uses separate directories for each deployment phase:
docker/
├── provisioned-instance/ # ✅ Current - post-provision
│ ├── Dockerfile
│ ├── supervisord.conf
│ ├── entrypoint.sh
│ └── README.md
├── configured-instance/ # 🔄 Future - post-configure
│ ├── Dockerfile
│ ├── docker-compose.yml # Example: Docker services
│ └── README.md
├── released-instance/ # 🔄 Future - post-release
│ ├── Dockerfile
│ ├── app-configs/ # Application configurations
│ └── README.md
└── running-instance/ # 🔄 Future - post-run
├── Dockerfile
├── service-configs/ # Service validation configs
└── README.md
- Clear Separation: Each phase has its own directory and concerns
- Independent Evolution: Each Dockerfile can evolve independently
- Easier Maintenance: Simpler to understand and debug individual phases
- Flexible Building: Can build any phase independently
- Better Documentation: Each directory can have phase-specific docs
# Build specific phase containers
docker build -f docker/provisioned-instance/Dockerfile -t torrust-provisioned:latest .
docker build -f docker/configured-instance/Dockerfile -t torrust-configured:latest .
docker build -f docker/released-instance/Dockerfile -t torrust-released:latest .
docker build -f docker/running-instance/Dockerfile -t torrust-running:latest .-
docker/provisioned-instance/- Base system ready for configuration
-
docker/configured-instance/- System with Docker, dependencies installed- Build FROM
torrust-provisioned-instance:latest - Add Ansible playbook execution results
- Verify Docker daemon, Docker Compose installation
- Build FROM
-
docker/released-instance/- System with applications deployed- Build FROM
torrust-configured-instance:latest - Add application artifacts
- Add service configurations
- Build FROM
-
docker/running-instance/- System with services started and validated- Build FROM
torrust-released-instance:latest - Start all services
- Run validation checks
- Build FROM
- Test Coverage: Complete deployment pipeline testing
- Fast Feedback: Test individual phases quickly (~2-3 seconds vs ~17-30 seconds for LXD)
- Debugging: Isolate issues to specific deployment phases
- Scalability: Easy to add new phases or modify existing ones
- Documentation: Each phase self-documents its purpose and setup
- Reusability: Containers can be used outside of testing (demos, development)
- CI Reliability: Avoids GitHub Actions connectivity issues with nested VMs
Each deployment phase has distinct concerns that are tested appropriately:
- Provisioned Phase: Base system setup, user management, SSH connectivity
- Configured Phase: Software installation, system configuration, dependency management
- Released Phase: Application deployment, service configuration, artifact management
- Running Phase: Service validation, monitoring setup, operational readiness
This architecture enables:
- Testing Isolation: E2E tests can target specific phases independently
- Development Workflow: Teams can work on different phases independently
- Issue Isolation: Phase-specific containers make it easier to isolate problems
The Docker phase architecture complements the split E2E testing strategy by providing fast, reliable containers for configuration testing while maintaining comprehensive coverage of the entire deployment pipeline.
When adding new features or making changes:
For OpenTofu, LXD, or cloud-init modifications:
- Update provision tests in
src/bin/e2e_provision_tests.rs - Add validation methods for new infrastructure components
- Test locally:
cargo run --bin e2e-provision-tests - Verify CI passes on
.github/workflows/test-e2e-provision.yml
For Ansible playbooks or software installation modifications:
- Update configuration tests in
src/bin/e2e_config_tests.rs - Add validation methods for new software components
- Update Docker image in
docker/provisioned-instance/if needed - Test locally:
cargo run --bin e2e-config-tests - Verify CI passes on
.github/workflows/test-e2e-config.yml
For comprehensive changes affecting multiple components:
- Test with full local suite:
cargo run --bin e2e-tests-full - Verify both provision and configuration suites pass independently
- Update this documentation to reflect changes
- Consider split approach: Can the change be tested in isolated suites?
- Provision tests: Focus on infrastructure readiness, minimal network dependencies
- Configuration tests: Focus on software functionality, reliable network access via containers
- Full local tests: Comprehensive validation for development workflows
- Independence: Each suite should be runnable independently without conflicts
The split E2E testing approach ensures reliable CI while maintaining comprehensive coverage of the entire deployment pipeline.