E2E Testing Guide

This guide explains how to run and understand the End-to-End (E2E) tests for the Torrust Tracker Deployer project.

🧪 What are E2E Tests?

The E2E tests validate the complete deployment process using two independent test suites:

E2E Provision and Destroy Tests - Test infrastructure provisioning and destruction lifecycle using LXD VMs
E2E Configuration Tests - Test software installation and configuration using Docker containers

This split approach ensures reliable testing in CI environments while maintaining comprehensive coverage.

🚀 Running E2E Tests

Independent Test Suites

Provision and Destroy Tests

Test infrastructure provisioning and destruction lifecycle (VM creation, cloud-init, and destruction):

cargo run --bin e2e-provision-and-destroy-tests

Configuration Tests

Test software installation and configuration (Ansible playbooks):

cargo run --bin e2e-config-tests

Full Local Testing

For local development, you can run the complete end-to-end test:

cargo run --bin e2e-tests-full

⚠️ Note: The e2e-tests-full binary cannot run on GitHub Actions due to network connectivity issues, but is useful for local validation.

Command Line Options

All test binaries support these options:

--keep - Keep the test environment after completion (useful for debugging)
--templates-dir - Specify custom templates directory path
--help - Show help information

Examples

# Run provision and destroy tests
cargo run --bin e2e-provision-and-destroy-tests

# Run provision and destroy tests with debugging (keep environment)
cargo run --bin e2e-provision-and-destroy-tests -- --keep

# Run configuration tests with debugging
cargo run --bin e2e-config-tests -- --keep

# Run full local tests with custom templates
cargo run --bin e2e-tests-full -- --templates-dir ./custom/templates

📋 Test Sequences

E2E Provision and Destroy Tests (`e2e-provision-and-destroy-tests`)

Tests the complete infrastructure lifecycle using LXD VMs:

Preflight Cleanup
- Removes artifacts from previous test runs that may have failed to clean up
Infrastructure Provisioning
- Uses OpenTofu configuration from templates/tofu/lxd/
- Creates LXD container with Ubuntu and cloud-init configuration
Cloud-init Completion
- Waits for cloud-init to finish system initialization
- Validates user accounts and SSH key setup
- Verifies basic network interface setup
Infrastructure Destruction
- Destroys infrastructure using DestroyCommand (application layer)
- Falls back to manual cleanup if DestroyCommand fails
- Ensures proper resource cleanup regardless of test success or failure

Validation:

✅ VM is created and running
✅ Cloud-init status is "done"
✅ Boot completion marker file exists (/var/lib/cloud/instance/boot-finished)
✅ Infrastructure is properly destroyed after tests complete

DestroyCommand Integration

The provision and destroy tests use the DestroyCommand from the application layer to test the complete infrastructure lifecycle. This provides:

Application Layer Testing: Tests the actual command that users will execute
Idempotent Cleanup: Destroy command can be run multiple times safely
Fallback Strategy: Manual cleanup if destroy command fails (ensures CI reliability)

Implementation:

// Import destroy command from application layer
use torrust_tracker_deployer_lib::application::commands::destroy::DestroyCommand;

// Execute destroy via application command
async fn cleanup_with_destroy_command(
    environment: Environment<Provisioned>,
    opentofu_client: Arc<OpenTofuClient>,
    repository: Arc<dyn EnvironmentRepository>,
) -> Result<(), DestroyCommandError> {
    let destroy_cmd = DestroyCommand::new(opentofu_client, repository);
    destroy_cmd.execute(environment)?;
    Ok(())
}

Fallback Cleanup:

If the DestroyCommand fails (e.g., due to infrastructure issues), the test suite falls back to manual cleanup:

// Try application layer destroy first
if let Err(e) = run_destroy_command(&context).await {
    error!("DestroyCommand failed: {}, falling back to manual cleanup", e);
    cleanup_test_infrastructure(&context).await?;
}

This ensures:

CI tests always clean up resources
Real-world destroy command is validated
Infrastructure issues don't block CI

For detailed destroy command documentation, see:

E2E Configuration Tests (`e2e-config-tests`)

Tests software installation and configuration using Docker containers:

Container Setup
- Creates Docker container from docker/provisioned-instance/
- Configures SSH connectivity for Ansible
Software Installation (install-docker.yml)
- Installs Docker Community Edition
- Configures Docker service
- Validates Docker daemon is running
Docker Compose Installation (install-docker-compose.yml)
- Installs Docker Compose binary
- Validates installation with test configuration

Validation:

✅ Container is accessible via SSH
✅ Docker version command works
✅ Docker daemon service is active
✅ Docker Compose version command works
✅ Can parse and validate a test docker-compose.yml file

Full Local Tests (`e2e-tests-full`)

Combines both provision and configuration phases in a single LXD VM for comprehensive local testing.

🛠️ Prerequisites

Automated Setup (Recommended)

The project provides a dependency installer tool that automatically detects and installs required dependencies:

# Install all required dependencies
cargo run --bin dependency-installer install

# Check which dependencies are installed
cargo run --bin dependency-installer check

# List all dependencies with status
cargo run --bin dependency-installer list

The installer supports:

cargo-machete - Detects unused Rust dependencies
OpenTofu - Infrastructure provisioning tool
Ansible - Configuration management tool
LXD - VM-based testing infrastructure

For detailed information, see packages/dependency-installer/README.md.

Manual Setup

If you prefer manual installation or need to troubleshoot:

For E2E Provision Tests

LXD installed and configured

sudo snap install lxd
sudo lxd init  # Follow the setup prompts

OpenTofu installed

# Installation instructions in docs/tech-stack/opentofu.md

For E2E Configuration Tests

Docker installed

# Docker is available on most systems or in CI environments
docker --version

Ansible installed

# Installation instructions in docs/tech-stack/ansible.md

For Full Local Tests (`e2e-tests-full`)

Requires all of the above: LXD, OpenTofu, Docker, and Ansible.

Verification

After setup (automated or manual), verify all dependencies are available:

# Quick check (exit code indicates success/failure)
cargo run --bin dependency-installer check

# Detailed check with logging
cargo run --bin dependency-installer check --verbose

🐛 Troubleshooting

Test Environment Cleanup

Provision Tests Cleanup

If provision tests fail and leave LXD resources behind:

# Check running containers
lxc list

# Stop and delete the test container
lxc stop torrust-tracker-vm
lxc delete torrust-tracker-vm

# Or use OpenTofu to clean up
cd build/tofu/lxd
tofu destroy -auto-approve

Configuration Tests Cleanup

If configuration tests fail and leave Docker resources behind:

# Check running containers
docker ps -a

# Stop and remove test containers
docker stop $(docker ps -q --filter "ancestor=torrust-provisioned-instance")
docker rm $(docker ps -aq --filter "ancestor=torrust-provisioned-instance")

# Remove test images if needed
docker rmi torrust-provisioned-instance

Common Issues by Test Suite

Provision Tests Issues

LXD daemon not running: sudo systemctl start lxd
Insufficient privileges: Ensure your user is in the lxd group
OpenTofu state corruption: Delete build/tofu/lxd/terraform.tfstate and retry
Cloud-init timeout: VM may need more time; check lxc exec torrust-tracker-vm -- cloud-init status

Configuration Tests Issues

Docker daemon not running: sudo systemctl start docker
Container build failures: Check Docker image build logs
SSH connectivity to container: Verify container networking and SSH service
Ansible connection errors: Check container SSH configuration and key permissions

Full Local Tests Issues

Network connectivity in VMs: Known limitation - use split test suites for reliable testing
SSH connectivity failures: Usually means cloud-init is still running or SSH configuration failed
Mixed infrastructure issues: Combines all provision and configuration issues above

Test Suite Selection Guide

Use Provision Tests (e2e-provision-tests) when:

Testing infrastructure changes (OpenTofu, LXD configuration)
Validating VM creation and cloud-init setup
Working on provisioning-related features

Use Configuration Tests (e2e-config-tests) when:

Testing Ansible playbooks and software installation
Validating configuration management changes
Working on application deployment features

Use Full Local Tests (e2e-tests-full) when:

Comprehensive local validation before CI
Integration testing of provision + configuration
Debugging end-to-end deployment issues

CI Network Issues

Problem: GitHub Actions runners experience intermittent network connectivity problems within LXD VMs that cause:

Docker GPG key downloads to fail (Network is unreachable errors)
Package repository access timeouts
Generally flaky network behavior

Root Cause: This is a known issue with GitHub-hosted runners:

GitHub Issue #13003 - Network connectivity issues with LXD VMs
GitHub Issue #1187 - Original networking issue
GitHub Issue #2890 - Specific apt repository timeout issues

Solution: We split E2E tests into two suites:

Provision Tests: Use LXD VMs for infrastructure testing only (no network-heavy operations inside VM)
Configuration Tests: Use Docker containers which have reliable network connectivity on GitHub Actions
Full Local Tests: Available for comprehensive local testing where network connectivity works

Implementation: Configuration tests use Docker containers with:

Direct internet access for package downloads
Reliable networking for Ansible connectivity
No nested virtualization issues

Debug Mode

Use the --keep flag to inspect the environment after test completion:

Provision Tests Debugging

cargo run --bin e2e-provision-tests -- --keep

# After test completion, connect to the LXD container:
lxc exec torrust-tracker-vm -- /bin/bash

Configuration Tests Debugging

cargo run --bin e2e-config-tests -- --keep

# After test completion, find and connect to the Docker container:
docker ps
docker exec -it <container-id> /bin/bash

Full Local Tests Debugging

cargo run --bin e2e-tests-full -- --keep

# Connect to the LXD VM as above
lxc exec torrust-tracker-vm -- /bin/bash

🏗️ Architecture

The split E2E testing architecture ensures reliable CI while maintaining comprehensive coverage:

┌───────────────────────────────────────────────────────────────────┐
│                        E2E Test Suites                            │
└─────┬────────────────┬──────────────────┬─────────────────────────┘
      │                │                  │
      │                │                  │
┌─────▼──────┐   ┌─────▼──────────┐   ┌───▼──────────────────┐
│ Provision  │   │Configuration   │   │    Full Local        │
│   Tests    │   │    Tests       │   │      Tests           │
│            │   │                │   │                      │
│ LXD VMs    │   │   Docker       │   │ LXD VMs + Docker     │
│ (CI Safe)  │   │ Containers     │   │ (Local Only)         │
│            │   │ (CI Safe)      │   │                      │
└─────┬──────┘   └───────┬────────┘   └───┬──────────────────┘
      │                  │                │
┌─────▼────────┐   ┌─────▼────────┐   ┌───▼──────────────────┐
│ OpenTofu/    │   │ Testcontain- │   │ OpenTofu + Ansible   │
│    LXD       │   │     ers      │   │    (Full Stack)      │
│Infrastructure│   │   Docker     │   │                      │
│   Layer      │   │ Management   │   │                      │
└──────────────┘   └──────────────┘   └──────────────────────┘
       │                │                         │
┌──────▼──────┐  ┌──────▼──────────┐    ┌─────────▼─────────┐
│ VM Creation │  │Ansible Playbooks│    │  Complete Stack   │
│ Cloud-init  │  │ Configuration   │    │    Validation     │
│ Validation  │  │   Validation    │    │                   │
└─────────────┘  └─────────────────┘    └───────────────────┘

Test Suite Responsibilities

Provision Tests: Infrastructure creation and basic VM setup validation
Configuration Tests: Software installation and application deployment
Full Local Tests: End-to-end integration validation for comprehensive testing

This architecture provides:

Reliability: Each test suite works independently in CI environments
Speed: Focused testing reduces execution time
Coverage: Combined suites provide complete deployment validation
Debugging: Clear separation makes issue identification easier

� Docker Architecture for E2E Testing

The E2E testing system uses a Docker architecture representing different deployment phases, allowing for efficient testing of the configuration, release, and run phases of the deployment pipeline.

Current Implementation

Provisioned Instance (`docker/provisioned-instance/`)

Purpose: Represents the state after VM provisioning but before configuration.

Contents:

Ubuntu 24.04 LTS base (matches production VMs)
SSH server (via supervisor for container-native process management)
torrust user with sudo access
No application dependencies installed
Ready for Ansible configuration

Usage: E2E configuration testing - simulates a freshly provisioned VM ready for software installation.

Future Expansion Architecture

Recommended Approach: Multiple Dockerfiles

The planned architecture uses separate directories for each deployment phase:

docker/
├── provisioned-instance/          # ✅ Current - post-provision
│   ├── Dockerfile
│   ├── supervisord.conf
│   ├── entrypoint.sh
│   └── README.md
├── configured-instance/           # 🔄 Future - post-configure
│   ├── Dockerfile
│   ├── docker-compose.yml         # Example: Docker services
│   └── README.md
├── released-instance/             # 🔄 Future - post-release
│   ├── Dockerfile
│   ├── app-configs/               # Application configurations
│   └── README.md
└── running-instance/              # 🔄 Future - post-run
    ├── Dockerfile
    ├── service-configs/           # Service validation configs
    └── README.md

Benefits of This Architecture

Clear Separation: Each phase has its own directory and concerns
Independent Evolution: Each Dockerfile can evolve independently
Easier Maintenance: Simpler to understand and debug individual phases
Flexible Building: Can build any phase independently
Better Documentation: Each directory can have phase-specific docs

Usage Example

# Build specific phase containers
docker build -f docker/provisioned-instance/Dockerfile -t torrust-provisioned:latest .
docker build -f docker/configured-instance/Dockerfile -t torrust-configured:latest .
docker build -f docker/released-instance/Dockerfile -t torrust-released:latest .
docker build -f docker/running-instance/Dockerfile -t torrust-running:latest .

Implementation Strategy

Phase 1: ✅ COMPLETED

docker/provisioned-instance/ - Base system ready for configuration

Phase 2: Future

docker/configured-instance/ - System with Docker, dependencies installed
- Build FROM torrust-provisioned-instance:latest
- Add Ansible playbook execution results
- Verify Docker daemon, Docker Compose installation

Phase 3: Future

docker/released-instance/ - System with applications deployed
- Build FROM torrust-configured-instance:latest
- Add application artifacts
- Add service configurations

Phase 4: Future

docker/running-instance/ - System with services started and validated
- Build FROM torrust-released-instance:latest
- Start all services
- Run validation checks

Benefits of Docker Phase Architecture

Test Coverage: Complete deployment pipeline testing
Fast Feedback: Test individual phases quickly (~2-3 seconds vs ~17-30 seconds for LXD)
Debugging: Isolate issues to specific deployment phases
Scalability: Easy to add new phases or modify existing ones
Documentation: Each phase self-documents its purpose and setup
Reusability: Containers can be used outside of testing (demos, development)
CI Reliability: Avoids GitHub Actions connectivity issues with nested VMs

Phase-Specific Testing Integration

Each deployment phase has distinct concerns that are tested appropriately:

Provisioned Phase: Base system setup, user management, SSH connectivity
Configured Phase: Software installation, system configuration, dependency management
Released Phase: Application deployment, service configuration, artifact management
Running Phase: Service validation, monitoring setup, operational readiness

This architecture enables:

Testing Isolation: E2E tests can target specific phases independently
Development Workflow: Teams can work on different phases independently
Issue Isolation: Phase-specific containers make it easier to isolate problems

The Docker phase architecture complements the split E2E testing strategy by providing fast, reliable containers for configuration testing while maintaining comprehensive coverage of the entire deployment pipeline.

�📝 Contributing to E2E Tests

When adding new features or making changes:

Infrastructure Changes

For OpenTofu, LXD, or cloud-init modifications:

Update provision tests in src/bin/e2e_provision_tests.rs
Add validation methods for new infrastructure components
Test locally: cargo run --bin e2e-provision-tests
Verify CI passes on .github/workflows/test-e2e-provision.yml

Configuration Changes

For Ansible playbooks or software installation modifications:

Update configuration tests in src/bin/e2e_config_tests.rs
Add validation methods for new software components
Update Docker image in docker/provisioned-instance/ if needed
Test locally: cargo run --bin e2e-config-tests
Verify CI passes on .github/workflows/test-e2e-config.yml

End-to-End Integration

For comprehensive changes affecting multiple components:

Test with full local suite: cargo run --bin e2e-tests-full
Verify both provision and configuration suites pass independently
Update this documentation to reflect changes
Consider split approach: Can the change be tested in isolated suites?

Test Design Principles

Provision tests: Focus on infrastructure readiness, minimal network dependencies
Configuration tests: Focus on software functionality, reliable network access via containers
Full local tests: Comprehensive validation for development workflows
Independence: Each suite should be runnable independently without conflicts

The split E2E testing approach ensures reliable CI while maintaining comprehensive coverage of the entire deployment pipeline.

FilesExpand file tree

e2e-testing.md

Latest commit

History