Accepted
2025-01-22
The tracker deployment system needed to add Prometheus as a metrics collection service. Several design decisions were required:
- Enablement Strategy: Should Prometheus be mandatory, opt-in, or enabled-by-default?
- Template Rendering: How should Prometheus templates be rendered in the release workflow?
- Service Validation: How should E2E tests validate optional services like Prometheus?
The decision impacts:
- User experience (ease of getting started with monitoring)
- System architecture (template rendering patterns)
- Testing patterns (extensibility for future optional services)
Prometheus is included by default in generated environment templates but can be disabled by removing the configuration section.
Implementation:
pub struct UserInputs {
pub prometheus: Option<PrometheusConfig>, // Some by default, None to disable
}Configuration:
{
"prometheus": {
"scrape_interval": 15
}
}Disabling: Remove the entire prometheus section from the environment config.
Rationale:
- Monitoring is a best practice - users should get it by default
- Opt-out is simple - just remove the config section
- No complex feature flags or enablement parameters needed
- Follows principle of least surprise (monitoring expected for production deployments)
Each service renders its templates independently in the release handler, not from within other service's template rendering.
Architecture:
ReleaseCommandHandler::execute()
├─ Step 1: Create tracker storage
├─ Step 2: Render tracker templates (tracker/*.toml)
├─ Step 3: Deploy tracker configs
├─ Step 4: Create Prometheus storage (if enabled)
├─ Step 5: Render Prometheus templates (prometheus.yml) - INDEPENDENT STEP
├─ Step 6: Deploy Prometheus configs
├─ Step 7: Render Docker Compose templates (docker-compose.yml)
└─ Step 8: Deploy compose files
Rationale:
- Each service is responsible for its own template rendering
- Docker Compose templates only define service orchestration, not content generation
- Environment configuration is the source of truth for which services are enabled
- Follows Single Responsibility Principle (each step does one thing)
- Makes it easy to add future services (Grafana, Alertmanager, etc.)
Anti-Pattern Avoided: Rendering Prometheus templates from within Docker Compose template rendering step.
E2E validation uses a ServiceValidation struct with boolean flags instead of function parameters.
Implementation:
pub struct ServiceValidation {
pub prometheus: bool,
// Future: pub grafana: bool,
// Future: pub alertmanager: bool,
}
pub fn run_release_validation(
socket_addr: SocketAddr,
ssh_credentials: &SshCredentials,
services: Option<ServiceValidation>,
) -> Result<(), String>Rationale:
- Extensible for future services without API changes
- More semantic than boolean parameters
- Clear intent:
ServiceValidation { prometheus: true } - Follows Open-Closed Principle (open for extension, closed for modification)
Anti-Pattern Avoided: run_release_validation_with_prometheus_check(addr, creds, true) - too specific and not extensible.
-
Better User Experience:
- Users get monitoring by default without manual setup
- Simple opt-out (remove config section)
- Production-ready deployments out of the box
-
Cleaner Architecture:
- Each service manages its own templates independently
- Clear separation of concerns in release handler
- Easy to add future services (Grafana, Alertmanager, Loki, etc.)
-
Extensible Testing:
- ServiceValidation struct easily extended for new services
- Consistent pattern for optional service validation
- Type-safe validation configuration
-
Maintenance Benefits:
- Independent template rendering simplifies debugging
- Each service's templates can be modified independently
- Clear workflow steps make issues easier to trace
-
Default Overhead:
- Users who don't want monitoring must manually remove the section
- Prometheus container always included in default deployments
- Slightly more disk/memory usage for minimal deployments
-
Configuration Discovery:
- Users must learn that removing the section disables the service
- Not immediately obvious from JSON schema alone
- Requires documentation of the opt-out pattern
- Breaking Changes: Future Prometheus config schema changes require careful migration planning
- Service Dependencies: Adding services that depend on Prometheus requires proper ordering logic
- Template Complexity: As services grow, need to ensure independent rendering doesn't duplicate logic
Approach: Always deploy Prometheus, no opt-out.
Rejected Because:
- Forces monitoring on users who don't want it
- Increases minimum resource requirements
- Violates principle of least astonishment for minimal deployments
Approach: Prometheus disabled by default, enabled with "prometheus": { "enabled": true }.
Rejected Because:
- Requires users to discover and enable monitoring manually
- Most production deployments should have monitoring - opt-in makes it less likely
- Adds complexity with enabled/disabled flags
Approach: Docker Compose template rendering step also renders Prometheus templates.
Rejected Because:
- Violates Single Responsibility Principle
- Makes Docker Compose step dependent on Prometheus internals
- Harder to add future services independently
- Couples service orchestration with service configuration
Approach: run_release_validation(addr, creds, check_prometheus: bool).
Rejected Because:
- Not extensible - adding Grafana requires API change
- Less semantic - what does
truemean? - Becomes unwieldy with multiple services
- Violates Open-Closed Principle
- Template System Architecture - Project Generator pattern
- Environment Variable Injection - Configuration passing
- DDD Layer Placement - Module organization
- Issue: #238 - Prometheus Slice - Release and Run Commands
- Manual Testing Guide: Prometheus Verification
- Prometheus Documentation: https://prometheus.io/docs/
- torrust-demo Reference: Existing Prometheus integration patterns