Accepted
2025-10-06
Commands need to persist error context when failures occur. We need to decide what information to store and where.
When a command like ProvisionCommand fails:
- We need to know what failed (which step)
- We need to know why it failed (error details)
- We need complete information for debugging
- We need actionable guidance for users
- Type-safe error handling (not string-based)
- Complete error information (full error chain)
- No serialization constraints on error types
- Error history preservation (multiple attempts)
- Actionable user guidance
Use structured error context with independent trace files.
State File (data/{env}/state.json) contains:
- Enum-based context (type-safe)
- Essential metadata (timing, step, kind)
- Reference to trace file
Trace Files (data/{env}/traces/{timestamp}-{command}.log) contain:
- Complete error chain (all nested errors)
- Root cause analysis
- Suggested actions
- Full debugging context
Instead of requiring errors to implement Serialize, we use a custom Traceable trait:
trait Traceable {
fn trace_format(&self) -> String;
}This allows:
- Custom formatting per error type
- No serialization constraints
- Errors can contain non-serializable data (file handles, sockets, etc.)
Location: data/{env}/traces/ (not build/)
data/= internal application statebuild/= generated artifacts (OpenTofu, Ansible configs)
Naming: {timestamp}-{command}.log (timestamp first)
- Example:
20251003-103045-provision.log - Chronological sorting by default
- Easy to find latest/oldest
Generation: Only on command failure (not success)
- Traces are for debugging failures
- Success cases don't need error details
- Keeps trace directory focused
✅ No Serialization Constraints
- Error types don't need
Serialize + Deserialize - Can use custom
Traceabletrait instead - Errors can contain non-serializable data
✅ Complete Error History
- Multiple failures preserved
- Not overwritten on retry
- Full audit trail maintained
✅ Type-Safe Context
- Pattern matching on enums
- Compile-time guarantees
- No string typos
✅ Independent Concern
- Trace generation separate from state management
- Any command can generate traces
- Flexible implementation
The data/ vs build/ distinction is important:
-
data/{env}/= Internal application statestate.json- current environment statetraces/- error trace history- Managed by the application
- Should be backed up
-
build/{env}/= Generated artifacts- OpenTofu
.tffiles - Ansible inventory files
- Can be regenerated from templates
- Safe to delete
- OpenTofu
{timestamp}-{command}.log format enables:
# List all traces chronologically
ls data/e2e-full/traces/
20251003-103045-provision.log
20251003-105230-provision.log
20251003-110015-configure.log
# Find latest trace
ls -1 data/e2e-full/traces/ | tail -1
# Find all provision failures
ls data/e2e-full/traces/*-provision.logpub struct ProvisionFailed {
failed_step: String,
}Pros: Simple, easy to implement Cons: No type safety, no error details, typo-prone Verdict: Good for MVP, insufficient for production
pub struct ProvisionFailed {
error: ProvisionCommandError, // Requires Serialize
}Pros: Complete information, type-safe Cons: Forces all errors to be serializable, tight coupling Verdict: Too restrictive
pub struct ProvisionFailureContext {
captured_logs: Vec<LogRecord>,
}Pros: Complete execution context Cons: Complex, indirect, format-dependent Verdict: Error trace is more direct
pub struct ProvisionFailureContext {
trace_id: String, // Search logs manually
}Pros: Minimal state Cons: Requires log retention, not self-contained Verdict: Trace files provide better UX
✅ Type-safe error handling with enums
✅ Complete error information preserved
✅ No serialization constraints via Traceable trait
✅ Error history maintained
✅ Better user experience
✅ Independent trace management
ℹ️ Will need trace cleanup policy eventually ℹ️ Requires writable filesystem
This will be implemented in a separate refactor after Phase 5:
- Complete Phase 5 with string-based approach
- Define
ProvisionFailureContextstruct with enums - Define
TraceIdnewtype (wrappingUuid) for type safety - Define
Traceabletrait for error formatting - Implement trace file writer with
PathBuffor file paths - Update commands to generate traces on failure
- Add tests and documentation
Refactor plan: docs/refactors/error-context-with-trace-files.md
We chose structured context + independent trace files because it:
- Provides type-safe pattern matching (enums)
- Captures complete error chains (
Traceabletrait, notSerialize) - Maintains lightweight state files
- Preserves error history
- Stores traces in
data/(internal state), notbuild/(artifacts) - Uses timestamp-first naming for easy sorting
- Generates traces only on failure
This balances simplicity, completeness, and usability better than alternatives.