docs: [#310] research database backup strategies#312
Merged
josecelano merged 31 commits intomainfrom Jan 30, 2026
Merged
Conversation
Research documentation covering: SQLite backup strategies: - Backup approaches (.backup command, VACUUM INTO, file copy risks) - WAL mode analysis with checkpointing behavior - Backup verification and restore procedures - Torrust Live Demo analysis (current unsafe cp, proposed .backup) Container backup architectures: - 5 patterns documented (Host Crontab, Centralized, Sidecar, Orchestrator, External Tool) - Comparison matrix with pros/cons - Decision flowchart for pattern selection Backup tools evaluation: - Restic: Recommended - mature, encrypted, deduplicated, Docker support - Kopia: Alternative - newer, more features (GUI, ECC, server mode) - Rustic: Discarded - beta status, not production-ready - Two-phase backup approach (DB dump → file backup) Key findings: - Use .backup command for SQLite (Online Backup API, safe during writes) - WAL mode optional for safe backups (useful for read performance) - Restic is best fit: battle-tested, simple, Docker-native, sufficient features Related issues created on torrust-demo: - Issue #85: Use .backup instead of cp - Issue #86: Evaluate WAL mode for high-traffic scenario
773c4f9 to
848dbde
Compare
Key conclusions: - SQLite: Use .backup command (Online Backup API), WAL mode optional - Tool: Restic recommended (mature, encrypted, Docker-native) - Scope: Document best practices but don't automate in deployer yet Rationale for not automating: - Backup strategies are opinionated and vary by user preference - Cloud providers offer native backup/snapshot tools - Some users prefer infrastructure-level over application-level backups - Adding backup automation increases configuration complexity Recommended approach: - Document best practices (done) - Implement manually in Torrust Live Demo - Provide templates/examples for users who want to implement
- Add MySQL backup approaches documentation (mysqldump, Percona XtraBackup) - Document InnoDB lock-free backup with --single-transaction - Add sidecar container backup solution as recommended pattern - Document files to backup with host-to-container path mapping - Add Restic best practices section (staging pattern, tags, verification) - Update issue spec to mark SQLite research goals complete - Add technical terms to project dictionary
- Add proof-of-concept implementation plan for sidecar container - Document performance/scalability considerations (17GB database) - Answer all open questions with decisions - Add backup execution flexibility requirements
- Environment manual-test-sidecar-backup created and running - Verified all 4 MySQL tables use InnoDB engine - Tracker API accessible and responding - Documented instance details and validation results
- Unified backup.sh script handles MySQL and config backups - Configuration-driven via environment variables (no rebuild needed) - backup-paths.txt file for flexible path specification - Standardized storage structure: etc/ lib/ log/ - Backs up: .env, docker-compose.yml, tracker/etc, prometheus/etc, grafana/provisioning - Removed separate backup-mysql.sh and entrypoint.sh scripts
- Add production-considerations.md documenting security, performance, reliability, and operational issues to address for production use - Update Dockerfile to run as torrust user (uid=1000) instead of root - Matches host app user for correct backup file ownership
- Rename from 'Archive Creation' to 'Backup Maintenance' - Two-phase approach: raw backup then compress/cleanup - Add compression for config files older than 1 hour - Add retention policy with BACKUP_RETENTION_DAYS env var - Document no-overlap behavior of sequential loop - Explain when restic would be needed vs simple bash
…ntion) - Add run_maintenance() function after each backup cycle - Implement compress_old_config_backups() - package configs older than 1 hour - Implement apply_retention_policy() - delete backups older than N days - Add BACKUP_RETENTION_DAYS env var (default: 7) to docker-compose - Update script and Dockerfile headers with new env var documentation - MySQL dumps compressed immediately during backup for efficiency - Config files packaged later in maintenance phase for storage efficiency
- Add comprehensive function documentation with Arguments, Returns, Side Effects - Document all 25+ functions with consistent style - Add explanatory comments for complex logic (packaging rationale, streaming) - Fix counting bugs: replace grep -c with wc -l for reliable integer results - Simplify delete_old_files_from using find -delete - Script is now ~46% documentation (264/570 lines)
- Add 44 unit tests covering all helper functions
- Test naming follows project convention: it_should_{behavior}_when_{condition}
- Tests run during Docker build - build fails if tests fail
- Multi-stage Dockerfile: test stage creates marker file, production stage requires it
- Make constants configurable for test isolation (BACKUP_DIR_MYSQL, etc.)
- Fix is_comment_or_empty to handle whitespace-only lines
Tested functions:
- Text processing: is_comment_or_empty, trim_whitespace
- Configuration: get_interval, get_retention_days, get_paths_file, is_mysql_enabled
- File system: ensure_directory_exists, get_file_size, has_valid_paths_file
- MySQL: generate_mysql_backup_path, validate_mysql_configuration
- Maintenance: cleanup_empty_directories, delete_old_files_from
- Logging: log, log_header, log_item, log_error
Tested and documented all restore procedures: - MySQL restore to test database (validation) - MySQL restore to production database - Config file restore - Full disaster recovery simulation Key findings: - RTO ~15 seconds for small databases - All 4 tables restored correctly - Tracker healthy after restore - Hidden files (.env) need explicit copy Documented issues: - cp -r dir/* doesn't copy hidden files - MySQL 'keys' is a reserved word (but backup handles this)
Real-world testing on 17GB Torrust Demo production database: - SQLite .backup command is unusable for large databases under load - Ran 16+ hours, stalled at 10% (1.7GB of 17GB) - Effective rate: ~37 MB/hour vs disk capable of 445 MB/s - Never completed due to constant restart-on-modification - Maintenance window approach tested and verified - 72 seconds for complete 17GB backup (with tracker stopped) - ~90 seconds total downtime including stop/start - Off-site transfer: 9 minutes at 32.3 MB/s - Added size-based scalability recommendations - <1GB: use .backup (works well) - 1-10GB: consider maintenance window - >10GB: must use alternatives (LVM/ZFS snapshots, Litestream) - Documents alternative approaches for large databases - Filesystem snapshots (instant, no downtime) - VACUUM INTO (compacted copy) - Litestream (continuous replication) - WAL mode with checkpoint control
- Complete Phase 7 (Documentation Update) with lessons learned - Update preliminary conclusions with critical large database warning - Mark POC as complete (all 7 phases done) Key findings documented: - Sidecar container pattern only practical for databases < 1GB - SQLite .backup stalls for large databases under concurrent load - Maintenance window backup (72s for 17GB) is the practical alternative - 44 unit tests validate backup script behavior
Reorganize the backup-strategies research documentation: - Move database-specific docs to databases/ (mysql/, sqlite/) - Move container-backup-architectures.md to architectures/container-patterns.md - Rename preliminary-conclusions.md to conclusions.md - Rename requirements-notes.md to requirements.md - Move POC files to solutions/sidecar-container/ - Move sidecar-container.md to solutions/sidecar-container/design.md - Delete redundant proof-of-concept.md New structure: - databases/mysql/ - MySQL backup approaches - databases/sqlite/ - SQLite backup approaches and large DB findings - architectures/ - Container backup patterns - tools/ - Backup tool evaluations (restic, etc.) - solutions/sidecar-container/ - Complete POC with phases and artifacts All internal links updated to reflect new paths.
Add two proposed solutions for handling large database backups: - exclude-statistics: Backup only essential data, exclude stats tables - maintenance-window: Host-level backup with service stop/restart These alternatives address the finding that sidecar container backup is only practical for databases < 1GB.
Analyzed 17GB production database: - torrents table: 161M rows (~8 GB, 99.8% of DB) - 96.9% of torrents have completed=0 (never downloaded) - Excluding these reduces backup to ~247 MB (98.5% reduction) Key limitation documented: This reduces backup SIZE but NOT backup TIME under heavy load due to SQLite locking contention.
…ault backup interval - Add maintenance-window artifacts folder with: - maintenance-backup.sh: host-level orchestration script - maintenance-backup.cron: crontab entry for daily 3 AM backup - backup-container/: Dockerfile and backup script with BACKUP_MODE support - docker-compose files and environment config - Update default BACKUP_INTERVAL from 120s to 86400s (24 hours) - Replace inline script in README with artifacts folder reference - Both sidecar and maintenance-window solutions share the same backup script
- Add backup_test.bats for maintenance-window backup script (48 tests) - Update sidecar-container tests with BACKUP_MODE tests - Update default interval tests from 120s to 86400s (24 hours) - Add shellcheck directives for bats-specific warnings - Add clarifying comments about /data mount point in docker-compose files - Document why we mount entire deployment directory (root .env + storage/) - Add 'subshells' to project dictionary
- Update conclusions.md with recommended solution section - Compare maintenance-window vs sidecar approaches - Update solutions/README.md to recommend maintenance-window - Update main README.md with new recommendation - Update sidecar-container README.md with limitation warning - Fix reference paths in conclusions.md The maintenance-window hybrid approach is now the recommended solution: - 95%+ of logic in portable container - Works for databases of any size (17GB in ~90s vs 16+ hours) - Simple crontab + ~50 lines of host script - Could be automated by deployer in Configure phase
- Add SQLite backup functionality alongside MySQL support - Add BACKUP_SQLITE_ENABLED and SQLITE_DATABASE_PATH environment variables - Add backup_sqlite(), generate_sqlite_backup_path(), dump_sqlite_database() functions - Add is_sqlite_enabled(), get_sqlite_database_path(), validate_sqlite_configuration() - Add sqlite3 package to Dockerfile for database backup - Change default BACKUP_MODE from continuous to single - Change default BACKUP_INTERVAL from 120s to 86400s (24 hours) - Make logging consistent: both MySQL and SQLite show database details conditionally - Add 10 new SQLite unit tests (58 total tests) - Rename docker-compose-with-backup.yml to docker-compose-with-backup-mysql.yml - Add docker-compose-with-backup-sqlite.yml for SQLite deployments - Add maintenance-backup-test.cron for testing with 2-minute interval
Captures practical concerns and edge cases discovered during research: - Template complexity for MySQL vs SQLite env vars - Path translation between host/container contexts - SSH agent key selection issues - Backup container single mode behavior - Configuration validation discoveries - Crontab and log rotation considerations - Open questions for future implementation
Member
Author
|
ACK 1168780 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Comprehensive research documentation for database backup strategies as part of Epic #309 (Add backup support).
This PR includes complete research for SQLite and MySQL backup strategies, backup tools evaluation, container backup architectures, a working proof-of-concept backup container with 58 bats-core unit tests, and a recommended solution (Maintenance Window Hybrid approach).
What's Included
Database Backup Strategies
SQLite
.backupcommand (Online Backup API),VACUUM INTO, file copy riskscp), proposed improvements.backupstalls at 10% after 16+ hours for 17GB database (~37 MB/hour effective rate). Maintenance window backup completes in 72 seconds.MySQL
mysqldump, physical backups, binary log backupsContainer Backup Architectures
Backup Tools Evaluation
Solution Comparison (NEW)
Four backup solutions evaluated with detailed trade-off analysis:
Recommended Solution: Maintenance Window Hybrid (95% container, 5% host script)
Maintenance Window Backup POC (Complete - NEW)
A working proof-of-concept with 58 bats-core unit tests supporting both MySQL and SQLite:
POC Artifacts:
backup.shscript with modular functionsmaintenance-backup.shhost orchestration scriptKey Findings
.backupcommand (Online Backup API) - safe during concurrent writes.backupimpractical for DBs > 1GB due to locking overhead (~37 MB/hour).backup)mysqldumpworks reliably for containerized deploymentsLessons Learned (Implementation Concerns)
Key pain points discovered during POC that affect future implementation:
IdentitiesOnly=yesRelated Issues
.backupinstead ofcpChecklist
Research Complete
POC Complete
Future Work (out of scope for this PR)
Documentation Structure