Backup writer

A periodic snapshot of the volatile state under programfiles/ — every authored document plus the user store — written as a zip archive on a configurable interval, with a configurable retention cap. The shape mirrors the sitemap writer: one tokio task per process, kicked off from main after the runtime is up.

Bootstrap

backup::spawn_writer() is called from main.rs right after the runtime comes online. It’s a no-op when either backup_interval_secs or backup_retain is missing, zero, or negative — config_u64 only returns Some for strictly positive integers, and both gates short-circuit to return before any thread is spawned. A deployment that hasn’t set those keys stays completely quiet; nothing is written, no background task exists.

What’s archived

INCLUDE_PATHS in src/backup.rs lists exactly two directories:

  • programfiles/content/ — the doc tree (every resource’s ctx.json, body files, perms).
  • programfiles/local_auth/ — the user store.

What’s NOT archived

  • programfiles/op/ — checked-in static config (config.json, navbar.json, support_lang.json, robots.txt). Losing it is lossless; it comes back with git pull.
  • The backup destination itself. Including it would compound — every cycle’s archive would grow by the size of every previous archive.

Archive naming

backup-<unix_ts>.zip, where <unix_ts> is unix_now() at the start of the cycle. The timestamp embeds into the filename so lexical sort matches chronological order — list_archives and prune rely on that. The on-disk mtime additionally preserves the human-readable timestamp for ls -l.

Atomic write

run_cycle writes to <archive>.tmp and then renames over the final path. Same atomicity trick routes/sitemap.rs::persist uses. If archive_paths_to_file fails partway through, the .tmp file is unlinked and the cycle returns without renaming, so a half-written archive never appears under a canonical backup-<ts>.zip name. If the rename itself fails, the .tmp is also unlinked — list_archives filters by the backup-*.zip pattern so a stray .tmp wouldn’t be counted anyway, but the cleanup keeps the destination tidy.

Retention and pruning

prune(dir, retain) lists archives via list_archives (which sorts lexically, i.e. oldest first thanks to the timestamped names), then deletes everything except the newest retain entries. Individual remove_file errors are logged and swallowed — a single undeletable file shouldn’t block the pruning of the rest, and certainly shouldn’t block the next cycle’s write.

Initial snapshot

spawn_writer runs one refresh().await before entering the tokio::time::sleep loop, so a fresh restart always has a recent snapshot even on hosts that get restarted more often than the configured interval.

Config keys

Read fresh from programfiles/op/config.json each cycle (the writer doesn’t cache config beyond a single function call), so editing the file takes effect on the next tick — no restart required.

  • backup_interval_secs — seconds between snapshots. Goes through config_u64, which means zero / negative / missing all mean “disabled”. When disabled, spawn_writer returns immediately without spawning the task.
  • backup_retain — keep at most this many archives. Same config_u64 semantics; same “disabled = no task” behavior. Required because without a cap, archives grow unbounded.
  • backup_dir — destination. Optional string. Empty / missing falls back to DEFAULT_BACKUP_DIR (programfiles/backup). Relative paths resolve from the working directory, so an external mount works without code changes — just point it at /mnt/backups/fds or similar.

Restore procedure

  1. Stop the server.
  2. unzip backup-<ts>.zip at the repo root. Archive entry paths preserve each include’s leading folder (content/foo.md for the programfiles/content include; see src/zip.rs::archive_paths_to_file for how strip_root is set to the include’s parent), so a top-level unzip recreates the programfiles/content/... and programfiles/local_auth/... layout.
  3. Restart.

No metadata is stripped; perms files, ctx.jsons, and body files all round-trip byte-for-byte. The test run_cycle_writes_archive_with_payload in src/backup.rs verifies the round-trip on a synthetic workspace.

Tests

The #[cfg(test)] mod tests block in src/backup.rs shows a pattern worth reusing for any module that touches the filesystem. Each test owns a fresh target/test_backup_<name>/ directory built by a workspace helper that remove_dir_alls and re-creates the path, so cargo’s default parallel test execution doesn’t create cross-test interference. target/ is already gitignored, so the test workspace can’t leak into a commit even if cleanup fails. Three tests cover the contract: run_cycle_writes_archive_with_payload (single cycle, round-trip payload), prune_keeps_n_newest (retention behaviour with 10 fake archives pruned to 3), and list_archives_ignores_unrelated_files (filter rejects .tmp and unrelated user files). The fourth, run_cycle_no_op_when_no_paths_present, pins the empty-includes case to “do nothing” rather than “write an empty archive”.

See SEO and sitemap for the parallel background-writer + atomic-rename pattern.