nanobot

mirror of https://github.com/HKUDS/nanobot.git synced 2026-06-21 02:03:59 +00:00

History

hussein1362 75c2506c07 fix(cron): atomic write for jobs.json + don't silently overwrite corrupt store

Two related bugs that together caused scheduled jobs to disappear after
a container restart:

1. `_save_store()` used `Path.write_text(...)`, which truncates the
   destination in place.  A SIGKILL or shutdown mid-write left
   `jobs.json` either truncated or corrupt.

2. `_load_jobs()` caught any parse error, logged at WARNING, and
   returned an empty list.  `start()` then called `_save_store()`
   immediately, overwriting the corrupt-but-recoverable file with an
   empty job array.  Every scheduled job was silently lost with only a
   single warning line in the log.

Reproduction in production: container restart at 18:08, after which a
job that had fired correctly for two consecutive days never fired
again.  jobs.json on disk was missing the job entirely.

Fix:
- `_save_store()` now writes via temp file + `os.replace` + `fsync`
  (matches the session manager pattern from 512bf59,
  "fix(session): fsync sessions on graceful shutdown to prevent data
  loss").  An interrupted write cannot corrupt the live file.
- `_load_jobs()` now moves a corrupt store aside as
  `jobs.json.corrupt-<ts>` and returns `None` instead of `[]`.
- `start()` aborts with a `RuntimeError` when the on-disk store is
  corrupt, instead of starting empty and overwriting.
- `_load_store()` falls back to the previous in-memory snapshot when
  a hot reload encounters a corrupt file, so a transient corruption
  after start does not drop live jobs.

Tests cover the atomic-write path, the corrupt-file preservation,
the start-time refusal, the in-memory fallback, and a basic save/load
round trip across two service instances.  Existing 79 cron tests and
full suite (2553 tests) still pass.

2026-05-04 00:16:39 +08:00

test_cron_persistence.py

fix(cron): atomic write for jobs.json + don't silently overwrite corrupt store

2026-05-04 00:16:39 +08:00

test_cron_service.py

fix(cron): persist channel_meta and session_key across reloads

2026-04-27 12:45:00 +08:00

test_cron_tool_list.py

fix(slack): preserve thread context for proactive replies

2026-04-27 02:10:38 +08:00

test_cron_tool_schema_contract.py

fix(cron): state per-action requirements in descriptions, keep list/remove callable

2026-04-17 22:52:48 +08:00