nanobot

mirror/nanobot

Fork 0

mirror of https://github.com/HKUDS/nanobot.git synced 2026-05-07 02:05:51 +00:00

Commit Graph

Author	SHA1	Message	Date
Xubin Ren	9a9e446f3f	fix(cron): clean persistence lint issues Keep the cron persistence hardening clean under ruff without changing behavior. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-04 00:16:39 +08:00
hussein1362	75c2506c07	fix(cron): atomic write for jobs.json + don't silently overwrite corrupt store Two related bugs that together caused scheduled jobs to disappear after a container restart: 1. `_save_store()` used `Path.write_text(...)`, which truncates the destination in place. A SIGKILL or shutdown mid-write left `jobs.json` either truncated or corrupt. 2. `_load_jobs()` caught any parse error, logged at WARNING, and returned an empty list. `start()` then called `_save_store()` immediately, overwriting the corrupt-but-recoverable file with an empty job array. Every scheduled job was silently lost with only a single warning line in the log. Reproduction in production: container restart at 18:08, after which a job that had fired correctly for two consecutive days never fired again. jobs.json on disk was missing the job entirely. Fix: - `_save_store()` now writes via temp file + `os.replace` + `fsync` (matches the session manager pattern from 512bf59, "fix(session): fsync sessions on graceful shutdown to prevent data loss"). An interrupted write cannot corrupt the live file. - `_load_jobs()` now moves a corrupt store aside as `jobs.json.corrupt-<ts>` and returns `None` instead of `[]`. - `start()` aborts with a `RuntimeError` when the on-disk store is corrupt, instead of starting empty and overwriting. - `_load_store()` falls back to the previous in-memory snapshot when a hot reload encounters a corrupt file, so a transient corruption after start does not drop live jobs. Tests cover the atomic-write path, the corrupt-file preservation, the start-time refusal, the in-memory fallback, and a basic save/load round trip across two service instances. Existing 79 cron tests and full suite (2553 tests) still pass.	2026-05-04 00:16:39 +08:00

Author

SHA1

Message

Date

Xubin Ren

9a9e446f3f

fix(cron): clean persistence lint issues

Keep the cron persistence hardening clean under ruff without changing behavior.

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-05-04 00:16:39 +08:00

hussein1362

75c2506c07

fix(cron): atomic write for jobs.json + don't silently overwrite corrupt store

Two related bugs that together caused scheduled jobs to disappear after
a container restart:

1. `_save_store()` used `Path.write_text(...)`, which truncates the
   destination in place.  A SIGKILL or shutdown mid-write left
   `jobs.json` either truncated or corrupt.

2. `_load_jobs()` caught any parse error, logged at WARNING, and
   returned an empty list.  `start()` then called `_save_store()`
   immediately, overwriting the corrupt-but-recoverable file with an
   empty job array.  Every scheduled job was silently lost with only a
   single warning line in the log.

Reproduction in production: container restart at 18:08, after which a
job that had fired correctly for two consecutive days never fired
again.  jobs.json on disk was missing the job entirely.

Fix:
- `_save_store()` now writes via temp file + `os.replace` + `fsync`
  (matches the session manager pattern from 512bf59,
  "fix(session): fsync sessions on graceful shutdown to prevent data
  loss").  An interrupted write cannot corrupt the live file.
- `_load_jobs()` now moves a corrupt store aside as
  `jobs.json.corrupt-<ts>` and returns `None` instead of `[]`.
- `start()` aborts with a `RuntimeError` when the on-disk store is
  corrupt, instead of starting empty and overwriting.
- `_load_store()` falls back to the previous in-memory snapshot when
  a hot reload encounters a corrupt file, so a transient corruption
  after start does not drop live jobs.

Tests cover the atomic-write path, the corrupt-file preservation,
the start-time refusal, the in-memory fallback, and a basic save/load
round trip across two service instances.  Existing 79 cron tests and
full suite (2553 tests) still pass.

2026-05-04 00:16:39 +08:00

2 Commits