mirror of https://github.com/HKUDS/nanobot.git synced 2026-06-15 07:14:08 +00:00

chengyongru 271b3651d7 refactor: use cron turn naming internally

2026-06-12 11:57:35 +08:00

9.1 KiB

Raw Blame History

Cron / Session / Memory Design Decisions

This note records the agreed design direction for fixing the mismatch between scheduled cron jobs and chat session memory.

Problem

User-created cron jobs currently run their agent turn under an internal key such as cron:{job.id} and only deliver the final response back to the user channel. That splits the turn's working memory from the session where the user sees and continues the conversation.

The visible failure mode is awkward: a cron job reports something into a chat, the user discusses it in that chat, and the next cron run behaves as if that discussion never happened.

The fix is not to make cron a separate delivery system. A user cron job should be a scheduled input into a session.

Core Model

For new user-created cron jobs, payload.session_key is the canonical anchor.

The cron job belongs to that session.
The cron job reads that session's memory/history.
The cron job produces a normal session turn.
There is no separate delivery target concept for new jobs.

Legacy fields remain in the store only for compatibility:

payload.channel
payload.to
payload.channel_meta
payload.deliver

These fields are legacy-only. New cron creation should not depend on them.

Job Categories

Use explicit branching:

Bound user cron job: payload.kind == "agent_turn", payload.session_key is present, and no legacy delivery fields (deliver, channel, to, or channel_meta) are set. This uses the new session-turn model.
Legacy unbound cron job: user job with no payload.session_key. Keep the existing behavior. Do not migrate, infer, bind, or add UI for these jobs in this change.
System job: payload.kind == "system_event" or known internal jobs such as dream / heartbeat. Keep their specialized paths.

The project should not grow a compatibility subsystem for legacy jobs. Missing session_key means old behavior.

New Job Creation

CronTool must create user cron jobs with a session_key.

If no request/session context exists, cron action=add should fail.
Do not create new unbound jobs.
Do not infer session_key from channel/to for new jobs.
Remove deliver from the advertised tool schema. It can remain as a Python compatibility argument, but it must not affect new bound jobs.
New bound jobs should persist message and session_key; legacy delivery fields should not be populated as part of the new path.

Execution Path

Bound user cron jobs should execute through AgentLoop as internal inbound session events, not as an out-of-band agent.process_direct() call.

The intended flow is:

cron due -> create cron inbound -> AgentLoop dispatches session turn

The inbound event should carry metadata identifying the cron run, such as:

job id
job name
run id
prompt reference
persisted trigger content

This keeps locking, runtime status, session persistence, and WebUI behavior on the same path as normal chat turns.

session_key is the ownership anchor, but an InboundMessage still needs an execution context. Bound cron must resolve channel, chat_id, and any channel metadata from the target session/session metadata. It must not fall back to legacy payload.channel, payload.to, or payload.channel_meta for bound jobs. Those fields are only for the legacy unbound path.

The scheduler must not mark a bound job run as complete just because the inbound event was queued. It should either wait for the cron turn to complete and record the real outcome, or explicitly model the run as separate states such as queued and turn_completed. A failed cron turn must be reflected in the cron run record/job state, not hidden behind a successful enqueue.

Active Session Behavior

Cron must not interrupt an active session turn.

If the target session is idle, run the cron turn immediately.
If the target session is running, defer the cron turn until the current turn completes.
Do not inject the cron turn into the active turn's runtime context.
Do not route cron messages into the existing mid-turn pending injection queue.
UI/runtime status may show that a cron run is queued, but the current LLM call should not see the queued cron turn.

Cron inbound events need explicit metadata, for example _cron_trigger plus _cron_defer_until_session_idle. AgentLoop.run() must recognize that metadata before the existing _pending_queues mid-turn injection branch. If the session is active, the event goes to a deferred cron queue, not the pending injection queue.

The user experience goal is: cron can run after the current answer, but it should not take over an answer already in progress.

Session History

Do not persist the raw internal execution prompt as a normal user message.

Instead, persist a readable cron trigger event, for example:

{
  "role": "user",
  "content": "Scheduled cron job triggered: daily monitor\n\nCheck ...",
  "_cron_turn": true,
  "cron_job_id": "abc123",
  "cron_job_name": "daily monitor",
  "cron_run_id": "abc123:1770000000000",
  "cron_prompt_ref": {
    "id": "cron.agent_turn.reminder",
    "version": 1,
    "sha256": "..."
  }
}

The assistant result should be saved as the normal assistant response for that turn, with source metadata suitable for WebUI rendering.

This gives future turns useful context without leaking internal instruction text into the transcript.

Prompt Traceability

The rendered execution prompt should remain traceable, but it should not be part of normal session history.

Use a named/versioned prompt reference in session history and save the full rendered prompt in an internal run record.

Preferred direction:

Move the cron execution prompt out of commands.py into a named template.
Use a stable prompt id such as cron.agent_turn.reminder.
Store prompt_ref and cron_run_id in session history.
Store the full rendered prompt, prompt variables, and errors in an internal run record.

Avoid putting full prompt text into jobs.json; run records should not make the cron store grow without bound.

Visibility and Evaluation

A bound user cron job is a real session turn.

If it succeeds, save and publish the assistant response.
Do not pass bound cron responses through evaluate_response().
Keep evaluate_response() only for system/legacy paths where the old behavior still applies.
Avoid states where session history contains a response the user never saw.

If a bound cron job starts executing, it must leave a visible closure in the session:

success response
short failure message
or an empty-result status message

Full exceptions and diagnostic details belong in the internal run record, not in the user-facing transcript.

Deleting Sessions

Deleting a session with bound cron jobs should be a two-step operation.

Default delete behavior should block and return the associated cron jobs. Existing WebUI/API response field names are kept for compatibility:

{
  "deleted": false,
  "blocked_by_automations": true,
  "automations": [
    {"id": "abc123", "name": "daily monitor", "enabled": true}
  ]
}

After explicit confirmation, the API may delete the bound user cron jobs and then delete the session/thread.

Rules:

Only block on user-created bound jobs whose payload.session_key equals the session being deleted.
Do not block on system jobs.
Do not block on legacy unbound jobs.
In unified-session mode, WebUI chats display cron jobs owned by unified:default, but deleting an individual websocket:* thread should not block on or delete those unified cron jobs.
If the user manually deletes files outside the WebUI/API, do not try to compensate.

Unified Session Mode

When unified_session is enabled, WebUI-created cron jobs should bind to the same unified session as normal WebUI chat turns: unified:default.

All WebUI chats should display cron jobs owned by unified:default.
Individual WebUI thread deletion should remain scoped to the concrete websocket:* thread being deleted.
Toggling unified_session does not migrate existing cron jobs. Existing jobs keep their stored payload.session_key and continue to execute against that owner until explicitly removed or recreated.

WebUI Scope

This change should not grow into a global scheduler/task manager.

Keep the scope focused:

Fix cron/session/memory semantics for new bound jobs.
Preserve legacy job behavior.
Add deletion protection for sessions with bound cron jobs.
Update the existing WebUI panel that lists scheduled jobs only as needed for the new bound-job status.

Do not add deterministic legacy migration, legacy binding UI, or a global calendar/task manager in this change.

Manual Run

Do not add a user-visible "run now" feature as part of this design.

CronService.run_job() may remain an internal/test helper. It should not become a product surface, and the implementation should avoid creating a separate execution path that behaves differently from scheduled runs.

Non-Goals

No legacy migration.
No automatic binding of legacy jobs.
No runtime-context prompt asking the model to bind jobs.
No new global scheduler/task manager.
No new delivery-target abstraction.
No user-visible manual cron run.

9.1 KiB Raw Blame History