Three failure modes addressed:
1. Model reflects HEARTBEAT.md instructions back as output instead of
executing them ("HEARTBEAT.md has active tasks listed...")
2. Model narrates decision logic ("Best judgment call: stay quiet")
3. Model produces empty output for silence, runner treats it as failure,
finalization retry generates "couldn't produce a final answer" which
gets delivered to the user
Changes:
- Add _is_deliverable() pre-filter in HeartbeatService._tick() that catches
finalization fallback messages and leaked reasoning patterns before they
reach the evaluator
- Wrap Phase 2 task input with a delivery-awareness preamble telling the
model its output goes directly to the user's messaging app
- Add meta-reasoning suppression criterion to evaluator template
No changes to agent/loop.py, runner.py, providers, or config schema.
The earlier commits picked up a large amount of Black-style reformatting
(multi-line frozenset / keyword-arg wrapping / docstring blanks / removed
parens) on top of the actual guard fix. @chengyongru flagged it; the
first pass reverted some but not all.
This restores nanobot/providers/base.py, runner.py, heartbeat/service.py,
and utils/evaluator.py to origin/main and reapplies only the guard logic:
- base.py: add should_execute_tools property
- runner.py / heartbeat/service.py / utils/evaluator.py: route through it
+ log a warning when has_tool_calls but finish_reason is anomalous
Net diff vs main is now +87/-4 (was +211/-102) — roughly 30 lines of real
logic, which is what the PR is actually about.
Behavior unchanged from previous HEAD; full suite still 2014 passed.
Made-with: Cursor
Add agent-level timezone configuration with a UTC default, propagate it into runtime context and heartbeat prompts, and document valid IANA timezone usage in the README.
Phase 1 _decide() now includes "Current date/time: YYYY-MM-DD HH:MM UTC"
in the user prompt and instructs the LLM to use it for time-aware scheduling.
Without this, the LLM defaults to 'run' for any task description regardless
of whether it is actually due, defeating Phase 1's pre-screening purpose.
Closes#1929
- Add nanobot/utils/evaluator.py: lightweight LLM tool-call to decide notify/silent after background task execution
- Remove magic token injection from heartbeat and cron prompts
- Clean session history (no more <SILENT_OK> pollution)
- Add tests for evaluator and updated heartbeat three-phase flow
Appends a strict instruction to background task prompts (cron and heartbeat)
directing the agent to return a `<SILENT_OK>` token if there is nothing
material to report. Adds conditional logic to intercept this token and
suppress the outbound message to the user, preventing notification spam
from autonomous background checks.
The HEARTBEAT_OK_TOKEN comparison was broken because the token
itself ("HEARTBEAT_OK" with underscore) was being compared against
a response string that had underscores removed. This made the
condition always fail, preventing the heartbeat service from
recognizing "no tasks" responses.
Now both sides of the comparison remove underscores consistently,
allowing proper matching of the HEARTBEAT_OK token.