Replaces PR #3493's blanket fatal abort with a "tell the model + throttle
the bypass loop" policy. Workspace-bound rejections are now ordinary
recoverable tool errors enriched with a structured "this is a hard policy
boundary" instruction; SSRF stays the only marker that aborts the turn.
Why the fatal-abort approach broke
----------------------------------
PR #3493 promoted every shell `_guard_command` and filesystem path-resolution
rejection to a turn-fatal RuntimeError. Two of those messages (`path
outside working dir` and `path traversal detected`) are heuristic substring
scans on the raw command, so legitimate commands like `rm <ws>/x.txt
2>/dev/null` or `find . -type f` killed the user's turn (#3599). On
channels with outbound dedupe (Telegram) the user just saw silence (#3605),
and the noise polluted the LLM's context until it started hallucinating
guard rejections on plain relative paths (#3597).
Why we still need *some* throttle
---------------------------------
The original #3493 pain point was real: the LLM, refused once, would
swap tools and try again -- read_file -> exec cat -> exec cp -> bash -c
-> ln -sf -> python -c open(...). Just removing the fatal escape lets
that loop run wild until max_iterations.
What this commit does
---------------------
- `nanobot/utils/runtime.py`: add `workspace_violation_signature` and
`repeated_workspace_violation_error`. The signature normalizes
filesystem `path` arguments and the first absolute path inside an
exec command, so swapping tools against the same outside target hits
the same throttle bucket. Two soft attempts are allowed; the third
attempt's tool result is replaced with a hard "stop trying to bypass"
message that quotes the target path and tells the model to ask the
user for help.
- `nanobot/agent/runner.py`: split classification into `_is_ssrf_violation`
(still fatal) and `_is_workspace_violation` (now soft). All three
failure branches in `_run_tool` (prep_error / exception / Error
result) route through a shared `_classify_violation` that bumps the
per-turn workspace_violation_counts dict and either keeps the tool's
own message or substitutes the throttle escalation. `_execute_tools`
now threads that dict alongside the existing external_lookup_counts.
- `nanobot/agent/tools/shell.py`: append a structured boundary note to
every workspace-bound guard rejection (`working_dir could not be
resolved`, `working_dir is outside`, `path outside working dir`,
`path traversal detected`). SSRF errors stay short and direct so the
model doesn't try to "phrase around" them. Existing `2>/dev/null`
allow-list and benign device passthrough from the previous commit
remain.
- `nanobot/agent/tools/filesystem.py`: append the same boundary note to
the `outside allowed directory` PermissionError so read_file / write_file
/ list_dir errors give the LLM the same explicit hint.
Tests
-----
- `tests/utils/test_workspace_violation_throttle.py` (new): signature
collapses across read_file/exec/python -c against the same path,
different paths get independent budgets, escalation only fires after
the third attempt.
- `tests/agent/test_runner.py`:
- `test_runner_does_not_abort_on_workspace_violation_anymore` -- v2
contract: filesystem PermissionError is now soft, runner moves to
the next iteration and finalizes cleanly.
- `test_is_ssrf_violation_remains_fatal` + the existing
`test_runner_aborts_on_ssrf_violation` -- SSRF still aborts on the
first attempt.
- `test_runner_lets_llm_recover_from_shell_guard_path_outside` -- end
to end recovery from `path outside working dir`.
- `test_runner_throttles_repeated_workspace_bypass_attempts` -- four
bypass attempts against the same outside target produce at least
one `workspace_violation_escalated` event and the run completes
naturally without aborting the turn.
- The two `_execute_tools` direct-call tests now pass the new
workspace_violation_counts dict.
- `tests/tools/test_tool_validation.py`: relax three `==` assertions
to `startswith` + "hard policy boundary" substring check to match
the new structured error messages.
- `tests/tools/test_exec_security.py` keeps the prior `2>/dev/null`
regression and the `> /etc/issue` negative case from the previous
commit on this branch -- they still pass under the new policy.
Coverage status: full pytest 2648 passed / 2 skipped (was 2638 / 2
on origin/main). Ruff is clean for every file touched in this commit.
Co-authored-by: Cursor <cursoragent@cursor.com>
PR #3493 promoted every shell `_guard_command` rejection to a turn-fatal
RuntimeError. The two heuristic outputs in that list -- `path outside
working dir` and `path traversal detected` -- routinely false-positive on
benign constructs (e.g. `2>/dev/null`, quoted `..` arguments to sed/find,
absolute paths inside inline scripts), so legitimate workspace commands
silently kill the user's turn (#3599) and the agent never gets a chance
to retry with a different approach (#3605).
Two changes, both narrowly scoped:
- `ExecTool._guard_command` now skips a small allow-list of kernel device
files (`/dev/null`, the standard streams, `/dev/random`, `/dev/fd/N`,
...) before the workspace path check, matched against the pre-resolve
string so symlinks like `/dev/stderr -> /proc/self/fd/2` still hit the
allow-list. Real outside writes such as `> /etc/issue` remain blocked.
- `AgentRunner._WORKSPACE_BLOCK_MARKERS` keeps only the four hard
path-resolution errors from filesystem.py / shell.py and the SSRF
marker. The two heuristic substrings move out of the fatal list, so
the LLM sees them as ordinary tool errors and can self-correct in the
next iteration. SSRF stays fatal because retrying an internal URL
with a different phrasing would defeat the safety boundary.
Tests:
- `tests/tools/test_exec_security.py`: parametrized regression for the
exact #3599 command sample plus other stdio redirects and device
reads; explicit negative case asserts `> /etc/issue` is still blocked.
- `tests/agent/test_runner.py`: `_is_workspace_violation` no longer
fatals on the two heuristic markers, plus an end-to-end case proving
the runner hands the guard error back to the LLM and finalizes the
next turn cleanly.
Keep the new turn-end signal scoped to WebSocket clients, preserve pending tool-call state across trailing tool result rows, and drop the accidental npm lockfile from the Bun-based WebUI.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: allow_patterns take priority over deny_patterns in ExecTool
Previously deny_patterns were checked first with no bypass, meaning
allow_patterns could never exempt commands from the built-in deny list.
This made it impossible to whitelist destructive commands for specific
directories (e.g. build/cleanup tasks).
Changes:
- shell.py: check allow_patterns first; if matched, skip deny check
- shell.py: deny_patterns now appends to built-in list (not replaces)
- schema.py: add allow_patterns/deny_patterns to ExecToolConfig
- loop.py/subagent.py: pass allow_patterns/deny_patterns to ExecTool
- Add test_exec_allow_patterns.py covering priority semantics
* fix: separate deny pattern errors from workspace violation detection
The deny pattern error message "Command blocked by safety guard" was
included in _WORKSPACE_BLOCK_MARKERS, causing deny_pattern blocks to be
misclassified as fatal workspace violations. This meant LLMs had no
chance to retry with a different command — the turn was aborted
immediately.
Changes:
- shell.py: deny/allowlist error messages now use distinct phrasing
("blocked by deny pattern filter" / "blocked by allowlist filter")
- runner.py: remove "blocked by safety guard" from
_WORKSPACE_BLOCK_MARKERS so deny_pattern errors are treated as normal
tool errors (LLM can retry) instead of fatal violations
- workspace path errors still use "blocked by safety guard" and remain
fatal as intended
* fix: update test assertions to match new deny pattern error message
* fix: indentation error in test file
* fix: restore SSRF fatal classification and tidy exec pattern plumbing
Address review feedback on the deny/allow_patterns rework:
- runner.py: re-add "internal/private url detected" to
_WORKSPACE_BLOCK_MARKERS. The earlier marker removal also stripped
fatal classification from SSRF / internal-URL rejections (whose
message still says "blocked by safety guard"), turning a hard
security boundary into something the LLM could retry.
- loop.py / subagent.py: drop `or None` between ExecToolConfig and
ExecTool. The schema default is an empty list and ExecTool already
normalizes None back to [], so the indirection was a no-op.
- shell.py: extract `explicitly_allowed` flag in _guard_command so
allow_patterns are scanned once instead of twice and the control
flow no longer relies on a no-op `pass + else` branch.
- tests/agent/test_runner.py: add a regression test asserting that
the SSRF block message is treated as fatal, while deny/allowlist
filter messages are deliberately non-fatal.
* fix: remove unused exec allow-pattern test import
Keep the new ExecTool allow-pattern coverage clean under ruff.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Xubin Ren <xubinrencs@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Treat workspace and safety guard failures as fatal regardless of whether they arrive from tool preparation, returned tool output, or raised exceptions.
Made-with: Cursor
The max_messages config field in AgentDefaults was accepted by the
schema but never threaded through to the actual get_history() calls
in the agent loop. Both call sites in _process_message hardcoded the
default, so sessions with slow or local models accumulated unbounded
history that inflated prompt tokens and caused LLM timeouts.
Changes:
- Add max_messages field to AgentDefaults (default 0 = use built-in
constant, any positive value caps history replay)
- Store the value on AgentLoop and pass it to get_history() when
non-zero
- Wire the config through all three AgentLoop construction sites in
commands.py (gateway, API server, CLI chat)
- 14 focused tests covering schema validation, init storage, history
slicing, boundary alignment, integration wiring, and the
zero/default path
Move sessionHistoryMaxMessages, sessionHistoryMaxTokens, and
sessionFileMaxMessages out of user-facing config into internal
constants (HISTORY_MAX_MESSAGES=120, FILE_MAX_MESSAGES=2000).
- Remove 3 fields from AgentDefaults and config pipeline
- Sink enforce_file_cap into Session (was AgentLoop)
- Auto-derive token budget from context window (was configurable)
- Net -113 lines across 7 files; 723 tests green
Made-with: Cursor
Past assistant turns in history were prefixed with "[Message Time: ...]"
just like user turns. The model treated these as in-context demos and
started prefixing its own replies with the same marker, leaking
metadata to the user. Prompt-level warnings could not beat dozens of
prior assistant samples.
Annotate only user turns and proactive deliveries
(_channel_delivery=True, i.e. cron / heartbeat pushes whose timing is
the whole point and which are too infrequent to act as demos). Adjacent
user-side timestamps still pin every normal assistant reply for
relative-time reasoning. The now-redundant identity.md warning is
removed along with the demonstration source.
MCP resource/prompt/tool names containing spaces or special characters
(e.g. "PostgreSQL System Information") were forwarded verbatim to model
provider APIs, causing validation errors from both Anthropic and OpenAI
which require names matching ^[a-zA-Z0-9_-]{1,128}$.
Add _sanitize_name() that replaces invalid characters with underscores
and collapses consecutive underscores. Applied in MCPToolWrapper,
MCPResourceWrapper, MCPPromptWrapper constructors and the enabled_tools
filtering logic.
Closes#3468
Capture Slack thread metadata for cron and message-tool deliveries so replies stay in the originating thread, and hydrate first thread mentions with recent Slack context.
Made-with: Cursor
Include persisted turn timestamps when assembling LLM prompts so relative-date references like yesterday and today have concrete anchors.
Made-with: Cursor
#3412 stopped the headline raw_archive bloat but left four adjacent leaks
on the same pollution chain:
- archive() success path appended uncapped LLM summaries to history.jsonl,
so a misbehaving LLM could re-open the #3412 bug from the happy path.
- maybe_consolidate_by_tokens did not advance last_consolidated when
archive() fell back to raw_archive, causing duplicate [RAW] dumps of
the same chunk on every subsequent call.
- Dream's Phase 1/2 prompt injected MEMORY.md / SOUL.md / USER.md and
each history entry without caps, so any legacy oversized record (or an
unbounded user edit) would blow past the context window every dream.
- append_history itself had no default cap, leaving future new callers
one forgotten-cap-away from the same vector.
Changes:
- Cap LLM-produced summaries at 8K chars (_ARCHIVE_SUMMARY_MAX_CHARS)
before writing to history.jsonl.
- Advance session.last_consolidated after archive() regardless of whether
it summarized or raw-archived — both outcomes materialize the chunk;
still break the round loop on fallback so a degraded LLM isn't hammered.
- Truncate MEMORY.md / SOUL.md / USER.md and each history entry in Dream's
Phase 1 prompt preview (Phase 2 still reaches full files via read_file).
- Add _HISTORY_ENTRY_HARD_CAP (64K) as belt-and-suspenders default in
append_history with a once-per-store warning, so any new caller that
forgets its own tighter cap gets caught and observable.
Layer the caps by scope: raw_archive=16K, archive summary=8K,
append_history default=64K. Tight per-caller values cover expected
payloads; the wide default only catches regressions.
Tests: +9 regression tests covering each fix. Full suite: 2372 passed.
Made-with: Cursor
Cover two untested boundaries from #3412:
- _truncate_to_token_budget with positive budget exercises tiktoken
- _MAX_HISTORY_CHARS caps Recent History section in system prompt
Made-with: Cursor
Root cause: when consolidation LLM fails, raw_archive() dumped full message
content (~1MB) into history.jsonl with no size limit. Since build_system_prompt()
injects history.jsonl into every system prompt, all subsequent LLM calls exceeded
the 200K context window with error 1261.
Additionally, _cap_consolidation_boundary's 60-message cap caused consolidation
to get stuck on sessions with long tool chains (200+ iterations), triggering
the raw_archive fallback in the first place.
Three-layer fix:
- Remove _cap_consolidation_boundary: let pick_consolidation_boundary drive
chunk sizing based solely on token budget
- Truncate archive() input: use tiktoken to cap formatted text to the model's
input token budget before sending to consolidation LLM
- Truncate raw_archive() output: cap history.jsonl entries at 16K chars
Extend the existing on_progress callback to carry structured tool-event
payloads alongside the plain-text hint, so channels can render rich
tool execution state (start/finish/error, arguments, results, file
attachments) rather than only the pre-formatted hint string.
Changes
-------
- AgentLoop._tool_event_start_payload() — builds a version-1 start
payload from a ToolCallRequest
- AgentLoop._tool_event_result_extras() — extracts files/embeds from a
tool result dict
- AgentLoop._tool_event_finish_payloads() — maps tool_calls +
tool_results + tool_events from AgentHookContext into finish payloads
- _LoopHook.before_execute_tools() — passes tool_events=[...] to
on_progress together with the existing tool_hint flag
- _LoopHook.after_iteration() — emits a second on_progress call with
the finish payloads once tool results are available
- _bus_progress() — forwards tool_events as _tool_events in OutboundMessage
metadata so channel implementations can read them
- on_progress type widened to Callable[..., Awaitable[None]] on all
public entry points; _cli_progress updated to accept and ignore
tool_events
The contract is additive: callers that only accept (content, *, tool_hint)
continue to work unchanged. Callers that also accept tool_events receive
the structured data.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Discord threads use their own channel IDs, so allowChannels was blocking
thread replies unless each thread ID was listed explicitly.
- Include the thread parent channel ID as an allowlist candidate
- Enforce allow_channels on slash commands (previously bypassed)
- Show parent channel ID in runtime context, reply to the thread
- Fix subagent cancel key via effective_key propagation
- Detect bot mentions via raw_mentions and reply-to-bot references
- Cache seen thread channels for outbound delivery
- Ignore system messages that become empty prompts
When the main agent spawns multiple sub-agents, each completion
independently triggered a new _dispatch, causing 3-4 user-visible
responses instead of a single comprehensive report.
- Extend _drain_pending to block-wait on pending_queue when sub-agents
are still running, keeping the runner loop alive for in-order injection
- Pass pending_queue in the system message path so subsequent sub-agent
results can still be injected mid-turn via a new dispatch
Move the non-int cursor guard out of the two consumer sites and into a
shared ``_iter_valid_entries`` iterator so the invariant lives in one
place. Closes three gaps left by the original fix:
* ``bool`` is now rejected — ``isinstance(True, int)`` is ``True`` in
Python, so the previous guard silently treated ``{"cursor": true}`` as
cursor ``1``.
* Recovery now returns ``max(valid cursors) + 1``. Under adversarial
corruption "first int scanning in reverse" is not the same thing, and
only ``max`` keeps the recovered cursor strictly greater than every
legitimate cursor still on disk.
* Non-int cursors are logged exactly once per ``MemoryStore``. Silently
dropping corrupted entries hides the root cause (an external writer
to ``memory/history.jsonl``); rate-limiting keeps the log clean when
the same poisoned file is read every turn.
All 7 tests from the original fix pass unchanged; 3 new tests pin the
invariants above.
Made-with: Cursor