nanobot

mirror of https://github.com/HKUDS/nanobot.git synced 2026-05-20 00:22:31 +00:00

Author	SHA1	Message	Date
Xubin Ren	7c29a738a5	test(long-task): expect wrapped completion message after validation Align assertions with LongTaskTool final return shape on main. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-13 17:45:06 +00:00
Xubin Ren	ab3d0b7146	chore: merge origin/main into pr-3460	2026-05-13 17:44:00 +00:00
Xubin Ren	78e8cc3e55	fix(long-task): honor final signal and file tracking Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-13 16:53:58 +00:00
Xubin Ren	5efd67919b	feat(runner): support fallback candidates Resolve fallbackModels as preset references or explicit inline provider configs so failover uses complete model settings without exposing fallback logic to the agent loop. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-13 15:34:03 +00:00
Xubin Ren	43db848db0	Revert "feat(runner): support structured fallback models" This reverts commit 02b059a616dc6dc82ad15282102c7b27a5a34e40.	2026-05-13 14:11:08 +00:00
Xubin Ren	02b059a616	feat(runner): support structured fallback models Bind fallback model chains to the active model configuration so defaults and presets do not inherit or merge fallback behavior implicitly. Require explicit fallback providers while preserving per-fallback generation overrides and context-window safety. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-13 13:57:30 +00:00
Xubin Ren	eaa8ebd5d3	Merge remote-tracking branch 'origin/main' into pr-3756	2026-05-13 13:12:56 +00:00
Xubin Ren	fb508a302a	feat(webui): refresh session titles from live updates	2026-05-13 13:10:21 +00:00
chengyongru	913b0774d8	feat(runner): add model failover with fallback_models When the primary model returns a non-transient error and no content has been streamed yet, the runner now tries each model listed in the active preset's fallback_models in order. Each fallback model may reside on a different provider — a temporary provider instance is created on-the-fly via make_provider(config, model=...). Key design: - Failover is request-scoped (does not affect subagents/dream/consolidator) - Provider is restored via try/finally after each fallback attempt - Skipped when content was already streamed to avoid duplicate output - Recursive failover prevented by clearing fallback_models on fallback spec - Circuit breaker trips open after 3 consecutive primary failures (60s cooldown) - Cross-provider routing: fallback model prefix (e.g. groq/) determines provider Fixes: cross-provider fallback was broken because the factory passed the original preset (with provider forced to primary's provider) when creating fallback providers. Now uses provider="auto" so the model string prefix correctly routes to the right provider. Also fixes: log messages now distinguish between primary-failed, previous-fallback-failed, and circuit-open scenarios. closes: https://github.com/HKUDS/nanobot/issues/3376	2026-05-13 17:30:49 +08:00
Xubin Ren	458b4ba235	feat(reasoning): stream reasoning content as a first-class channel Reasoning now flows as its own stream — symmetric to the answer's ``delta`` / ``stream_end`` pair — instead of being shipped as one oversized progress message. This lets WebUI render a live "Thinking…" bubble that updates in place, then auto-collapses when the stream closes. Other channels remain plugin no-ops by default. ## Protocol New metadata: ``_reasoning_delta`` (chunk) and ``_reasoning_end`` (close marker). ChannelManager routes both to the dedicated plugin hooks below; the legacy one-shot ``_reasoning`` is kept for back-compat and BaseChannel expands it into a single delta + end pair so plugins only ever implement the streaming primitives. WebSocket emits two new events: - ``reasoning_delta`` (event, chat_id, text, optional stream_id) - ``reasoning_end`` (event, chat_id, optional stream_id) ## BaseChannel surface - ``send_reasoning_delta(chat_id, delta, metadata)`` — no-op default - ``send_reasoning_end(chat_id, metadata)`` — no-op default - ``send_reasoning(msg)`` — back-compat wrapper, base impl forwards to the streaming primitives A channel adds reasoning support by overriding the two streaming primitives. Telegram / Slack / Discord / Feishu / WeChat / Matrix keep the base no-ops until their bubble UIs are adapted; reasoning silently drops at dispatch, never as a stray text message. ## AgentHook Adds ``emit_reasoning_end`` to the hook lifecycle. ``_LoopHook`` tracks whether a reasoning segment is open and closes it on: - the first answer delta arriving (so the UI locks the bubble before the answer renders below), - ``on_stream_end``, - one-shot ``reasoning_content`` / ``thinking_blocks`` after a single non-streaming response. ## WebUI - ``UIMessage.reasoning`` is now a single accumulated string with a companion ``reasoningStreaming`` flag. - ``useNanobotStream`` consumes ``reasoning_delta`` / ``reasoning_end``; legacy ``kind: "reasoning"`` is auto-translated to a delta + end. - New ``ReasoningBubble``: shimmer header + auto-expanded while streaming, collapses to a clickable "Thinking" pill once closed, respects ``prefers-reduced-motion``. - Answer deltas adopt the reasoning placeholder so the bubble and the answer share one assistant row. ## Tests - ``tests/channels/test_channel_manager_reasoning.py`` — manager routes delta + end, drops on channel opt-out, expands one-shot back-compat. - ``tests/channels/test_websocket_channel.py`` — new ``reasoning_delta`` / ``reasoning_end`` frames, empty-chunk safety, no-subscriber safety, back-compat expansion. - ``tests/agent/test_runner_reasoning.py`` — runner closes the segment on streaming answer start and after one-shot reasoning. - WebUI ``useNanobotStream`` + ``message-bubble`` cover the new protocol and the shimmer styling. ## Docs ``docs/configuration.md`` and ``docs/websocket.md`` document the new events and the plugin contract. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-13 07:13:43 +00:00
Xubin Ren	01fa362c03	Merge origin/main into feat/show-reasoning Resolves conflicts after main landed the state-machine turn refactor and the test_runner.py 9-file split: - nanobot/agent/loop.py: take main's `_state_build`/`_persist_user_message_early` flow; restore the `reasoning: bool` parameter on `_build_bus_progress_callback` so the loop hook can mark progress as reasoning-channel without coupling to the answer stream. - nanobot/cli/stream.py: keep main's configurable `bot_name`/`bot_icon` header while preserving the PR's `transient=True` Live + `self._console` routing + `_renderable()` final-render path that fixed TUI duplication. - tests/agent/test_runner.py was deleted on main and split into 9 focused files; relocated all 6 reasoning tests into a new `test_runner_reasoning.py` matching the new layout, deduplicated the per-test `ReasoningHook` boilerplate through a shared `_RecordingHook` helper. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-13 05:07:14 +00:00
chengyongru	99cc6ee808	test(agent): expand coverage and refactor test structure - Add 42 tests for ContextBuilder (context.py: 0→42 tests) - Add 37 tests for SubagentManager lifecycle (subagent.py: 2→37 tests) - Add 42 unit tests for AutoCompact in isolation - Split monolithic test_runner.py (3313 lines) into 9 focused files: test_runner_core, test_runner_hooks, test_runner_errors, test_runner_safety, test_runner_persistence, test_runner_governance, test_runner_tool_execution, test_runner_injections, test_loop_runner_integration - Add 3 config passthrough tests (temperature/max_tokens/reasoning_effort) - Fix fragile patch.object(__init__) in test_stop_preserves_context - Create shared conftest.py with make_provider/make_loop factories Total: 934 tests passing, 0 regressions	2026-05-13 12:49:17 +08:00
chengyongru	5acae58a13	test(long-task): add boundary tests and fix race conditions - Add 7 edge-case tests: validation crash resilience, hook exception safety, mid-run correction injection, FIFO correction ordering, explicit file changes overriding auto-detection, final budget for max_steps=1, and dynamic budget switching boundaries - Fix assertion in test_long_task_completes_after_multiple_handoffs to match exact prompt format - Remove asyncio timing hack from test_state_exposure - Add asyncio.sleep(0) yield in test_inject_correction_during_execution to prevent race between signal injection and step continuation - All 34 tests passing	2026-05-13 01:26:01 +08:00
Xubin Ren	352aaf0627	refactor(reasoning): unify reasoning extraction across providers Reasoning surfacing was split across three branches in runner.py plus two separate streaming buffers (loop hook and runner progress stream), with three independent display-side gates in the CLI. This collapsed the policy into one source of truth and fixed two real bugs: - Structured `reasoning_content` was suppressed whenever the answer was streamed, because the runner gated emission on `streamed_content`. Providers don't stream `reasoning_content`; it only arrives on the final response, so the answer stream and the reasoning channel are independent. Added `streamed_reasoning` to `AgentHookContext` to track the right bit. - `channels.showReasoning` was subordinated to `sendProgress`. They are orthogonal — turning off progress streaming shouldn't silence reasoning. Reworked the CLI gates accordingly. Single-helper consolidation: - `extract_reasoning(reasoning_content, thinking_blocks, content)` returns `(reasoning_text, cleaned_content)` with a defined fallback order: dedicated field → Anthropic thinking_blocks → inline `<think>`/`<thought>` tags. Models that expose none of these short-circuit to `(None, content)` — zero overhead. - `IncrementalThinkExtractor` replaces the ad-hoc `emit_incremental_think` function and its hand-rolled "emitted cursor" state in both the loop hook and the runner progress stream. Also documented the new `showReasoning` channel option in docs/configuration.md and noted its independence from sendProgress. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-12 17:14:19 +00:00
chengyongru	78ecb2a99a	feat(long-task): major overhaul with structured handoffs, validation, and observability - Structured HandoffState: HandoffTool now accepts files_created, files_modified, next_step_hint, and verification fields instead of a plain string. Progress is passed between steps as structured data. - Completion validation round: After complete() is called, a dedicated validator step runs to verify the claim against the original goal. If validation fails, the task continues rather than returning a false completion. - Dynamic prompt system: 3 Jinja2 templates (step_start, step_middle, step_final) selected based on step number. Final steps get tighter budget and stronger "wrap up" guidance. - Automatic file change tracking: Extracts write_file/edit_file events from tool_events and injects them into the next step's context if the subagent forgot to report them explicitly. - Budget tracking & adaptive strategy: Cumulative token usage is tracked across steps. Per-step tool budget drops from 8 to 4 in the last two steps to force handoff/completion. - Crash retry with graceful degradation: A step that crashes is retried once. Persistent crashes terminate the task and return partial progress. - Full observability hooks for future WebUI integration: - set_hooks() with on_step_start, on_step_complete, on_handoff, on_validation_started, on_validation_passed, on_validation_failed, on_task_complete, on_task_error, and catch-all on_event. - Readable state properties: current_step, total_steps, status, last_handoff, cumulative_usage, goal. - inject_correction() allows external code to send user corrections that are injected into the next step's prompt. - run_step() accepts optional max_iterations for dynamic budget control. All 27 long-task tests and 11 subagent tests pass.	2026-05-13 00:55:52 +08:00
chengyongru	e7214d96ed	fix(long-task): add debug logging for step-level observability	2026-05-12 23:37:00 +08:00
chengyongru	bf5762a3d4	feat(long-task): add LongTaskTool for multi-step agent tasks Implements a meta-ReAct loop where long-running tasks are broken into sequential subagent steps, each starting fresh with the original goal and progress from the previous step. This prevents context drift when agents work on complex, multi-step tasks. - Extract build_tool_registry() from SubagentManager for reuse - Add run_step() for synchronous subagent execution (no bus announcement) - Add HandoffTool and CompleteTool as signal mechanisms via shared dict - Add LongTaskTool orchestrator with simplified prompt (8 iterations/step) - Register LongTaskTool in main agent loop - Add _extract_handoff_from_messages fallback for robustness	2026-05-12 23:37:00 +08:00
Flinn Xie	3a851f8f8d	feat(reasoning): add inline think tag extraction and Anthropic thinking_blocks support Add extract_think() and emit_incremental_think() helpers to extract thinking content from inline <think> and <thought> tags in the content field. This handles models served via Ollama, self-hosted vLLM, or other compatible endpoints that embed reasoning as inline tags instead of using the dedicated reasoning_content API field. Also adds Anthropic thinking_blocks support for extended thinking via the thinking content blocks array. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>	2026-05-12 23:02:59 +08:00
chengyongru	9e15925cf4	refactor(agent): remove ask_user tool The ask_user tool used AskUserInterrupt(BaseException) for mid-turn blocking, creating heavy coupling across runner, loop, and session management. The model now asks questions naturally in response text, the turn ends normally, and the user's next message starts a new turn with session history providing continuity. Removed: - nanobot/agent/tools/ask.py (tool, interrupt, helpers) - tests/agent/test_ask_user.py - webui/src/components/thread/AskUserPrompt.tsx - AskUserInterrupt handling in runner.py - Dual-path message building in loop.py - Pending ask detection via history scanning - button_prompt/buttons emission in WebSocket channel - ask_user references in Slack channel docstrings Preserved (MessageTool uses these independently): - OutboundMessage.buttons field - Channel button rendering (Telegram, Slack, WebSocket)	2026-05-12 22:48:26 +08:00
Xubin Ren	079b37aac5	test(config): cover legacy model defaults without presets Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-12 20:06:22 +08:00
Xubin Ren	13eede5803	refactor(agent): inject runtime model publisher Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-12 20:06:22 +08:00
Xubin Ren	e6103d9312	fix(agent): separate preset snapshots from config reload Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-12 20:06:22 +08:00
Xubin Ren	8fcb24bb7c	refactor(agent): trim model preset runtime wiring Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-12 20:06:22 +08:00
Xubin Ren	c9b84c7b11	fix(config): reserve implicit default model preset Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-12 20:06:22 +08:00
Xubin Ren	bcc4b97183	fix(webui): broadcast runtime model updates Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-12 20:06:22 +08:00
Xubin Ren	b61c6304c3	fix(config): reconcile presets with settings reload Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-12 20:06:22 +08:00
Xubin Ren	c450d6fd3f	fix(config): make model preset switching atomic Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-12 20:06:22 +08:00
chengyongru	6f78267c82	feat(config): add ModelPresetConfig and runtime preset switching - Add `ModelPresetConfig` schema for named model presets - Add `model_presets` dict to `Config` and `model_preset` field to `AgentDefaults` - Add `resolve_preset()` to return effective model params from preset or defaults - Add `@model_validator` to reject unknown preset names - Update `_match_provider()` to use resolved preset model/provider - Update `make_provider()` and `provider_signature()` to use `resolve_preset()` - Add `model_preset` property to `AgentLoop` for atomic runtime switching - Update `AgentLoop.from_config()` to inject a runtime `default` preset - Wire self-tool to inspect/clear preset state - Update CLI display strings to show active preset	2026-05-12 20:06:22 +08:00
Xubin Ren	23312d683e	fix(tools): isolate plugin runtime state Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-12 11:28:20 +08:00
chengyongru	043f0e67f7	feat(tools): introduce plugin-based tool discovery and runtime context protocol This commit implements a progressive refactoring of the tool system to support plugin discovery, scoped loading, and protocol-driven runtime context injection. Key changes: - Add Tool ABC metadata (tool_name, _scopes) and ToolContext dataclass for dependency injection. - Introduce ToolLoader with pkgutil-based builtin discovery and entry_points-based third-party plugin loading. - Add scope filtering (core/subagent/memory) so different contexts load appropriate tool sets. - Introduce ContextAware protocol and RequestContext dataclass to replace hardcoded per-tool context injection in AgentLoop. - Add RuntimeState / MutableRuntimeState protocols to decouple MyTool from AgentLoop. - Migrate all built-in tools to declare scopes and implement create()/enabled() hooks. - Migrate MessageTool, SpawnTool, CronTool, and MyTool to ContextAware. - Refactor AgentLoop to use ToolLoader and protocol-driven context injection. - Refactor SubagentManager to use ToolLoader(scope="subagent") with per-run FileStates isolation. - Register all built-in tools via pyproject.toml entry_points. - Add comprehensive tests for loader scopes, entry_points, ContextAware, subagent tools, and runtime state sync.	2026-05-12 11:28:20 +08:00
chengyongru	a6e993df25	fix(agent): move archived summary into system prompt for KV cache stability - Append [Archived Context Summary] to system prompt instead of injecting it into the user message runtime context, improving KV cache reuse across turns and avoiding consecutive same-role messages. - _last_summary persists in metadata (no pop) for restart survival; summary is re-injected every turn via the stable system prompt. - Remove dynamic "Inactive for X minutes" from _format_summary — use static last_active timestamp instead to preserve KV cache stability. - Pass session_summary through build_messages() so both normal and ask_user paths receive the archived summary in the system prompt. - estimate_session_prompt_tokens now reads _last_summary from metadata to include the summary in token budget estimation. - Remove obsolete session_summary parameter from maybe_consolidate_by_tokens and estimate_session_prompt_tokens call sites in loop.py (summary flows through build_messages instead). - Ensure /new (session.clear()) clears _last_summary from metadata.	2026-05-11 01:25:15 +08:00
Flinn Xie	3a27af0018	feat(cli): display model reasoning content during streaming Add show_reasoning config (default: False) to display model thinking/reasoning content in the TUI during streaming. Reasoning is emitted via a new emit_reasoning hook on AgentHook, gated by the channels config. Display uses ✻ prefix with dim italic styling. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 01:02:49 +08:00
Xubin Ren	9252f4d826	Revert "fix(agent): persist _last_summary across restarts with used sentinel" This reverts commit e5a1416a37b423de95b0fa279e9473110a678112.	2026-05-09 15:00:54 +08:00
chengyongru	e5a1416a37	fix(agent): persist _last_summary across restarts with used sentinel The previous implementation popped _last_summary from session.metadata after injecting it into the prompt, then saved the session. This caused the summary to be permanently lost after a process restart, making the AI forget archived context and appear to ignore memory or reference non-existent previous messages. Replace the destructive pop with a _last_summary_used sentinel: - _last_summary stays in metadata for restart survival - _last_summary_used prevents duplicate injection within the same turn - Clear the sentinel whenever a new summary is generated Updates tests to match the new persistence behavior.	2026-05-09 14:58:38 +08:00
Xubin Ren	3231aaf9ee	fix(image): prevent duplicate delivery and replay artifacts	2026-05-09 05:45:13 +00:00
Xubin Ren	cbd5b06075	fix(memory): align replay overflow with history trimming Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-08 20:37:03 +08:00
Xubin Ren	24daf9a51c	test(memory): accept replay window in consolidation assertion Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-08 20:37:03 +08:00
Xubin Ren	91ade9eaac	fix(memory): consolidate history hidden by replay window Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-08 20:37:03 +08:00
Xubin Ren	e936ed48bd	feat: add image generation tool and WebUI mode Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-08 20:06:23 +08:00
chengyongru	3a2f47d720	fix(onboard): allow empty strings and falsy values in input fields Fixes two related input-handling bugs in the onboard wizard: 1. _input_text treated "" as None, preventing users from clearing optional string fields or entering empty strings intentionally. 2. _input_model_with_autocomplete used `if value else None`, which discarded falsy values such as empty strings or 0. To support clearing optional string fields, add _is_str_or_none() and normalize empty strings to None inside _configure_pydantic_model only when the field annotation is `str \| None`. Required str fields keep "" as a valid value. Also included: - Remember last selected item in provider/channel/model menus for better UX when configuring multiple items. - Rename _SIMPLE_TYPES and _MENU_DISPATCH to lowercase to follow Python naming conventions (they are local variables, not constants). - Remove unused imports in test file. Extracted from PR #3358.	2026-05-08 13:21:51 +08:00
Jefsky	44a341335a	fix(dream): restore cursor with memory state Track the Dream cursor in memory versioning so restores do not skip history after rolling back Dream commits. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-07 01:06:05 +08:00
Xubin Ren	790a03ec28	feat(webui): polish chat layout and titles Align the WebUI sidebar and chat chrome with the updated design, and generate WebUI session titles asynchronously without blocking turns. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-06 22:20:35 +08:00
Tim O'Brien	99209a806d	fix(tool_hints): pass max_length to abbreviate_path for is_path tools The is_path branch in _fmt_known was not passing max_length to abbreviate_path, so read_file, write_file, edit, list_dir, and web_fetch always truncated paths at 40 chars regardless of config. Now all three branches (is_path, is_command, fallback) honor the configured toolHintMaxLength.	2026-05-06 21:18:39 +08:00
Tim O'Brien	daa4a25c9b	feat(config): add toolHintMaxLength to control tool hint truncation Add to config (default: 40, range: 20-500). Controls how many characters of tool hints are shown in progress updates (e.g. '$ cd …/project && npm test'). Set to 120+ to see full commands instead of truncated hints: ```json { "agents": { "defaults": { "toolHintMaxLength": 120 } } } ``` - Thread max_length through format_tool_hints → _fmt_known/_fmt_mcp/_fmt_fallback - Make path abbreviation in _abbreviate_command proportional to max_length - Add TestToolHintMaxLength test class with 5 tests - All 41 existing tests pass	2026-05-06 21:18:39 +08:00
hanyuanling	653de4a7ef	fix(agent): gate provider progress deltas	2026-05-06 21:18:30 +08:00
chengyongru	05e0106592	refactor(logging): preserve tracebacks and add channel context - Preserve tracebacks: logger.error in except blocks → logger.exception - Channel context: BaseChannel injects self.logger = logger.bind(channel=name) - Third-party bridge: redirect_lib_logging() replaces ad-hoc stdlib-to-loguru bridges - Log levels: network timeouts downgraded from ERROR → WARNING - Fix --verbose flag to actually work with loguru (set handler to DEBUG)	2026-05-06 21:17:45 +08:00
Xubin Ren	db14685a69	fix(agent): soften SSRF guard recovery Keep private URL access blocked at the tool boundary, but return a clear non-retryable hint so the agent can recover conversationally instead of aborting the turn. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-06 00:43:00 +08:00
chengyongru	c30e4d86f3	refactor(agent): simplify subagent concurrency with rejection over semaphore Replace the asyncio.Semaphore queueing approach with a simple count check in SpawnTool.execute(). When the concurrency limit is reached, the tool returns an error string so the agent can perceive the reason and adjust its behavior instead of silently queueing. - Remove max_concurrent_subagents parameter threading through AgentLoop, commands.py, and nanobot.py - SubagentManager reads the limit directly from AgentDefaults - SpawnTool checks get_running_count() before calling spawn() - Simplify tests to verify rejection behavior	2026-05-05 22:22:04 +08:00
Xubin Ren	614b21368f	fix(agent): tighten safety guard edge cases Keep the /dev workspace guard exception scoped to the known benign device paths already handled by ExecTool, and add coverage that non-benign /dev targets still get blocked. Also add a streaming regression for tool_error responses so fatal tool failures are delivered by channels instead of being marked as already streamed. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-04 01:25:52 +08:00
Xubin Ren	b8406be215	fix(runner): soft workspace boundary + per-target throttle (#3493 #3599 #3605 ) Replaces PR #3493's blanket fatal abort with a "tell the model + throttle the bypass loop" policy. Workspace-bound rejections are now ordinary recoverable tool errors enriched with a structured "this is a hard policy boundary" instruction; SSRF stays the only marker that aborts the turn. Why the fatal-abort approach broke ---------------------------------- PR #3493 promoted every shell `_guard_command` and filesystem path-resolution rejection to a turn-fatal RuntimeError. Two of those messages (`path outside working dir` and `path traversal detected`) are heuristic substring scans on the raw command, so legitimate commands like `rm <ws>/x.txt 2>/dev/null` or `find . -type f` killed the user's turn (#3599). On channels with outbound dedupe (Telegram) the user just saw silence (#3605), and the noise polluted the LLM's context until it started hallucinating guard rejections on plain relative paths (#3597). Why we still need some throttle --------------------------------- The original #3493 pain point was real: the LLM, refused once, would swap tools and try again -- read_file -> exec cat -> exec cp -> bash -c -> ln -sf -> python -c open(...). Just removing the fatal escape lets that loop run wild until max_iterations. What this commit does --------------------- - `nanobot/utils/runtime.py`: add `workspace_violation_signature` and `repeated_workspace_violation_error`. The signature normalizes filesystem `path` arguments and the first absolute path inside an exec command, so swapping tools against the same outside target hits the same throttle bucket. Two soft attempts are allowed; the third attempt's tool result is replaced with a hard "stop trying to bypass" message that quotes the target path and tells the model to ask the user for help. - `nanobot/agent/runner.py`: split classification into `_is_ssrf_violation` (still fatal) and `_is_workspace_violation` (now soft). All three failure branches in `_run_tool` (prep_error / exception / Error result) route through a shared `_classify_violation` that bumps the per-turn workspace_violation_counts dict and either keeps the tool's own message or substitutes the throttle escalation. `_execute_tools` now threads that dict alongside the existing external_lookup_counts. - `nanobot/agent/tools/shell.py`: append a structured boundary note to every workspace-bound guard rejection (`working_dir could not be resolved`, `working_dir is outside`, `path outside working dir`, `path traversal detected`). SSRF errors stay short and direct so the model doesn't try to "phrase around" them. Existing `2>/dev/null` allow-list and benign device passthrough from the previous commit remain. - `nanobot/agent/tools/filesystem.py`: append the same boundary note to the `outside allowed directory` PermissionError so read_file / write_file / list_dir errors give the LLM the same explicit hint. Tests ----- - `tests/utils/test_workspace_violation_throttle.py` (new): signature collapses across read_file/exec/python -c against the same path, different paths get independent budgets, escalation only fires after the third attempt. - `tests/agent/test_runner.py`: - `test_runner_does_not_abort_on_workspace_violation_anymore` -- v2 contract: filesystem PermissionError is now soft, runner moves to the next iteration and finalizes cleanly. - `test_is_ssrf_violation_remains_fatal` + the existing `test_runner_aborts_on_ssrf_violation` -- SSRF still aborts on the first attempt. - `test_runner_lets_llm_recover_from_shell_guard_path_outside` -- end to end recovery from `path outside working dir`. - `test_runner_throttles_repeated_workspace_bypass_attempts` -- four bypass attempts against the same outside target produce at least one `workspace_violation_escalated` event and the run completes naturally without aborting the turn. - The two `_execute_tools` direct-call tests now pass the new workspace_violation_counts dict. - `tests/tools/test_tool_validation.py`: relax three `==` assertions to `startswith` + "hard policy boundary" substring check to match the new structured error messages. - `tests/tools/test_exec_security.py` keeps the prior `2>/dev/null` regression and the `> /etc/issue` negative case from the previous commit on this branch -- they still pass under the new policy. Coverage status: full pytest 2648 passed / 2 skipped (was 2638 / 2 on origin/main). Ruff is clean for every file touched in this commit. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-04 01:18:39 +08:00

1 2 3 4

199 Commits