194 Commits

Author SHA1 Message Date
chengyongru
f57e324ea5 test(pairing): cover _PENDING_USER_TURN_KEY cleanup and None allow_from
- Assert pending_user_turn is cleared from session metadata after
  shortcut commands (e.g. /help) in test_auto_compact.py.
- Add test for None allow_from / allowFrom values in
  test_base_channel.py to prevent TypeError regressions.
2026-05-15 10:05:30 +08:00
chengyongru
26665823e3 fix(agent): persist shortcut commands without polluting LLM context
Shortcut commands (e.g. /help, /pairing) skip BUILD and SAVE states,
so their turns were never persisted to the session.  This caused WebUI
chats to appear empty after _turn_end because history hydration reads
from the session file.

Fix by persisting the user message and assistant response inside
_state_command, but tag them with _command=True so Session.get_history
filters them out of LLM context.  /new is excluded because it
intentionally clears the session.

- AgentLoop._persist_user_message_early now accepts **kwargs so
  _state_command can pass _command=True for the user turn.
- Session.get_history skips messages with _command=True.
2026-05-14 23:51:58 +08:00
Xubin Ren
5efd67919b feat(runner): support fallback candidates
Resolve fallbackModels as preset references or explicit inline provider configs so failover uses complete model settings without exposing fallback logic to the agent loop.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 15:34:03 +00:00
Xubin Ren
43db848db0 Revert "feat(runner): support structured fallback models"
This reverts commit 02b059a616dc6dc82ad15282102c7b27a5a34e40.
2026-05-13 14:11:08 +00:00
Xubin Ren
02b059a616 feat(runner): support structured fallback models
Bind fallback model chains to the active model configuration so defaults and presets do not inherit or merge fallback behavior implicitly. Require explicit fallback providers while preserving per-fallback generation overrides and context-window safety.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 13:57:30 +00:00
Xubin Ren
eaa8ebd5d3 Merge remote-tracking branch 'origin/main' into pr-3756 2026-05-13 13:12:56 +00:00
Xubin Ren
fb508a302a feat(webui): refresh session titles from live updates 2026-05-13 13:10:21 +00:00
chengyongru
913b0774d8 feat(runner): add model failover with fallback_models
When the primary model returns a non-transient error and no content
has been streamed yet, the runner now tries each model listed in the
active preset's fallback_models in order.  Each fallback model may
reside on a different provider — a temporary provider instance is
created on-the-fly via make_provider(config, model=...).

Key design:
- Failover is request-scoped (does not affect subagents/dream/consolidator)
- Provider is restored via try/finally after each fallback attempt
- Skipped when content was already streamed to avoid duplicate output
- Recursive failover prevented by clearing fallback_models on fallback spec
- Circuit breaker trips open after 3 consecutive primary failures (60s cooldown)
- Cross-provider routing: fallback model prefix (e.g. groq/) determines provider

Fixes: cross-provider fallback was broken because the factory passed the
original preset (with provider forced to primary's provider) when creating
fallback providers.  Now uses provider="auto" so the model string prefix
correctly routes to the right provider.

Also fixes: log messages now distinguish between primary-failed,
previous-fallback-failed, and circuit-open scenarios.

closes: https://github.com/HKUDS/nanobot/issues/3376
2026-05-13 17:30:49 +08:00
Xubin Ren
458b4ba235 feat(reasoning): stream reasoning content as a first-class channel
Reasoning now flows as its own stream — symmetric to the answer's
``delta`` / ``stream_end`` pair — instead of being shipped as one
oversized progress message. This lets WebUI render a live "Thinking…"
bubble that updates in place, then auto-collapses when the stream
closes. Other channels remain plugin no-ops by default.

## Protocol

New metadata: ``_reasoning_delta`` (chunk) and ``_reasoning_end``
(close marker). ChannelManager routes both to the dedicated plugin
hooks below; the legacy one-shot ``_reasoning`` is kept for back-compat
and BaseChannel expands it into a single delta + end pair so plugins
only ever implement the streaming primitives.

WebSocket emits two new events:

- ``reasoning_delta`` (event, chat_id, text, optional stream_id)
- ``reasoning_end`` (event, chat_id, optional stream_id)

## BaseChannel surface

- ``send_reasoning_delta(chat_id, delta, metadata)`` — no-op default
- ``send_reasoning_end(chat_id, metadata)`` — no-op default
- ``send_reasoning(msg)`` — back-compat wrapper, base impl forwards
  to the streaming primitives

A channel adds reasoning support by overriding the two streaming
primitives. Telegram / Slack / Discord / Feishu / WeChat / Matrix keep
the base no-ops until their bubble UIs are adapted; reasoning silently
drops at dispatch, never as a stray text message.

## AgentHook

Adds ``emit_reasoning_end`` to the hook lifecycle. ``_LoopHook`` tracks
whether a reasoning segment is open and closes it on:

- the first answer delta arriving (so the UI locks the bubble before
  the answer renders below),
- ``on_stream_end``,
- one-shot ``reasoning_content`` / ``thinking_blocks`` after a single
  non-streaming response.

## WebUI

- ``UIMessage.reasoning`` is now a single accumulated string with a
  companion ``reasoningStreaming`` flag.
- ``useNanobotStream`` consumes ``reasoning_delta`` / ``reasoning_end``;
  legacy ``kind: "reasoning"`` is auto-translated to a delta + end.
- New ``ReasoningBubble``: shimmer header + auto-expanded while
  streaming, collapses to a clickable "Thinking" pill once closed,
  respects ``prefers-reduced-motion``.
- Answer deltas adopt the reasoning placeholder so the bubble and the
  answer share one assistant row.

## Tests

- ``tests/channels/test_channel_manager_reasoning.py`` — manager routes
  delta + end, drops on channel opt-out, expands one-shot back-compat.
- ``tests/channels/test_websocket_channel.py`` — new ``reasoning_delta``
  / ``reasoning_end`` frames, empty-chunk safety, no-subscriber safety,
  back-compat expansion.
- ``tests/agent/test_runner_reasoning.py`` — runner closes the segment
  on streaming answer start and after one-shot reasoning.
- WebUI ``useNanobotStream`` + ``message-bubble`` cover the new
  protocol and the shimmer styling.

## Docs

``docs/configuration.md`` and ``docs/websocket.md`` document the new
events and the plugin contract.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 07:13:43 +00:00
Xubin Ren
01fa362c03 Merge origin/main into feat/show-reasoning
Resolves conflicts after main landed the state-machine turn refactor
and the test_runner.py 9-file split:

- nanobot/agent/loop.py: take main's `_state_build`/`_persist_user_message_early`
  flow; restore the `reasoning: bool` parameter on `_build_bus_progress_callback`
  so the loop hook can mark progress as reasoning-channel without coupling to
  the answer stream.
- nanobot/cli/stream.py: keep main's configurable `bot_name`/`bot_icon` header
  while preserving the PR's `transient=True` Live + `self._console` routing
  + `_renderable()` final-render path that fixed TUI duplication.
- tests/agent/test_runner.py was deleted on main and split into 9 focused
  files; relocated all 6 reasoning tests into a new `test_runner_reasoning.py`
  matching the new layout, deduplicated the per-test `ReasoningHook` boilerplate
  through a shared `_RecordingHook` helper.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 05:07:14 +00:00
chengyongru
99cc6ee808 test(agent): expand coverage and refactor test structure
- Add 42 tests for ContextBuilder (context.py: 0→42 tests)
- Add 37 tests for SubagentManager lifecycle (subagent.py: 2→37 tests)
- Add 42 unit tests for AutoCompact in isolation
- Split monolithic test_runner.py (3313 lines) into 9 focused files:
  test_runner_core, test_runner_hooks, test_runner_errors,
  test_runner_safety, test_runner_persistence, test_runner_governance,
  test_runner_tool_execution, test_runner_injections,
  test_loop_runner_integration
- Add 3 config passthrough tests (temperature/max_tokens/reasoning_effort)
- Fix fragile patch.object(__init__) in test_stop_preserves_context
- Create shared conftest.py with make_provider/make_loop factories

Total: 934 tests passing, 0 regressions
2026-05-13 12:49:17 +08:00
Xubin Ren
352aaf0627 refactor(reasoning): unify reasoning extraction across providers
Reasoning surfacing was split across three branches in runner.py plus
two separate streaming buffers (loop hook and runner progress stream),
with three independent display-side gates in the CLI. This collapsed
the policy into one source of truth and fixed two real bugs:

- Structured `reasoning_content` was suppressed whenever the answer was
  streamed, because the runner gated emission on `streamed_content`.
  Providers don't stream `reasoning_content`; it only arrives on the
  final response, so the answer stream and the reasoning channel are
  independent. Added `streamed_reasoning` to `AgentHookContext` to track
  the right bit.
- `channels.showReasoning` was subordinated to `sendProgress`. They are
  orthogonal — turning off progress streaming shouldn't silence
  reasoning. Reworked the CLI gates accordingly.

Single-helper consolidation:

- `extract_reasoning(reasoning_content, thinking_blocks, content)`
  returns `(reasoning_text, cleaned_content)` with a defined fallback
  order: dedicated field → Anthropic thinking_blocks → inline
  `<think>`/`<thought>` tags. Models that expose none of these
  short-circuit to `(None, content)` — zero overhead.
- `IncrementalThinkExtractor` replaces the ad-hoc `emit_incremental_think`
  function and its hand-rolled "emitted cursor" state in both the loop
  hook and the runner progress stream.

Also documented the new `showReasoning` channel option in
docs/configuration.md and noted its independence from sendProgress.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 17:14:19 +00:00
Flinn Xie
3a851f8f8d feat(reasoning): add inline think tag extraction and Anthropic thinking_blocks support
Add extract_think() and emit_incremental_think() helpers to extract thinking content from inline <think> and <thought> tags in the content field. This handles models served via Ollama, self-hosted vLLM, or other compatible endpoints that embed reasoning as inline tags instead of using the dedicated reasoning_content API field.

Also adds Anthropic thinking_blocks support for extended thinking via the thinking content blocks array.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-12 23:02:59 +08:00
chengyongru
9e15925cf4 refactor(agent): remove ask_user tool
The ask_user tool used AskUserInterrupt(BaseException) for mid-turn
blocking, creating heavy coupling across runner, loop, and session
management. The model now asks questions naturally in response text,
the turn ends normally, and the user's next message starts a new turn
with session history providing continuity.

Removed:
- nanobot/agent/tools/ask.py (tool, interrupt, helpers)
- tests/agent/test_ask_user.py
- webui/src/components/thread/AskUserPrompt.tsx
- AskUserInterrupt handling in runner.py
- Dual-path message building in loop.py
- Pending ask detection via history scanning
- button_prompt/buttons emission in WebSocket channel
- ask_user references in Slack channel docstrings

Preserved (MessageTool uses these independently):
- OutboundMessage.buttons field
- Channel button rendering (Telegram, Slack, WebSocket)
2026-05-12 22:48:26 +08:00
Xubin Ren
079b37aac5 test(config): cover legacy model defaults without presets
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 20:06:22 +08:00
Xubin Ren
13eede5803 refactor(agent): inject runtime model publisher
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 20:06:22 +08:00
Xubin Ren
e6103d9312 fix(agent): separate preset snapshots from config reload
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 20:06:22 +08:00
Xubin Ren
8fcb24bb7c refactor(agent): trim model preset runtime wiring
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 20:06:22 +08:00
Xubin Ren
c9b84c7b11 fix(config): reserve implicit default model preset
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 20:06:22 +08:00
Xubin Ren
bcc4b97183 fix(webui): broadcast runtime model updates
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 20:06:22 +08:00
Xubin Ren
b61c6304c3 fix(config): reconcile presets with settings reload
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 20:06:22 +08:00
Xubin Ren
c450d6fd3f fix(config): make model preset switching atomic
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 20:06:22 +08:00
chengyongru
6f78267c82 feat(config): add ModelPresetConfig and runtime preset switching
- Add `ModelPresetConfig` schema for named model presets
- Add `model_presets` dict to `Config` and `model_preset` field to `AgentDefaults`
- Add `resolve_preset()` to return effective model params from preset or defaults
- Add `@model_validator` to reject unknown preset names
- Update `_match_provider()` to use resolved preset model/provider
- Update `make_provider()` and `provider_signature()` to use `resolve_preset()`
- Add `model_preset` property to `AgentLoop` for atomic runtime switching
- Update `AgentLoop.from_config()` to inject a runtime `default` preset
- Wire self-tool to inspect/clear preset state
- Update CLI display strings to show active preset
2026-05-12 20:06:22 +08:00
Xubin Ren
23312d683e fix(tools): isolate plugin runtime state
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 11:28:20 +08:00
chengyongru
043f0e67f7 feat(tools): introduce plugin-based tool discovery and runtime context protocol
This commit implements a progressive refactoring of the tool system to support
plugin discovery, scoped loading, and protocol-driven runtime context injection.

Key changes:
- Add Tool ABC metadata (tool_name, _scopes) and ToolContext dataclass for
dependency injection.
- Introduce ToolLoader with pkgutil-based builtin discovery and
entry_points-based third-party plugin loading.
- Add scope filtering (core/subagent/memory) so different contexts load
appropriate tool sets.
- Introduce ContextAware protocol and RequestContext dataclass to replace
hardcoded per-tool context injection in AgentLoop.
- Add RuntimeState / MutableRuntimeState protocols to decouple MyTool from
AgentLoop.
- Migrate all built-in tools to declare scopes and implement create()/enabled()
hooks.
- Migrate MessageTool, SpawnTool, CronTool, and MyTool to ContextAware.
- Refactor AgentLoop to use ToolLoader and protocol-driven context injection.
- Refactor SubagentManager to use ToolLoader(scope="subagent") with per-run
FileStates isolation.
- Register all built-in tools via pyproject.toml entry_points.
- Add comprehensive tests for loader scopes, entry_points, ContextAware,
subagent tools, and runtime state sync.
2026-05-12 11:28:20 +08:00
chengyongru
a6e993df25 fix(agent): move archived summary into system prompt for KV cache stability
- Append [Archived Context Summary] to system prompt instead of injecting
  it into the user message runtime context, improving KV cache reuse across
  turns and avoiding consecutive same-role messages.
- _last_summary persists in metadata (no pop) for restart survival;
  summary is re-injected every turn via the stable system prompt.
- Remove dynamic "Inactive for X minutes" from _format_summary — use
  static last_active timestamp instead to preserve KV cache stability.
- Pass session_summary through build_messages() so both normal and
  ask_user paths receive the archived summary in the system prompt.
- estimate_session_prompt_tokens now reads _last_summary from metadata
  to include the summary in token budget estimation.
- Remove obsolete session_summary parameter from
  maybe_consolidate_by_tokens and estimate_session_prompt_tokens
  call sites in loop.py (summary flows through build_messages instead).
- Ensure /new (session.clear()) clears _last_summary from metadata.
2026-05-11 01:25:15 +08:00
Flinn Xie
3a27af0018 feat(cli): display model reasoning content during streaming
Add show_reasoning config (default: False) to display model
thinking/reasoning content in the TUI during streaming.  Reasoning
is emitted via a new emit_reasoning hook on AgentHook, gated by the
channels config.  Display uses ✻ prefix with dim italic styling.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 01:02:49 +08:00
Xubin Ren
9252f4d826 Revert "fix(agent): persist _last_summary across restarts with used sentinel"
This reverts commit e5a1416a37b423de95b0fa279e9473110a678112.
2026-05-09 15:00:54 +08:00
chengyongru
e5a1416a37 fix(agent): persist _last_summary across restarts with used sentinel
The previous implementation popped _last_summary from session.metadata
after injecting it into the prompt, then saved the session. This caused
the summary to be permanently lost after a process restart, making the
AI forget archived context and appear to ignore memory or reference
non-existent previous messages.

Replace the destructive pop with a _last_summary_used sentinel:
- _last_summary stays in metadata for restart survival
- _last_summary_used prevents duplicate injection within the same turn
- Clear the sentinel whenever a new summary is generated

Updates tests to match the new persistence behavior.
2026-05-09 14:58:38 +08:00
Xubin Ren
3231aaf9ee fix(image): prevent duplicate delivery and replay artifacts 2026-05-09 05:45:13 +00:00
Xubin Ren
cbd5b06075 fix(memory): align replay overflow with history trimming
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-08 20:37:03 +08:00
Xubin Ren
24daf9a51c test(memory): accept replay window in consolidation assertion
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-08 20:37:03 +08:00
Xubin Ren
91ade9eaac fix(memory): consolidate history hidden by replay window
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-08 20:37:03 +08:00
Xubin Ren
e936ed48bd feat: add image generation tool and WebUI mode
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-08 20:06:23 +08:00
chengyongru
3a2f47d720 fix(onboard): allow empty strings and falsy values in input fields
Fixes two related input-handling bugs in the onboard wizard:

1. _input_text treated "" as None, preventing users from clearing
   optional string fields or entering empty strings intentionally.

2. _input_model_with_autocomplete used `if value else None`, which
   discarded falsy values such as empty strings or 0.

To support clearing optional string fields, add _is_str_or_none() and
normalize empty strings to None inside _configure_pydantic_model only
when the field annotation is `str | None`. Required str fields keep
"" as a valid value.

Also included:
- Remember last selected item in provider/channel/model menus for
  better UX when configuring multiple items.
- Rename _SIMPLE_TYPES and _MENU_DISPATCH to lowercase to follow
  Python naming conventions (they are local variables, not constants).
- Remove unused imports in test file.

Extracted from PR #3358.
2026-05-08 13:21:51 +08:00
Jefsky
44a341335a fix(dream): restore cursor with memory state
Track the Dream cursor in memory versioning so restores do not skip history after rolling back Dream commits.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-07 01:06:05 +08:00
Xubin Ren
790a03ec28 feat(webui): polish chat layout and titles
Align the WebUI sidebar and chat chrome with the updated design, and generate WebUI session titles asynchronously without blocking turns.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-06 22:20:35 +08:00
Tim O'Brien
99209a806d fix(tool_hints): pass max_length to abbreviate_path for is_path tools
The is_path branch in _fmt_known was not passing max_length to
abbreviate_path, so read_file, write_file, edit, list_dir, and
web_fetch always truncated paths at 40 chars regardless of config.

Now all three branches (is_path, is_command, fallback) honor the
configured toolHintMaxLength.
2026-05-06 21:18:39 +08:00
Tim O'Brien
daa4a25c9b feat(config): add toolHintMaxLength to control tool hint truncation
Add  to  config (default: 40, range: 20-500).
Controls how many characters of tool hints are shown in progress updates
(e.g. '$ cd …/project && npm test').

Set to 120+ to see full commands instead of truncated hints:

```json
{
  "agents": {
    "defaults": {
      "toolHintMaxLength": 120
    }
  }
}
```

- Thread max_length through format_tool_hints → _fmt_known/_fmt_mcp/_fmt_fallback
- Make path abbreviation in _abbreviate_command proportional to max_length
- Add TestToolHintMaxLength test class with 5 tests
- All 41 existing tests pass
2026-05-06 21:18:39 +08:00
hanyuanling
653de4a7ef fix(agent): gate provider progress deltas 2026-05-06 21:18:30 +08:00
chengyongru
05e0106592 refactor(logging): preserve tracebacks and add channel context
- Preserve tracebacks: logger.error in except blocks → logger.exception
- Channel context: BaseChannel injects self.logger = logger.bind(channel=name)
- Third-party bridge: redirect_lib_logging() replaces ad-hoc stdlib-to-loguru bridges
- Log levels: network timeouts downgraded from ERROR → WARNING
- Fix --verbose flag to actually work with loguru (set handler to DEBUG)
2026-05-06 21:17:45 +08:00
Xubin Ren
db14685a69 fix(agent): soften SSRF guard recovery
Keep private URL access blocked at the tool boundary, but return a clear non-retryable hint so the agent can recover conversationally instead of aborting the turn.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-06 00:43:00 +08:00
chengyongru
c30e4d86f3 refactor(agent): simplify subagent concurrency with rejection over semaphore
Replace the asyncio.Semaphore queueing approach with a simple count
check in SpawnTool.execute(). When the concurrency limit is reached,
the tool returns an error string so the agent can perceive the reason
and adjust its behavior instead of silently queueing.

- Remove max_concurrent_subagents parameter threading through
  AgentLoop, commands.py, and nanobot.py
- SubagentManager reads the limit directly from AgentDefaults
- SpawnTool checks get_running_count() before calling spawn()
- Simplify tests to verify rejection behavior
2026-05-05 22:22:04 +08:00
Xubin Ren
614b21368f fix(agent): tighten safety guard edge cases
Keep the /dev workspace guard exception scoped to the known benign device paths already handled by ExecTool, and add coverage that non-benign /dev targets still get blocked. Also add a streaming regression for tool_error responses so fatal tool failures are delivered by channels instead of being marked as already streamed.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-04 01:25:52 +08:00
Xubin Ren
b8406be215 fix(runner): soft workspace boundary + per-target throttle (#3493 #3599 #3605)
Replaces PR #3493's blanket fatal abort with a "tell the model + throttle
the bypass loop" policy.  Workspace-bound rejections are now ordinary
recoverable tool errors enriched with a structured "this is a hard policy
boundary" instruction; SSRF stays the only marker that aborts the turn.

Why the fatal-abort approach broke
----------------------------------
PR #3493 promoted every shell `_guard_command` and filesystem path-resolution
rejection to a turn-fatal RuntimeError.  Two of those messages (`path
outside working dir` and `path traversal detected`) are heuristic substring
scans on the raw command, so legitimate commands like `rm <ws>/x.txt
2>/dev/null` or `find . -type f` killed the user's turn (#3599).  On
channels with outbound dedupe (Telegram) the user just saw silence (#3605),
and the noise polluted the LLM's context until it started hallucinating
guard rejections on plain relative paths (#3597).

Why we still need *some* throttle
---------------------------------
The original #3493 pain point was real: the LLM, refused once, would
swap tools and try again -- read_file -> exec cat -> exec cp -> bash -c
-> ln -sf -> python -c open(...).  Just removing the fatal escape lets
that loop run wild until max_iterations.

What this commit does
---------------------
- `nanobot/utils/runtime.py`: add `workspace_violation_signature` and
  `repeated_workspace_violation_error`.  The signature normalizes
  filesystem `path` arguments and the first absolute path inside an
  exec command, so swapping tools against the same outside target hits
  the same throttle bucket.  Two soft attempts are allowed; the third
  attempt's tool result is replaced with a hard "stop trying to bypass"
  message that quotes the target path and tells the model to ask the
  user for help.

- `nanobot/agent/runner.py`: split classification into `_is_ssrf_violation`
  (still fatal) and `_is_workspace_violation` (now soft).  All three
  failure branches in `_run_tool` (prep_error / exception / Error
  result) route through a shared `_classify_violation` that bumps the
  per-turn workspace_violation_counts dict and either keeps the tool's
  own message or substitutes the throttle escalation.  `_execute_tools`
  now threads that dict alongside the existing external_lookup_counts.

- `nanobot/agent/tools/shell.py`: append a structured boundary note to
  every workspace-bound guard rejection (`working_dir could not be
  resolved`, `working_dir is outside`, `path outside working dir`,
  `path traversal detected`).  SSRF errors stay short and direct so the
  model doesn't try to "phrase around" them.  Existing `2>/dev/null`
  allow-list and benign device passthrough from the previous commit
  remain.

- `nanobot/agent/tools/filesystem.py`: append the same boundary note to
  the `outside allowed directory` PermissionError so read_file / write_file
  / list_dir errors give the LLM the same explicit hint.

Tests
-----
- `tests/utils/test_workspace_violation_throttle.py` (new): signature
  collapses across read_file/exec/python -c against the same path,
  different paths get independent budgets, escalation only fires after
  the third attempt.

- `tests/agent/test_runner.py`:
  - `test_runner_does_not_abort_on_workspace_violation_anymore` -- v2
    contract: filesystem PermissionError is now soft, runner moves to
    the next iteration and finalizes cleanly.
  - `test_is_ssrf_violation_remains_fatal` + the existing
    `test_runner_aborts_on_ssrf_violation` -- SSRF still aborts on the
    first attempt.
  - `test_runner_lets_llm_recover_from_shell_guard_path_outside` -- end
    to end recovery from `path outside working dir`.
  - `test_runner_throttles_repeated_workspace_bypass_attempts` -- four
    bypass attempts against the same outside target produce at least
    one `workspace_violation_escalated` event and the run completes
    naturally without aborting the turn.
  - The two `_execute_tools` direct-call tests now pass the new
    workspace_violation_counts dict.

- `tests/tools/test_tool_validation.py`: relax three `==` assertions
  to `startswith` + "hard policy boundary" substring check to match
  the new structured error messages.

- `tests/tools/test_exec_security.py` keeps the prior `2>/dev/null`
  regression and the `> /etc/issue` negative case from the previous
  commit on this branch -- they still pass under the new policy.

Coverage status: full pytest 2648 passed / 2 skipped (was 2638 / 2
on origin/main).  Ruff is clean for every file touched in this commit.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-04 01:18:39 +08:00
Xubin Ren
7742f8fbdc fix(runner): narrow workspace_violation fatal classification (#3599, helps #3605 #3597)
PR #3493 promoted every shell `_guard_command` rejection to a turn-fatal
RuntimeError. The two heuristic outputs in that list -- `path outside
working dir` and `path traversal detected` -- routinely false-positive on
benign constructs (e.g. `2>/dev/null`, quoted `..` arguments to sed/find,
absolute paths inside inline scripts), so legitimate workspace commands
silently kill the user's turn (#3599) and the agent never gets a chance
to retry with a different approach (#3605).

Two changes, both narrowly scoped:

- `ExecTool._guard_command` now skips a small allow-list of kernel device
  files (`/dev/null`, the standard streams, `/dev/random`, `/dev/fd/N`,
  ...) before the workspace path check, matched against the pre-resolve
  string so symlinks like `/dev/stderr -> /proc/self/fd/2` still hit the
  allow-list. Real outside writes such as `> /etc/issue` remain blocked.
- `AgentRunner._WORKSPACE_BLOCK_MARKERS` keeps only the four hard
  path-resolution errors from filesystem.py / shell.py and the SSRF
  marker. The two heuristic substrings move out of the fatal list, so
  the LLM sees them as ordinary tool errors and can self-correct in the
  next iteration. SSRF stays fatal because retrying an internal URL
  with a different phrasing would defeat the safety boundary.

Tests:
- `tests/tools/test_exec_security.py`: parametrized regression for the
  exact #3599 command sample plus other stdio redirects and device
  reads; explicit negative case asserts `> /etc/issue` is still blocked.
- `tests/agent/test_runner.py`: `_is_workspace_violation` no longer
  fatals on the two heuristic markers, plus an end-to-end case proving
  the runner hands the guard error back to the LLM and finalizes the
  next turn cleanly.
2026-05-04 01:18:39 +08:00
Xubin Ren
96da6d8190 fix(webui): tighten turn completion handling
Keep the new turn-end signal scoped to WebSocket clients, preserve pending tool-call state across trailing tool result rows, and drop the accidental npm lockfile from the Bun-based WebUI.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-03 22:28:40 +08:00
ramonpaolo
be83525f99 test(webui): cover turn-end streaming regressions 2026-05-03 22:28:40 +08:00
chengyongru
5853d5dfda
fix: allow_patterns take priority over deny_patterns in ExecTool (#3594)
* fix: allow_patterns take priority over deny_patterns in ExecTool

Previously deny_patterns were checked first with no bypass, meaning
allow_patterns could never exempt commands from the built-in deny list.
This made it impossible to whitelist destructive commands for specific
directories (e.g. build/cleanup tasks).

Changes:
- shell.py: check allow_patterns first; if matched, skip deny check
- shell.py: deny_patterns now appends to built-in list (not replaces)
- schema.py: add allow_patterns/deny_patterns to ExecToolConfig
- loop.py/subagent.py: pass allow_patterns/deny_patterns to ExecTool
- Add test_exec_allow_patterns.py covering priority semantics

* fix: separate deny pattern errors from workspace violation detection

The deny pattern error message "Command blocked by safety guard" was
included in _WORKSPACE_BLOCK_MARKERS, causing deny_pattern blocks to be
misclassified as fatal workspace violations. This meant LLMs had no
chance to retry with a different command — the turn was aborted
immediately.

Changes:
- shell.py: deny/allowlist error messages now use distinct phrasing
  ("blocked by deny pattern filter" / "blocked by allowlist filter")
- runner.py: remove "blocked by safety guard" from
  _WORKSPACE_BLOCK_MARKERS so deny_pattern errors are treated as normal
  tool errors (LLM can retry) instead of fatal violations
- workspace path errors still use "blocked by safety guard" and remain
  fatal as intended

* fix: update test assertions to match new deny pattern error message

* fix: indentation error in test file

* fix: restore SSRF fatal classification and tidy exec pattern plumbing

Address review feedback on the deny/allow_patterns rework:

- runner.py: re-add "internal/private url detected" to
  _WORKSPACE_BLOCK_MARKERS. The earlier marker removal also stripped
  fatal classification from SSRF / internal-URL rejections (whose
  message still says "blocked by safety guard"), turning a hard
  security boundary into something the LLM could retry.
- loop.py / subagent.py: drop `or None` between ExecToolConfig and
  ExecTool. The schema default is an empty list and ExecTool already
  normalizes None back to [], so the indirection was a no-op.
- shell.py: extract `explicitly_allowed` flag in _guard_command so
  allow_patterns are scanned once instead of twice and the control
  flow no longer relies on a no-op `pass + else` branch.
- tests/agent/test_runner.py: add a regression test asserting that
  the SSRF block message is treated as fatal, while deny/allowlist
  filter messages are deliberately non-fatal.

* fix: remove unused exec allow-pattern test import

Keep the new ExecTool allow-pattern coverage clean under ruff.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Xubin Ren <xubinrencs@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-03 00:27:17 +08:00
Xubin Ren
188e6df757 fix(utils): cover complete trailing think markers
Made-with: Cursor
2026-05-01 20:09:59 +08:00