Reasoning now flows as its own stream — symmetric to the answer's
``delta`` / ``stream_end`` pair — instead of being shipped as one
oversized progress message. This lets WebUI render a live "Thinking…"
bubble that updates in place, then auto-collapses when the stream
closes. Other channels remain plugin no-ops by default.
## Protocol
New metadata: ``_reasoning_delta`` (chunk) and ``_reasoning_end``
(close marker). ChannelManager routes both to the dedicated plugin
hooks below; the legacy one-shot ``_reasoning`` is kept for back-compat
and BaseChannel expands it into a single delta + end pair so plugins
only ever implement the streaming primitives.
WebSocket emits two new events:
- ``reasoning_delta`` (event, chat_id, text, optional stream_id)
- ``reasoning_end`` (event, chat_id, optional stream_id)
## BaseChannel surface
- ``send_reasoning_delta(chat_id, delta, metadata)`` — no-op default
- ``send_reasoning_end(chat_id, metadata)`` — no-op default
- ``send_reasoning(msg)`` — back-compat wrapper, base impl forwards
to the streaming primitives
A channel adds reasoning support by overriding the two streaming
primitives. Telegram / Slack / Discord / Feishu / WeChat / Matrix keep
the base no-ops until their bubble UIs are adapted; reasoning silently
drops at dispatch, never as a stray text message.
## AgentHook
Adds ``emit_reasoning_end`` to the hook lifecycle. ``_LoopHook`` tracks
whether a reasoning segment is open and closes it on:
- the first answer delta arriving (so the UI locks the bubble before
the answer renders below),
- ``on_stream_end``,
- one-shot ``reasoning_content`` / ``thinking_blocks`` after a single
non-streaming response.
## WebUI
- ``UIMessage.reasoning`` is now a single accumulated string with a
companion ``reasoningStreaming`` flag.
- ``useNanobotStream`` consumes ``reasoning_delta`` / ``reasoning_end``;
legacy ``kind: "reasoning"`` is auto-translated to a delta + end.
- New ``ReasoningBubble``: shimmer header + auto-expanded while
streaming, collapses to a clickable "Thinking" pill once closed,
respects ``prefers-reduced-motion``.
- Answer deltas adopt the reasoning placeholder so the bubble and the
answer share one assistant row.
## Tests
- ``tests/channels/test_channel_manager_reasoning.py`` — manager routes
delta + end, drops on channel opt-out, expands one-shot back-compat.
- ``tests/channels/test_websocket_channel.py`` — new ``reasoning_delta``
/ ``reasoning_end`` frames, empty-chunk safety, no-subscriber safety,
back-compat expansion.
- ``tests/agent/test_runner_reasoning.py`` — runner closes the segment
on streaming answer start and after one-shot reasoning.
- WebUI ``useNanobotStream`` + ``message-bubble`` cover the new
protocol and the shimmer styling.
## Docs
``docs/configuration.md`` and ``docs/websocket.md`` document the new
events and the plugin contract.
Co-authored-by: Cursor <cursoragent@cursor.com>
Resolves conflicts after main landed the state-machine turn refactor
and the test_runner.py 9-file split:
- nanobot/agent/loop.py: take main's `_state_build`/`_persist_user_message_early`
flow; restore the `reasoning: bool` parameter on `_build_bus_progress_callback`
so the loop hook can mark progress as reasoning-channel without coupling to
the answer stream.
- nanobot/cli/stream.py: keep main's configurable `bot_name`/`bot_icon` header
while preserving the PR's `transient=True` Live + `self._console` routing
+ `_renderable()` final-render path that fixed TUI duplication.
- tests/agent/test_runner.py was deleted on main and split into 9 focused
files; relocated all 6 reasoning tests into a new `test_runner_reasoning.py`
matching the new layout, deduplicated the per-test `ReasoningHook` boilerplate
through a shared `_RecordingHook` helper.
Co-authored-by: Cursor <cursoragent@cursor.com>
Reasoning surfacing was split across three branches in runner.py plus
two separate streaming buffers (loop hook and runner progress stream),
with three independent display-side gates in the CLI. This collapsed
the policy into one source of truth and fixed two real bugs:
- Structured `reasoning_content` was suppressed whenever the answer was
streamed, because the runner gated emission on `streamed_content`.
Providers don't stream `reasoning_content`; it only arrives on the
final response, so the answer stream and the reasoning channel are
independent. Added `streamed_reasoning` to `AgentHookContext` to track
the right bit.
- `channels.showReasoning` was subordinated to `sendProgress`. They are
orthogonal — turning off progress streaming shouldn't silence
reasoning. Reworked the CLI gates accordingly.
Single-helper consolidation:
- `extract_reasoning(reasoning_content, thinking_blocks, content)`
returns `(reasoning_text, cleaned_content)` with a defined fallback
order: dedicated field → Anthropic thinking_blocks → inline
`<think>`/`<thought>` tags. Models that expose none of these
short-circuit to `(None, content)` — zero overhead.
- `IncrementalThinkExtractor` replaces the ad-hoc `emit_incremental_think`
function and its hand-rolled "emitted cursor" state in both the loop
hook and the runner progress stream.
Also documented the new `showReasoning` channel option in
docs/configuration.md and noted its independence from sendProgress.
Co-authored-by: Cursor <cursoragent@cursor.com>
Add extract_think() and emit_incremental_think() helpers to extract thinking content from inline <think> and <thought> tags in the content field. This handles models served via Ollama, self-hosted vLLM, or other compatible endpoints that embed reasoning as inline tags instead of using the dedicated reasoning_content API field.
Also adds Anthropic thinking_blocks support for extended thinking via the thinking content blocks array.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
The ask_user tool used AskUserInterrupt(BaseException) for mid-turn
blocking, creating heavy coupling across runner, loop, and session
management. The model now asks questions naturally in response text,
the turn ends normally, and the user's next message starts a new turn
with session history providing continuity.
Removed:
- nanobot/agent/tools/ask.py (tool, interrupt, helpers)
- tests/agent/test_ask_user.py
- webui/src/components/thread/AskUserPrompt.tsx
- AskUserInterrupt handling in runner.py
- Dual-path message building in loop.py
- Pending ask detection via history scanning
- button_prompt/buttons emission in WebSocket channel
- ask_user references in Slack channel docstrings
Preserved (MessageTool uses these independently):
- OutboundMessage.buttons field
- Channel button rendering (Telegram, Slack, WebSocket)
Add show_reasoning config (default: False) to display model
thinking/reasoning content in the TUI during streaming. Reasoning
is emitted via a new emit_reasoning hook on AgentHook, gated by the
channels config. Display uses ✻ prefix with dim italic styling.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Keep private URL access blocked at the tool boundary, but return a clear non-retryable hint so the agent can recover conversationally instead of aborting the turn.
Co-authored-by: Cursor <cursoragent@cursor.com>
Keep the workspace-boundary changes easier to review by trimming long explanatory comments down to short local notes. Also make the #3599 POSIX command regression skip on Windows and normalize workspace violation signatures to POSIX separators so the throttle tests are platform-stable.
Tests:
- uv run pytest tests/tools/test_exec_security.py tests/utils/test_workspace_violation_throttle.py -q
- uv run pytest -q
Co-authored-by: Cursor <cursoragent@cursor.com>
Replaces PR #3493's blanket fatal abort with a "tell the model + throttle
the bypass loop" policy. Workspace-bound rejections are now ordinary
recoverable tool errors enriched with a structured "this is a hard policy
boundary" instruction; SSRF stays the only marker that aborts the turn.
Why the fatal-abort approach broke
----------------------------------
PR #3493 promoted every shell `_guard_command` and filesystem path-resolution
rejection to a turn-fatal RuntimeError. Two of those messages (`path
outside working dir` and `path traversal detected`) are heuristic substring
scans on the raw command, so legitimate commands like `rm <ws>/x.txt
2>/dev/null` or `find . -type f` killed the user's turn (#3599). On
channels with outbound dedupe (Telegram) the user just saw silence (#3605),
and the noise polluted the LLM's context until it started hallucinating
guard rejections on plain relative paths (#3597).
Why we still need *some* throttle
---------------------------------
The original #3493 pain point was real: the LLM, refused once, would
swap tools and try again -- read_file -> exec cat -> exec cp -> bash -c
-> ln -sf -> python -c open(...). Just removing the fatal escape lets
that loop run wild until max_iterations.
What this commit does
---------------------
- `nanobot/utils/runtime.py`: add `workspace_violation_signature` and
`repeated_workspace_violation_error`. The signature normalizes
filesystem `path` arguments and the first absolute path inside an
exec command, so swapping tools against the same outside target hits
the same throttle bucket. Two soft attempts are allowed; the third
attempt's tool result is replaced with a hard "stop trying to bypass"
message that quotes the target path and tells the model to ask the
user for help.
- `nanobot/agent/runner.py`: split classification into `_is_ssrf_violation`
(still fatal) and `_is_workspace_violation` (now soft). All three
failure branches in `_run_tool` (prep_error / exception / Error
result) route through a shared `_classify_violation` that bumps the
per-turn workspace_violation_counts dict and either keeps the tool's
own message or substitutes the throttle escalation. `_execute_tools`
now threads that dict alongside the existing external_lookup_counts.
- `nanobot/agent/tools/shell.py`: append a structured boundary note to
every workspace-bound guard rejection (`working_dir could not be
resolved`, `working_dir is outside`, `path outside working dir`,
`path traversal detected`). SSRF errors stay short and direct so the
model doesn't try to "phrase around" them. Existing `2>/dev/null`
allow-list and benign device passthrough from the previous commit
remain.
- `nanobot/agent/tools/filesystem.py`: append the same boundary note to
the `outside allowed directory` PermissionError so read_file / write_file
/ list_dir errors give the LLM the same explicit hint.
Tests
-----
- `tests/utils/test_workspace_violation_throttle.py` (new): signature
collapses across read_file/exec/python -c against the same path,
different paths get independent budgets, escalation only fires after
the third attempt.
- `tests/agent/test_runner.py`:
- `test_runner_does_not_abort_on_workspace_violation_anymore` -- v2
contract: filesystem PermissionError is now soft, runner moves to
the next iteration and finalizes cleanly.
- `test_is_ssrf_violation_remains_fatal` + the existing
`test_runner_aborts_on_ssrf_violation` -- SSRF still aborts on the
first attempt.
- `test_runner_lets_llm_recover_from_shell_guard_path_outside` -- end
to end recovery from `path outside working dir`.
- `test_runner_throttles_repeated_workspace_bypass_attempts` -- four
bypass attempts against the same outside target produce at least
one `workspace_violation_escalated` event and the run completes
naturally without aborting the turn.
- The two `_execute_tools` direct-call tests now pass the new
workspace_violation_counts dict.
- `tests/tools/test_tool_validation.py`: relax three `==` assertions
to `startswith` + "hard policy boundary" substring check to match
the new structured error messages.
- `tests/tools/test_exec_security.py` keeps the prior `2>/dev/null`
regression and the `> /etc/issue` negative case from the previous
commit on this branch -- they still pass under the new policy.
Coverage status: full pytest 2648 passed / 2 skipped (was 2638 / 2
on origin/main). Ruff is clean for every file touched in this commit.
Co-authored-by: Cursor <cursoragent@cursor.com>
PR #3493 promoted every shell `_guard_command` rejection to a turn-fatal
RuntimeError. The two heuristic outputs in that list -- `path outside
working dir` and `path traversal detected` -- routinely false-positive on
benign constructs (e.g. `2>/dev/null`, quoted `..` arguments to sed/find,
absolute paths inside inline scripts), so legitimate workspace commands
silently kill the user's turn (#3599) and the agent never gets a chance
to retry with a different approach (#3605).
Two changes, both narrowly scoped:
- `ExecTool._guard_command` now skips a small allow-list of kernel device
files (`/dev/null`, the standard streams, `/dev/random`, `/dev/fd/N`,
...) before the workspace path check, matched against the pre-resolve
string so symlinks like `/dev/stderr -> /proc/self/fd/2` still hit the
allow-list. Real outside writes such as `> /etc/issue` remain blocked.
- `AgentRunner._WORKSPACE_BLOCK_MARKERS` keeps only the four hard
path-resolution errors from filesystem.py / shell.py and the SSRF
marker. The two heuristic substrings move out of the fatal list, so
the LLM sees them as ordinary tool errors and can self-correct in the
next iteration. SSRF stays fatal because retrying an internal URL
with a different phrasing would defeat the safety boundary.
Tests:
- `tests/tools/test_exec_security.py`: parametrized regression for the
exact #3599 command sample plus other stdio redirects and device
reads; explicit negative case asserts `> /etc/issue` is still blocked.
- `tests/agent/test_runner.py`: `_is_workspace_violation` no longer
fatals on the two heuristic markers, plus an end-to-end case proving
the runner hands the guard error back to the LLM and finalizes the
next turn cleanly.
* fix: allow_patterns take priority over deny_patterns in ExecTool
Previously deny_patterns were checked first with no bypass, meaning
allow_patterns could never exempt commands from the built-in deny list.
This made it impossible to whitelist destructive commands for specific
directories (e.g. build/cleanup tasks).
Changes:
- shell.py: check allow_patterns first; if matched, skip deny check
- shell.py: deny_patterns now appends to built-in list (not replaces)
- schema.py: add allow_patterns/deny_patterns to ExecToolConfig
- loop.py/subagent.py: pass allow_patterns/deny_patterns to ExecTool
- Add test_exec_allow_patterns.py covering priority semantics
* fix: separate deny pattern errors from workspace violation detection
The deny pattern error message "Command blocked by safety guard" was
included in _WORKSPACE_BLOCK_MARKERS, causing deny_pattern blocks to be
misclassified as fatal workspace violations. This meant LLMs had no
chance to retry with a different command — the turn was aborted
immediately.
Changes:
- shell.py: deny/allowlist error messages now use distinct phrasing
("blocked by deny pattern filter" / "blocked by allowlist filter")
- runner.py: remove "blocked by safety guard" from
_WORKSPACE_BLOCK_MARKERS so deny_pattern errors are treated as normal
tool errors (LLM can retry) instead of fatal violations
- workspace path errors still use "blocked by safety guard" and remain
fatal as intended
* fix: update test assertions to match new deny pattern error message
* fix: indentation error in test file
* fix: restore SSRF fatal classification and tidy exec pattern plumbing
Address review feedback on the deny/allow_patterns rework:
- runner.py: re-add "internal/private url detected" to
_WORKSPACE_BLOCK_MARKERS. The earlier marker removal also stripped
fatal classification from SSRF / internal-URL rejections (whose
message still says "blocked by safety guard"), turning a hard
security boundary into something the LLM could retry.
- loop.py / subagent.py: drop `or None` between ExecToolConfig and
ExecTool. The schema default is an empty list and ExecTool already
normalizes None back to [], so the indirection was a no-op.
- shell.py: extract `explicitly_allowed` flag in _guard_command so
allow_patterns are scanned once instead of twice and the control
flow no longer relies on a no-op `pass + else` branch.
- tests/agent/test_runner.py: add a regression test asserting that
the SSRF block message is treated as fatal, while deny/allowlist
filter messages are deliberately non-fatal.
* fix: remove unused exec allow-pattern test import
Keep the new ExecTool allow-pattern coverage clean under ruff.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Xubin Ren <xubinrencs@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Treat workspace and safety guard failures as fatal regardless of whether they arrive from tool preparation, returned tool output, or raised exceptions.
Made-with: Cursor
The earlier commits picked up a large amount of Black-style reformatting
(multi-line frozenset / keyword-arg wrapping / docstring blanks / removed
parens) on top of the actual guard fix. @chengyongru flagged it; the
first pass reverted some but not all.
This restores nanobot/providers/base.py, runner.py, heartbeat/service.py,
and utils/evaluator.py to origin/main and reapplies only the guard logic:
- base.py: add should_execute_tools property
- runner.py / heartbeat/service.py / utils/evaluator.py: route through it
+ log a warning when has_tool_calls but finish_reason is anomalous
Net diff vs main is now +87/-4 (was +211/-102) — roughly 30 lines of real
logic, which is what the PR is actually about.
Behavior unchanged from previous HEAD; full suite still 2014 passed.
Made-with: Cursor
- Extract synthetic user message string to module-level constant
- Tighten comments in _snip_history recovery branch
- Strengthen no-user edge case test to verify safety net interaction
When _snip_history truncates the message history and the only user message
ends up outside the kept window, providers like GLM reject the resulting
system→assistant sequence with error 1214 ("messages 参数非法").
Two-layer fix:
1. _snip_history now walks backwards through non_system messages to recover
the nearest user message when none exists in the kept window.
2. _enforce_role_alternation inserts a synthetic user message
"(conversation continued)" when the first non-system message is a bare
assistant (no tool_calls), serving as a safety net for any edge cases
that slip through.
Co-authored-by: darlingbud <darlingbud@users.noreply.github.com>
Keep late follow-up injections observable when they are drained during max-iteration shutdown so loop-level response suppression still makes the right decision.
Made-with: Cursor
- Migrate "after tools" inline drain to use _try_drain_injections,
completing the refactoring (all 6 drain sites now use the helper).
- Move checkpoint emission into _try_drain_injections via optional
iteration parameter, eliminating the leaky split between helper
and caller for the final-response path.
- Extract _make_injection_callback() test helper to replace 7
identical inject_cb function bodies.
- Add test_injection_cycle_cap_on_error_path to verify the cycle
cap is enforced on error exit paths.
When the agent runner exits due to LLM error, tool error, empty response,
or max_iterations, it breaks out of the iteration loop without draining
the pending injection queue. This causes leftover messages to be
re-published as independent inbound messages, resulting in duplicate or
confusing replies to the user.
Extract the injection drain logic into a `_try_drain_injections` helper
and call it before each break in the error/edge-case paths. If injections
are found, continue the loop instead of breaking. For max_iterations
(where the loop is exhausted), drain injections to prevent re-publish
without continuing.
* feat(agent): add mid-turn message injection for responsive follow-ups
Allow user messages sent during an active agent turn to be injected
into the running LLM context instead of being queued behind a
per-session lock. Inspired by Claude Code's mid-turn queue drain
mechanism (query.ts:1547-1643).
Key design decisions:
- Messages are injected as natural user messages between iterations,
no tool cancellation or special system prompt needed
- Two drain checkpoints: after tool execution and after final LLM
response ("last-mile" to prevent dropping late arrivals)
- Bounded by MAX_INJECTION_CYCLES (5) to prevent consuming the
iteration budget on rapid follow-ups
- had_injections flag bypasses _sent_in_turn suppression so follow-up
responses are always delivered
Closes#1609
* fix(agent): harden mid-turn injection with streaming fix, bounded queue, and message safety
- Fix streaming protocol violation: Checkpoint 2 now checks for injections
BEFORE calling on_stream_end, passing resuming=True when injections found
so streaming channels (Feishu) don't prematurely finalize the card
- Bound pending queue to maxsize=20 with QueueFull handling
- Add warning log when injection batch exceeds _MAX_INJECTIONS_PER_TURN
- Re-publish leftover queue messages to bus in _dispatch finally block to
prevent silent message loss on early exit (max_iterations, tool_error, cancel)
- Fix PEP 8 blank line before dataclass and logger.info indentation
- Add 12 new tests covering drain, checkpoints, cycle cap, queue routing,
cleanup, and leftover re-publish
Keep tool-call assistant messages valid across provider sanitization and avoid trailing user-only history after model errors. This prevents follow-up requests from sending broken tool chains back to the gateway.
- Adjusted message handling in AgentRunner to ensure that historical messages remain unchanged during context governance.
- Introduced tests to verify that backfill operations do not alter the saved message boundary, maintaining the integrity of the conversation history.
When the LLM returns an error (e.g. 429 quota exceeded, stream timeout),
streaming channels silently drop the error message because `_streamed=True`
is set in metadata even though no content was actually streamed.
This change:
- Skips setting `_streamed` when stop_reason is "error", so error messages
go through the normal channel.send() path and reach the user
- Stops appending error content to session history, preventing error
messages from polluting subsequent conversation context
- Exposes stop_reason from _run_agent_loop to enable the above check
- Added Jinja2 template support for various agent responses, including identity, skills, and memory consolidation.
- Introduced new templates for evaluating notifications, handling subagent announcements, and managing platform policies.
- Updated the agent context and memory modules to utilize the new templating system for improved readability and maintainability.
- Added a new dependency on Jinja2 in pyproject.toml.