Previously _validate_allow_from raised SystemExit when allowFrom was
missing, forcing every channel to declare an explicit allowlist.
With the pairing feature this is no longer necessary: a channel with
no allowFrom simply operates in pairing-only mode, letting users
approve senders via /pairing approve <code> from the WebUI or CLI.
- Replace SystemExit with an info log in _validate_allow_from
- Add test_validate_allow_from_allows_missing_allow_from
- Assert pending_user_turn is cleared from session metadata after
shortcut commands (e.g. /help) in test_auto_compact.py.
- Add test for None allow_from / allowFrom values in
test_base_channel.py to prevent TypeError regressions.
- AgentLoop._state_command now persists user message and assistant
response for shortcut commands (e.g. /pairing) so WebUI history
hydration after _turn_end no longer shows an empty chat. /new is
excluded because it intentionally clears the session.
- Feishu _on_message sends pairing codes for unauthorized DMs before
any media side effects (reactions, downloads, transcription).
Group chat unauthorized senders are still silently ignored early.
- Update test_feishu_reply to assert the new DM pairing behavior.
/pairing is now a first-class built-in command dispatched through
CommandRouter, just like /status, /model, /dream, etc.
Benefits:
- WebUI automatically shows /pairing in the slash command palette
(because builtin_command_palette() feeds /api/commands).
- All channels (Telegram, Discord, WebSocket, etc.) use the same
dispatch path for /pairing; no more channel-level interception.
- The command still only works for already-authorised users because
is_allowed() gates message ingestion before the bus.
Changes:
- Add handle_pairing_command() to nanobot.pairing.store — pure
function callable from CLI, CommandRouter, and tests.
- Add cmd_pairing to nanobot.command.builtin and register in
BUILTIN_COMMAND_SPECS + register_builtin_commands().
- Remove BaseChannel._handle_pairing_command() and the /pairing
interception logic from _handle_message().
- Clean up unused pairing imports from base.py.
- Add unit tests for handle_pairing_command and cmd_pairing dispatch.
- Extract format_pairing_reply() and format_expiry() to eliminate
duplication between BaseChannel and SlackChannel.
- Use _write_text_atomic() from helpers.py instead of hand-rolled
fsync logic in pairing store.
- Convert approved lists to in-memory sets for O(1) lookup.
- Remove collision retry loop (8-char entropy is sufficient).
- Fix /pairing command parsing to split prefix exactly.
- Remove unused import time from base.py.
- Fix tests to pass subcommand_text, not full /pairing string.
- Add os.fsync with Windows-compatible directory flush in pairing store
- Increase pairing code length from 6 -> 8 characters for higher entropy
- Remove SystemExit on empty allowFrom; empty list now defers to pairing
- Update is_allowed docstring to document pairing fallback semantics
- Propagate is_dm to Matrix (direct rooms) and Slack (im channels)
- Slack _is_allowed now checks pairing store for DM allowlist mode
- Fix /pairing revoke to accept optional channel argument
- Move inline import time to module top-level
- Add WebSocket comment explaining is_dm=True assumption
- Add comprehensive tests for store and BaseChannel pairing integration
- Fix existing tests that expected empty allowFrom to hard-exit
Refs #3774
Shortcut commands (e.g. /help, /pairing) skip BUILD and SAVE states,
so their turns were never persisted to the session. This caused WebUI
chats to appear empty after _turn_end because history hydration reads
from the session file.
Fix by persisting the user message and assistant response inside
_state_command, but tag them with _command=True so Session.get_history
filters them out of LLM context. /new is excluded because it
intentionally clears the session.
- AgentLoop._persist_user_message_early now accepts **kwargs so
_state_command can pass _command=True for the user turn.
- Session.get_history skips messages with _command=True.
When an MCP server configured as streamableHttp or SSE is unreachable,
streamable_http_client's anyio task group cleanup raises RuntimeError /
ExceptionGroup that escapes the caller's try/except and crashes the
event loop with "Unhandled exception in event loop".
Fix: add a lightweight TCP probe (_probe_http_url) before entering the
MCP SDK transport. If the port is closed, the server is skipped with a
warning instead of crashing. stdio transport is not probed (local
process).
Closes#3739
Resolve fallbackModels as preset references or explicit inline provider configs so failover uses complete model settings without exposing fallback logic to the agent loop.
Co-authored-by: Cursor <cursoragent@cursor.com>
Bind fallback model chains to the active model configuration so defaults and presets do not inherit or merge fallback behavior implicitly. Require explicit fallback providers while preserving per-fallback generation overrides and context-window safety.
Co-authored-by: Cursor <cursoragent@cursor.com>
When the primary model returns a non-transient error and no content
has been streamed yet, the runner now tries each model listed in the
active preset's fallback_models in order. Each fallback model may
reside on a different provider — a temporary provider instance is
created on-the-fly via make_provider(config, model=...).
Key design:
- Failover is request-scoped (does not affect subagents/dream/consolidator)
- Provider is restored via try/finally after each fallback attempt
- Skipped when content was already streamed to avoid duplicate output
- Recursive failover prevented by clearing fallback_models on fallback spec
- Circuit breaker trips open after 3 consecutive primary failures (60s cooldown)
- Cross-provider routing: fallback model prefix (e.g. groq/) determines provider
Fixes: cross-provider fallback was broken because the factory passed the
original preset (with provider forced to primary's provider) when creating
fallback providers. Now uses provider="auto" so the model string prefix
correctly routes to the right provider.
Also fixes: log messages now distinguish between primary-failed,
previous-fallback-failed, and circuit-open scenarios.
closes: https://github.com/HKUDS/nanobot/issues/3376
Reasoning now flows as its own stream — symmetric to the answer's
``delta`` / ``stream_end`` pair — instead of being shipped as one
oversized progress message. This lets WebUI render a live "Thinking…"
bubble that updates in place, then auto-collapses when the stream
closes. Other channels remain plugin no-ops by default.
## Protocol
New metadata: ``_reasoning_delta`` (chunk) and ``_reasoning_end``
(close marker). ChannelManager routes both to the dedicated plugin
hooks below; the legacy one-shot ``_reasoning`` is kept for back-compat
and BaseChannel expands it into a single delta + end pair so plugins
only ever implement the streaming primitives.
WebSocket emits two new events:
- ``reasoning_delta`` (event, chat_id, text, optional stream_id)
- ``reasoning_end`` (event, chat_id, optional stream_id)
## BaseChannel surface
- ``send_reasoning_delta(chat_id, delta, metadata)`` — no-op default
- ``send_reasoning_end(chat_id, metadata)`` — no-op default
- ``send_reasoning(msg)`` — back-compat wrapper, base impl forwards
to the streaming primitives
A channel adds reasoning support by overriding the two streaming
primitives. Telegram / Slack / Discord / Feishu / WeChat / Matrix keep
the base no-ops until their bubble UIs are adapted; reasoning silently
drops at dispatch, never as a stray text message.
## AgentHook
Adds ``emit_reasoning_end`` to the hook lifecycle. ``_LoopHook`` tracks
whether a reasoning segment is open and closes it on:
- the first answer delta arriving (so the UI locks the bubble before
the answer renders below),
- ``on_stream_end``,
- one-shot ``reasoning_content`` / ``thinking_blocks`` after a single
non-streaming response.
## WebUI
- ``UIMessage.reasoning`` is now a single accumulated string with a
companion ``reasoningStreaming`` flag.
- ``useNanobotStream`` consumes ``reasoning_delta`` / ``reasoning_end``;
legacy ``kind: "reasoning"`` is auto-translated to a delta + end.
- New ``ReasoningBubble``: shimmer header + auto-expanded while
streaming, collapses to a clickable "Thinking" pill once closed,
respects ``prefers-reduced-motion``.
- Answer deltas adopt the reasoning placeholder so the bubble and the
answer share one assistant row.
## Tests
- ``tests/channels/test_channel_manager_reasoning.py`` — manager routes
delta + end, drops on channel opt-out, expands one-shot back-compat.
- ``tests/channels/test_websocket_channel.py`` — new ``reasoning_delta``
/ ``reasoning_end`` frames, empty-chunk safety, no-subscriber safety,
back-compat expansion.
- ``tests/agent/test_runner_reasoning.py`` — runner closes the segment
on streaming answer start and after one-shot reasoning.
- WebUI ``useNanobotStream`` + ``message-bubble`` cover the new
protocol and the shimmer styling.
## Docs
``docs/configuration.md`` and ``docs/websocket.md`` document the new
events and the plugin contract.
Co-authored-by: Cursor <cursoragent@cursor.com>
Reasoning was being shipped to every channel as a generic progress
message with a `_reasoning: true` flag. Two problems with that:
1. Channels without a low-emphasis UI primitive (Telegram, Slack,
Discord, Feishu...) would dump raw model thoughts as ordinary
replies, polluting the conversation.
2. The agent loop double-gated by inspecting `channels_config`, which
coupled the loop to display policy.
Treat reasoning as its own plugin action — `BaseChannel.send_reasoning`
defaults to a documented no-op; channels that have a fitting affordance
override. ChannelManager routes `_reasoning` outbounds to that method
only when the channel opts in via `show_reasoning` (camelCase alias
`showReasoning` mirrors `sendProgress`). Plugins that don't override
silently drop reasoning — "no fit, no leak" is the contract.
Reference implementation lands for WebSocket / WebUI: a new
`kind: "reasoning"` frame, parked on the active assistant bubble as a
collapsible `Thinking` group above the answer. CLI keeps its existing
direct path (it doesn't go through the bus). `ChannelsConfig.show_reasoning`
flips to `true` by default — only adapted channels surface anything,
others stay quiet.
Loop net diff is -3 lines: the `channels_config.show_reasoning` check
moves out, leaving emit_reasoning a one-liner that publishes and trusts
the channel to decide.
Co-authored-by: Cursor <cursoragent@cursor.com>
Resolves conflicts after main landed the state-machine turn refactor
and the test_runner.py 9-file split:
- nanobot/agent/loop.py: take main's `_state_build`/`_persist_user_message_early`
flow; restore the `reasoning: bool` parameter on `_build_bus_progress_callback`
so the loop hook can mark progress as reasoning-channel without coupling to
the answer stream.
- nanobot/cli/stream.py: keep main's configurable `bot_name`/`bot_icon` header
while preserving the PR's `transient=True` Live + `self._console` routing
+ `_renderable()` final-render path that fixed TUI duplication.
- tests/agent/test_runner.py was deleted on main and split into 9 focused
files; relocated all 6 reasoning tests into a new `test_runner_reasoning.py`
matching the new layout, deduplicated the per-test `ReasoningHook` boilerplate
through a shared `_RecordingHook` helper.
Co-authored-by: Cursor <cursoragent@cursor.com>
Reasoning surfacing was split across three branches in runner.py plus
two separate streaming buffers (loop hook and runner progress stream),
with three independent display-side gates in the CLI. This collapsed
the policy into one source of truth and fixed two real bugs:
- Structured `reasoning_content` was suppressed whenever the answer was
streamed, because the runner gated emission on `streamed_content`.
Providers don't stream `reasoning_content`; it only arrives on the
final response, so the answer stream and the reasoning channel are
independent. Added `streamed_reasoning` to `AgentHookContext` to track
the right bit.
- `channels.showReasoning` was subordinated to `sendProgress`. They are
orthogonal — turning off progress streaming shouldn't silence
reasoning. Reworked the CLI gates accordingly.
Single-helper consolidation:
- `extract_reasoning(reasoning_content, thinking_blocks, content)`
returns `(reasoning_text, cleaned_content)` with a defined fallback
order: dedicated field → Anthropic thinking_blocks → inline
`<think>`/`<thought>` tags. Models that expose none of these
short-circuit to `(None, content)` — zero overhead.
- `IncrementalThinkExtractor` replaces the ad-hoc `emit_incremental_think`
function and its hand-rolled "emitted cursor" state in both the loop
hook and the runner progress stream.
Also documented the new `showReasoning` channel option in
docs/configuration.md and noted its independence from sendProgress.
Co-authored-by: Cursor <cursoragent@cursor.com>
Add extract_think() and emit_incremental_think() helpers to extract thinking content from inline <think> and <thought> tags in the content field. This handles models served via Ollama, self-hosted vLLM, or other compatible endpoints that embed reasoning as inline tags instead of using the dedicated reasoning_content API field.
Also adds Anthropic thinking_blocks support for extended thinking via the thinking content blocks array.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
The ask_user tool used AskUserInterrupt(BaseException) for mid-turn
blocking, creating heavy coupling across runner, loop, and session
management. The model now asks questions naturally in response text,
the turn ends normally, and the user's next message starts a new turn
with session history providing continuity.
Removed:
- nanobot/agent/tools/ask.py (tool, interrupt, helpers)
- tests/agent/test_ask_user.py
- webui/src/components/thread/AskUserPrompt.tsx
- AskUserInterrupt handling in runner.py
- Dual-path message building in loop.py
- Pending ask detection via history scanning
- button_prompt/buttons emission in WebSocket channel
- ask_user references in Slack channel docstrings
Preserved (MessageTool uses these independently):
- OutboundMessage.buttons field
- Channel button rendering (Telegram, Slack, WebSocket)
Remove unused code confirmed dead via vulture scan, grep verification,
and coverage analysis:
- _get_bridge_dir (cli/commands.py): 82-line function with zero callers
- add_assistant_message (agent/context.py): method body never executed,
also removed now-unused build_assistant_message import
- _tool_parameters_schema (agent/tools/base.py): redundant copy of schema
already exposed via the `parameters` property
- MSTEAMS_REF_TTL_S (channels/msteams.py): unused constant (production
uses config.ref_ttl_days directly); inlined in test
- MESSAGE_TYPE_USER (channels/weixin.py): unused constant
- Add `ModelPresetConfig` schema for named model presets
- Add `model_presets` dict to `Config` and `model_preset` field to `AgentDefaults`
- Add `resolve_preset()` to return effective model params from preset or defaults
- Add `@model_validator` to reject unknown preset names
- Update `_match_provider()` to use resolved preset model/provider
- Update `make_provider()` and `provider_signature()` to use `resolve_preset()`
- Add `model_preset` property to `AgentLoop` for atomic runtime switching
- Update `AgentLoop.from_config()` to inject a runtime `default` preset
- Wire self-tool to inspect/clear preset state
- Update CLI display strings to show active preset
This commit implements a progressive refactoring of the tool system to support
plugin discovery, scoped loading, and protocol-driven runtime context injection.
Key changes:
- Add Tool ABC metadata (tool_name, _scopes) and ToolContext dataclass for
dependency injection.
- Introduce ToolLoader with pkgutil-based builtin discovery and
entry_points-based third-party plugin loading.
- Add scope filtering (core/subagent/memory) so different contexts load
appropriate tool sets.
- Introduce ContextAware protocol and RequestContext dataclass to replace
hardcoded per-tool context injection in AgentLoop.
- Add RuntimeState / MutableRuntimeState protocols to decouple MyTool from
AgentLoop.
- Migrate all built-in tools to declare scopes and implement create()/enabled()
hooks.
- Migrate MessageTool, SpawnTool, CronTool, and MyTool to ContextAware.
- Refactor AgentLoop to use ToolLoader and protocol-driven context injection.
- Refactor SubagentManager to use ToolLoader(scope="subagent") with per-run
FileStates isolation.
- Register all built-in tools via pyproject.toml entry_points.
- Add comprehensive tests for loader scopes, entry_points, ContextAware,
subagent tools, and runtime state sync.
The hosted Xiaomi MiMo API accepts {"thinking": {"type": "enabled"|"disabled"}}
to toggle reasoning, which is exactly the shape produced by the existing
thinking_type style. The xiaomi_mimo ProviderSpec just needed to opt in.
Before this fix, setting reasoning_effort="none" had no effect on MiMo
because no thinking_style was configured, so the disable signal never
reached the server. Default-on models (mimo-v2.5-pro and friends) kept
reasoning regardless of user configuration.
Source: https://platform.xiaomimimo.com/docs/en-US/api/chat/openai-api
Co-authored with Claude Opus 4.7. Strategy and review via Claude Desktop,
implementation via Claude Code.
Six tests covering:
- AgentDefaults preserves 'nanobot' and the cat icon by default
- camelCase config keys (botName/botIcon) bind to the new fields
- Empty bot_icon is accepted (opt-out of the leading icon)
- ThinkingSpinner uses bot_name in its status text
- StreamRenderer header combines icon and name when icon is set
- StreamRenderer header is just the name when icon is empty
- Append [Archived Context Summary] to system prompt instead of injecting
it into the user message runtime context, improving KV cache reuse across
turns and avoiding consecutive same-role messages.
- _last_summary persists in metadata (no pop) for restart survival;
summary is re-injected every turn via the stable system prompt.
- Remove dynamic "Inactive for X minutes" from _format_summary — use
static last_active timestamp instead to preserve KV cache stability.
- Pass session_summary through build_messages() so both normal and
ask_user paths receive the archived summary in the system prompt.
- estimate_session_prompt_tokens now reads _last_summary from metadata
to include the summary in token budget estimation.
- Remove obsolete session_summary parameter from
maybe_consolidate_by_tokens and estimate_session_prompt_tokens
call sites in loop.py (summary flows through build_messages instead).
- Ensure /new (session.clear()) clears _last_summary from metadata.