nanobot

mirror of https://github.com/HKUDS/nanobot.git synced 2026-06-14 14:54:06 +00:00

Author	SHA1	Message	Date
chengyongru	409afe1a3d	test(tools): add basic regression tests for ContextVar routing context	2026-04-21 13:25:30 +08:00
jr_blue_551	ff8c28d5a8	agent: use ContextVar for tool routing context	2026-04-21 13:25:30 +08:00
Xubin Ren	82aa9efc02	test(mcp): pin CancelledError short-circuits the retry loop The retry branch is only reachable via `except Exception`, and `CancelledError` inherits from `BaseException`, so today it naturally bypasses the retry path and /stop still works. Add one focused regression test so any future refactor that widens the retry catch to `BaseException`, re-orders the handlers, or adds `CancelledError` to `_TRANSIENT_EXC_NAMES` fails CI instead of silently swallowing /stop. Made-with: Cursor	2026-04-21 13:24:40 +08:00
hussein1362	368752e707	fix(mcp): retry once on transient connection errors When an MCP server restarts or a network connection drops between tool calls, the existing session throws ClosedResourceError, BrokenPipeError, ConnectionResetError, etc. Currently these are caught as generic exceptions and returned as permanent failures to the LLM, which then tells the user 'my tools are broken.' This change adds a single automatic retry with a 1-second backoff for transient connection-class errors in MCPToolWrapper, MCPResourceWrapper, and MCPPromptWrapper. Non-transient errors (ValueError, RuntimeError, McpError, etc.) are not retried. The retry is conservative: - Only 1 retry (not configurable, to keep the change minimal) - Only for a specific set of connection-class exceptions - Matched by exception class name to avoid importing anyio/etc. - 1s sleep between attempts to allow the server to recover - Clear logging distinguishes retried vs permanent failures In production this eliminates most 'MCP tool call failed: ClosedResourceError' noise when MCP bridge processes restart (e.g. after config changes or OOM kills). Tests: 22 new tests covering retry, exhaustion, non-transient bypass, timeout bypass, and all three wrapper types.	2026-04-21 13:24:40 +08:00
Xubin Ren	6c24f24e9e	feat(models): add support for kimi-k2.6 with temperature override and update documentation	2026-04-20 18:18:06 +00:00
Xubin Ren	009cce78ad	fix(anthropic): also enforce leading-user + empty-array recovery Extend `_merge_consecutive` so the three invariants from `LLMProvider._enforce_role_alternation` all hold for Anthropic: 1. collapse consecutive same-role turns (unchanged) 2. no trailing assistant — Anthropic rejects prefill (unchanged) 3. no leading assistant — Anthropic requires the first turn be user 4. non-empty messages array — recover the last stripped assistant as a user turn when every turn got stripped, so callers don't hit a secondary "messages array empty" 400 Anthropic-specific wrinkle: `tool_use` blocks live inside `content` (not a separate `tool_calls` field) and are illegal inside user turns, so both recovery paths skip any message carrying them rather than silently producing a malformed request. Adds 4 unit tests covering the new branches, including the tool_use opt-outs, and updates the existing `test_single_assistant_stripped` to reflect the new rerouting contract. Made-with: Cursor	2026-04-21 01:32:32 +08:00
hussein1362	2f02342083	fix(anthropic): strip trailing assistant messages to prevent prefill error Anthropic does not support assistant-message prefill and returns a 400 error when the conversation ends with an assistant turn. This commonly happens when heartbeat/system messages accumulate trailing assistant replies in the session history. The _merge_consecutive method already handles same-role merging but did not strip trailing assistant messages. The base provider's _enforce_role_alternation (used by OpenAI-compat) does strip them, but AnthropicProvider uses its own _merge_consecutive instead. Add a trailing-assistant stripping loop to _merge_consecutive, matching the behavior already present in _enforce_role_alternation. Includes 7 new tests covering merge + strip behavior.	2026-04-21 01:32:32 +08:00
Xubin Ren	00de55072d	test(agent): exercise /stop cancellation through _dispatch Add a regression test that actually runs the CancelledError branch of AgentLoop._dispatch end-to-end and asserts the in-flight checkpoint is materialized into session.messages before the cancellation unwinds. The three existing tests call _restore_runtime_checkpoint directly, so they pass even if the cancel-time restore is ever removed from _dispatch. This new test is the one that actually locks the fix in place. Made-with: Cursor	2026-04-21 01:14:41 +08:00
hussein1362	847c50b2de	fix(loop): preserve partial context when /stop cancels a task When a user sends /stop to interrupt an active agent turn, the task is cancelled via CancelledError. Previously, the cancellation handler just logged and re-raised, discarding any tool results and assistant messages accumulated during the interrupted turn. The runtime checkpoint mechanism already persists partial turn state (assistant messages, completed tool results, pending tool calls) into session metadata via _emit_checkpoint. However, this checkpoint was only materialized into session history on the NEXT incoming message via _restore_runtime_checkpoint — not at cancellation time. Now the CancelledError handler in _dispatch calls _restore_runtime_checkpoint immediately, so the partial context is preserved in session history. This means the next message the user sends will see all the work that was done before /stop, rather than starting from scratch. Fixes #2966 Includes 3 tests verifying checkpoint restoration on cancellation.	2026-04-21 01:14:41 +08:00
hlg	899a9073ce	fix(memory): do not fall back to raw entry when strip_think empties it `append_history` previously used `strip_think(entry) or entry.rstrip()` as a safety net, so if the entire entry was a template-token leak (e.g. `<think>reasoning</think>` or `<channel\|>` alone), the raw leaked text was still persisted to history — later re-introducing the very content `strip_think` was meant to scrub, via consolidation / replay. Persist the cleaned content directly. When cleanup empties a non-empty entry, log at debug and store an empty-content record (cursor continuity preserved). Adds 3 regression tests in test_memory_store.py covering: - Well-formed thinking blocks are stripped before persistence. - Pure-leak entries persist as empty, not as raw text. - Malformed prefix leaks (`<channel\|>`) also persist as empty.	2026-04-20 17:04:48 +08:00
hlg	8e7d8bef6a	fix(utils): handle malformed think tags and channel markers in strip_think Some models / Ollama renderers occasionally emit tokenizer-level template leaks that the existing regexes miss: 1. Malformed opening tags with no closing `>`, running straight into user-facing content — e.g. `<think广场照明灯目前…` (observed with Gemma 4 via Ollama). The earlier `<think>[\s\S]?</think>` and `^\s<think>[\s\S]$` patterns both require `>`, so these leak into rendered messages. 2. Harmony-style channel markers like `<channel\|>` / `<\|channel\|>` at the start of a response. 3. Orphan `</think>` / `</thought>` closing tags left behind when only the opener was consumed upstream. Handles each case conservatively: - Malformed `<think` / `<thought` only match when the next char is NOT a tag-name continuation (`[A-Za-z0-9_\-:>/]`). Explicit ASCII class instead of `\w` because Python's Unicode `\w` matches CJK and would defeat the primary fix. - Orphan closing tags and channel markers are stripped only at the start or end of the text*. `strip_think` is also applied before persisting history (memory.py), so mid-text stripping would silently rewrite transcripts where the tokens themselves are discussed. Preserves: `<thinker>`, `<think-foo>`, `<think_foo>`, `<think1>`, `<think:foo>`, `<thought/>`, literal `` `</think>` `` / `` `<channel\|>` `` inside prose or code blocks. Adds 16 new regression tests covering both the leak cases and the preserved-prose cases.	2026-04-20 17:04:48 +08:00
chengyongru	f900c5bb8e	fix(telegram): address code review issues from cherry-pick merge - Fix critical plain-text fallback that was sending raw HTML tags to users: keep raw markdown available for the fallback path - Extract TELEGRAM_HTML_MAX_LEN (4096) constant to replace hardcoded magic number and document the difference from TELEGRAM_MAX_MESSAGE_LEN - Add fallback to _send_text for extra HTML chunks when HTML parse fails - Add missing @pytest.mark.asyncio decorator on test_send_delta_stream_end_html_expansion_does_not_overflow	2026-04-20 16:58:46 +08:00
stutiredboy	2eea82f5ee	fix(telegram): split oversized stream buffer mid-flight Cherry-picked from #3311 (stutiredboy). Streaming edits called edit_message_text(text=buf.text) without chunking, so once accumulated deltas crossed Telegram's 4096-char limit an ongoing stream would fail with BadRequest. Extracts _flush_stream_overflow helper that edits the first chunk in place, sends any middle chunks, and re-anchors the buffer to a new message for the tail so subsequent deltas keep streaming. Co-Authored-By: stutiredboy <stutiredboy@users.noreply.github.com>	2026-04-20 16:58:46 +08:00
himax12	fd8f08cc83	fix(telegram): convert markdown to HTML before splitting to avoid message length overflow Cherry-picked from #3316 (himax12). When streaming completes in send_delta(), the code was splitting raw markdown text by 4000, then converting to HTML. The markdown-to-HTML conversion adds 10-33% characters, which could push the result over Telegram's 4096 character limit. The fix converts markdown to HTML first, then splits by 4096 (actual Telegram limit), ensuring the edited message always fits. Fixes #3315	2026-04-20 16:58:46 +08:00
chengyongru	68466b1c2a	fix(agent): propagate effective session key through subagent pipeline The previous fix hardcoded session_key_override as channel:chat_id which broke unified session mode where pending queues use "unified:default". Propagate the effective key from _set_tool_context through SpawnTool into the origin dict so _announce_result routes to the correct pending queue in both normal and unified session modes.	2026-04-20 14:47:14 +08:00
chengyongru	79821a571f	fix: suppress intermediate progress output in cron jobs Cron jobs now pass on_progress=_silent to process_direct, matching the heartbeat pattern. Previously, tool hints and streaming deltas were published to the user channel via bus during execution, but the final response could be rejected by evaluate_response — leaving users with confusing partial output and no conclusion. Closes #3319	2026-04-20 11:43:54 +08:00
Xubin Ren	56a779c128	fix(session): repair read-only corrupt session paths	2026-04-20 00:17:50 +08:00
aiguozhi123456	efb04a1712	fix(session): use atomic writes and add corrupt-file repair SessionManager.save() previously used bare open("w") which could truncate the JSONL file if the process crashed mid-write. Now writes to a .tmp file and atomically replaces via os.replace(), matching the pattern already used in qq.py. _load() now attempts _repair() before returning None, recovering valid lines from partially-written files. 12 new tests cover atomic save correctness, temp-file cleanup on failure, and repair of truncated/corrupt JSONL. cowork-with:opencode(glm-5.1)	2026-04-20 00:17:50 +08:00
Alfredo Arenas	5d976d79ff	test(discord): update tests for bot-to-bot fix (#3217 ) The old test `test_on_message_ignores_bot_messages` asserted the previous (incorrect) contract that ALL bot-authored messages are dropped. With #3217 only self-loops are dropped, so this test was replaced with three more precise tests: - test_on_message_ignores_self_messages: verifies self-loop guard (author_id == _bot_user_id is dropped) - test_on_message_accepts_messages_from_other_bots: new test for the fix itself — other bots' messages flow through - test_on_message_stops_typing_on_handle_exception: preserves the typing cleanup assertion from the original test Net result: +1 behavior tested, same behaviors retained. Co-authored with Claude Opus 4.7	2026-04-19 23:32:40 +08:00
coldxiangyu	7527961b19	fix(cron): drop top-level oneOf so OpenAI Codex/Responses accept tool schema PR #3125 added a top-level `oneOf` branch to `_CRON_PARAMETERS` to advertise per-action required fields. OpenAI Codex/Responses rejects `oneOf`/`anyOf`/`allOf`/`enum`/`not` at the root of function parameters, so any agent that registers the cron tool now fails to start with: HTTP 400: Invalid schema for function 'cron': schema must have type 'object' and not have 'oneOf'/'anyOf'/'allOf'/'enum'/'not' at the top level. Remove the top-level `oneOf`. The original intent of #3125 (stop LLMs from looping on the #3113 contract mismatch) is preserved by: - `validate_params` — runtime-enforces `message` for `action='add'` and `job_id` for `action='remove'` - field descriptions — each schema field already flags "REQUIRED when action='...'" so the LLM sees the contract The regression test is updated to lock the invariant in the other direction: the top-level schema must not contain `oneOf`/`anyOf`/`allOf`/`not`, and the REQUIRED hints must stay on `message` and `job_id`. Verified: - tests/cron/ 70 passed - tests/agent/test_loop_cron_timezone.py + tests/providers/ 232 passed Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>	2026-04-19 21:54:38 +08:00
Xubin Ren	b3049f7323	fix(webui): stabilize empty session history state	2026-04-19 13:38:47 +00:00
Xubin Ren	c4b3837c5f	Merge remote-tracking branch 'origin/main' into nanobot-webui	2026-04-19 12:36:52 +00:00
Xubin Ren	46e11a68a7	test: speed up cron and restart timing tests Replace fixed sleep-based waits with condition polling in cron tests and mock the restart delay in CLI restart tests to reduce suite runtime without changing behavior.	2026-04-19 12:35:57 +00:00
Xubin Ren	b6d63fb1ec	fix: normalize responses circuit breaker keys Made-with: Cursor	2026-04-19 20:16:25 +08:00
Mohamed Elkholy	baba3b2160	fix(providers): add circuit breaker for Responses API fallback When the Responses API fails repeatedly (3 consecutive compatibility errors), skip it and fall back directly to Chat Completions. Unlike a permanent disable, the circuit re-probes after 5 minutes so recovery is automatic when the API comes back. Success resets the counter. Keyed per (model, reasoning_effort) so a failure with one model does not affect others.	2026-04-19 20:16:25 +08:00
Xubin Ren	ccd6c05f71	fix: include pending summaries in consolidation estimates Made-with: Cursor	2026-04-19 20:06:11 +08:00
Xubin Ren	54b659929e	test: cover summary persistence after token consolidation Made-with: Cursor	2026-04-19 20:06:11 +08:00
Xubin Ren	be10ba1f0d	Merge remote-tracking branch 'origin/main' into nanobot-webui	2026-04-19 05:15:27 +00:00
Alfredo Arenas	2d0442976e	test(cli): update _make_console tests for isatty-based fix (#3265 ) The old test `test_make_console_uses_force_terminal` hardcoded `force_terminal is True`, which contradicts the fix: we now defer to sys.stdout.isatty() so piped / non-TTY output gets plain text instead of ANSI escape codes. Split into two tests covering both branches: - test_make_console_force_terminal_when_stdout_is_tty: TTY path (force_terminal=True, rich output) - test_make_console_force_terminal_false_when_stdout_is_not_tty: non-TTY path (force_terminal=False, plain text) — regression guard for the bug reported in #3265 Co-authored with Claude Opus 4.7	2026-04-19 04:19:59 +08:00
Xubin Ren	384bad17b4	Merge origin/main into fix/config-default-api-base Made-with: Cursor	2026-04-18 20:08:21 +00:00
Xubin Ren	9c0dc8b276	fix: drop generic repeated tool-call guard The global guard changed baseline agent and subagent behavior without proving a real no-progress loop. Keep this PR focused on the cron contract hardening and validation fixes. Made-with: Cursor	2026-04-18 19:59:58 +00:00
Xubin Ren	adc1e843b4	Merge origin/main into fix/cron-contract-repeat-guard Made-with: Cursor	2026-04-18 19:42:48 +00:00
Xubin Ren	e08507f3ce	fix: handle git worktrees in GitStore nested repo protection Treat `.git` files the same as `.git` directories so GitStore refuses to initialize inside git worktrees, and add a focused regression test for that checkout shape. Made-with: Cursor	2026-04-19 03:38:22 +08:00
longle325	fb28678b64	fix: prevent GitStore from creating nested repos and overwriting .gitignore (#2980 ) GitStore.init() now checks if the workspace is already inside a git repository before calling porcelain.init(). If so, it refuses to create a nested repo. Additionally, existing .gitignore files are preserved by appending only missing Dream-specific entries rather than overwriting. Closes #2980	2026-04-19 03:38:22 +08:00
Xubin Ren	1b211c7d3a	Merge branch 'main' into nanobot-webui Made-with: Cursor	2026-04-18 19:17:16 +00:00
Xubin Ren	9ed3031a42	feat(webui): add initial webui with websocket chat flow	2026-04-18 18:51:53 +00:00
chengyongru	5818569e8f	feat(wizard): auto-detect Literal fields as select menus Literal["standard", "persistent"] fields are now rendered as select dropdowns instead of free-text input. This makes provider_retry_mode and any future Literal fields self-documenting in the wizard.	2026-04-18 21:56:10 +08:00
chengyongru	ebb5179cab	feat(wizard): add Channel Common, API Server menus and field constraint validation - Add [H] Channel Common menu to configure send_progress, send_tool_hints, send_max_retries, and transcription_provider - Add [I] API Server menu to configure host, port, timeout - Add real-time Pydantic field constraint validation (ge/gt/le/lt/min_length/max_length) with constraint hints shown in field display (e.g. "Send Max Retries (0-10)") - Add _pause() to View Configuration Summary to prevent immediate screen clear - Fix _format_value dict branch to handle BaseModel instances without crashing	2026-04-18 21:56:10 +08:00
chengyongru	34e8f97b1f	refactor(templates): separate identity and SOUL responsibilities Move all behavioral instructions out of identity.md into SOUL.md so that each file has a single clear purpose: - identity.md: capability facts only (runtime, workspace, format hints, tool guidance, untrusted content warning) - SOUL.md: behavioral rules (name, personality, execution rules) The "Act, don't narrate" rule is refined into layered behavior: act immediately on single-step tasks, plan first for multi-step tasks. This eliminates the contradiction where identity said "never end with a plan" but user SOUL.md said "always plan first".	2026-04-18 21:55:56 +08:00
Xubin Ren	6bfb75ed03	feat(websocket): multiplex multiple chat_ids over a single connection	2026-04-18 16:49:12 +08:00
Xubin Ren	70a1279b86	test: pin retry-wait callback routing so internal heartbeats stay off channels Add two focused regression tests for the retry-wait leak this PR fixes: - tests/agent/test_runner.py::test_runner_binds_on_retry_wait_to_retry_callback_not_progress locks in that `AgentRunSpec.retry_wait_callback` (not `progress_callback`) is what `_build_request_kwargs` forwards to the provider as `on_retry_wait`. - tests/channels/test_channel_manager_delta_coalescing.py::TestRetryWaitFiltering runs `_dispatch_outbound` end-to-end and asserts that `_retry_wait: True` messages never reach channel send. Both tests fail on origin/main and pass with this PR's fix applied. Made-with: Cursor	2026-04-18 13:50:05 +08:00
Xubin Ren	c8d834a504	fix(loop): document subagent-followup persistence and guard empty content - Add inline rationale for persisting before ContextBuilder and for passing current_message="" on subagent follow-ups (avoids double-projection after merge). - Skip persistence for empty subagent content (no-op messages should not pollute history). - Add regression test covering the empty-content guard. Made-with: Cursor	2026-04-18 13:30:22 +08:00
xzq.xu	1c939e8a5f	fix(loop): persist subagent follow-up events in history	2026-04-18 13:30:22 +08:00
04cb	c27b4d07c4	fix(utils): recurse into PPTX groups and tables when extracting text (#3250 )	2026-04-18 12:30:42 +08:00
JunghwanNA	34fccb2ee9	Prevent self-inspection from leaking configured secrets MyTool blocks direct access to sensitive nested paths, but its formatter still printed scalar fields for small config objects. That let `my(action="check", key="web_config.search")` expose `api_key` in plain text even though the docs promise sensitive sub-fields are protected. This keeps the change narrow: sensitive nested config fields are omitted from MyTool's formatted output, and regression coverage locks the behavior in. Constraint: Must preserve existing read-only inspection behavior for non-sensitive fields Constraint: Keep scope limited to MyTool rather than introducing broader redaction plumbing Rejected: Rework global context/tool redaction around MyTool \| broader than needed for the leak path Confidence: high Scope-risk: narrow Reversibility: clean Directive: If more nested config rendering is added later, filter sensitive field names at the formatter boundary as well as the path resolver Tested: PYTHONPATH=$PWD pytest -q tests/agent/tools/test_self_tool.py /Users/jh0927/Workspace/nanobot-validation-artifacts-2026-04-18/test_my_tool_secret_leak_regression.py Not-tested: Full repository test suite Related: #3259	2026-04-18 00:59:08 +08:00
JunghwanNA	c196b5b0c2	Prevent failed SSE requests from masquerading as successful completions The streaming API currently logs backend exceptions but still emits the same `finish_reason: "stop"` + `[DONE]` terminator used for successful responses. That makes a failed streamed request look successful to OpenAI-compatible clients. This keeps the fix narrow: track whether the stream backend failed and suppress the success terminator in that case. A regression test locks in the expected behavior. Constraint: Keep the non-streaming response path untouched Constraint: Follow up on the known limitation called out during PR #3222 review without redesigning the SSE protocol Rejected: Introduce a custom SSE error event shape in the same patch \| expands API surface and review scope Confidence: high Scope-risk: narrow Reversibility: clean Directive: If explicit streamed error events are added later, keep them distinct from the success stop+[DONE] terminator to preserve client retry semantics Tested: PYTHONPATH=$PWD pytest -q tests/test_api_stream.py /Users/jh0927/Workspace/nanobot-validation-artifacts-2026-04-18/test_api_stream_error_regression.py Not-tested: Full repository test suite Related: #3260 Related: #3222	2026-04-18 00:44:44 +08:00
Steve	39dd59f2ba	fix(cron): state per-action requirements in descriptions, keep list/remove callable The previous patch promoted `message` into top-level `required`, which solved the `add` loop but broke `list` and `remove`: `ToolRegistry.prepare_call` enforces `required` via `validate_params`, so `cron(action="list")` and `cron(action="remove", job_id=...)` — both documented in `SKILL.md` — started failing schema validation with the same "missing required message" shape that #3113 describes for `add`. Instead: - Keep `required=["action"]` so `list`/`remove` stay callable. - Prefix `message`'s description with `REQUIRED when action='add'.` and `job_id`'s with `REQUIRED when action='remove'.` so LLMs see the real per-action contract up front. - Keep the improved runtime error message from the previous commit for the case an LLM still omits `message` on `add`. Also add `tests/cron/test_cron_tool_schema_contract.py` to lock in: - `list` and `remove` pass schema validation with no `message` - `add` with `message` passes - `add` without `message` surfaces the actionable runtime error - field descriptions carry the REQUIRED hints - top-level `required` stays `["action"]` Existing `tests/cron/test_cron_tool_list.py` cases bypass schema validation by calling `_list_jobs()` / `_remove_job()` directly, which is why CI didn't catch the regression; the new test goes through `ToolRegistry.prepare_call`.	2026-04-17 22:52:48 +08:00
Xubin Ren	b8d327dc41	test + docs: lock should_execute_tools guard semantics (#3220 ) Two small follow-ups to the guard: 1. Fix the should_execute_tools docstring so it matches the actual code. The previous version said "Only execute when finish_reason explicitly signals tool intent" but the code also accepts finish_reason == "stop". Explain why (some compliant providers emit "stop" with legitimate tool calls — openai_compat_provider.py already mirrors this at lines ~633 / ~678 where ("tool_calls", "stop") are both treated as the terminal tool-call state). Without this, a strict "tool_calls"-only guard would regress 15 existing runner tests that construct LLMResponse with tool_calls but no explicit finish_reason (default = "stop"). 2. Add tests/providers/test_llm_response.py. This locks the three cases: - no tool calls -> never executes - tool calls + "tool_calls"/stop -> executes - tool calls + refusal / content_filter / error / length / ... -> blocked These are exactly the boundary cases the #3220 fix is about; without a test here a future refactor could silently revert the guard. Body + tests only, no behavior change beyond the existing PR's intent. Made-with: Cursor	2026-04-17 20:39:46 +08:00
Cheng Yongru	aabc3d5017	fix(memory): fall back to raw_archive on LLM error response When chat_with_retry returns an error response (finish_reason='error') instead of raising an exception, archive() previously treated the error message as a valid summary and wrote it to history.jsonl, while the original session data was already cleared by /new — causing irreversible data loss. Fix: check finish_reason after the LLM call and raise RuntimeError on error responses, which naturally falls through to the existing raw_archive fallback. This preserves the original messages in history.jsonl instead of losing them. Fixes #3244	2026-04-17 20:15:07 +08:00
Mariano Campo	d0e65ebf70	fix(exec): pass allowed_env_keys to exec tool calls in subagents	2026-04-17 16:32:25 +08:00

1 2 3 4 5 ...

577 Commits