nanobot

mirror of https://github.com/HKUDS/nanobot.git synced 2026-05-04 16:55:54 +00:00

Author	SHA1	Message	Date
chengyongru	5853d5dfda	fix: allow_patterns take priority over deny_patterns in ExecTool (#3594 ) * fix: allow_patterns take priority over deny_patterns in ExecTool Previously deny_patterns were checked first with no bypass, meaning allow_patterns could never exempt commands from the built-in deny list. This made it impossible to whitelist destructive commands for specific directories (e.g. build/cleanup tasks). Changes: - shell.py: check allow_patterns first; if matched, skip deny check - shell.py: deny_patterns now appends to built-in list (not replaces) - schema.py: add allow_patterns/deny_patterns to ExecToolConfig - loop.py/subagent.py: pass allow_patterns/deny_patterns to ExecTool - Add test_exec_allow_patterns.py covering priority semantics * fix: separate deny pattern errors from workspace violation detection The deny pattern error message "Command blocked by safety guard" was included in _WORKSPACE_BLOCK_MARKERS, causing deny_pattern blocks to be misclassified as fatal workspace violations. This meant LLMs had no chance to retry with a different command — the turn was aborted immediately. Changes: - shell.py: deny/allowlist error messages now use distinct phrasing ("blocked by deny pattern filter" / "blocked by allowlist filter") - runner.py: remove "blocked by safety guard" from _WORKSPACE_BLOCK_MARKERS so deny_pattern errors are treated as normal tool errors (LLM can retry) instead of fatal violations - workspace path errors still use "blocked by safety guard" and remain fatal as intended * fix: update test assertions to match new deny pattern error message * fix: indentation error in test file * fix: restore SSRF fatal classification and tidy exec pattern plumbing Address review feedback on the deny/allow_patterns rework: - runner.py: re-add "internal/private url detected" to _WORKSPACE_BLOCK_MARKERS. The earlier marker removal also stripped fatal classification from SSRF / internal-URL rejections (whose message still says "blocked by safety guard"), turning a hard security boundary into something the LLM could retry. - loop.py / subagent.py: drop `or None` between ExecToolConfig and ExecTool. The schema default is an empty list and ExecTool already normalizes None back to [], so the indirection was a no-op. - shell.py: extract `explicitly_allowed` flag in _guard_command so allow_patterns are scanned once instead of twice and the control flow no longer relies on a no-op `pass + else` branch. - tests/agent/test_runner.py: add a regression test asserting that the SSRF block message is treated as fatal, while deny/allowlist filter messages are deliberately non-fatal. * fix: remove unused exec allow-pattern test import Keep the new ExecTool allow-pattern coverage clean under ruff. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Xubin Ren <xubinrencs@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-03 00:27:17 +08:00
Xubin Ren	188e6df757	fix(utils): cover complete trailing think markers Made-with: Cursor	2026-05-01 20:09:59 +08:00
bravel	2c397ad442	fix: strip partial think tags in streaming output	2026-05-01 20:09:59 +08:00
Xubin Ren	e157392250	fix(agent): scope subagent reply dedupe to origin message Made-with: Cursor	2026-05-01 11:47:24 +00:00
yorkhellen	08f326ec55	test: Add tests for sender_id runtime context injection	2026-05-01 19:43:38 +08:00
Xubin Ren	306958d6e6	add native Bedrock Converse provider Made-with: Cursor	2026-05-01 18:52:03 +08:00
hanyuanling	3c20d16117	fix subagent max iteration limit	2026-04-30 13:45:40 +08:00
Xubin Ren	3d7099b421	fix(memory): clean atomic write test hygiene Made-with: Cursor	2026-04-29 16:57:50 +08:00
yorkhellen	2af45945e2	fix(memory): ensure atomic write for history.jsonl Use temp file + os.replace + fsync to prevent partial writes on crash. Add tests for atomic write behavior and tmp file cleanup on exception.	2026-04-29 16:57:50 +08:00
Xubin Ren	48f3cc6390	fix(agent): stop on workspace violations from tool errors Treat workspace and safety guard failures as fatal regardless of whether they arrive from tool preparation, returned tool output, or raised exceptions. Made-with: Cursor	2026-04-28 15:13:27 +08:00
Xubin Ren	ad4802600e	refactor(config): make max messages default explicit Use 120 as the config-level default and normalize zero back to that limit so session replay always receives an explicit message cap. Made-with: Cursor	2026-04-28 14:54:32 +08:00
hussein1362	d45ffcf519	feat(config): wire max_messages into session history replay The max_messages config field in AgentDefaults was accepted by the schema but never threaded through to the actual get_history() calls in the agent loop. Both call sites in _process_message hardcoded the default, so sessions with slow or local models accumulated unbounded history that inflated prompt tokens and caused LLM timeouts. Changes: - Add max_messages field to AgentDefaults (default 0 = use built-in constant, any positive value caps history replay) - Store the value on AgentLoop and pass it to get_history() when non-zero - Wire the config through all three AgentLoop construction sites in commands.py (gateway, API server, CLI chat) - 14 focused tests covering schema validation, init storage, history slicing, boundary alignment, integration wiring, and the zero/default path	2026-04-28 14:54:32 +08:00
Xubin Ren	fdfecd3ba6	refactor(codex): name progress delta capability semantically Use a provider capability name that describes user-visible progress delta support instead of the runner implementation detail. Made-with: Cursor	2026-04-27 18:48:05 +08:00
hanyuanling	ae14142a87	fix(codex): stream progress deltas to channels	2026-04-27 18:48:05 +08:00
Xubin Ren	e31273ebaa	Merge origin/main into fix/discord-allow-channel-threads Made-with: Cursor	2026-04-27 09:26:24 +00:00
Xubin Ren	eb4b3d9e26	refactor(session): internalize history/file-cap knobs as constants Move sessionHistoryMaxMessages, sessionHistoryMaxTokens, and sessionFileMaxMessages out of user-facing config into internal constants (HISTORY_MAX_MESSAGES=120, FILE_MAX_MESSAGES=2000). - Remove 3 fields from AgentDefaults and config pipeline - Sink enforce_file_cap into Session (was AgentLoop) - Auto-derive token budget from context window (was configurable) - Net -113 lines across 7 files; 723 tests green Made-with: Cursor	2026-04-27 08:06:50 +00:00
Xubin Ren	29ebc2d355	Merge origin/main into feat/session-replay-file-cap-invariants Preserve main's timestamp/tool-context replay semantics while keeping the PR's session history and file-cap budgets. Made-with: Cursor	2026-04-27 07:32:00 +00:00
Xubin Ren	311a7fe36e	fix(session): stop training the model to parrot [Message Time: ...] Past assistant turns in history were prefixed with "[Message Time: ...]" just like user turns. The model treated these as in-context demos and started prefixing its own replies with the same marker, leaking metadata to the user. Prompt-level warnings could not beat dozens of prior assistant samples. Annotate only user turns and proactive deliveries (_channel_delivery=True, i.e. cron / heartbeat pushes whose timing is the whole point and which are too infrequent to act as demos). Adjacent user-side timestamps still pin every normal assistant reply for relative-time reasoning. The now-redundant identity.md warning is removed along with the demonstration source.	2026-04-27 07:11:20 +00:00
Xubin Ren	7dcf83e389	test(agent): cover threaded subagent routing Made-with: Cursor	2026-04-27 14:37:36 +08:00
Xubin Ren	eeaec1f951	fix(agent): prevent message time metadata from leaking into replies	2026-04-27 06:23:43 +00:00
chengyongru	6eb178113e	fix(mcp): sanitize MCP capability names for model API compatibility MCP resource/prompt/tool names containing spaces or special characters (e.g. "PostgreSQL System Information") were forwarded verbatim to model provider APIs, causing validation errors from both Anthropic and OpenAI which require names matching ^[a-zA-Z0-9_-]{1,128}$. Add _sanitize_name() that replaces invalid characters with underscores and collapses consecutive underscores. Applied in MCPToolWrapper, MCPResourceWrapper, MCPPromptWrapper constructors and the enabled_tools filtering logic. Closes #3468	2026-04-27 11:49:50 +08:00
Xubin Ren	4a4ba1efc1	Merge branch 'main' into fix/session-history-timestamps Made-with: Cursor	2026-04-26 18:13:11 +00:00
Xubin Ren	038a140ad3	fix(slack): preserve thread context for proactive replies Capture Slack thread metadata for cron and message-tool deliveries so replies stay in the originating thread, and hydrate first thread mentions with recent Slack context. Made-with: Cursor	2026-04-27 02:10:38 +08:00
Xubin Ren	df37a36174	fix(agent): expose session timestamps in model context Include persisted turn timestamps when assembling LLM prompts so relative-date references like yesterday and today have concrete anchors. Made-with: Cursor	2026-04-26 17:42:58 +00:00
hanyuanling	59dfd74842	feat(session): enforce replay/file-cap invariants for history lifecycle	2026-04-27 00:53:32 +08:00
Xubin Ren	b2aec5528a	refactor(agent): move provider refresh into subsystem owners	2026-04-26 14:18:37 +00:00
Xubin Ren	f670da6c70	refactor(providers): move provider snapshot creation into factory	2026-04-26 14:05:13 +00:00
Xubin Ren	65b0ae81af	Merge origin/main into webui-settings Made-with: Cursor	2026-04-26 13:05:32 +00:00
Xubin Ren	727086ddac	test: tighten consolidation ratio coverage Made-with: Cursor	2026-04-26 20:24:42 +08:00
chengyongru	fca56d324a	test: add unit tests for configurable consolidation_ratio Cover ratio propagation, schema validation, and consolidation behavior with different ratio values (0.1, 0.5, 0.9).	2026-04-26 20:24:42 +08:00
Xubin Ren	b440e76d2f	feat(webui): add model settings runtime refresh	2026-04-25 18:05:06 +00:00
Xubin Ren	a58d9fd357	feat(webui): render ask_user choices Made-with: Cursor	2026-04-25 15:46:47 +00:00
Xubin Ren	403ce23d22	fix(agent): tighten ask_user CLI handling Made-with: Cursor	2026-04-25 22:10:19 +08:00
Xubin Ren	3b1ea99ee1	fix(agent): render ask_user options without buttons Made-with: Cursor	2026-04-25 22:10:19 +08:00
Xubin Ren	cfc76ffbbf	feat(agent): add ask_user tool Made-with: Cursor	2026-04-25 22:10:19 +08:00
yorkhellen	076e4166d7	fix(agent): add LLM request timeout to prevent session lock starvation	2026-04-25 03:40:34 +08:00
Xubin Ren	4531167c12	fix(agent): bound remaining memory/history pollution paths from #3412 #3412 stopped the headline raw_archive bloat but left four adjacent leaks on the same pollution chain: - archive() success path appended uncapped LLM summaries to history.jsonl, so a misbehaving LLM could re-open the #3412 bug from the happy path. - maybe_consolidate_by_tokens did not advance last_consolidated when archive() fell back to raw_archive, causing duplicate [RAW] dumps of the same chunk on every subsequent call. - Dream's Phase 1/2 prompt injected MEMORY.md / SOUL.md / USER.md and each history entry without caps, so any legacy oversized record (or an unbounded user edit) would blow past the context window every dream. - append_history itself had no default cap, leaving future new callers one forgotten-cap-away from the same vector. Changes: - Cap LLM-produced summaries at 8K chars (_ARCHIVE_SUMMARY_MAX_CHARS) before writing to history.jsonl. - Advance session.last_consolidated after archive() regardless of whether it summarized or raw-archived — both outcomes materialize the chunk; still break the round loop on fallback so a degraded LLM isn't hammered. - Truncate MEMORY.md / SOUL.md / USER.md and each history entry in Dream's Phase 1 prompt preview (Phase 2 still reaches full files via read_file). - Add _HISTORY_ENTRY_HARD_CAP (64K) as belt-and-suspenders default in append_history with a once-per-store warning, so any new caller that forgets its own tighter cap gets caught and observable. Layer the caps by scope: raw_archive=16K, archive summary=8K, append_history default=64K. Tight per-caller values cover expected payloads; the wide default only catches regressions. Tests: +9 regression tests covering each fix. Full suite: 2372 passed. Made-with: Cursor	2026-04-24 04:17:19 +08:00
Xubin Ren	81a5af2352	test(consolidation): add regression tests for tiktoken truncation path and history char cap Cover two untested boundaries from #3412: - _truncate_to_token_budget with positive budget exercises tiktoken - _MAX_HISTORY_CHARS caps Recent History section in system prompt Made-with: Cursor	2026-04-24 03:57:59 +08:00
chengyongru	2848f69897	fix(agent): prevent history.jsonl bloat from raw_archive and stuck consolidation Root cause: when consolidation LLM fails, raw_archive() dumped full message content (~1MB) into history.jsonl with no size limit. Since build_system_prompt() injects history.jsonl into every system prompt, all subsequent LLM calls exceeded the 200K context window with error 1261. Additionally, _cap_consolidation_boundary's 60-message cap caused consolidation to get stuck on sessions with long tool chains (200+ iterations), triggering the raw_archive fallback in the first place. Three-layer fix: - Remove _cap_consolidation_boundary: let pick_consolidation_boundary drive chunk sizing based solely on token budget - Truncate archive() input: use tiktoken to cap formatted text to the model's input token budget before sending to consolidation LLM - Truncate raw_archive() output: cap history.jsonl entries at 16K chars	2026-04-24 03:57:59 +08:00
Xubin Ren	469fc90fe6	fix(agent): on_progress tool_events only when callback accepts; align progress tests with main Made-with: Cursor	2026-04-23 20:06:11 +08:00
Pablo Cabeza	c23d719780	feat(agent): emit structured _tool_events progress metadata Extend the existing on_progress callback to carry structured tool-event payloads alongside the plain-text hint, so channels can render rich tool execution state (start/finish/error, arguments, results, file attachments) rather than only the pre-formatted hint string. Changes ------- - AgentLoop._tool_event_start_payload() — builds a version-1 start payload from a ToolCallRequest - AgentLoop._tool_event_result_extras() — extracts files/embeds from a tool result dict - AgentLoop._tool_event_finish_payloads() — maps tool_calls + tool_results + tool_events from AgentHookContext into finish payloads - _LoopHook.before_execute_tools() — passes tool_events=[...] to on_progress together with the existing tool_hint flag - _LoopHook.after_iteration() — emits a second on_progress call with the finish payloads once tool results are available - _bus_progress() — forwards tool_events as _tool_events in OutboundMessage metadata so channel implementations can read them - on_progress type widened to Callable[..., Awaitable[None]] on all public entry points; _cli_progress updated to accept and ignore tool_events The contract is additive: callers that only accept (content, *, tool_hint) continue to work unchanged. Callers that also accept tool_events receive the structured data. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 20:06:11 +08:00
Bongjin Lee	93ca791ac6	fix(discord): full thread support with session isolation and allowlist enforcement Discord threads use their own channel IDs, so allowChannels was blocking thread replies unless each thread ID was listed explicitly. - Include the thread parent channel ID as an allowlist candidate - Enforce allow_channels on slash commands (previously bypassed) - Show parent channel ID in runtime context, reply to the thread - Fix subagent cancel key via effective_key propagation - Detect bot mentions via raw_mentions and reply-to-bot references - Cache seen thread channels for outbound delivery - Ignore system messages that become empty prompts	2026-04-23 04:05:39 +09:00
Xubin Ren	61a28c2c0a	feat(webui): support image uploads in composer and message bubbles	2026-04-23 00:07:27 +08:00
chengyongru	42c4af2118	fix(agent): prevent duplicate responses when sub-agents complete concurrently When the main agent spawns multiple sub-agents, each completion independently triggered a new _dispatch, causing 3-4 user-visible responses instead of a single comprehensive report. - Extend _drain_pending to block-wait on pending_queue when sub-agents are still running, keeping the runner loop alive for in-order injection - Pass pending_queue in the system message path so subsequent sub-agent results can still be injected mid-turn via a new dispatch	2026-04-22 20:02:19 +08:00
chengyongru	e15705b471	fix(tests): add _cancel_active_tasks mock to cmd_new test fixtures The existing test_unified_session tests construct a SimpleNamespace loop mock that now needs _cancel_active_tasks since cmd_new calls it.	2026-04-21 21:50:37 +08:00
Xubin Ren	c1957e14ff	refactor(memory): centralize cursor validation behind a single gate Move the non-int cursor guard out of the two consumer sites and into a shared ``_iter_valid_entries`` iterator so the invariant lives in one place. Closes three gaps left by the original fix: * ``bool`` is now rejected — ``isinstance(True, int)`` is ``True`` in Python, so the previous guard silently treated ``{"cursor": true}`` as cursor ``1``. * Recovery now returns ``max(valid cursors) + 1``. Under adversarial corruption "first int scanning in reverse" is not the same thing, and only ``max`` keeps the recovered cursor strictly greater than every legitimate cursor still on disk. * Non-int cursors are logged exactly once per ``MemoryStore``. Silently dropping corrupted entries hides the root cause (an external writer to ``memory/history.jsonl``); rate-limiting keeps the log clean when the same poisoned file is read every turn. All 7 tests from the original fix pass unchanged; 3 new tests pin the invariants above. Made-with: Cursor	2026-04-21 14:02:53 +08:00
Muata Kamdibe	c0a11c7cf4	fix(memory): harden cursor recovery against non-integer corruption _next_cursor now checks isinstance(cursor, int) before arithmetic, falling back to a reverse scan of all entries when the last entry's cursor is corrupted. read_unprocessed_history skips entries with non-int cursors instead of crashing on comparison. Root cause: external callers (cron jobs, plugins) occasionally wrote string cursors to history.jsonl, which blocked all subsequent append_history calls with TypeError/ValueError. Includes 7 regression tests covering string, float, null, and list cursor types.	2026-04-21 14:02:53 +08:00
Xubin Ren	82aa9efc02	test(mcp): pin CancelledError short-circuits the retry loop The retry branch is only reachable via `except Exception`, and `CancelledError` inherits from `BaseException`, so today it naturally bypasses the retry path and /stop still works. Add one focused regression test so any future refactor that widens the retry catch to `BaseException`, re-orders the handlers, or adds `CancelledError` to `_TRANSIENT_EXC_NAMES` fails CI instead of silently swallowing /stop. Made-with: Cursor	2026-04-21 13:24:40 +08:00
hussein1362	368752e707	fix(mcp): retry once on transient connection errors When an MCP server restarts or a network connection drops between tool calls, the existing session throws ClosedResourceError, BrokenPipeError, ConnectionResetError, etc. Currently these are caught as generic exceptions and returned as permanent failures to the LLM, which then tells the user 'my tools are broken.' This change adds a single automatic retry with a 1-second backoff for transient connection-class errors in MCPToolWrapper, MCPResourceWrapper, and MCPPromptWrapper. Non-transient errors (ValueError, RuntimeError, McpError, etc.) are not retried. The retry is conservative: - Only 1 retry (not configurable, to keep the change minimal) - Only for a specific set of connection-class exceptions - Matched by exception class name to avoid importing anyio/etc. - 1s sleep between attempts to allow the server to recover - Clear logging distinguishes retried vs permanent failures In production this eliminates most 'MCP tool call failed: ClosedResourceError' noise when MCP bridge processes restart (e.g. after config changes or OOM kills). Tests: 22 new tests covering retry, exhaustion, non-transient bypass, timeout bypass, and all three wrapper types.	2026-04-21 13:24:40 +08:00
Xubin Ren	00de55072d	test(agent): exercise /stop cancellation through _dispatch Add a regression test that actually runs the CancelledError branch of AgentLoop._dispatch end-to-end and asserts the in-flight checkpoint is materialized into session.messages before the cancellation unwinds. The three existing tests call _restore_runtime_checkpoint directly, so they pass even if the cancel-time restore is ever removed from _dispatch. This new test is the one that actually locks the fix in place. Made-with: Cursor	2026-04-21 01:14:41 +08:00

1 2 3

146 Commits