nanobot

mirror of https://github.com/HKUDS/nanobot.git synced 2026-06-15 07:14:08 +00:00

Author	SHA1	Message	Date
Xubin Ren	403ce23d22	fix(agent): tighten ask_user CLI handling Made-with: Cursor	2026-04-25 22:10:19 +08:00
Xubin Ren	3b1ea99ee1	fix(agent): render ask_user options without buttons Made-with: Cursor	2026-04-25 22:10:19 +08:00
Xubin Ren	cfc76ffbbf	feat(agent): add ask_user tool Made-with: Cursor	2026-04-25 22:10:19 +08:00
yorkhellen	076e4166d7	fix(agent): add LLM request timeout to prevent session lock starvation	2026-04-25 03:40:34 +08:00
Xubin Ren	4531167c12	fix(agent): bound remaining memory/history pollution paths from #3412 #3412 stopped the headline raw_archive bloat but left four adjacent leaks on the same pollution chain: - archive() success path appended uncapped LLM summaries to history.jsonl, so a misbehaving LLM could re-open the #3412 bug from the happy path. - maybe_consolidate_by_tokens did not advance last_consolidated when archive() fell back to raw_archive, causing duplicate [RAW] dumps of the same chunk on every subsequent call. - Dream's Phase 1/2 prompt injected MEMORY.md / SOUL.md / USER.md and each history entry without caps, so any legacy oversized record (or an unbounded user edit) would blow past the context window every dream. - append_history itself had no default cap, leaving future new callers one forgotten-cap-away from the same vector. Changes: - Cap LLM-produced summaries at 8K chars (_ARCHIVE_SUMMARY_MAX_CHARS) before writing to history.jsonl. - Advance session.last_consolidated after archive() regardless of whether it summarized or raw-archived — both outcomes materialize the chunk; still break the round loop on fallback so a degraded LLM isn't hammered. - Truncate MEMORY.md / SOUL.md / USER.md and each history entry in Dream's Phase 1 prompt preview (Phase 2 still reaches full files via read_file). - Add _HISTORY_ENTRY_HARD_CAP (64K) as belt-and-suspenders default in append_history with a once-per-store warning, so any new caller that forgets its own tighter cap gets caught and observable. Layer the caps by scope: raw_archive=16K, archive summary=8K, append_history default=64K. Tight per-caller values cover expected payloads; the wide default only catches regressions. Tests: +9 regression tests covering each fix. Full suite: 2372 passed. Made-with: Cursor	2026-04-24 04:17:19 +08:00
Xubin Ren	81a5af2352	test(consolidation): add regression tests for tiktoken truncation path and history char cap Cover two untested boundaries from #3412: - _truncate_to_token_budget with positive budget exercises tiktoken - _MAX_HISTORY_CHARS caps Recent History section in system prompt Made-with: Cursor	2026-04-24 03:57:59 +08:00
chengyongru	2848f69897	fix(agent): prevent history.jsonl bloat from raw_archive and stuck consolidation Root cause: when consolidation LLM fails, raw_archive() dumped full message content (~1MB) into history.jsonl with no size limit. Since build_system_prompt() injects history.jsonl into every system prompt, all subsequent LLM calls exceeded the 200K context window with error 1261. Additionally, _cap_consolidation_boundary's 60-message cap caused consolidation to get stuck on sessions with long tool chains (200+ iterations), triggering the raw_archive fallback in the first place. Three-layer fix: - Remove _cap_consolidation_boundary: let pick_consolidation_boundary drive chunk sizing based solely on token budget - Truncate archive() input: use tiktoken to cap formatted text to the model's input token budget before sending to consolidation LLM - Truncate raw_archive() output: cap history.jsonl entries at 16K chars	2026-04-24 03:57:59 +08:00
Xubin Ren	469fc90fe6	fix(agent): on_progress tool_events only when callback accepts; align progress tests with main Made-with: Cursor	2026-04-23 20:06:11 +08:00
Pablo Cabeza	c23d719780	feat(agent): emit structured _tool_events progress metadata Extend the existing on_progress callback to carry structured tool-event payloads alongside the plain-text hint, so channels can render rich tool execution state (start/finish/error, arguments, results, file attachments) rather than only the pre-formatted hint string. Changes ------- - AgentLoop._tool_event_start_payload() — builds a version-1 start payload from a ToolCallRequest - AgentLoop._tool_event_result_extras() — extracts files/embeds from a tool result dict - AgentLoop._tool_event_finish_payloads() — maps tool_calls + tool_results + tool_events from AgentHookContext into finish payloads - _LoopHook.before_execute_tools() — passes tool_events=[...] to on_progress together with the existing tool_hint flag - _LoopHook.after_iteration() — emits a second on_progress call with the finish payloads once tool results are available - _bus_progress() — forwards tool_events as _tool_events in OutboundMessage metadata so channel implementations can read them - on_progress type widened to Callable[..., Awaitable[None]] on all public entry points; _cli_progress updated to accept and ignore tool_events The contract is additive: callers that only accept (content, *, tool_hint) continue to work unchanged. Callers that also accept tool_events receive the structured data. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 20:06:11 +08:00
Xubin Ren	61a28c2c0a	feat(webui): support image uploads in composer and message bubbles	2026-04-23 00:07:27 +08:00
chengyongru	42c4af2118	fix(agent): prevent duplicate responses when sub-agents complete concurrently When the main agent spawns multiple sub-agents, each completion independently triggered a new _dispatch, causing 3-4 user-visible responses instead of a single comprehensive report. - Extend _drain_pending to block-wait on pending_queue when sub-agents are still running, keeping the runner loop alive for in-order injection - Pass pending_queue in the system message path so subsequent sub-agent results can still be injected mid-turn via a new dispatch	2026-04-22 20:02:19 +08:00
chengyongru	e15705b471	fix(tests): add _cancel_active_tasks mock to cmd_new test fixtures The existing test_unified_session tests construct a SimpleNamespace loop mock that now needs _cancel_active_tasks since cmd_new calls it.	2026-04-21 21:50:37 +08:00
Xubin Ren	c1957e14ff	refactor(memory): centralize cursor validation behind a single gate Move the non-int cursor guard out of the two consumer sites and into a shared ``_iter_valid_entries`` iterator so the invariant lives in one place. Closes three gaps left by the original fix: * ``bool`` is now rejected — ``isinstance(True, int)`` is ``True`` in Python, so the previous guard silently treated ``{"cursor": true}`` as cursor ``1``. * Recovery now returns ``max(valid cursors) + 1``. Under adversarial corruption "first int scanning in reverse" is not the same thing, and only ``max`` keeps the recovered cursor strictly greater than every legitimate cursor still on disk. * Non-int cursors are logged exactly once per ``MemoryStore``. Silently dropping corrupted entries hides the root cause (an external writer to ``memory/history.jsonl``); rate-limiting keeps the log clean when the same poisoned file is read every turn. All 7 tests from the original fix pass unchanged; 3 new tests pin the invariants above. Made-with: Cursor	2026-04-21 14:02:53 +08:00
Muata Kamdibe	c0a11c7cf4	fix(memory): harden cursor recovery against non-integer corruption _next_cursor now checks isinstance(cursor, int) before arithmetic, falling back to a reverse scan of all entries when the last entry's cursor is corrupted. read_unprocessed_history skips entries with non-int cursors instead of crashing on comparison. Root cause: external callers (cron jobs, plugins) occasionally wrote string cursors to history.jsonl, which blocked all subsequent append_history calls with TypeError/ValueError. Includes 7 regression tests covering string, float, null, and list cursor types.	2026-04-21 14:02:53 +08:00
Xubin Ren	82aa9efc02	test(mcp): pin CancelledError short-circuits the retry loop The retry branch is only reachable via `except Exception`, and `CancelledError` inherits from `BaseException`, so today it naturally bypasses the retry path and /stop still works. Add one focused regression test so any future refactor that widens the retry catch to `BaseException`, re-orders the handlers, or adds `CancelledError` to `_TRANSIENT_EXC_NAMES` fails CI instead of silently swallowing /stop. Made-with: Cursor	2026-04-21 13:24:40 +08:00
hussein1362	368752e707	fix(mcp): retry once on transient connection errors When an MCP server restarts or a network connection drops between tool calls, the existing session throws ClosedResourceError, BrokenPipeError, ConnectionResetError, etc. Currently these are caught as generic exceptions and returned as permanent failures to the LLM, which then tells the user 'my tools are broken.' This change adds a single automatic retry with a 1-second backoff for transient connection-class errors in MCPToolWrapper, MCPResourceWrapper, and MCPPromptWrapper. Non-transient errors (ValueError, RuntimeError, McpError, etc.) are not retried. The retry is conservative: - Only 1 retry (not configurable, to keep the change minimal) - Only for a specific set of connection-class exceptions - Matched by exception class name to avoid importing anyio/etc. - 1s sleep between attempts to allow the server to recover - Clear logging distinguishes retried vs permanent failures In production this eliminates most 'MCP tool call failed: ClosedResourceError' noise when MCP bridge processes restart (e.g. after config changes or OOM kills). Tests: 22 new tests covering retry, exhaustion, non-transient bypass, timeout bypass, and all three wrapper types.	2026-04-21 13:24:40 +08:00
Xubin Ren	00de55072d	test(agent): exercise /stop cancellation through _dispatch Add a regression test that actually runs the CancelledError branch of AgentLoop._dispatch end-to-end and asserts the in-flight checkpoint is materialized into session.messages before the cancellation unwinds. The three existing tests call _restore_runtime_checkpoint directly, so they pass even if the cancel-time restore is ever removed from _dispatch. This new test is the one that actually locks the fix in place. Made-with: Cursor	2026-04-21 01:14:41 +08:00
hussein1362	847c50b2de	fix(loop): preserve partial context when /stop cancels a task When a user sends /stop to interrupt an active agent turn, the task is cancelled via CancelledError. Previously, the cancellation handler just logged and re-raised, discarding any tool results and assistant messages accumulated during the interrupted turn. The runtime checkpoint mechanism already persists partial turn state (assistant messages, completed tool results, pending tool calls) into session metadata via _emit_checkpoint. However, this checkpoint was only materialized into session history on the NEXT incoming message via _restore_runtime_checkpoint — not at cancellation time. Now the CancelledError handler in _dispatch calls _restore_runtime_checkpoint immediately, so the partial context is preserved in session history. This means the next message the user sends will see all the work that was done before /stop, rather than starting from scratch. Fixes #2966 Includes 3 tests verifying checkpoint restoration on cancellation.	2026-04-21 01:14:41 +08:00
hlg	899a9073ce	fix(memory): do not fall back to raw entry when strip_think empties it `append_history` previously used `strip_think(entry) or entry.rstrip()` as a safety net, so if the entire entry was a template-token leak (e.g. `<think>reasoning</think>` or `<channel\|>` alone), the raw leaked text was still persisted to history — later re-introducing the very content `strip_think` was meant to scrub, via consolidation / replay. Persist the cleaned content directly. When cleanup empties a non-empty entry, log at debug and store an empty-content record (cursor continuity preserved). Adds 3 regression tests in test_memory_store.py covering: - Well-formed thinking blocks are stripped before persistence. - Pure-leak entries persist as empty, not as raw text. - Malformed prefix leaks (`<channel\|>`) also persist as empty.	2026-04-20 17:04:48 +08:00
chengyongru	68466b1c2a	fix(agent): propagate effective session key through subagent pipeline The previous fix hardcoded session_key_override as channel:chat_id which broke unified session mode where pending queues use "unified:default". Propagate the effective key from _set_tool_context through SpawnTool into the origin dict so _announce_result routes to the correct pending queue in both normal and unified session modes.	2026-04-20 14:47:14 +08:00
Xubin Ren	56a779c128	fix(session): repair read-only corrupt session paths	2026-04-20 00:17:50 +08:00
aiguozhi123456	efb04a1712	fix(session): use atomic writes and add corrupt-file repair SessionManager.save() previously used bare open("w") which could truncate the JSONL file if the process crashed mid-write. Now writes to a .tmp file and atomically replaces via os.replace(), matching the pattern already used in qq.py. _load() now attempts _repair() before returning None, recovering valid lines from partially-written files. 12 new tests cover atomic save correctness, temp-file cleanup on failure, and repair of truncated/corrupt JSONL. cowork-with:opencode(glm-5.1)	2026-04-20 00:17:50 +08:00
Xubin Ren	c4b3837c5f	Merge remote-tracking branch 'origin/main' into nanobot-webui	2026-04-19 12:36:52 +00:00
Xubin Ren	ccd6c05f71	fix: include pending summaries in consolidation estimates Made-with: Cursor	2026-04-19 20:06:11 +08:00
Xubin Ren	54b659929e	test: cover summary persistence after token consolidation Made-with: Cursor	2026-04-19 20:06:11 +08:00
Xubin Ren	1b211c7d3a	Merge branch 'main' into nanobot-webui Made-with: Cursor	2026-04-18 19:17:16 +00:00
Xubin Ren	9ed3031a42	feat(webui): add initial webui with websocket chat flow	2026-04-18 18:51:53 +00:00
chengyongru	5818569e8f	feat(wizard): auto-detect Literal fields as select menus Literal["standard", "persistent"] fields are now rendered as select dropdowns instead of free-text input. This makes provider_retry_mode and any future Literal fields self-documenting in the wizard.	2026-04-18 21:56:10 +08:00
chengyongru	ebb5179cab	feat(wizard): add Channel Common, API Server menus and field constraint validation - Add [H] Channel Common menu to configure send_progress, send_tool_hints, send_max_retries, and transcription_provider - Add [I] API Server menu to configure host, port, timeout - Add real-time Pydantic field constraint validation (ge/gt/le/lt/min_length/max_length) with constraint hints shown in field display (e.g. "Send Max Retries (0-10)") - Add _pause() to View Configuration Summary to prevent immediate screen clear - Fix _format_value dict branch to handle BaseModel instances without crashing	2026-04-18 21:56:10 +08:00
chengyongru	34e8f97b1f	refactor(templates): separate identity and SOUL responsibilities Move all behavioral instructions out of identity.md into SOUL.md so that each file has a single clear purpose: - identity.md: capability facts only (runtime, workspace, format hints, tool guidance, untrusted content warning) - SOUL.md: behavioral rules (name, personality, execution rules) The "Act, don't narrate" rule is refined into layered behavior: act immediately on single-step tasks, plan first for multi-step tasks. This eliminates the contradiction where identity said "never end with a plan" but user SOUL.md said "always plan first".	2026-04-18 21:55:56 +08:00
Xubin Ren	70a1279b86	test: pin retry-wait callback routing so internal heartbeats stay off channels Add two focused regression tests for the retry-wait leak this PR fixes: - tests/agent/test_runner.py::test_runner_binds_on_retry_wait_to_retry_callback_not_progress locks in that `AgentRunSpec.retry_wait_callback` (not `progress_callback`) is what `_build_request_kwargs` forwards to the provider as `on_retry_wait`. - tests/channels/test_channel_manager_delta_coalescing.py::TestRetryWaitFiltering runs `_dispatch_outbound` end-to-end and asserts that `_retry_wait: True` messages never reach channel send. Both tests fail on origin/main and pass with this PR's fix applied. Made-with: Cursor	2026-04-18 13:50:05 +08:00
Xubin Ren	c8d834a504	fix(loop): document subagent-followup persistence and guard empty content - Add inline rationale for persisting before ContextBuilder and for passing current_message="" on subagent follow-ups (avoids double-projection after merge). - Skip persistence for empty subagent content (no-op messages should not pollute history). - Add regression test covering the empty-content guard. Made-with: Cursor	2026-04-18 13:30:22 +08:00
xzq.xu	1c939e8a5f	fix(loop): persist subagent follow-up events in history	2026-04-18 13:30:22 +08:00
JunghwanNA	34fccb2ee9	Prevent self-inspection from leaking configured secrets MyTool blocks direct access to sensitive nested paths, but its formatter still printed scalar fields for small config objects. That let `my(action="check", key="web_config.search")` expose `api_key` in plain text even though the docs promise sensitive sub-fields are protected. This keeps the change narrow: sensitive nested config fields are omitted from MyTool's formatted output, and regression coverage locks the behavior in. Constraint: Must preserve existing read-only inspection behavior for non-sensitive fields Constraint: Keep scope limited to MyTool rather than introducing broader redaction plumbing Rejected: Rework global context/tool redaction around MyTool \| broader than needed for the leak path Confidence: high Scope-risk: narrow Reversibility: clean Directive: If more nested config rendering is added later, filter sensitive field names at the formatter boundary as well as the path resolver Tested: PYTHONPATH=$PWD pytest -q tests/agent/tools/test_self_tool.py /Users/jh0927/Workspace/nanobot-validation-artifacts-2026-04-18/test_my_tool_secret_leak_regression.py Not-tested: Full repository test suite Related: #3259	2026-04-18 00:59:08 +08:00
Cheng Yongru	aabc3d5017	fix(memory): fall back to raw_archive on LLM error response When chat_with_retry returns an error response (finish_reason='error') instead of raising an exception, archive() previously treated the error message as a valid summary and wrote it to history.jsonl, while the original session data was already cleared by /new — causing irreversible data loss. Fix: check finish_reason after the LLM call and raise RuntimeError on error responses, which naturally falls through to the existing raw_archive fallback. This preserves the original messages in history.jsonl instead of losing them. Fixes #3244	2026-04-17 20:15:07 +08:00
Mariano Campo	d0e65ebf70	fix(exec): pass allowed_env_keys to exec tool calls in subagents	2026-04-17 16:32:25 +08:00
chengyongru	8c0c4e5b31	refactor(agent): tighten comments, extract constant, strengthen edge case test - Extract synthetic user message string to module-level constant - Tighten comments in _snip_history recovery branch - Strengthen no-user edge case test to verify safety net interaction	2026-04-17 16:20:53 +08:00
chengyongru	44b526c4ee	fix(agent): preserve user message in _snip_history to prevent GLM error 1214 When _snip_history truncates the message history and the only user message ends up outside the kept window, providers like GLM reject the resulting system→assistant sequence with error 1214 ("messages 参数非法"). Two-layer fix: 1. _snip_history now walks backwards through non_system messages to recover the nearest user message when none exists in the kept window. 2. _enforce_role_alternation inserts a synthetic user message "(conversation continued)" when the first non-system message is a bare assistant (no tool_calls), serving as a safety net for any edge cases that slip through. Co-authored-by: darlingbud <darlingbud@users.noreply.github.com>	2026-04-17 16:20:53 +08:00
Xubin Ren	cc5a666d5d	review(dream): harden line-age annotation per review feedback Follow-up to #3212, fully backward compatible: - Extract the 14-day staleness threshold as `_STALE_THRESHOLD_DAYS` module constant and pass it into the Phase 1 prompt template as `{{ stale_threshold_days }}`. The number lived in three places before (code threshold, prompt instruction, docstring); now there is one. - Add `DreamConfig.annotate_line_ages` (default True = current behavior) and propagate it through `Dream.__init__` and the gateway wiring in cli/commands.py. Gives users a knob to disable the feature without a code patch if an LLM reacts poorly to the `← Nd` suffix. - Harden `_annotate_with_ages` against dirty working trees: when HEAD blob line count disagrees with the working-tree content length, skip annotation entirely instead of assigning ages to the wrong lines. The previous `i >= len(ages)` guard only handled one direction of the mismatch. - Inline-comment the `max_iterations` 10→15 bump with a pointer to exp002 so future blame has context. - Add 4 regression tests: end-to-end `← 30d` reaches prompt, 14/15 threshold boundary, `annotate_line_ages=False` bypasses git entirely (verified via `assert_not_called`), length-mismatch defense, and template-var rendering. Made-with: Cursor	2026-04-17 13:45:38 +08:00
chengyongru	35f3084c03	feat(dream): per-line age annotations + dedup-aware prompt + max_iter=15 Three improvements to Dream's memory consolidation: 1. Per-line git-blame age annotations: MEMORY.md lines get `← Nd` suffixes (N>14) from dulwich annotate. SOUL.md/USER.md excluded as permanent. LLM uses content judgment, not just age, to decide what to prune. 2. Dedup-aware Phase 1 prompt: reframed as dual-task (extract facts + deduplicate existing files) with explicit redundancy patterns to scan for. Validated through 20 experiments (exp-002 prompt + max_iter=15 was best, averaging -1643 chars/5.4% compression per run). 3. Phase 1 analysis as commit body: dream git commits now include the full Phase 1 analysis for transparency via /dream-log. 4. max_iterations raised from 10 to 15: 30% improvement over 10 with no risk; 20 showed diminishing returns (exp-020: -701 vs exp-017: -1643).	2026-04-17 13:45:38 +08:00
Xubin Ren	90b7d940e8	refactor(config): nest MyTool settings under tools.my (with legacy-key migration)	2026-04-16 15:58:20 +00:00
chengyongru	b51da93cbb	feat(agent): add SelfTool for runtime self-inspection and configuration Add a built-in tool that lets the agent inspect and modify its own runtime state (model, iterations, context window, etc.). Key features: - inspect: view current config, usage stats, and subagent status - modify: adjust parameters at runtime (protected by type/range validation) - Subagent observability: inspect running subagent tasks (phase, iteration, tool events, errors) — subagents are no longer a black box - Watchdog corrects out-of-bounds values on each iteration - Enabled by default in read-only mode (self_modify: false) - All changes are in-memory only; restart restores defaults - Comprehensive test suite (90 tests) Includes a self-awareness skill (always-on) with progressive disclosure: SKILL.md for core rules, references/examples.md for detailed scenarios.	2026-04-16 23:44:26 +08:00
Xubin Ren	92a5125108	Merge PR #3141 : fix(skills): use yaml.safe_load for frontmatter parsing to handle multiline descriptions fix(skills): use yaml.safe_load for frontmatter parsing to handle multiline descriptions	2026-04-16 20:07:15 +08:00
chengyongru	d64e963258	test(memory): add regression tests for missing cursor key Cover read_unprocessed_history skipping cursorless entries and _next_cursor safe fallback when last entry has no cursor.	2026-04-16 12:32:38 +08:00
chengyongru	015833e34b	Merge branch 'main' into fix/skills-yaml-frontmatter	2026-04-15 16:56:23 +08:00
chengyongru	6fbada5363	refactor(context): deduplicate system prompt — markdown skills index, skip template MEMORY.md - Convert skills summary from verbose XML (4-5 lines/skill) to compact markdown list (1 line/skill) with inline path for read_file lookup - Exclude always-loaded skills (e.g. memory) from the skills index to avoid duplicating content already in the Active Skills section - Skip injecting the Memory section when MEMORY.md still matches the bundled template (i.e. Dream hasn't populated it yet)	2026-04-15 15:49:30 +08:00
yanghan-cyber	a1b544fd23	fix(skills): use yaml.safe_load for frontmatter parsing to handle multiline descriptions The hand-rolled line-by-line YAML parser treated each line independently, so YAML multiline scalars (folded `>` and literal `\|`) were captured as the literal characters ">" or "\|" instead of the actual text content.	2026-04-14 15:29:59 +08:00
yeyitech	65a15f39ee	test(loop): cover /stop checkpoint recovery	2026-04-14 14:15:22 +08:00
yeyitech	ee061f0595	fix(web): serialize duckduckgo search calls	2026-04-14 14:10:06 +08:00
Xubin Ren	a38bc637bd	fix(runner): preserve injection flag after max-iteration drain Keep late follow-up injections observable when they are drained during max-iteration shutdown so loop-level response suppression still makes the right decision. Made-with: Cursor	2026-04-14 00:30:30 +08:00

1 2 3

113 Commits