nanobot

mirror of https://github.com/HKUDS/nanobot.git synced 2026-05-20 00:22:31 +00:00

Author	SHA1	Message	Date
Xubin Ren	7c29a738a5	test(long-task): expect wrapped completion message after validation Align assertions with LongTaskTool final return shape on main. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-13 17:45:06 +00:00
Xubin Ren	78e8cc3e55	fix(long-task): honor final signal and file tracking Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-13 16:53:58 +00:00
chengyongru	5acae58a13	test(long-task): add boundary tests and fix race conditions - Add 7 edge-case tests: validation crash resilience, hook exception safety, mid-run correction injection, FIFO correction ordering, explicit file changes overriding auto-detection, final budget for max_steps=1, and dynamic budget switching boundaries - Fix assertion in test_long_task_completes_after_multiple_handoffs to match exact prompt format - Remove asyncio timing hack from test_state_exposure - Add asyncio.sleep(0) yield in test_inject_correction_during_execution to prevent race between signal injection and step continuation - All 34 tests passing	2026-05-13 01:26:01 +08:00
chengyongru	78ecb2a99a	feat(long-task): major overhaul with structured handoffs, validation, and observability - Structured HandoffState: HandoffTool now accepts files_created, files_modified, next_step_hint, and verification fields instead of a plain string. Progress is passed between steps as structured data. - Completion validation round: After complete() is called, a dedicated validator step runs to verify the claim against the original goal. If validation fails, the task continues rather than returning a false completion. - Dynamic prompt system: 3 Jinja2 templates (step_start, step_middle, step_final) selected based on step number. Final steps get tighter budget and stronger "wrap up" guidance. - Automatic file change tracking: Extracts write_file/edit_file events from tool_events and injects them into the next step's context if the subagent forgot to report them explicitly. - Budget tracking & adaptive strategy: Cumulative token usage is tracked across steps. Per-step tool budget drops from 8 to 4 in the last two steps to force handoff/completion. - Crash retry with graceful degradation: A step that crashes is retried once. Persistent crashes terminate the task and return partial progress. - Full observability hooks for future WebUI integration: - set_hooks() with on_step_start, on_step_complete, on_handoff, on_validation_started, on_validation_passed, on_validation_failed, on_task_complete, on_task_error, and catch-all on_event. - Readable state properties: current_step, total_steps, status, last_handoff, cumulative_usage, goal. - inject_correction() allows external code to send user corrections that are injected into the next step's prompt. - run_step() accepts optional max_iterations for dynamic budget control. All 27 long-task tests and 11 subagent tests pass.	2026-05-13 00:55:52 +08:00
chengyongru	e7214d96ed	fix(long-task): add debug logging for step-level observability	2026-05-12 23:37:00 +08:00
chengyongru	bf5762a3d4	feat(long-task): add LongTaskTool for multi-step agent tasks Implements a meta-ReAct loop where long-running tasks are broken into sequential subagent steps, each starting fresh with the original goal and progress from the previous step. This prevents context drift when agents work on complex, multi-step tasks. - Extract build_tool_registry() from SubagentManager for reuse - Add run_step() for synchronous subagent execution (no bus announcement) - Add HandoffTool and CompleteTool as signal mechanisms via shared dict - Add LongTaskTool orchestrator with simplified prompt (8 iterations/step) - Register LongTaskTool in main agent loop - Add _extract_handoff_from_messages fallback for robustness	2026-05-12 23:37:00 +08:00
chengyongru	043f0e67f7	feat(tools): introduce plugin-based tool discovery and runtime context protocol This commit implements a progressive refactoring of the tool system to support plugin discovery, scoped loading, and protocol-driven runtime context injection. Key changes: - Add Tool ABC metadata (tool_name, _scopes) and ToolContext dataclass for dependency injection. - Introduce ToolLoader with pkgutil-based builtin discovery and entry_points-based third-party plugin loading. - Add scope filtering (core/subagent/memory) so different contexts load appropriate tool sets. - Introduce ContextAware protocol and RequestContext dataclass to replace hardcoded per-tool context injection in AgentLoop. - Add RuntimeState / MutableRuntimeState protocols to decouple MyTool from AgentLoop. - Migrate all built-in tools to declare scopes and implement create()/enabled() hooks. - Migrate MessageTool, SpawnTool, CronTool, and MyTool to ContextAware. - Refactor AgentLoop to use ToolLoader and protocol-driven context injection. - Refactor SubagentManager to use ToolLoader(scope="subagent") with per-run FileStates isolation. - Register all built-in tools via pyproject.toml entry_points. - Add comprehensive tests for loader scopes, entry_points, ContextAware, subagent tools, and runtime state sync.	2026-05-12 11:28:20 +08:00
chengyongru	c30e4d86f3	refactor(agent): simplify subagent concurrency with rejection over semaphore Replace the asyncio.Semaphore queueing approach with a simple count check in SpawnTool.execute(). When the concurrency limit is reached, the tool returns an error string so the agent can perceive the reason and adjust its behavior instead of silently queueing. - Remove max_concurrent_subagents parameter threading through AgentLoop, commands.py, and nanobot.py - SubagentManager reads the limit directly from AgentDefaults - SpawnTool checks get_running_count() before calling spawn() - Simplify tests to verify rejection behavior	2026-05-05 22:22:04 +08:00
hanyuanling	3c20d16117	fix subagent max iteration limit	2026-04-30 13:45:40 +08:00
chengyongru	42c4af2118	fix(agent): prevent duplicate responses when sub-agents complete concurrently When the main agent spawns multiple sub-agents, each completion independently triggered a new _dispatch, causing 3-4 user-visible responses instead of a single comprehensive report. - Extend _drain_pending to block-wait on pending_queue when sub-agents are still running, keeping the runner loop alive for in-order injection - Pass pending_queue in the system message path so subsequent sub-agent results can still be injected mid-turn via a new dispatch	2026-04-22 20:02:19 +08:00
JunghwanNA	34fccb2ee9	Prevent self-inspection from leaking configured secrets MyTool blocks direct access to sensitive nested paths, but its formatter still printed scalar fields for small config objects. That let `my(action="check", key="web_config.search")` expose `api_key` in plain text even though the docs promise sensitive sub-fields are protected. This keeps the change narrow: sensitive nested config fields are omitted from MyTool's formatted output, and regression coverage locks the behavior in. Constraint: Must preserve existing read-only inspection behavior for non-sensitive fields Constraint: Keep scope limited to MyTool rather than introducing broader redaction plumbing Rejected: Rework global context/tool redaction around MyTool \| broader than needed for the leak path Confidence: high Scope-risk: narrow Reversibility: clean Directive: If more nested config rendering is added later, filter sensitive field names at the formatter boundary as well as the path resolver Tested: PYTHONPATH=$PWD pytest -q tests/agent/tools/test_self_tool.py /Users/jh0927/Workspace/nanobot-validation-artifacts-2026-04-18/test_my_tool_secret_leak_regression.py Not-tested: Full repository test suite Related: #3259	2026-04-18 00:59:08 +08:00
Mariano Campo	d0e65ebf70	fix(exec): pass allowed_env_keys to exec tool calls in subagents	2026-04-17 16:32:25 +08:00
Xubin Ren	90b7d940e8	refactor(config): nest MyTool settings under tools.my (with legacy-key migration)	2026-04-16 15:58:20 +00:00
chengyongru	b51da93cbb	feat(agent): add SelfTool for runtime self-inspection and configuration Add a built-in tool that lets the agent inspect and modify its own runtime state (model, iterations, context window, etc.). Key features: - inspect: view current config, usage stats, and subagent status - modify: adjust parameters at runtime (protected by type/range validation) - Subagent observability: inspect running subagent tasks (phase, iteration, tool events, errors) — subagents are no longer a black box - Watchdog corrects out-of-bounds values on each iteration - Enabled by default in read-only mode (self_modify: false) - All changes are in-memory only; restart restores defaults - Comprehensive test suite (90 tests) Includes a self-awareness skill (always-on) with progressive disclosure: SKILL.md for core rules, references/examples.md for detailed scenarios.	2026-04-16 23:44:26 +08:00

14 Commits