nanobot

mirror of https://github.com/HKUDS/nanobot.git synced 2026-05-19 16:12:30 +00:00

Author	SHA1	Message	Date
chengyongru	584072cf63	refactor: restrict fallback_models to preset-only and clean up provider factory - Restrict fallback_models to only reference preset names in model_presets. - Add schema validation to reject unknown preset names in fallback_models. - Remove build_provider_for_model() since bare model fallback is no longer supported. - Simplify make_provider_factory() to only look up presets by name. - Update onboard UI to remove "Add custom model" option from fallback chain. - Update tests to use preset names instead of bare model strings in fallback chains. - Fix test imports referencing deleted _make_provider function.	2026-05-08 20:16:06 +08:00
hanyuanling	7c270577e1	Refine fallback routing on model presets	2026-05-08 20:16:06 +08:00
LeftX	2e5930e355	feat: add fallback_models support for automatic model failover When the primary model fails (finish_reason="error" after exhausting provider-level retries), automatically try each model in the configured fallback_models list. Supports cross-provider fallback via a cached provider_factory that resolves the correct provider for each model string. Config: agents.defaults.fallback_models: ["model-b", "provider/model-c"] Changes: - AgentDefaults: add fallback_models field - AgentRunSpec: add fallback_models field - AgentRunner: add provider_factory, _call_provider, _resolve_fallback_provider - AgentLoop: accept and forward fallback_models + provider_factory - nanobot.py: extract _make_provider_for_model, add _make_provider_factory - cli/commands.py: add _make_cli_provider_factory, wire all AgentLoop sites - tests/agent/test_runner_fallback.py: 8 test cases covering primary success, single/multi fallback, cross-provider, no-factory reuse, caching Made-with: Cursor	2026-05-08 20:16:06 +08:00
chengyongru	83f437a088	feat(config): add model preset support for runtime model switching Add ModelPresetConfig schema and model_presets dictionary to config, enabling named bundles of model parameters (model, temperature, max_tokens, reasoning_effort, context_window_tokens) that can be switched atomically at runtime via the self tool.	2026-05-08 20:16:06 +08:00
chengyongru	e34b7fd086	fix(onboard): allow empty strings and falsy values in input fields Fixes two related input-handling bugs in the onboard wizard: 1. _input_text treated "" as None, preventing users from clearing optional string fields or entering empty strings intentionally. 2. _input_model_with_autocomplete used `if value else None`, which discarded falsy values such as empty strings or 0. To support clearing optional string fields, add _is_str_or_none() and normalize empty strings to None inside _configure_pydantic_model only when the field annotation is `str \| None`. Required str fields keep "" as a valid value. Also included: - Remember last selected item in provider/channel/model menus for better UX when configuring multiple items. - Rename _SIMPLE_TYPES and _MENU_DISPATCH to lowercase to follow Python naming conventions (they are local variables, not constants). - Remove unused imports in test file. Extracted from PR #3358.	2026-05-08 13:13:20 +08:00
Xubin Ren	790a03ec28	feat(webui): polish chat layout and titles Align the WebUI sidebar and chat chrome with the updated design, and generate WebUI session titles asynchronously without blocking turns. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-06 22:20:35 +08:00
Tim O'Brien	99209a806d	fix(tool_hints): pass max_length to abbreviate_path for is_path tools The is_path branch in _fmt_known was not passing max_length to abbreviate_path, so read_file, write_file, edit, list_dir, and web_fetch always truncated paths at 40 chars regardless of config. Now all three branches (is_path, is_command, fallback) honor the configured toolHintMaxLength.	2026-05-06 21:18:39 +08:00
Tim O'Brien	daa4a25c9b	feat(config): add toolHintMaxLength to control tool hint truncation Add to config (default: 40, range: 20-500). Controls how many characters of tool hints are shown in progress updates (e.g. '$ cd …/project && npm test'). Set to 120+ to see full commands instead of truncated hints: ```json { "agents": { "defaults": { "toolHintMaxLength": 120 } } } ``` - Thread max_length through format_tool_hints → _fmt_known/_fmt_mcp/_fmt_fallback - Make path abbreviation in _abbreviate_command proportional to max_length - Add TestToolHintMaxLength test class with 5 tests - All 41 existing tests pass	2026-05-06 21:18:39 +08:00
hanyuanling	653de4a7ef	fix(agent): gate provider progress deltas	2026-05-06 21:18:30 +08:00
chengyongru	05e0106592	refactor(logging): preserve tracebacks and add channel context - Preserve tracebacks: logger.error in except blocks → logger.exception - Channel context: BaseChannel injects self.logger = logger.bind(channel=name) - Third-party bridge: redirect_lib_logging() replaces ad-hoc stdlib-to-loguru bridges - Log levels: network timeouts downgraded from ERROR → WARNING - Fix --verbose flag to actually work with loguru (set handler to DEBUG)	2026-05-06 21:17:45 +08:00
Xubin Ren	db14685a69	fix(agent): soften SSRF guard recovery Keep private URL access blocked at the tool boundary, but return a clear non-retryable hint so the agent can recover conversationally instead of aborting the turn. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-06 00:43:00 +08:00
chengyongru	c30e4d86f3	refactor(agent): simplify subagent concurrency with rejection over semaphore Replace the asyncio.Semaphore queueing approach with a simple count check in SpawnTool.execute(). When the concurrency limit is reached, the tool returns an error string so the agent can perceive the reason and adjust its behavior instead of silently queueing. - Remove max_concurrent_subagents parameter threading through AgentLoop, commands.py, and nanobot.py - SubagentManager reads the limit directly from AgentDefaults - SpawnTool checks get_running_count() before calling spawn() - Simplify tests to verify rejection behavior	2026-05-05 22:22:04 +08:00
Xubin Ren	614b21368f	fix(agent): tighten safety guard edge cases Keep the /dev workspace guard exception scoped to the known benign device paths already handled by ExecTool, and add coverage that non-benign /dev targets still get blocked. Also add a streaming regression for tool_error responses so fatal tool failures are delivered by channels instead of being marked as already streamed. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-04 01:25:52 +08:00
Xubin Ren	b8406be215	fix(runner): soft workspace boundary + per-target throttle (#3493 #3599 #3605 ) Replaces PR #3493's blanket fatal abort with a "tell the model + throttle the bypass loop" policy. Workspace-bound rejections are now ordinary recoverable tool errors enriched with a structured "this is a hard policy boundary" instruction; SSRF stays the only marker that aborts the turn. Why the fatal-abort approach broke ---------------------------------- PR #3493 promoted every shell `_guard_command` and filesystem path-resolution rejection to a turn-fatal RuntimeError. Two of those messages (`path outside working dir` and `path traversal detected`) are heuristic substring scans on the raw command, so legitimate commands like `rm <ws>/x.txt 2>/dev/null` or `find . -type f` killed the user's turn (#3599). On channels with outbound dedupe (Telegram) the user just saw silence (#3605), and the noise polluted the LLM's context until it started hallucinating guard rejections on plain relative paths (#3597). Why we still need some throttle --------------------------------- The original #3493 pain point was real: the LLM, refused once, would swap tools and try again -- read_file -> exec cat -> exec cp -> bash -c -> ln -sf -> python -c open(...). Just removing the fatal escape lets that loop run wild until max_iterations. What this commit does --------------------- - `nanobot/utils/runtime.py`: add `workspace_violation_signature` and `repeated_workspace_violation_error`. The signature normalizes filesystem `path` arguments and the first absolute path inside an exec command, so swapping tools against the same outside target hits the same throttle bucket. Two soft attempts are allowed; the third attempt's tool result is replaced with a hard "stop trying to bypass" message that quotes the target path and tells the model to ask the user for help. - `nanobot/agent/runner.py`: split classification into `_is_ssrf_violation` (still fatal) and `_is_workspace_violation` (now soft). All three failure branches in `_run_tool` (prep_error / exception / Error result) route through a shared `_classify_violation` that bumps the per-turn workspace_violation_counts dict and either keeps the tool's own message or substitutes the throttle escalation. `_execute_tools` now threads that dict alongside the existing external_lookup_counts. - `nanobot/agent/tools/shell.py`: append a structured boundary note to every workspace-bound guard rejection (`working_dir could not be resolved`, `working_dir is outside`, `path outside working dir`, `path traversal detected`). SSRF errors stay short and direct so the model doesn't try to "phrase around" them. Existing `2>/dev/null` allow-list and benign device passthrough from the previous commit remain. - `nanobot/agent/tools/filesystem.py`: append the same boundary note to the `outside allowed directory` PermissionError so read_file / write_file / list_dir errors give the LLM the same explicit hint. Tests ----- - `tests/utils/test_workspace_violation_throttle.py` (new): signature collapses across read_file/exec/python -c against the same path, different paths get independent budgets, escalation only fires after the third attempt. - `tests/agent/test_runner.py`: - `test_runner_does_not_abort_on_workspace_violation_anymore` -- v2 contract: filesystem PermissionError is now soft, runner moves to the next iteration and finalizes cleanly. - `test_is_ssrf_violation_remains_fatal` + the existing `test_runner_aborts_on_ssrf_violation` -- SSRF still aborts on the first attempt. - `test_runner_lets_llm_recover_from_shell_guard_path_outside` -- end to end recovery from `path outside working dir`. - `test_runner_throttles_repeated_workspace_bypass_attempts` -- four bypass attempts against the same outside target produce at least one `workspace_violation_escalated` event and the run completes naturally without aborting the turn. - The two `_execute_tools` direct-call tests now pass the new workspace_violation_counts dict. - `tests/tools/test_tool_validation.py`: relax three `==` assertions to `startswith` + "hard policy boundary" substring check to match the new structured error messages. - `tests/tools/test_exec_security.py` keeps the prior `2>/dev/null` regression and the `> /etc/issue` negative case from the previous commit on this branch -- they still pass under the new policy. Coverage status: full pytest 2648 passed / 2 skipped (was 2638 / 2 on origin/main). Ruff is clean for every file touched in this commit. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-04 01:18:39 +08:00
Xubin Ren	7742f8fbdc	fix(runner): narrow workspace_violation fatal classification (#3599 , helps #3605 #3597 ) PR #3493 promoted every shell `_guard_command` rejection to a turn-fatal RuntimeError. The two heuristic outputs in that list -- `path outside working dir` and `path traversal detected` -- routinely false-positive on benign constructs (e.g. `2>/dev/null`, quoted `..` arguments to sed/find, absolute paths inside inline scripts), so legitimate workspace commands silently kill the user's turn (#3599) and the agent never gets a chance to retry with a different approach (#3605). Two changes, both narrowly scoped: - `ExecTool._guard_command` now skips a small allow-list of kernel device files (`/dev/null`, the standard streams, `/dev/random`, `/dev/fd/N`, ...) before the workspace path check, matched against the pre-resolve string so symlinks like `/dev/stderr -> /proc/self/fd/2` still hit the allow-list. Real outside writes such as `> /etc/issue` remain blocked. - `AgentRunner._WORKSPACE_BLOCK_MARKERS` keeps only the four hard path-resolution errors from filesystem.py / shell.py and the SSRF marker. The two heuristic substrings move out of the fatal list, so the LLM sees them as ordinary tool errors and can self-correct in the next iteration. SSRF stays fatal because retrying an internal URL with a different phrasing would defeat the safety boundary. Tests: - `tests/tools/test_exec_security.py`: parametrized regression for the exact #3599 command sample plus other stdio redirects and device reads; explicit negative case asserts `> /etc/issue` is still blocked. - `tests/agent/test_runner.py`: `_is_workspace_violation` no longer fatals on the two heuristic markers, plus an end-to-end case proving the runner hands the guard error back to the LLM and finalizes the next turn cleanly.	2026-05-04 01:18:39 +08:00
Xubin Ren	96da6d8190	fix(webui): tighten turn completion handling Keep the new turn-end signal scoped to WebSocket clients, preserve pending tool-call state across trailing tool result rows, and drop the accidental npm lockfile from the Bun-based WebUI. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-03 22:28:40 +08:00
ramonpaolo	be83525f99	test(webui): cover turn-end streaming regressions	2026-05-03 22:28:40 +08:00
chengyongru	5853d5dfda	fix: allow_patterns take priority over deny_patterns in ExecTool (#3594 ) * fix: allow_patterns take priority over deny_patterns in ExecTool Previously deny_patterns were checked first with no bypass, meaning allow_patterns could never exempt commands from the built-in deny list. This made it impossible to whitelist destructive commands for specific directories (e.g. build/cleanup tasks). Changes: - shell.py: check allow_patterns first; if matched, skip deny check - shell.py: deny_patterns now appends to built-in list (not replaces) - schema.py: add allow_patterns/deny_patterns to ExecToolConfig - loop.py/subagent.py: pass allow_patterns/deny_patterns to ExecTool - Add test_exec_allow_patterns.py covering priority semantics * fix: separate deny pattern errors from workspace violation detection The deny pattern error message "Command blocked by safety guard" was included in _WORKSPACE_BLOCK_MARKERS, causing deny_pattern blocks to be misclassified as fatal workspace violations. This meant LLMs had no chance to retry with a different command — the turn was aborted immediately. Changes: - shell.py: deny/allowlist error messages now use distinct phrasing ("blocked by deny pattern filter" / "blocked by allowlist filter") - runner.py: remove "blocked by safety guard" from _WORKSPACE_BLOCK_MARKERS so deny_pattern errors are treated as normal tool errors (LLM can retry) instead of fatal violations - workspace path errors still use "blocked by safety guard" and remain fatal as intended * fix: update test assertions to match new deny pattern error message * fix: indentation error in test file * fix: restore SSRF fatal classification and tidy exec pattern plumbing Address review feedback on the deny/allow_patterns rework: - runner.py: re-add "internal/private url detected" to _WORKSPACE_BLOCK_MARKERS. The earlier marker removal also stripped fatal classification from SSRF / internal-URL rejections (whose message still says "blocked by safety guard"), turning a hard security boundary into something the LLM could retry. - loop.py / subagent.py: drop `or None` between ExecToolConfig and ExecTool. The schema default is an empty list and ExecTool already normalizes None back to [], so the indirection was a no-op. - shell.py: extract `explicitly_allowed` flag in _guard_command so allow_patterns are scanned once instead of twice and the control flow no longer relies on a no-op `pass + else` branch. - tests/agent/test_runner.py: add a regression test asserting that the SSRF block message is treated as fatal, while deny/allowlist filter messages are deliberately non-fatal. * fix: remove unused exec allow-pattern test import Keep the new ExecTool allow-pattern coverage clean under ruff. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Xubin Ren <xubinrencs@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-03 00:27:17 +08:00
Xubin Ren	188e6df757	fix(utils): cover complete trailing think markers Made-with: Cursor	2026-05-01 20:09:59 +08:00
bravel	2c397ad442	fix: strip partial think tags in streaming output	2026-05-01 20:09:59 +08:00
Xubin Ren	e157392250	fix(agent): scope subagent reply dedupe to origin message Made-with: Cursor	2026-05-01 11:47:24 +00:00
yorkhellen	08f326ec55	test: Add tests for sender_id runtime context injection	2026-05-01 19:43:38 +08:00
Xubin Ren	306958d6e6	add native Bedrock Converse provider Made-with: Cursor	2026-05-01 18:52:03 +08:00
hanyuanling	3c20d16117	fix subagent max iteration limit	2026-04-30 13:45:40 +08:00
Xubin Ren	3d7099b421	fix(memory): clean atomic write test hygiene Made-with: Cursor	2026-04-29 16:57:50 +08:00
yorkhellen	2af45945e2	fix(memory): ensure atomic write for history.jsonl Use temp file + os.replace + fsync to prevent partial writes on crash. Add tests for atomic write behavior and tmp file cleanup on exception.	2026-04-29 16:57:50 +08:00
Xubin Ren	48f3cc6390	fix(agent): stop on workspace violations from tool errors Treat workspace and safety guard failures as fatal regardless of whether they arrive from tool preparation, returned tool output, or raised exceptions. Made-with: Cursor	2026-04-28 15:13:27 +08:00
Xubin Ren	ad4802600e	refactor(config): make max messages default explicit Use 120 as the config-level default and normalize zero back to that limit so session replay always receives an explicit message cap. Made-with: Cursor	2026-04-28 14:54:32 +08:00
hussein1362	d45ffcf519	feat(config): wire max_messages into session history replay The max_messages config field in AgentDefaults was accepted by the schema but never threaded through to the actual get_history() calls in the agent loop. Both call sites in _process_message hardcoded the default, so sessions with slow or local models accumulated unbounded history that inflated prompt tokens and caused LLM timeouts. Changes: - Add max_messages field to AgentDefaults (default 0 = use built-in constant, any positive value caps history replay) - Store the value on AgentLoop and pass it to get_history() when non-zero - Wire the config through all three AgentLoop construction sites in commands.py (gateway, API server, CLI chat) - 14 focused tests covering schema validation, init storage, history slicing, boundary alignment, integration wiring, and the zero/default path	2026-04-28 14:54:32 +08:00
Xubin Ren	fdfecd3ba6	refactor(codex): name progress delta capability semantically Use a provider capability name that describes user-visible progress delta support instead of the runner implementation detail. Made-with: Cursor	2026-04-27 18:48:05 +08:00
hanyuanling	ae14142a87	fix(codex): stream progress deltas to channels	2026-04-27 18:48:05 +08:00
Xubin Ren	e31273ebaa	Merge origin/main into fix/discord-allow-channel-threads Made-with: Cursor	2026-04-27 09:26:24 +00:00
Xubin Ren	eb4b3d9e26	refactor(session): internalize history/file-cap knobs as constants Move sessionHistoryMaxMessages, sessionHistoryMaxTokens, and sessionFileMaxMessages out of user-facing config into internal constants (HISTORY_MAX_MESSAGES=120, FILE_MAX_MESSAGES=2000). - Remove 3 fields from AgentDefaults and config pipeline - Sink enforce_file_cap into Session (was AgentLoop) - Auto-derive token budget from context window (was configurable) - Net -113 lines across 7 files; 723 tests green Made-with: Cursor	2026-04-27 08:06:50 +00:00
Xubin Ren	29ebc2d355	Merge origin/main into feat/session-replay-file-cap-invariants Preserve main's timestamp/tool-context replay semantics while keeping the PR's session history and file-cap budgets. Made-with: Cursor	2026-04-27 07:32:00 +00:00
Xubin Ren	311a7fe36e	fix(session): stop training the model to parrot [Message Time: ...] Past assistant turns in history were prefixed with "[Message Time: ...]" just like user turns. The model treated these as in-context demos and started prefixing its own replies with the same marker, leaking metadata to the user. Prompt-level warnings could not beat dozens of prior assistant samples. Annotate only user turns and proactive deliveries (_channel_delivery=True, i.e. cron / heartbeat pushes whose timing is the whole point and which are too infrequent to act as demos). Adjacent user-side timestamps still pin every normal assistant reply for relative-time reasoning. The now-redundant identity.md warning is removed along with the demonstration source.	2026-04-27 07:11:20 +00:00
Xubin Ren	7dcf83e389	test(agent): cover threaded subagent routing Made-with: Cursor	2026-04-27 14:37:36 +08:00
Xubin Ren	eeaec1f951	fix(agent): prevent message time metadata from leaking into replies	2026-04-27 06:23:43 +00:00
chengyongru	6eb178113e	fix(mcp): sanitize MCP capability names for model API compatibility MCP resource/prompt/tool names containing spaces or special characters (e.g. "PostgreSQL System Information") were forwarded verbatim to model provider APIs, causing validation errors from both Anthropic and OpenAI which require names matching ^[a-zA-Z0-9_-]{1,128}$. Add _sanitize_name() that replaces invalid characters with underscores and collapses consecutive underscores. Applied in MCPToolWrapper, MCPResourceWrapper, MCPPromptWrapper constructors and the enabled_tools filtering logic. Closes #3468	2026-04-27 11:49:50 +08:00
Xubin Ren	4a4ba1efc1	Merge branch 'main' into fix/session-history-timestamps Made-with: Cursor	2026-04-26 18:13:11 +00:00
Xubin Ren	038a140ad3	fix(slack): preserve thread context for proactive replies Capture Slack thread metadata for cron and message-tool deliveries so replies stay in the originating thread, and hydrate first thread mentions with recent Slack context. Made-with: Cursor	2026-04-27 02:10:38 +08:00
Xubin Ren	df37a36174	fix(agent): expose session timestamps in model context Include persisted turn timestamps when assembling LLM prompts so relative-date references like yesterday and today have concrete anchors. Made-with: Cursor	2026-04-26 17:42:58 +00:00
hanyuanling	59dfd74842	feat(session): enforce replay/file-cap invariants for history lifecycle	2026-04-27 00:53:32 +08:00
Xubin Ren	b2aec5528a	refactor(agent): move provider refresh into subsystem owners	2026-04-26 14:18:37 +00:00
Xubin Ren	f670da6c70	refactor(providers): move provider snapshot creation into factory	2026-04-26 14:05:13 +00:00
Xubin Ren	65b0ae81af	Merge origin/main into webui-settings Made-with: Cursor	2026-04-26 13:05:32 +00:00
Xubin Ren	727086ddac	test: tighten consolidation ratio coverage Made-with: Cursor	2026-04-26 20:24:42 +08:00
chengyongru	fca56d324a	test: add unit tests for configurable consolidation_ratio Cover ratio propagation, schema validation, and consolidation behavior with different ratio values (0.1, 0.5, 0.9).	2026-04-26 20:24:42 +08:00
Xubin Ren	b440e76d2f	feat(webui): add model settings runtime refresh	2026-04-25 18:05:06 +00:00
Xubin Ren	a58d9fd357	feat(webui): render ask_user choices Made-with: Cursor	2026-04-25 15:46:47 +00:00
Xubin Ren	403ce23d22	fix(agent): tighten ask_user CLI handling Made-with: Cursor	2026-04-25 22:10:19 +08:00

1 2 3 4

163 Commits