nanobot

mirror of https://github.com/HKUDS/nanobot.git synced 2026-05-19 16:12:30 +00:00

Author	SHA1	Message	Date
ykstart	f97b960433	fix(exec): refine format command deny pattern to allow URL parameters The previous regex r"(?:^\|[;&\|]\s*)format\b" incorrectly blocked commands containing URL parameters like &format=json. Added negative lookahead (?!=) so format= (URL param key=value) is allowed while standalone format commands (e.g. ;format, &format, \|format) remain blocked. Added test cases for both blocking and allowing scenarios.	2026-05-16 18:52:42 +08:00
Xubin Ren	1c2ea1aad2	feat(goal): /goal command & long-running tasks (long_task) * feat(long-task): add LongTaskTool for multi-step agent tasks Implements a meta-ReAct loop where long-running tasks are broken into sequential subagent steps, each starting fresh with the original goal and progress from the previous step. This prevents context drift when agents work on complex, multi-step tasks. - Extract build_tool_registry() from SubagentManager for reuse - Add run_step() for synchronous subagent execution (no bus announcement) - Add HandoffTool and CompleteTool as signal mechanisms via shared dict - Add LongTaskTool orchestrator with simplified prompt (8 iterations/step) - Register LongTaskTool in main agent loop - Add _extract_handoff_from_messages fallback for robustness * fix(long-task): add debug logging for step-level observability * feat(long-task): major overhaul with structured handoffs, validation, and observability - Structured HandoffState: HandoffTool now accepts files_created, files_modified, next_step_hint, and verification fields instead of a plain string. Progress is passed between steps as structured data. - Completion validation round: After complete() is called, a dedicated validator step runs to verify the claim against the original goal. If validation fails, the task continues rather than returning a false completion. - Dynamic prompt system: 3 Jinja2 templates (step_start, step_middle, step_final) selected based on step number. Final steps get tighter budget and stronger "wrap up" guidance. - Automatic file change tracking: Extracts write_file/edit_file events from tool_events and injects them into the next step's context if the subagent forgot to report them explicitly. - Budget tracking & adaptive strategy: Cumulative token usage is tracked across steps. Per-step tool budget drops from 8 to 4 in the last two steps to force handoff/completion. - Crash retry with graceful degradation: A step that crashes is retried once. Persistent crashes terminate the task and return partial progress. - Full observability hooks for future WebUI integration: - set_hooks() with on_step_start, on_step_complete, on_handoff, on_validation_started, on_validation_passed, on_validation_failed, on_task_complete, on_task_error, and catch-all on_event. - Readable state properties: current_step, total_steps, status, last_handoff, cumulative_usage, goal. - inject_correction() allows external code to send user corrections that are injected into the next step's prompt. - run_step() accepts optional max_iterations for dynamic budget control. All 27 long-task tests and 11 subagent tests pass. * test(long-task): add boundary tests and fix race conditions - Add 7 edge-case tests: validation crash resilience, hook exception safety, mid-run correction injection, FIFO correction ordering, explicit file changes overriding auto-detection, final budget for max_steps=1, and dynamic budget switching boundaries - Fix assertion in test_long_task_completes_after_multiple_handoffs to match exact prompt format - Remove asyncio timing hack from test_state_exposure - Add asyncio.sleep(0) yield in test_inject_correction_during_execution to prevent race between signal injection and step continuation - All 34 tests passing * fix(long-task): address code review findings - Declare _scopes = {"core"} explicitly to prevent recursive nesting in subagent scope - Document fragile coupling in _extract_file_changes: path extraction depends on write_file/edit_file detail format; add debug log for unexpected formats - Align final-template threshold (max_steps - 2) with budget switch threshold - Eliminate hasattr(self, "_state") in _reset_state by initializing in __init__ * fix(long-task): honor final signal and file tracking Co-authored-by: Cursor <cursoragent@cursor.com> * feat(long-task): improve prompt structure and agent contract - Expand LongTaskTool.description to instruct parent agent on goal construction, return value semantics, and how to handle results. - Expand CompleteTool.description to emphasize that the summary IS the final answer returned to the parent agent. - Prefix validated return value with an explicit "final answer" directive to stop parent agent from re-running work. - Redesign step_start.md: Step 1 is now explicitly for exploration, planning, and skeleton-building. complete() is discouraged. - Remove bulky payload debug logging from _emit(); add targeted info/warning/error logs at key state transitions instead. - Add signal_type to HandoffState for cleaner signal detection. * test(long-task): expect wrapped completion message after validation Align assertions with LongTaskTool final return shape on main. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(webui): turn timing strip, latency, and session-switch restore - Agent loop: publish goal_status run/idle for WebSocket turns; attach wall-clock latency_ms on turn_end and persisted assistant metadata. - WebSocket channel: forward goal_status and latency fields to clients. - NanobotClient: track goal_status started_at per chat without requiring onChat; useNanobotStream restores run strip when returning to a chat. - Thread UI: composer/shell viewport hooks for run duration and latency; format helpers and i18n strings. - MessageBubble: drop trailing StreamCursor (layout artifact vs block markdown). - Builtin / tests: model command coverage, websocket and loop tests. Covers multi-session UX and round-trip timing visibility for the WebUI. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: keep message-tool file attachments after canonical history hydrate - MessageTool records per-turn media paths delivered to the active chat. - nanobot.utils.session_attachments stages out-of-media-root files and merges into the last assistant message before save (loop stays a thin call). - WebUI MediaCell: use a signed URL as a real download link when present. Fixes attachments flashing then vanishing on turn_end when paths lived outside get_media_dir (e.g. workspace files). Co-authored-by: Cursor <cursoragent@cursor.com> * feat(webui): agent activity cluster, stable keys, LTR sheen labels - Group reasoning and tool traces in AgentActivityCluster with i18n summaries - Stabilize React list keys for activity clusters (first message id anchor) - Replace background-clip shimmer with overlay sheen for streaming labels - ThreadMessages/MessageList integration and locale strings Co-authored-by: Cursor <cursoragent@cursor.com> * fix(webui): render assistant reasoning with Markdown + deferred stream - Use MarkdownText for ReasoningBubble body (same GFM/KaTeX path as replies) - Apply muted/italic prose tokens so thinking stays visually subordinate - useDeferredValue while reasoningStreaming to ease parser work during deltas - Preload markdown chunk when trace opens; add regression test with preloaded renderer Co-authored-by: Cursor <cursoragent@cursor.com> * fix(webui): default-collapse agent activity cluster while Working Outer fold no longer auto-expands during isTurnStreaming; user opens to see traces. Header sheen and live summary unchanged. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(long_task): cumulative run history, file union, and prompt tuning Inject cross-step summaries and merged file paths into middle/final step templates so chains do not lose early context. Strip the last run-history block when it duplicates Previous Progress to save tokens. Add optional cumulative_prompt_max_chars and cumulative_step_body_max_chars parameters with clamped defaults. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(webui): session switch keeps in-flight thread and replays buffered WS Save the prior chat message list to the per-chat cache in a layout effect when chatId changes (before stale writes could corrupt another chat). Skip one post-switch layout cache tick so we do not snapshot the wrong tab. Buffer inbound events per chat_id when no onChat subscriber is registered (e.g. user focused another session) and drain on resubscribe up to a cap, so streaming deltas are not lost while off-tab. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(webui): snap thread scroll to bottom on session open (no smooth glide) Use scroll-behavior auto on the viewport, instant programmatic scroll when following new messages and on scrollToBottomSignal. Keep smooth only for the explicit scroll-to-bottom button. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(webui): respect manual scroll-up after opening a session Track when the user leaves the bottom with a ref and skip ResizeObserver and deferred bottom snaps until they return or the conversation is reset. Remove the time-based force-bottom window that overrode atBottom. Multi-frame scrollToBottom honours the same guard unless force (scroll button). Co-authored-by: Cursor <cursoragent@cursor.com> * Publish long_task UI snapshots on outbound metadata - Add OUTBOUND_META_AGENT_UI (_agent_ui) for channel-agnostic structured state - LongTaskTool publishes {kind: long_task, data: snapshot} on the bus with _progress - WebSocket send forwards metadata as agent_ui for WebUI clients - Tests for bus payload, WS frame, and progress assertions - Fix loop progress tests: ignore _goal_status in streaming final filter and avoid brittle outbound[-1] ordering after goal status idle messages Co-authored-by: Cursor <cursoragent@cursor.com> * feat: WebUI long_task activity card and resilient history merge Add optional ui_summary to the long_task tool for one-line UI labels. Stream long_task agent_ui into a dedicated message row with timeline, markdown peek, and a right sheet for details. Merge canonical history after turn_end while re-inserting long_task rows before the final assistant reply. Collapse duplicate task_start/step_start steps in the timeline and extend i18n. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor: align long_task with thread_goal and drop orchestrator UI - Persist sustained objectives via session metadata (long_task / complete_goal); no subagent wiring or tool-driven agent_ui payloads.\n- Remove WebUI long-task activity UI, types, and translations; history merge preserves trace replay only, with legacy long_task rows normalized to traces.\n- Drop long_task prompt templates and get_long_task_run_dir; add webui thread disk helper for gateway persistence tests. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(agent): thread goal runtime context, tools, and skill - Add thread_goal_state helper and mirror active objectives into Runtime Context - Wire loop/context/memory/events as needed for goal metadata in turns - Expand long_task / complete_goal semantics (pivot/cancel/honest recap) - Add always-on thread-goal SKILL.md; align /goal command prompt - Tests for context builder and thread goal state - Remove unused webui ChatPane component Co-authored-by: Cursor <cursoragent@cursor.com> * feat(thread-goal): add websocket snapshot helper and publish goal updates from long_task Introduce thread_goal_ws_blob for bounded JSON snapshots, attach snapshots to websocket turn_end metadata in AgentLoop, and let long_task fan-out dedicated thread_goal frames on the websocket channel after persisting session metadata. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(channels): websocket thread_goal frames, turn_end replay, and session API scrub for subagent inject Emit thread_goal events and optional thread_goal on turn_end; scrub persisted subagent announce blobs on GET /api/sessions/.../messages and shorten session list previews so WebUI does not surface full Task/Summarize scaffolding. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(webui): merge ephemeral traces per user turn when reconciling canonical history Preserve disk/live trace rows inside the matching user–assistant segment instead of stacking every trace before the final assistant reply (fixes inflated tool counts after refresh or session switch). Co-authored-by: Cursor <cursoragent@cursor.com> * feat(webui): show assistant reply copy only on the last slice before the next user turn Avoid duplicate copy affordances on intermediate assistant bubbles that precede more agent activity in the same turn (tools or further assistant text). Co-authored-by: Cursor <cursoragent@cursor.com> * feat(webui): thread_goal stream plumbing, composer goal strip, sky glow, and client-side subagent scrub projection Track thread_goal and turn_goal snapshots in NanobotClient, hydrate React state from thread_goal frames and turn_end, surface objective/elapsed in the composer, add breathing sky halo CSS while goals are active, mirror server scrub logic on history hydration and webui_thread snapshots, and extend tests/client mocks. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(channels): add Slack Socket Mode connect timeout with actionable timeout errors Abort hung websockets.connect handshakes after a bounded wait, log REST-vs-WSS guidance, surface RuntimeError to channel startup, and log successful WSS setup. Co-authored-by: Cursor <cursoragent@cursor.com> * webui: expand thread goal in composer bottom sheet Add ChevronUp control on the run/goal strip that opens a bottom Sheet with full ui_summary and objective. Inline preview logic in RunElapsedStrip, add i18n strings across locales, and a composer unit test. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(webui): widen dedupeToolCallsForUi input for session API typing fetchSessionMessages types tool_calls as unknown; accept unknown so tsc build passes when passing message.tool_calls through. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(agent): extract WebSocket turn run status to webui_turn_helpers * refactor(skills): rename thread-goal to long-task and document idempotent goals * feat(skills): rename sustained-goal skill to long-goal and tighten long_task guidance * chore: remove unused subagent/context/router helpers * feat(session): rename sustained goal to goal_state and align WS/WebUI - Move helpers from agent/thread_goal_state to session/goal_state: GOAL_STATE_KEY, goal_state_runtime_lines, goal_state_ws_blob, parse_goal_state. - Session metadata now uses "goal_state"; still read legacy "thread_goal"; long_task writes drop the legacy key after save. - WebSocket: event/field goal_state, _goal_state_sync; turn_end carries goal_state; accept legacy _thread_goal_sync/thread_goal inbound metadata for dispatch. - WebUI: GoalStateWsPayload, goalState hook/client props, i18n keys goalState. - Runtime Context copy uses "Goal (active):" instead of "Thread goal". feat(agent): stream Anthropic thinking deltas and fix stream idle timeout * refactor(webui): transcript jsonl as sole timeline source * fix(agent): reject mismatched WS message chat_id and stream reasoning deltas * feat(webui): hydrate sustained goal and run timer after websocket subscribe * chore(webui,websocket): remove unused fetch helpers and legacy thread_goal WS paths * Raise default max_tokens and context window in agent schema. Align AgentDefaults and ModelPresetConfig with typical Claude-scale usage (32k completion budget, 256k context window) and update migration tests. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(gateway): bootstrap prefers in-memory model; clarify websocket naming * fix(websocket): websocket _handle_message passes is_dm; refresh /status test expectations --------- Co-authored-by: chengyongru <2755839590@qq.com> Co-authored-by: chengyongru <chengyongru.ai@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-16 01:14:11 +08:00
hanyuanling	b2ac609bb5	fix(web): back off Brave search rate limits	2026-05-16 00:12:50 +08:00
hinotoi-agent	164614ccf2	fix(message): share workspace path resolver	2026-05-15 17:19:20 +08:00
hinotoi-agent	57d7847dc8	fix(message): confine local media attachments	2026-05-15 17:19:20 +08:00
chengyongru	fe90edd71f	refactor(tools): remove GlobTool GlobTool is redundant — GrepTool already supports glob-based file filtering via its `glob` parameter, making a standalone glob-only tool unnecessary. Removing it simplifies the tool surface and reduces LLM confusion between glob and grep.	2026-05-15 17:19:00 +08:00
Jiajun Xie	6a25d8042d	fix(shell): support UNC paths in Windows path extraction - Update regex in _extract_absolute_paths to match both drive paths (C:\...) and UNC paths (\server\share) - Add comprehensive test cases for UNC paths, mixed paths, and edge cases	2026-05-15 15:47:15 +08:00
chengyongru	6a4ed255de	fix(mcp): probe HTTP port before connecting to prevent event-loop crash When an MCP server configured as streamableHttp or SSE is unreachable, streamable_http_client's anyio task group cleanup raises RuntimeError / ExceptionGroup that escapes the caller's try/except and crashes the event loop with "Unhandled exception in event loop". Fix: add a lightweight TCP probe (_probe_http_url) before entering the MCP SDK transport. If the port is closed, the server is skipped with a warning instead of crashing. stdio transport is not probed (local process). Closes #3739	2026-05-13 23:39:07 +08:00
chengyongru	9e15925cf4	refactor(agent): remove ask_user tool The ask_user tool used AskUserInterrupt(BaseException) for mid-turn blocking, creating heavy coupling across runner, loop, and session management. The model now asks questions naturally in response text, the turn ends normally, and the user's next message starts a new turn with session history providing continuity. Removed: - nanobot/agent/tools/ask.py (tool, interrupt, helpers) - tests/agent/test_ask_user.py - webui/src/components/thread/AskUserPrompt.tsx - AskUserInterrupt handling in runner.py - Dual-path message building in loop.py - Pending ask detection via history scanning - button_prompt/buttons emission in WebSocket channel - ask_user references in Slack channel docstrings Preserved (MessageTool uses these independently): - OutboundMessage.buttons field - Channel button rendering (Telegram, Slack, WebSocket)	2026-05-12 22:48:26 +08:00
Xubin Ren	23312d683e	fix(tools): isolate plugin runtime state Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-12 11:28:20 +08:00
chengyongru	043f0e67f7	feat(tools): introduce plugin-based tool discovery and runtime context protocol This commit implements a progressive refactoring of the tool system to support plugin discovery, scoped loading, and protocol-driven runtime context injection. Key changes: - Add Tool ABC metadata (tool_name, _scopes) and ToolContext dataclass for dependency injection. - Introduce ToolLoader with pkgutil-based builtin discovery and entry_points-based third-party plugin loading. - Add scope filtering (core/subagent/memory) so different contexts load appropriate tool sets. - Introduce ContextAware protocol and RequestContext dataclass to replace hardcoded per-tool context injection in AgentLoop. - Add RuntimeState / MutableRuntimeState protocols to decouple MyTool from AgentLoop. - Migrate all built-in tools to declare scopes and implement create()/enabled() hooks. - Migrate MessageTool, SpawnTool, CronTool, and MyTool to ContextAware. - Refactor AgentLoop to use ToolLoader and protocol-driven context injection. - Refactor SubagentManager to use ToolLoader(scope="subagent") with per-run FileStates isolation. - Register all built-in tools via pyproject.toml entry_points. - Add comprehensive tests for loader scopes, entry_points, ContextAware, subagent tools, and runtime state sync.	2026-05-12 11:28:20 +08:00
Xubin Ren	56eee06736	feat(webui): add BYOK web search settings Let WebUI users configure the single web search provider credential from BYOK while keeping saved secrets masked and hot-reloaded for new searches. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-09 14:52:48 +08:00
Xubin Ren	3231aaf9ee	fix(image): prevent duplicate delivery and replay artifacts	2026-05-09 05:45:13 +00:00
Xubin Ren	e936ed48bd	feat: add image generation tool and WebUI mode Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-08 20:06:23 +08:00
chengyongru	05e0106592	refactor(logging): preserve tracebacks and add channel context - Preserve tracebacks: logger.error in except blocks → logger.exception - Channel context: BaseChannel injects self.logger = logger.bind(channel=name) - Third-party bridge: redirect_lib_logging() replaces ad-hoc stdlib-to-loguru bridges - Log levels: network timeouts downgraded from ERROR → WARNING - Fix --verbose flag to actually work with loguru (set handler to DEBUG)	2026-05-06 21:17:45 +08:00
Xubin Ren	614b21368f	fix(agent): tighten safety guard edge cases Keep the /dev workspace guard exception scoped to the known benign device paths already handled by ExecTool, and add coverage that non-benign /dev targets still get blocked. Also add a streaming regression for tool_error responses so fatal tool failures are delivered by channels instead of being marked as already streamed. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-04 01:25:52 +08:00
chengyongru	d3689d143c	fix(agent): prevent safety guard false positives and streamed message drop Three independent fixes for issues exposed by PR #3493: 1. shell.py: allow /dev/* paths in workspace guard Commands like `rm file.txt 2>/dev/null` were blocked because _extract_absolute_paths captured /dev/null as a path outside the workspace. Allow /dev like media_path is already allowed. 2. shell.py: remove \| from home_paths regex prefix Loki query operator `\|~` was misinterpreted as pipe + home directory, causing false workspace violation errors. 3. loop.py: change _streamed from blacklist to whitelist stop_reason "tool_error" was not in the exclusion set {"ask_user", "error"}, so _streamed=True was set on fatal errors. channel manager then skipped channel.send() because it assumed the content was already streamed — but it never was. Whitelist to only {"stop", "end_turn", "max_tokens"}. Also fixes a pre-existing Windows bug in _spawn where create_subprocess_exec + list2cmdline breaks commands with paths containing spaces (e.g. D:\Program Files\python.exe). Closes: #3599, #3605	2026-05-04 01:25:52 +08:00
Xubin Ren	2a7433b7ec	chore(runner): tighten workspace guard comments and Windows tests Keep the workspace-boundary changes easier to review by trimming long explanatory comments down to short local notes. Also make the #3599 POSIX command regression skip on Windows and normalize workspace violation signatures to POSIX separators so the throttle tests are platform-stable. Tests: - uv run pytest tests/tools/test_exec_security.py tests/utils/test_workspace_violation_throttle.py -q - uv run pytest -q Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-04 01:18:39 +08:00
Xubin Ren	b8406be215	fix(runner): soft workspace boundary + per-target throttle (#3493 #3599 #3605 ) Replaces PR #3493's blanket fatal abort with a "tell the model + throttle the bypass loop" policy. Workspace-bound rejections are now ordinary recoverable tool errors enriched with a structured "this is a hard policy boundary" instruction; SSRF stays the only marker that aborts the turn. Why the fatal-abort approach broke ---------------------------------- PR #3493 promoted every shell `_guard_command` and filesystem path-resolution rejection to a turn-fatal RuntimeError. Two of those messages (`path outside working dir` and `path traversal detected`) are heuristic substring scans on the raw command, so legitimate commands like `rm <ws>/x.txt 2>/dev/null` or `find . -type f` killed the user's turn (#3599). On channels with outbound dedupe (Telegram) the user just saw silence (#3605), and the noise polluted the LLM's context until it started hallucinating guard rejections on plain relative paths (#3597). Why we still need some throttle --------------------------------- The original #3493 pain point was real: the LLM, refused once, would swap tools and try again -- read_file -> exec cat -> exec cp -> bash -c -> ln -sf -> python -c open(...). Just removing the fatal escape lets that loop run wild until max_iterations. What this commit does --------------------- - `nanobot/utils/runtime.py`: add `workspace_violation_signature` and `repeated_workspace_violation_error`. The signature normalizes filesystem `path` arguments and the first absolute path inside an exec command, so swapping tools against the same outside target hits the same throttle bucket. Two soft attempts are allowed; the third attempt's tool result is replaced with a hard "stop trying to bypass" message that quotes the target path and tells the model to ask the user for help. - `nanobot/agent/runner.py`: split classification into `_is_ssrf_violation` (still fatal) and `_is_workspace_violation` (now soft). All three failure branches in `_run_tool` (prep_error / exception / Error result) route through a shared `_classify_violation` that bumps the per-turn workspace_violation_counts dict and either keeps the tool's own message or substitutes the throttle escalation. `_execute_tools` now threads that dict alongside the existing external_lookup_counts. - `nanobot/agent/tools/shell.py`: append a structured boundary note to every workspace-bound guard rejection (`working_dir could not be resolved`, `working_dir is outside`, `path outside working dir`, `path traversal detected`). SSRF errors stay short and direct so the model doesn't try to "phrase around" them. Existing `2>/dev/null` allow-list and benign device passthrough from the previous commit remain. - `nanobot/agent/tools/filesystem.py`: append the same boundary note to the `outside allowed directory` PermissionError so read_file / write_file / list_dir errors give the LLM the same explicit hint. Tests ----- - `tests/utils/test_workspace_violation_throttle.py` (new): signature collapses across read_file/exec/python -c against the same path, different paths get independent budgets, escalation only fires after the third attempt. - `tests/agent/test_runner.py`: - `test_runner_does_not_abort_on_workspace_violation_anymore` -- v2 contract: filesystem PermissionError is now soft, runner moves to the next iteration and finalizes cleanly. - `test_is_ssrf_violation_remains_fatal` + the existing `test_runner_aborts_on_ssrf_violation` -- SSRF still aborts on the first attempt. - `test_runner_lets_llm_recover_from_shell_guard_path_outside` -- end to end recovery from `path outside working dir`. - `test_runner_throttles_repeated_workspace_bypass_attempts` -- four bypass attempts against the same outside target produce at least one `workspace_violation_escalated` event and the run completes naturally without aborting the turn. - The two `_execute_tools` direct-call tests now pass the new workspace_violation_counts dict. - `tests/tools/test_tool_validation.py`: relax three `==` assertions to `startswith` + "hard policy boundary" substring check to match the new structured error messages. - `tests/tools/test_exec_security.py` keeps the prior `2>/dev/null` regression and the `> /etc/issue` negative case from the previous commit on this branch -- they still pass under the new policy. Coverage status: full pytest 2648 passed / 2 skipped (was 2638 / 2 on origin/main). Ruff is clean for every file touched in this commit. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-04 01:18:39 +08:00
Xubin Ren	7742f8fbdc	fix(runner): narrow workspace_violation fatal classification (#3599 , helps #3605 #3597 ) PR #3493 promoted every shell `_guard_command` rejection to a turn-fatal RuntimeError. The two heuristic outputs in that list -- `path outside working dir` and `path traversal detected` -- routinely false-positive on benign constructs (e.g. `2>/dev/null`, quoted `..` arguments to sed/find, absolute paths inside inline scripts), so legitimate workspace commands silently kill the user's turn (#3599) and the agent never gets a chance to retry with a different approach (#3605). Two changes, both narrowly scoped: - `ExecTool._guard_command` now skips a small allow-list of kernel device files (`/dev/null`, the standard streams, `/dev/random`, `/dev/fd/N`, ...) before the workspace path check, matched against the pre-resolve string so symlinks like `/dev/stderr -> /proc/self/fd/2` still hit the allow-list. Real outside writes such as `> /etc/issue` remain blocked. - `AgentRunner._WORKSPACE_BLOCK_MARKERS` keeps only the four hard path-resolution errors from filesystem.py / shell.py and the SSRF marker. The two heuristic substrings move out of the fatal list, so the LLM sees them as ordinary tool errors and can self-correct in the next iteration. SSRF stays fatal because retrying an internal URL with a different phrasing would defeat the safety boundary. Tests: - `tests/tools/test_exec_security.py`: parametrized regression for the exact #3599 command sample plus other stdio redirects and device reads; explicit negative case asserts `> /etc/issue` is still blocked. - `tests/agent/test_runner.py`: `_is_workspace_violation` no longer fatals on the two heuristic markers, plus an end-to-end case proving the runner hands the guard error back to the LLM and finalizes the next turn cleanly.	2026-05-04 01:18:39 +08:00
chengyongru	5853d5dfda	fix: allow_patterns take priority over deny_patterns in ExecTool (#3594 ) * fix: allow_patterns take priority over deny_patterns in ExecTool Previously deny_patterns were checked first with no bypass, meaning allow_patterns could never exempt commands from the built-in deny list. This made it impossible to whitelist destructive commands for specific directories (e.g. build/cleanup tasks). Changes: - shell.py: check allow_patterns first; if matched, skip deny check - shell.py: deny_patterns now appends to built-in list (not replaces) - schema.py: add allow_patterns/deny_patterns to ExecToolConfig - loop.py/subagent.py: pass allow_patterns/deny_patterns to ExecTool - Add test_exec_allow_patterns.py covering priority semantics * fix: separate deny pattern errors from workspace violation detection The deny pattern error message "Command blocked by safety guard" was included in _WORKSPACE_BLOCK_MARKERS, causing deny_pattern blocks to be misclassified as fatal workspace violations. This meant LLMs had no chance to retry with a different command — the turn was aborted immediately. Changes: - shell.py: deny/allowlist error messages now use distinct phrasing ("blocked by deny pattern filter" / "blocked by allowlist filter") - runner.py: remove "blocked by safety guard" from _WORKSPACE_BLOCK_MARKERS so deny_pattern errors are treated as normal tool errors (LLM can retry) instead of fatal violations - workspace path errors still use "blocked by safety guard" and remain fatal as intended * fix: update test assertions to match new deny pattern error message * fix: indentation error in test file * fix: restore SSRF fatal classification and tidy exec pattern plumbing Address review feedback on the deny/allow_patterns rework: - runner.py: re-add "internal/private url detected" to _WORKSPACE_BLOCK_MARKERS. The earlier marker removal also stripped fatal classification from SSRF / internal-URL rejections (whose message still says "blocked by safety guard"), turning a hard security boundary into something the LLM could retry. - loop.py / subagent.py: drop `or None` between ExecToolConfig and ExecTool. The schema default is an empty list and ExecTool already normalizes None back to [], so the indirection was a no-op. - shell.py: extract `explicitly_allowed` flag in _guard_command so allow_patterns are scanned once instead of twice and the control flow no longer relies on a no-op `pass + else` branch. - tests/agent/test_runner.py: add a regression test asserting that the SSRF block message is treated as fatal, while deny/allowlist filter messages are deliberately non-fatal. * fix: remove unused exec allow-pattern test import Keep the new ExecTool allow-pattern coverage clean under ruff. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Xubin Ren <xubinrencs@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-03 00:27:17 +08:00
Xubin Ren	aea5948b11	fix(tools): tighten web fetch URL cleaning Made-with: Cursor	2026-05-01 19:58:19 +08:00
彭星杰	5dc96505e8	fix(web_fetch): sanitize URL to strip markdown backticks and quotes before validation LLM-generated tool calls may wrap URLs in markdown backticks or quotes (e.g. \https://example.com\), causing urlparse to produce empty scheme and netloc, which leads to all fetch attempts failing silently. Add URL cleaning at the top of WebFetchTool.execute to strip whitespace, backticks, double quotes, and single quotes, plus an early rejection guard for non-http(s) URLs after cleaning.	2026-05-01 19:58:19 +08:00
Xubin Ren	fae38319ca	fix(tools): scope file state by session Made-with: Cursor	2026-05-01 19:15:07 +08:00
LZDQ	58ae2d5b7e	Claude: replace module-level file read states with per-loop per-session state class. fixes #3571	2026-05-01 19:15:07 +08:00
chengyongru	28f9bbff31	feat(web_search): add olostep provider Adds Olostep (https://www.olostep.com) as an optional web_search backend using the official olostep Python SDK (client.answers.create()). Changes: - pyproject.toml: adds olostep>=0.1.0 optional dependency - schema.py: adds olostep to provider comment in WebSearchConfig - web.py: adds _search_olostep() with lazy import and provider branching - docs/configuration.md: documents Olostep setup under web search config - tests: unit tests for the new provider Backward compatible: existing users see no behavior change unless they opt into provider: "olostep". No hard dependency at runtime path. Co-authored-by: umerkay <umerkk164@gmail.com>	2026-04-28 19:09:38 +08:00
Xubin Ren	f4d8783f5e	test(web): cover configurable fetch behavior Ensure custom user agents are applied to direct web requests and disabling Jina Reader forces the local readability path. Made-with: Cursor	2026-04-28 07:25:47 +00:00
Xubin Ren	9b6f3d7abc	fix(agent): resolve message media against active workspace Made-with: Cursor	2026-04-27 14:31:39 +08:00
chengyongru	9b3e2524ac	fix(agent): resolve relative media paths in MessageTool When deployed with Docker and workspace mounted as a volume, sending media files failed because relative paths (e.g. output/image.png) were not resolved against the workspace directory. The process CWD differs from the workspace in containerized environments, causing os.path.isfile checks to fail in channel handlers. Normalize relative media paths at the MessageTool entry point using get_workspace_path().	2026-04-27 14:31:39 +08:00
chengyongru	6eb178113e	fix(mcp): sanitize MCP capability names for model API compatibility MCP resource/prompt/tool names containing spaces or special characters (e.g. "PostgreSQL System Information") were forwarded verbatim to model provider APIs, causing validation errors from both Anthropic and OpenAI which require names matching ^[a-zA-Z0-9_-]{1,128}$. Add _sanitize_name() that replaces invalid characters with underscores and collapses consecutive underscores. Applied in MCPToolWrapper, MCPResourceWrapper, MCPPromptWrapper constructors and the enabled_tools filtering logic. Closes #3468	2026-04-27 11:49:50 +08:00
Xubin Ren	038a140ad3	fix(slack): preserve thread context for proactive replies Capture Slack thread metadata for cron and message-tool deliveries so replies stay in the originating thread, and hydrate first thread mentions with recent Slack context. Made-with: Cursor	2026-04-27 02:10:38 +08:00
Xubin Ren	3b82e14f85	fix(shell): preserve login PATH for path append Made-with: Cursor	2026-04-26 20:32:38 +08:00
yorkhellen	814345dd78	fix: update tests for path_append env dict change	2026-04-26 20:32:38 +08:00
yorkhellen	2f2ac96ac7	fix: update tests for path_append env dict change	2026-04-26 20:32:38 +08:00
yorkhellen	23dde7b84c	fix: prevent shell injection via path_append in ExecTool	2026-04-26 20:32:38 +08:00
Xubin Ren	6036355ac5	fix(message): limit session recording to proactive sends Only mark message-tool deliveries for channel-session recording while cron jobs are running, avoiding duplicate session writes during normal user turns. Made-with: Cursor	2026-04-26 20:08:21 +08:00
Pablo Cabeza	c23d719780	feat(agent): emit structured _tool_events progress metadata Extend the existing on_progress callback to carry structured tool-event payloads alongside the plain-text hint, so channels can render rich tool execution state (start/finish/error, arguments, results, file attachments) rather than only the pre-formatted hint string. Changes ------- - AgentLoop._tool_event_start_payload() — builds a version-1 start payload from a ToolCallRequest - AgentLoop._tool_event_result_extras() — extracts files/embeds from a tool result dict - AgentLoop._tool_event_finish_payloads() — maps tool_calls + tool_results + tool_events from AgentHookContext into finish payloads - _LoopHook.before_execute_tools() — passes tool_events=[...] to on_progress together with the existing tool_hint flag - _LoopHook.after_iteration() — emits a second on_progress call with the finish payloads once tool results are available - _bus_progress() — forwards tool_events as _tool_events in OutboundMessage metadata so channel implementations can read them - on_progress type widened to Callable[..., Awaitable[None]] on all public entry points; _cli_progress updated to accept and ignore tool_events The contract is additive: callers that only accept (content, *, tool_hint) continue to work unchanged. Callers that also accept tool_events receive the structured data. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 20:06:11 +08:00
Xubin Ren	b9b81d9301	test(telegram): pin inline-keyboards flag gate and buttons validation Two kill-switch tests for the new inline-keyboards path. Neither is flashy — they just make sure the next unrelated refactor can't quietly regress two narrow contracts the PR relies on. 1. TelegramChannel._build_keyboard returns None whenever TelegramConfig.inline_keyboards is False, even if buttons are supplied. The flag defaults off; if someone ever flips that default the change should fail this test before it reaches prod bots. 2. MessageTool rejects malformed `buttons` payloads (non-list, mixed list/str row, non-str label, None label) up front instead of letting them slip into the channel layer where Telegram would silently 400 the send. Parametrized over four shapes the guard needs to reject. No production code touched. Made-with: Cursor	2026-04-23 13:26:06 +08:00
k	03ec28dd49	fix(mcp): avoid WinError 193 for Windows stdio launchers	2026-04-22 14:50:55 +09:00
aiguozhi123456	53ba410e49	feat(read_file): add DOCX, XLSX, PPTX support via document.extract_text() Wire up the existing office document extractors in document.py to ReadFileTool by adding an extension guard and _read_office_doc() method that follows the established PDF pattern. Handles missing libraries, corrupt files, empty documents, and 128K truncation consistently.	2026-04-21 22:12:19 +08:00
Xubin Ren	5badb75f6c	review: tighten scope and add regression tests Follow-ups from review of #3194: - ci.yml: drop unconditional --ignore=tests/channels/test_matrix_channel.py. That test file already calls pytest.importorskip("nio") at module top, so it self-skips on Windows (where nio isn't installed) without also hiding 62 tests from Linux CI. - filesystem.py: hoist `import os` to the module top and drop the duplicate inline import in ReadFileTool.execute. Document the CRLF->LF normalization as intentional (primarily a Windows UX fix so downstream StrReplace/Grep match consistently regardless of where the file was written). - test_read_enhancements.py: lock down two new behaviors * TestFileStateHashFallback: check_read warns when content changes but mtime is unchanged (coarse-mtime filesystems on Windows). * TestReadFileLineEndingNormalization: ReadFileTool strips CRLF and preserves LF-only files untouched. - test_tool_validation.py: restore list2cmdline/shlex.quote in test_exec_head_tail_truncation. The temp_path-based form was correct, but dropping the quoting broke on any Windows path containing spaces (e.g. C:\Users\John Doe\...). CI runners happen not to have spaces so this slipped through. Tests: 1993 passed locally. Made-with: Cursor	2026-04-17 16:11:37 +08:00
Jiajun Xie	3db2eb66e4	ci: add Windows and Python 3.14 support	2026-04-17 16:11:37 +08:00
chengyongru	b51da93cbb	feat(agent): add SelfTool for runtime self-inspection and configuration Add a built-in tool that lets the agent inspect and modify its own runtime state (model, iterations, context window, etc.). Key features: - inspect: view current config, usage stats, and subagent status - modify: adjust parameters at runtime (protected by type/range validation) - Subagent observability: inspect running subagent tasks (phase, iteration, tool events, errors) — subagents are no longer a black box - Watchdog corrects out-of-bounds values on each iteration - Enabled by default in read-only mode (self_modify: false) - All changes are in-memory only; restart restores defaults - Comprehensive test suite (90 tests) Includes a self-awareness skill (always-on) with progressive disclosure: SKILL.md for core rules, references/examples.md for detailed scenarios.	2026-04-16 23:44:26 +08:00
Mohamed Elkholy	1304ff78cc	perf(tools): cache ToolRegistry.get_definitions() between mutations get_definitions() sorts tools on every LLM iteration for prompt cache stability. Cache the sorted result and invalidate on register/unregister so the sort only runs when the tool set actually changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 21:52:36 +08:00
yeyitech	ee061f0595	fix(web): serialize duckduckgo search calls	2026-04-14 14:10:06 +08:00
Xubin Ren	49355b2bd6	test(tools): lock non-object parameter validation Add focused registry coverage so the new read_file/read_write parameter guard stays actionable without changing generic validation behavior for other tools. Made-with: Cursor	2026-04-13 09:55:05 +08:00
haosenwang1018	92ef594b6a	fix(mcp): hint on stdio protocol pollution	2026-04-13 09:41:55 +08:00
Xubin Ren	5dc238c7ef	fix(shell): allow read-only copies from internal state files Keep the new exec guard focused on writes to history.jsonl and .dream_cursor while still allowing read-only copy operations out of those files. Made-with: Cursor	2026-04-12 16:38:55 +08:00
04cb	3f59bd1443	fix(shell): reject LLM-supplied working_dir outside workspace (#2826 )	2026-04-12 16:38:55 +08:00
04cb	00fb491bc9	fix(shell): block exec writes to history.jsonl and cursor files (#2989 )	2026-04-12 16:38:55 +08:00

1 2

87 Commits