nanobot

mirror of https://github.com/HKUDS/nanobot.git synced 2026-05-19 16:12:30 +00:00

Author	SHA1	Message	Date
Xubin Ren	1c2ea1aad2	feat(goal): /goal command & long-running tasks (long_task) * feat(long-task): add LongTaskTool for multi-step agent tasks Implements a meta-ReAct loop where long-running tasks are broken into sequential subagent steps, each starting fresh with the original goal and progress from the previous step. This prevents context drift when agents work on complex, multi-step tasks. - Extract build_tool_registry() from SubagentManager for reuse - Add run_step() for synchronous subagent execution (no bus announcement) - Add HandoffTool and CompleteTool as signal mechanisms via shared dict - Add LongTaskTool orchestrator with simplified prompt (8 iterations/step) - Register LongTaskTool in main agent loop - Add _extract_handoff_from_messages fallback for robustness * fix(long-task): add debug logging for step-level observability * feat(long-task): major overhaul with structured handoffs, validation, and observability - Structured HandoffState: HandoffTool now accepts files_created, files_modified, next_step_hint, and verification fields instead of a plain string. Progress is passed between steps as structured data. - Completion validation round: After complete() is called, a dedicated validator step runs to verify the claim against the original goal. If validation fails, the task continues rather than returning a false completion. - Dynamic prompt system: 3 Jinja2 templates (step_start, step_middle, step_final) selected based on step number. Final steps get tighter budget and stronger "wrap up" guidance. - Automatic file change tracking: Extracts write_file/edit_file events from tool_events and injects them into the next step's context if the subagent forgot to report them explicitly. - Budget tracking & adaptive strategy: Cumulative token usage is tracked across steps. Per-step tool budget drops from 8 to 4 in the last two steps to force handoff/completion. - Crash retry with graceful degradation: A step that crashes is retried once. Persistent crashes terminate the task and return partial progress. - Full observability hooks for future WebUI integration: - set_hooks() with on_step_start, on_step_complete, on_handoff, on_validation_started, on_validation_passed, on_validation_failed, on_task_complete, on_task_error, and catch-all on_event. - Readable state properties: current_step, total_steps, status, last_handoff, cumulative_usage, goal. - inject_correction() allows external code to send user corrections that are injected into the next step's prompt. - run_step() accepts optional max_iterations for dynamic budget control. All 27 long-task tests and 11 subagent tests pass. * test(long-task): add boundary tests and fix race conditions - Add 7 edge-case tests: validation crash resilience, hook exception safety, mid-run correction injection, FIFO correction ordering, explicit file changes overriding auto-detection, final budget for max_steps=1, and dynamic budget switching boundaries - Fix assertion in test_long_task_completes_after_multiple_handoffs to match exact prompt format - Remove asyncio timing hack from test_state_exposure - Add asyncio.sleep(0) yield in test_inject_correction_during_execution to prevent race between signal injection and step continuation - All 34 tests passing * fix(long-task): address code review findings - Declare _scopes = {"core"} explicitly to prevent recursive nesting in subagent scope - Document fragile coupling in _extract_file_changes: path extraction depends on write_file/edit_file detail format; add debug log for unexpected formats - Align final-template threshold (max_steps - 2) with budget switch threshold - Eliminate hasattr(self, "_state") in _reset_state by initializing in __init__ * fix(long-task): honor final signal and file tracking Co-authored-by: Cursor <cursoragent@cursor.com> * feat(long-task): improve prompt structure and agent contract - Expand LongTaskTool.description to instruct parent agent on goal construction, return value semantics, and how to handle results. - Expand CompleteTool.description to emphasize that the summary IS the final answer returned to the parent agent. - Prefix validated return value with an explicit "final answer" directive to stop parent agent from re-running work. - Redesign step_start.md: Step 1 is now explicitly for exploration, planning, and skeleton-building. complete() is discouraged. - Remove bulky payload debug logging from _emit(); add targeted info/warning/error logs at key state transitions instead. - Add signal_type to HandoffState for cleaner signal detection. * test(long-task): expect wrapped completion message after validation Align assertions with LongTaskTool final return shape on main. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(webui): turn timing strip, latency, and session-switch restore - Agent loop: publish goal_status run/idle for WebSocket turns; attach wall-clock latency_ms on turn_end and persisted assistant metadata. - WebSocket channel: forward goal_status and latency fields to clients. - NanobotClient: track goal_status started_at per chat without requiring onChat; useNanobotStream restores run strip when returning to a chat. - Thread UI: composer/shell viewport hooks for run duration and latency; format helpers and i18n strings. - MessageBubble: drop trailing StreamCursor (layout artifact vs block markdown). - Builtin / tests: model command coverage, websocket and loop tests. Covers multi-session UX and round-trip timing visibility for the WebUI. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: keep message-tool file attachments after canonical history hydrate - MessageTool records per-turn media paths delivered to the active chat. - nanobot.utils.session_attachments stages out-of-media-root files and merges into the last assistant message before save (loop stays a thin call). - WebUI MediaCell: use a signed URL as a real download link when present. Fixes attachments flashing then vanishing on turn_end when paths lived outside get_media_dir (e.g. workspace files). Co-authored-by: Cursor <cursoragent@cursor.com> * feat(webui): agent activity cluster, stable keys, LTR sheen labels - Group reasoning and tool traces in AgentActivityCluster with i18n summaries - Stabilize React list keys for activity clusters (first message id anchor) - Replace background-clip shimmer with overlay sheen for streaming labels - ThreadMessages/MessageList integration and locale strings Co-authored-by: Cursor <cursoragent@cursor.com> * fix(webui): render assistant reasoning with Markdown + deferred stream - Use MarkdownText for ReasoningBubble body (same GFM/KaTeX path as replies) - Apply muted/italic prose tokens so thinking stays visually subordinate - useDeferredValue while reasoningStreaming to ease parser work during deltas - Preload markdown chunk when trace opens; add regression test with preloaded renderer Co-authored-by: Cursor <cursoragent@cursor.com> * fix(webui): default-collapse agent activity cluster while Working Outer fold no longer auto-expands during isTurnStreaming; user opens to see traces. Header sheen and live summary unchanged. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(long_task): cumulative run history, file union, and prompt tuning Inject cross-step summaries and merged file paths into middle/final step templates so chains do not lose early context. Strip the last run-history block when it duplicates Previous Progress to save tokens. Add optional cumulative_prompt_max_chars and cumulative_step_body_max_chars parameters with clamped defaults. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(webui): session switch keeps in-flight thread and replays buffered WS Save the prior chat message list to the per-chat cache in a layout effect when chatId changes (before stale writes could corrupt another chat). Skip one post-switch layout cache tick so we do not snapshot the wrong tab. Buffer inbound events per chat_id when no onChat subscriber is registered (e.g. user focused another session) and drain on resubscribe up to a cap, so streaming deltas are not lost while off-tab. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(webui): snap thread scroll to bottom on session open (no smooth glide) Use scroll-behavior auto on the viewport, instant programmatic scroll when following new messages and on scrollToBottomSignal. Keep smooth only for the explicit scroll-to-bottom button. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(webui): respect manual scroll-up after opening a session Track when the user leaves the bottom with a ref and skip ResizeObserver and deferred bottom snaps until they return or the conversation is reset. Remove the time-based force-bottom window that overrode atBottom. Multi-frame scrollToBottom honours the same guard unless force (scroll button). Co-authored-by: Cursor <cursoragent@cursor.com> * Publish long_task UI snapshots on outbound metadata - Add OUTBOUND_META_AGENT_UI (_agent_ui) for channel-agnostic structured state - LongTaskTool publishes {kind: long_task, data: snapshot} on the bus with _progress - WebSocket send forwards metadata as agent_ui for WebUI clients - Tests for bus payload, WS frame, and progress assertions - Fix loop progress tests: ignore _goal_status in streaming final filter and avoid brittle outbound[-1] ordering after goal status idle messages Co-authored-by: Cursor <cursoragent@cursor.com> * feat: WebUI long_task activity card and resilient history merge Add optional ui_summary to the long_task tool for one-line UI labels. Stream long_task agent_ui into a dedicated message row with timeline, markdown peek, and a right sheet for details. Merge canonical history after turn_end while re-inserting long_task rows before the final assistant reply. Collapse duplicate task_start/step_start steps in the timeline and extend i18n. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor: align long_task with thread_goal and drop orchestrator UI - Persist sustained objectives via session metadata (long_task / complete_goal); no subagent wiring or tool-driven agent_ui payloads.\n- Remove WebUI long-task activity UI, types, and translations; history merge preserves trace replay only, with legacy long_task rows normalized to traces.\n- Drop long_task prompt templates and get_long_task_run_dir; add webui thread disk helper for gateway persistence tests. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(agent): thread goal runtime context, tools, and skill - Add thread_goal_state helper and mirror active objectives into Runtime Context - Wire loop/context/memory/events as needed for goal metadata in turns - Expand long_task / complete_goal semantics (pivot/cancel/honest recap) - Add always-on thread-goal SKILL.md; align /goal command prompt - Tests for context builder and thread goal state - Remove unused webui ChatPane component Co-authored-by: Cursor <cursoragent@cursor.com> * feat(thread-goal): add websocket snapshot helper and publish goal updates from long_task Introduce thread_goal_ws_blob for bounded JSON snapshots, attach snapshots to websocket turn_end metadata in AgentLoop, and let long_task fan-out dedicated thread_goal frames on the websocket channel after persisting session metadata. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(channels): websocket thread_goal frames, turn_end replay, and session API scrub for subagent inject Emit thread_goal events and optional thread_goal on turn_end; scrub persisted subagent announce blobs on GET /api/sessions/.../messages and shorten session list previews so WebUI does not surface full Task/Summarize scaffolding. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(webui): merge ephemeral traces per user turn when reconciling canonical history Preserve disk/live trace rows inside the matching user–assistant segment instead of stacking every trace before the final assistant reply (fixes inflated tool counts after refresh or session switch). Co-authored-by: Cursor <cursoragent@cursor.com> * feat(webui): show assistant reply copy only on the last slice before the next user turn Avoid duplicate copy affordances on intermediate assistant bubbles that precede more agent activity in the same turn (tools or further assistant text). Co-authored-by: Cursor <cursoragent@cursor.com> * feat(webui): thread_goal stream plumbing, composer goal strip, sky glow, and client-side subagent scrub projection Track thread_goal and turn_goal snapshots in NanobotClient, hydrate React state from thread_goal frames and turn_end, surface objective/elapsed in the composer, add breathing sky halo CSS while goals are active, mirror server scrub logic on history hydration and webui_thread snapshots, and extend tests/client mocks. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(channels): add Slack Socket Mode connect timeout with actionable timeout errors Abort hung websockets.connect handshakes after a bounded wait, log REST-vs-WSS guidance, surface RuntimeError to channel startup, and log successful WSS setup. Co-authored-by: Cursor <cursoragent@cursor.com> * webui: expand thread goal in composer bottom sheet Add ChevronUp control on the run/goal strip that opens a bottom Sheet with full ui_summary and objective. Inline preview logic in RunElapsedStrip, add i18n strings across locales, and a composer unit test. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(webui): widen dedupeToolCallsForUi input for session API typing fetchSessionMessages types tool_calls as unknown; accept unknown so tsc build passes when passing message.tool_calls through. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(agent): extract WebSocket turn run status to webui_turn_helpers * refactor(skills): rename thread-goal to long-task and document idempotent goals * feat(skills): rename sustained-goal skill to long-goal and tighten long_task guidance * chore: remove unused subagent/context/router helpers * feat(session): rename sustained goal to goal_state and align WS/WebUI - Move helpers from agent/thread_goal_state to session/goal_state: GOAL_STATE_KEY, goal_state_runtime_lines, goal_state_ws_blob, parse_goal_state. - Session metadata now uses "goal_state"; still read legacy "thread_goal"; long_task writes drop the legacy key after save. - WebSocket: event/field goal_state, _goal_state_sync; turn_end carries goal_state; accept legacy _thread_goal_sync/thread_goal inbound metadata for dispatch. - WebUI: GoalStateWsPayload, goalState hook/client props, i18n keys goalState. - Runtime Context copy uses "Goal (active):" instead of "Thread goal". feat(agent): stream Anthropic thinking deltas and fix stream idle timeout * refactor(webui): transcript jsonl as sole timeline source * fix(agent): reject mismatched WS message chat_id and stream reasoning deltas * feat(webui): hydrate sustained goal and run timer after websocket subscribe * chore(webui,websocket): remove unused fetch helpers and legacy thread_goal WS paths * Raise default max_tokens and context window in agent schema. Align AgentDefaults and ModelPresetConfig with typical Claude-scale usage (32k completion budget, 256k context window) and update migration tests. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(gateway): bootstrap prefers in-memory model; clarify websocket naming * fix(websocket): websocket _handle_message passes is_dm; refresh /status test expectations --------- Co-authored-by: chengyongru <2755839590@qq.com> Co-authored-by: chengyongru <chengyongru.ai@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-16 01:14:11 +08:00
hanyuanling	2d17a095dc	fix(codex): stabilize prompt cache key	2026-05-16 00:13:10 +08:00
Xubin Ren	07f9ab580a	fix(provider): preserve Bedrock tool config for history Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-12 20:59:01 +08:00
Xubin Ren	fd6887c274	test(providers): cover VolcEngine token parameter Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-12 11:35:52 +08:00
Alfredo Arenas	c6b7a9524c	fix(providers): wire MiMo to thinking_type to allow disabling reasoning (#3585 ) The hosted Xiaomi MiMo API accepts {"thinking": {"type": "enabled"\|"disabled"}} to toggle reasoning, which is exactly the shape produced by the existing thinking_type style. The xiaomi_mimo ProviderSpec just needed to opt in. Before this fix, setting reasoning_effort="none" had no effect on MiMo because no thinking_style was configured, so the disable signal never reached the server. Default-on models (mimo-v2.5-pro and friends) kept reasoning regardless of user configuration. Source: https://platform.xiaomimimo.com/docs/en-US/api/chat/openai-api Co-authored with Claude Opus 4.7. Strategy and review via Claude Desktop, implementation via Claude Code.	2026-05-11 14:38:28 +08:00
Xubin Ren	e936ed48bd	feat: add image generation tool and WebUI mode Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-08 20:06:23 +08:00
chengyongru	3437ff273f	fix(transcription): address review nits on PR #3253 - Correct api_key type hint to str \| None in _post_transcription_with_retry - Remove unreachable final return "" - Fix test_openai_missing_api_key_short_circuits to actually test missing-key path (use audio_file fixture so file exists) - Fix PermissionError patch for Windows (patch class method instead of instance attribute)	2026-05-06 15:52:29 +08:00
mohamed-elkholy95	7ebf611be8	fix(transcription): retry Whisper calls and guard malformed responses A single transient failure between the agent and an OpenAI/Groq Whisper endpoint currently vanishes as `return ""` in transcribe(). The voice message arrives as the empty string and there is no way to tell real silence apart from a failed upload. A malformed but successful response body is even worse: the JSON-decode error escapes the helper unhandled. Add a shared `_post_transcription_with_retry` used by both providers. Retry behaviour: - exponential backoff 1s -> 2s -> 4s, up to 3 retries (4 attempts) - retryable HTTP statuses: 408, 429, 500, 502, 503, 504 - retryable exceptions: TimeoutException, ConnectError, ReadError, WriteError, RemoteProtocolError Non-transient failures short-circuit to "" on the first attempt -- retrying a misconfigured key or a broken upload only burns rate-limit quota. Branches that short-circuit: - missing API key, missing audio file - file-read errors (PermissionError, OSError) on the audio path, preserving the nightly contract for direct provider callers - HTTP auth/4xx body issues via raise_for_status() - response.json() parse failures - non-dict JSON payloads Sharing one helper means OpenAI and Groq cannot drift apart silently. Thread `language` through the helper. The multipart files dict is rebuilt inside the per-attempt loop, so when a caller sets self.language the `language` field is sent on every attempt -- not just the first. Tests cover: - every advertised retryable status and exception, parameterized - language present on attempts 1 and 2 of a 503->200 sequence - language absent when unset; present when set (both providers) - malformed JSON body and non-dict JSON body short-circuit to "" - PermissionError on file read short-circuits with no HTTP attempt - max-attempts give-up, exponential-backoff schedule, auth no-retry, missing-key / missing-file short-circuit Test stub fix: the _StubResponse in tests/channels/test_channel_plugins.py declared no status_code, which the new helper reads for retry classification. Set status_code = 200 so the stub advertises the successful response that those tests already simulate. Also moved the two transcription-provider imports to the top of that file (previously placed mid-file) so the file is ruff-clean (E402).	2026-05-06 15:52:25 +08:00
04cb	9d6afd86b5	fix(provider): backfill DeepSeek reasoning_content instead of dropping history (#3554 , #3584 )	2026-05-04 12:14:38 +08:00
Xubin Ren	861fbb0dde	fix(provider): correct LongCat OpenAI base URL Use the SDK-ready /v1 base so LongCat chat completions hit the documented endpoint. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-02 01:52:04 +08:00
moranfong	051037ff08	feat(provider): add LongCat via OpenAI-compatible backend	2026-05-02 01:52:04 +08:00
Xubin Ren	fd1a5a6267	test(provider): tidy Anthropic fallback imports Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-01 23:59:24 +08:00
coldxiangyu	4c54a2b153	fix(anthropic): auto-fallback to stream on long-request error The Anthropic SDK raises a client-side ValueError when a non-streaming `messages.create` call could exceed the 10-minute server timeout (e.g. high `max_tokens` combined with extended thinking budget). The error text "Streaming is required for operations that may take longer than 10 minutes" was bubbling up to the user as an opaque LLM error in channels that use the non-stream path (e.g. wecom in #2709). Detect this specific ValueError in `chat()` and transparently retry through `chat_stream()` (without `on_content_delta` so behavior matches the non-stream contract). Other ValueErrors continue to flow through `_handle_error` unchanged. Closes #2709 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 23:59:24 +08:00
Xubin Ren	43a58335f6	fix(provider): narrow DeepSeek reasoning history cleanup Made-with: Cursor	2026-05-01 19:52:38 +08:00
Xubin Ren	306958d6e6	add native Bedrock Converse provider Made-with: Cursor	2026-05-01 18:52:03 +08:00
masterlyj	2b9b41f9c3	test(providers): cover reasoning_effort="none" and gemma auto-routing - Anthropic: "none" must not enable extended thinking - Azure: "none" must not suppress temperature or inject reasoning body - DeepSeek/DashScope/Kimi: "none" sends thinking disabled, skips reasoning_effort field - Gemini: gemma keyword enables auto-routing for gemma models	2026-04-29 15:41:11 +08:00
hussein1362	415e617398	feat(providers): add extra_body config for OpenAI-compatible endpoints Add an `extra_body` field to `ProviderConfig` that merges arbitrary key-value pairs into every OpenAI-compatible request body. This is the escape hatch for provider-specific features that nanobot does not have first-class fields for. Real-world use cases this unblocks via config alone (no code changes): - vLLM/TGI `chat_template_kwargs` (e.g. `enable_thinking: false`) - vLLM guided decoding (`guided_json`, `guided_regex`) - Local model sampling params (`repetition_penalty`, `top_k`, `min_p`) - Any future provider-specific param without a new PR each time The config extra_body is applied last via recursive deep-merge, so it can extend or override provider-specific defaults (e.g. thinking params) without clobbering sibling keys set by internal logic. Changes: - Add `extra_body: dict[str, Any] \| None` to `ProviderConfig` - Pass it through `factory.py` to `OpenAICompatProvider.__init__` - Deep-merge into `_build_kwargs` after all internal extra_body entries - Add `_deep_merge` helper (recursive dict merge, does not mutate inputs) - 21 tests: deep-merge semantics, provider init, _build_kwargs integration, thinking coexistence, real-world patterns (guided_json, repetition_penalty), and schema validation	2026-04-28 15:56:13 +08:00
Xubin Ren	fdfecd3ba6	refactor(codex): name progress delta capability semantically Use a provider capability name that describes user-visible progress delta support instead of the runner implementation detail. Made-with: Cursor	2026-04-27 18:48:05 +08:00
hanyuanling	ae14142a87	fix(codex): stream progress deltas to channels	2026-04-27 18:48:05 +08:00
hanyuanling	9dc99d1b34	fix(provider): bound OpenAI-compatible request timeouts	2026-04-27 17:47:31 +08:00
hanyuanling	8e0ce59c0e	fix(provider): normalize DeepSeek non-string message content	2026-04-27 15:43:41 +08:00
Xubin Ren	82b8a3af7e	fix(provider): handle incomplete DeepSeek reasoning history	2026-04-26 20:47:55 +08:00
chengyongru	3de843a229	fix(provider): gate reasoning-to-content fallback behind spec flag The non-streaming parse path unconditionally promoted the `reasoning` response field to `content` when content was empty. This was intended for StepFun (whose API returns the actual answer in `reasoning`), but it applied to every OpenAI-compatible provider — causing internal thinking chains from models like Xiaomi MIMO to be leaked as formal replies. Add `reasoning_as_content: bool` to ProviderSpec (default False) and set it only for StepFun. The fallback now requires this flag rather than running globally. Fixes #3443	2026-04-26 20:11:08 +08:00
Xubin Ren	1e11b35b45	fix(providers): tighten local endpoint detection Parse the endpoint host before disabling keepalive so public hostnames that merely contain private-network substrings keep the default connection pool behavior. Made-with: Cursor	2026-04-26 16:14:24 +08:00
hussein1362	5943ab386d	fix(providers): disable HTTP keepalive for local/LAN endpoints Local model servers (Ollama, llama.cpp, vLLM) often close idle HTTP connections before the client-side keepalive timer expires. When two LLM calls happen seconds apart — for example the heartbeat _decide() phase followed immediately by process_direct() — the second call grabs a now-dead pooled connection, causing a transient APIConnectionError on every first attempt. The fix detects local endpoints via: - ProviderSpec.is_local (Ollama, LM Studio, vLLM, OVMS) - Private-network URL patterns (localhost, 127.x, 192.168.x, 10.x, 172.16-31.x, host.docker.internal, [::1]) For these endpoints, the AsyncOpenAI client is created with a custom httpx.AsyncClient that sets keepalive_expiry=0, forcing a fresh TCP connection for each request. This is cheap on LAN (sub-5ms connect) and eliminates the stale-connection retry tax entirely. Cloud providers (OpenAI, Anthropic, OpenRouter, etc.) keep the default 5-second keepalive, which is fine for high-frequency API usage. The private-network heuristic also covers the common case where users configure provider='openai' but point apiBase at a LAN IP running llama.cpp — the spec says is_local=False, but the URL clearly is.	2026-04-26 16:14:24 +08:00
Xubin Ren	3441d5f89c	test(anthropic): cover remaining opus-4-7 temperature branches The existing test only verified the adaptive path. Add two more cases: - enabled thinking (high): temperature must also be omitted - no thinking (None): temperature must still be omitted Made-with: Cursor	2026-04-24 15:33:59 +08:00
04cb	9239429a00	fix(anthropic): omit temperature for opus-4-7 (#3417 )	2026-04-24 15:33:59 +08:00
Xubin Ren	7f1913f619	fix(provider): add DeepSeek thinking toggle; backfill reasoning_content on legacy messages Two issues with DeepSeek V4 thinking mode support: 1. Missing thinking parameter injection. DeepSeek V4 requires `extra_body: {"thinking": {"type": "enabled/disabled"}}` — identical to VolcEngine/BytePlus. The code had this for volcengine, byteplus, dashscope, minimax, and kimi but not DeepSeek. This means `reasoning_effort=minimal` (thinking off) silently has no effect. Root cause: the thinking-style→wire-format mapping was an if/elif chain on provider names. DeepSeek was forgotten. Fix: make the mapping declarative via `ProviderSpec.thinking_style`: - "thinking_type" → {"thinking": {"type": "..."}} (DeepSeek, Volc, BytePlus) - "enable_thinking" → {"enable_thinking": bool} (DashScope) - "reasoning_split" → {"reasoning_split": bool} (MiniMax) `_build_kwargs` now does a single dict lookup. Adding a new provider with an existing wire format requires zero changes to the function. 2. Legacy session messages crash thinking-mode requests. When a session was started without thinking mode (or with a different model), assistant messages lack reasoning_content. DeepSeek V4 in thinking mode rejects these with 400: "The reasoning_content in the thinking mode must be passed back to the API." This affects ALL assistant messages, not just those with tool_calls (despite the docs only mentioning the tool_calls case). Fix: `_build_kwargs` backfills `reasoning_content: ""` on every assistant message missing it, but only when thinking mode is active. This is semantically neutral — the model treats empty reasoning_content as "no thinking happened on that turn". The backfill only touches the in-memory request copy; session files on disk are untouched. Tests: +5 (3 thinking toggle, 2 backfill). Full suite: 2377 passed. Made-with: Cursor	2026-04-24 15:06:39 +08:00
Xubin Ren	239e91a4d6	test(anthropic): pin tool_result image_url conversion regression Adds a focused regression test so the fix for tool_result image handling cannot silently revert. Two cases: - list content with an image_url + text block -> image_url is translated to a native Anthropic image block, sibling text passes through unchanged - plain string content passes through untouched (the new list branch must not alter the string path) These cover the exact symptom surface (silent image drop with a "Non-transient LLM error with image content" warning) and the only two content shapes tool results actually take today. Made-with: Cursor	2026-04-22 22:10:53 +08:00
Xubin Ren	427deb4a70	test(providers): add regression tests for GitHub Copilot /responses routing Locks in the four behaviors introduced by the fix so they can't silently revert: - _should_use_responses_api accepts github_copilot on its non-OpenAI base - _build_responses_body strips the 'github_copilot/' routing prefix - /responses failures on github_copilot do not fall back to /chat/completions Made-with: Cursor	2026-04-22 06:53:37 +00:00
Xubin Ren	88c619901e	review(providers): tighten comments in reasoning_effort normalize path Made-with: Cursor	2026-04-22 12:49:55 +08:00
hlg	28c42628b0	fix: normalize DashScope reasoning_effort (minimal vs minimum) DashScope rejects the OpenAI-style value "minimal" with `'reasoning_effort.effort' must be one of: 'none', 'minimum', 'low', 'medium', 'high', 'xhigh'`, but nanobot was passing the string through verbatim. Users who tried the documented "minimal" to disable thinking got a 400; users who tried the DashScope-native "minimum" to work around it got `enable_thinking=True` because the internal comparison was a hard string match on "minimal". Introduce a semantic/wire split in `_build_kwargs`: - `semantic_effort` is the internal canonical form (OpenAI vocabulary). "minimum" on the way in is normalized to "minimal" here so both spellings share one meaning. - `wire_effort` is what we actually serialize. For DashScope with semantic_effort == "minimal" we translate to "minimum" on the way out; other providers are unchanged. - `thinking_enabled` and the Kimi thinking branch now compare on `semantic_effort`, so either user spelling correctly disables provider-side thinking. Tests: - Strengthen `test_dashscope_thinking_disabled_for_minimal` to assert the wire value is "minimum" in addition to the extra_body signal; the original version only checked extra_body and let the invalid-value bug slip through. - Add `test_dashscope_thinking_disabled_for_minimum_alias` so a user who read the DashScope docs and configured "minimum" still gets thinking off. - Add `test_non_dashscope_minimal_not_retranslated` to pin down that the DashScope-specific translation does not leak to OpenAI et al.	2026-04-22 12:49:55 +08:00
k	e5b288c6eb	fix: map MiniMax reasoning_effort to reasoning_split	2026-04-22 00:52:56 +08:00
chengyongru	37ea8b8f5b	fix(retry): recognize ZhiPu 1302 rate-limit error for retry ZhiPu API returns code 1302 with Chinese text "速率限制" instead of standard HTTP 429 + "rate limit", causing the retry engine to treat it as non-transient and fail immediately.	2026-04-21 21:23:20 +08:00
Xubin Ren	6c24f24e9e	feat(models): add support for kimi-k2.6 with temperature override and update documentation	2026-04-20 18:18:06 +00:00
Xubin Ren	009cce78ad	fix(anthropic): also enforce leading-user + empty-array recovery Extend `_merge_consecutive` so the three invariants from `LLMProvider._enforce_role_alternation` all hold for Anthropic: 1. collapse consecutive same-role turns (unchanged) 2. no trailing assistant — Anthropic rejects prefill (unchanged) 3. no leading assistant — Anthropic requires the first turn be user 4. non-empty messages array — recover the last stripped assistant as a user turn when every turn got stripped, so callers don't hit a secondary "messages array empty" 400 Anthropic-specific wrinkle: `tool_use` blocks live inside `content` (not a separate `tool_calls` field) and are illegal inside user turns, so both recovery paths skip any message carrying them rather than silently producing a malformed request. Adds 4 unit tests covering the new branches, including the tool_use opt-outs, and updates the existing `test_single_assistant_stripped` to reflect the new rerouting contract. Made-with: Cursor	2026-04-21 01:32:32 +08:00
hussein1362	2f02342083	fix(anthropic): strip trailing assistant messages to prevent prefill error Anthropic does not support assistant-message prefill and returns a 400 error when the conversation ends with an assistant turn. This commonly happens when heartbeat/system messages accumulate trailing assistant replies in the session history. The _merge_consecutive method already handles same-role merging but did not strip trailing assistant messages. The base provider's _enforce_role_alternation (used by OpenAI-compat) does strip them, but AnthropicProvider uses its own _merge_consecutive instead. Add a trailing-assistant stripping loop to _merge_consecutive, matching the behavior already present in _enforce_role_alternation. Includes 7 new tests covering merge + strip behavior.	2026-04-21 01:32:32 +08:00
Xubin Ren	b6d63fb1ec	fix: normalize responses circuit breaker keys Made-with: Cursor	2026-04-19 20:16:25 +08:00
Mohamed Elkholy	baba3b2160	fix(providers): add circuit breaker for Responses API fallback When the Responses API fails repeatedly (3 consecutive compatibility errors), skip it and fall back directly to Chat Completions. Unlike a permanent disable, the circuit re-probes after 5 minutes so recovery is automatic when the API comes back. Success resets the counter. Keyed per (model, reasoning_effort) so a failure with one model does not affect others.	2026-04-19 20:16:25 +08:00
Xubin Ren	b8d327dc41	test + docs: lock should_execute_tools guard semantics (#3220 ) Two small follow-ups to the guard: 1. Fix the should_execute_tools docstring so it matches the actual code. The previous version said "Only execute when finish_reason explicitly signals tool intent" but the code also accepts finish_reason == "stop". Explain why (some compliant providers emit "stop" with legitimate tool calls — openai_compat_provider.py already mirrors this at lines ~633 / ~678 where ("tool_calls", "stop") are both treated as the terminal tool-call state). Without this, a strict "tool_calls"-only guard would regress 15 existing runner tests that construct LLMResponse with tool_calls but no explicit finish_reason (default = "stop"). 2. Add tests/providers/test_llm_response.py. This locks the three cases: - no tool calls -> never executes - tool calls + "tool_calls"/stop -> executes - tool calls + refusal / content_filter / error / length / ... -> blocked These are exactly the boundary cases the #3220 fix is about; without a test here a future refactor could silently revert the guard. Body + tests only, no behavior change beyond the existing PR's intent. Made-with: Cursor	2026-04-17 20:39:46 +08:00
chengyongru	8c0c4e5b31	refactor(agent): tighten comments, extract constant, strengthen edge case test - Extract synthetic user message string to module-level constant - Tighten comments in _snip_history recovery branch - Strengthen no-user edge case test to verify safety net interaction	2026-04-17 16:20:53 +08:00
chengyongru	44b526c4ee	fix(agent): preserve user message in _snip_history to prevent GLM error 1214 When _snip_history truncates the message history and the only user message ends up outside the kept window, providers like GLM reject the resulting system→assistant sequence with error 1214 ("messages 参数非法"). Two-layer fix: 1. _snip_history now walks backwards through non_system messages to recover the nearest user message when none exists in the kept window. 2. _enforce_role_alternation inserts a synthetic user message "(conversation continued)" when the first non-system message is a bare assistant (no tool_calls), serving as a safety net for any edge cases that slip through. Co-authored-by: darlingbud <darlingbud@users.noreply.github.com>	2026-04-17 16:20:53 +08:00
Xubin Ren	a6ea06e6bf	docs(providers): explain MiniMax thinking endpoint Document why MiniMax thinking mode uses a separate Anthropic-compatible provider and list the matching base URLs. Add a small registry test so the new provider stays wired to the expected backend and API key. Made-with: Cursor	2026-04-16 01:00:45 +08:00
04cb	eacc9fbb5f	refactor(providers): drop unreachable GenerationSettings fallback	2026-04-15 23:52:38 +08:00
04cb	54f7ad3752	fix(providers): guard chat_with_retry against explicit None max_tokens (#3102 )	2026-04-15 23:52:38 +08:00
razzh	9e2278826f	feat(provider): enable Kimi thinking via extra_body for k2.5 and k2.6 - Inject `thinking={"type": "enabled\|disabled"}` via extra_body for Kimi thinking-capable models (kimi-k2.5, k2.6-code-preview). - Add _is_kimi_thinking_model helper to handle both bare slugs and OpenRouter-style prefixed names (e.g. moonshotai/kimi-k2.5). - reasoning_effort="minimal" maps to disabled; any other value enables it. - Add tests for enabled/disabled states and OpenRouter prefix handling.	2026-04-15 01:59:32 +08:00
Xubin Ren	a0812ad60e	test: cover retry termination notifications Lock the new interaction-channel retry termination hints so both exhausted standard retries and persistent identical-error stops keep emitting the final progress message. Made-with: Cursor	2026-04-15 01:55:57 +08:00
Xubin Ren	b60e8dc0ba	test: cover missing tool-call arguments normalization Lock the strict-provider sanitization path so assistant tool calls without function.arguments are normalized to {} instead of being forwarded as missing values. Made-with: Cursor	2026-04-15 01:37:41 +08:00
Michael-lhh	f293ff7f18	fix: normalize tool-call arguments for strict providers Ensure assistant tool-call function.arguments is always emitted as valid JSON text so strict OpenAI-compatible backends (including Alibaba code models) do not reject requests. Add regressions for dict and malformed-string argument payloads in message sanitization. Made-with: Cursor	2026-04-15 01:37:41 +08:00
chengyongru	ac714803f6	fix(provider): recover trailing assistant message as user to prevent empty request When a subagent result is injected with current_role="assistant", _enforce_role_alternation drops the trailing assistant message, leaving only the system prompt. Providers like Zhipu/GLM reject such requests with error 1214 ("messages parameter invalid"). Now the last popped assistant message is recovered as a user message when no user/tool messages remain.	2026-04-13 12:54:39 +08:00

1 2

98 Commits