89 Commits

Author SHA1 Message Date
Xubin Ren
861fbb0dde fix(provider): correct LongCat OpenAI base URL
Use the SDK-ready /v1 base so LongCat chat completions hit the documented endpoint.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-02 01:52:04 +08:00
moranfong
051037ff08 feat(provider): add LongCat via OpenAI-compatible backend 2026-05-02 01:52:04 +08:00
Xubin Ren
fd1a5a6267 test(provider): tidy Anthropic fallback imports
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-01 23:59:24 +08:00
coldxiangyu
4c54a2b153 fix(anthropic): auto-fallback to stream on long-request error
The Anthropic SDK raises a client-side ValueError when a non-streaming
`messages.create` call could exceed the 10-minute server timeout (e.g.
high `max_tokens` combined with extended thinking budget). The error
text "Streaming is required for operations that may take longer than
10 minutes" was bubbling up to the user as an opaque LLM error in
channels that use the non-stream path (e.g. wecom in #2709).

Detect this specific ValueError in `chat()` and transparently retry
through `chat_stream()` (without `on_content_delta` so behavior matches
the non-stream contract). Other ValueErrors continue to flow through
`_handle_error` unchanged.

Closes #2709

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 23:59:24 +08:00
Xubin Ren
43a58335f6 fix(provider): narrow DeepSeek reasoning history cleanup
Made-with: Cursor
2026-05-01 19:52:38 +08:00
Xubin Ren
306958d6e6 add native Bedrock Converse provider
Made-with: Cursor
2026-05-01 18:52:03 +08:00
masterlyj
2b9b41f9c3 test(providers): cover reasoning_effort="none" and gemma auto-routing
- Anthropic: "none" must not enable extended thinking
- Azure: "none" must not suppress temperature or inject reasoning body
- DeepSeek/DashScope/Kimi: "none" sends thinking disabled, skips reasoning_effort field
- Gemini: gemma keyword enables auto-routing for gemma models
2026-04-29 15:41:11 +08:00
hussein1362
415e617398 feat(providers): add extra_body config for OpenAI-compatible endpoints
Add an `extra_body` field to `ProviderConfig` that merges arbitrary
key-value pairs into every OpenAI-compatible request body. This is the
escape hatch for provider-specific features that nanobot does not have
first-class fields for.

Real-world use cases this unblocks via config alone (no code changes):
- vLLM/TGI `chat_template_kwargs` (e.g. `enable_thinking: false`)
- vLLM guided decoding (`guided_json`, `guided_regex`)
- Local model sampling params (`repetition_penalty`, `top_k`, `min_p`)
- Any future provider-specific param without a new PR each time

The config extra_body is applied last via recursive deep-merge, so it
can extend or override provider-specific defaults (e.g. thinking
params) without clobbering sibling keys set by internal logic.

Changes:
- Add `extra_body: dict[str, Any] | None` to `ProviderConfig`
- Pass it through `factory.py` to `OpenAICompatProvider.__init__`
- Deep-merge into `_build_kwargs` after all internal extra_body entries
- Add `_deep_merge` helper (recursive dict merge, does not mutate inputs)
- 21 tests: deep-merge semantics, provider init, _build_kwargs
  integration, thinking coexistence, real-world patterns (guided_json,
  repetition_penalty), and schema validation
2026-04-28 15:56:13 +08:00
Xubin Ren
fdfecd3ba6 refactor(codex): name progress delta capability semantically
Use a provider capability name that describes user-visible progress delta support instead of the runner implementation detail.

Made-with: Cursor
2026-04-27 18:48:05 +08:00
hanyuanling
ae14142a87 fix(codex): stream progress deltas to channels 2026-04-27 18:48:05 +08:00
hanyuanling
9dc99d1b34 fix(provider): bound OpenAI-compatible request timeouts 2026-04-27 17:47:31 +08:00
hanyuanling
8e0ce59c0e fix(provider): normalize DeepSeek non-string message content 2026-04-27 15:43:41 +08:00
Xubin Ren
82b8a3af7e fix(provider): handle incomplete DeepSeek reasoning history 2026-04-26 20:47:55 +08:00
chengyongru
3de843a229 fix(provider): gate reasoning-to-content fallback behind spec flag
The non-streaming parse path unconditionally promoted the `reasoning`
response field to `content` when content was empty. This was intended
for StepFun (whose API returns the actual answer in `reasoning`), but
it applied to every OpenAI-compatible provider — causing internal
thinking chains from models like Xiaomi MIMO to be leaked as formal
replies.

Add `reasoning_as_content: bool` to ProviderSpec (default False) and
set it only for StepFun. The fallback now requires this flag rather
than running globally.

Fixes #3443
2026-04-26 20:11:08 +08:00
Xubin Ren
1e11b35b45 fix(providers): tighten local endpoint detection
Parse the endpoint host before disabling keepalive so public hostnames that merely contain private-network substrings keep the default connection pool behavior.

Made-with: Cursor
2026-04-26 16:14:24 +08:00
hussein1362
5943ab386d fix(providers): disable HTTP keepalive for local/LAN endpoints
Local model servers (Ollama, llama.cpp, vLLM) often close idle HTTP
connections before the client-side keepalive timer expires.  When two
LLM calls happen seconds apart — for example the heartbeat _decide()
phase followed immediately by process_direct() — the second call grabs
a now-dead pooled connection, causing a transient APIConnectionError
on every first attempt.

The fix detects local endpoints via:
- ProviderSpec.is_local (Ollama, LM Studio, vLLM, OVMS)
- Private-network URL patterns (localhost, 127.x, 192.168.x, 10.x,
  172.16-31.x, host.docker.internal, [::1])

For these endpoints, the AsyncOpenAI client is created with a custom
httpx.AsyncClient that sets keepalive_expiry=0, forcing a fresh TCP
connection for each request.  This is cheap on LAN (sub-5ms connect)
and eliminates the stale-connection retry tax entirely.

Cloud providers (OpenAI, Anthropic, OpenRouter, etc.) keep the default
5-second keepalive, which is fine for high-frequency API usage.

The private-network heuristic also covers the common case where users
configure provider='openai' but point apiBase at a LAN IP running
llama.cpp — the spec says is_local=False, but the URL clearly is.
2026-04-26 16:14:24 +08:00
Xubin Ren
3441d5f89c test(anthropic): cover remaining opus-4-7 temperature branches
The existing test only verified the adaptive path. Add two more cases:
- enabled thinking (high): temperature must also be omitted
- no thinking (None): temperature must still be omitted

Made-with: Cursor
2026-04-24 15:33:59 +08:00
04cb
9239429a00 fix(anthropic): omit temperature for opus-4-7 (#3417) 2026-04-24 15:33:59 +08:00
Xubin Ren
7f1913f619 fix(provider): add DeepSeek thinking toggle; backfill reasoning_content on legacy messages
Two issues with DeepSeek V4 thinking mode support:

1. Missing thinking parameter injection.
   DeepSeek V4 requires `extra_body: {"thinking": {"type": "enabled/disabled"}}`
   — identical to VolcEngine/BytePlus. The code had this for volcengine,
   byteplus, dashscope, minimax, and kimi but not DeepSeek. This means
   `reasoning_effort=minimal` (thinking off) silently has no effect.

   Root cause: the thinking-style→wire-format mapping was an if/elif chain
   on provider *names*. DeepSeek was forgotten.

   Fix: make the mapping declarative via `ProviderSpec.thinking_style`:
   - "thinking_type" → {"thinking": {"type": "..."}} (DeepSeek, Volc, BytePlus)
   - "enable_thinking" → {"enable_thinking": bool} (DashScope)
   - "reasoning_split" → {"reasoning_split": bool} (MiniMax)
   `_build_kwargs` now does a single dict lookup. Adding a new provider
   with an existing wire format requires zero changes to the function.

2. Legacy session messages crash thinking-mode requests.
   When a session was started without thinking mode (or with a different
   model), assistant messages lack reasoning_content. DeepSeek V4 in
   thinking mode rejects these with 400:
   "The reasoning_content in the thinking mode must be passed back to the API."
   This affects ALL assistant messages, not just those with tool_calls
   (despite the docs only mentioning the tool_calls case).

   Fix: `_build_kwargs` backfills `reasoning_content: ""` on every
   assistant message missing it, but only when thinking mode is active.
   This is semantically neutral — the model treats empty reasoning_content
   as "no thinking happened on that turn". The backfill only touches the
   in-memory request copy; session files on disk are untouched.

Tests: +5 (3 thinking toggle, 2 backfill). Full suite: 2377 passed.
Made-with: Cursor
2026-04-24 15:06:39 +08:00
Xubin Ren
239e91a4d6 test(anthropic): pin tool_result image_url conversion regression
Adds a focused regression test so the fix for tool_result image
handling cannot silently revert. Two cases:

- list content with an image_url + text block -> image_url is
  translated to a native Anthropic image block, sibling text passes
  through unchanged
- plain string content passes through untouched (the new list branch
  must not alter the string path)

These cover the exact symptom surface (silent image drop with a
"Non-transient LLM error with image content" warning) and the only
two content shapes tool results actually take today.

Made-with: Cursor
2026-04-22 22:10:53 +08:00
Xubin Ren
427deb4a70 test(providers): add regression tests for GitHub Copilot /responses routing
Locks in the four behaviors introduced by the fix so they can't silently
revert:
- _should_use_responses_api accepts github_copilot on its non-OpenAI base
- _build_responses_body strips the 'github_copilot/' routing prefix
- /responses failures on github_copilot do not fall back to /chat/completions

Made-with: Cursor
2026-04-22 06:53:37 +00:00
Xubin Ren
88c619901e review(providers): tighten comments in reasoning_effort normalize path
Made-with: Cursor
2026-04-22 12:49:55 +08:00
hlg
28c42628b0 fix: normalize DashScope reasoning_effort (minimal vs minimum)
DashScope rejects the OpenAI-style value "minimal" with
`'reasoning_effort.effort' must be one of: 'none', 'minimum', 'low',
'medium', 'high', 'xhigh'`, but nanobot was passing the string through
verbatim. Users who tried the documented "minimal" to disable thinking
got a 400; users who tried the DashScope-native "minimum" to work
around it got `enable_thinking=True` because the internal comparison
was a hard string match on "minimal".

Introduce a semantic/wire split in `_build_kwargs`:

- `semantic_effort` is the internal canonical form (OpenAI vocabulary).
  "minimum" on the way in is normalized to "minimal" here so both
  spellings share one meaning.
- `wire_effort` is what we actually serialize. For DashScope with
  semantic_effort == "minimal" we translate to "minimum" on the way
  out; other providers are unchanged.
- `thinking_enabled` and the Kimi thinking branch now compare on
  `semantic_effort`, so either user spelling correctly disables
  provider-side thinking.

Tests:

- Strengthen `test_dashscope_thinking_disabled_for_minimal` to assert
  the wire value is "minimum" in addition to the extra_body signal;
  the original version only checked extra_body and let the
  invalid-value bug slip through.
- Add `test_dashscope_thinking_disabled_for_minimum_alias` so a user
  who read the DashScope docs and configured "minimum" still gets
  thinking off.
- Add `test_non_dashscope_minimal_not_retranslated` to pin down that
  the DashScope-specific translation does not leak to OpenAI et al.
2026-04-22 12:49:55 +08:00
k
e5b288c6eb fix: map MiniMax reasoning_effort to reasoning_split 2026-04-22 00:52:56 +08:00
chengyongru
37ea8b8f5b fix(retry): recognize ZhiPu 1302 rate-limit error for retry
ZhiPu API returns code 1302 with Chinese text "速率限制" instead of
standard HTTP 429 + "rate limit", causing the retry engine to treat
it as non-transient and fail immediately.
2026-04-21 21:23:20 +08:00
Xubin Ren
6c24f24e9e feat(models): add support for kimi-k2.6 with temperature override and update documentation 2026-04-20 18:18:06 +00:00
Xubin Ren
009cce78ad fix(anthropic): also enforce leading-user + empty-array recovery
Extend `_merge_consecutive` so the three invariants from
`LLMProvider._enforce_role_alternation` all hold for Anthropic:

1. collapse consecutive same-role turns (unchanged)
2. no trailing assistant — Anthropic rejects prefill (unchanged)
3. no leading assistant — Anthropic requires the first turn be user
4. non-empty messages array — recover the last stripped assistant as a
   user turn when every turn got stripped, so callers don't hit a
   secondary "messages array empty" 400

Anthropic-specific wrinkle: `tool_use` blocks live inside `content` (not
a separate `tool_calls` field) and are illegal inside user turns, so
both recovery paths skip any message carrying them rather than silently
producing a malformed request.

Adds 4 unit tests covering the new branches, including the tool_use
opt-outs, and updates the existing `test_single_assistant_stripped` to
reflect the new rerouting contract.

Made-with: Cursor
2026-04-21 01:32:32 +08:00
hussein1362
2f02342083 fix(anthropic): strip trailing assistant messages to prevent prefill error
Anthropic does not support assistant-message prefill and returns a 400
error when the conversation ends with an assistant turn. This commonly
happens when heartbeat/system messages accumulate trailing assistant
replies in the session history.

The _merge_consecutive method already handles same-role merging but did
not strip trailing assistant messages. The base provider's
_enforce_role_alternation (used by OpenAI-compat) does strip them, but
AnthropicProvider uses its own _merge_consecutive instead.

Add a trailing-assistant stripping loop to _merge_consecutive, matching
the behavior already present in _enforce_role_alternation.

Includes 7 new tests covering merge + strip behavior.
2026-04-21 01:32:32 +08:00
Xubin Ren
b6d63fb1ec fix: normalize responses circuit breaker keys
Made-with: Cursor
2026-04-19 20:16:25 +08:00
Mohamed Elkholy
baba3b2160 fix(providers): add circuit breaker for Responses API fallback
When the Responses API fails repeatedly (3 consecutive compatibility
errors), skip it and fall back directly to Chat Completions.  Unlike a
permanent disable, the circuit re-probes after 5 minutes so recovery
is automatic when the API comes back.  Success resets the counter.

Keyed per (model, reasoning_effort) so a failure with one model does
not affect others.
2026-04-19 20:16:25 +08:00
Xubin Ren
b8d327dc41 test + docs: lock should_execute_tools guard semantics (#3220)
Two small follow-ups to the guard:

1. Fix the should_execute_tools docstring so it matches the actual code.
   The previous version said "Only execute when finish_reason explicitly
   signals tool intent" but the code also accepts finish_reason == "stop".
   Explain why (some compliant providers emit "stop" with legitimate tool
   calls — openai_compat_provider.py already mirrors this at lines ~633 /
   ~678 where ("tool_calls", "stop") are both treated as the terminal
   tool-call state). Without this, a strict "tool_calls"-only guard would
   regress 15 existing runner tests that construct LLMResponse with
   tool_calls but no explicit finish_reason (default = "stop").

2. Add tests/providers/test_llm_response.py. This locks the three cases:
   - no tool calls                  -> never executes
   - tool calls + "tool_calls"/stop -> executes
   - tool calls + refusal / content_filter / error / length / ... -> blocked

   These are exactly the boundary cases the #3220 fix is about; without a
   test here a future refactor could silently revert the guard.

Body + tests only, no behavior change beyond the existing PR's intent.

Made-with: Cursor
2026-04-17 20:39:46 +08:00
chengyongru
8c0c4e5b31 refactor(agent): tighten comments, extract constant, strengthen edge case test
- Extract synthetic user message string to module-level constant
- Tighten comments in _snip_history recovery branch
- Strengthen no-user edge case test to verify safety net interaction
2026-04-17 16:20:53 +08:00
chengyongru
44b526c4ee fix(agent): preserve user message in _snip_history to prevent GLM error 1214
When _snip_history truncates the message history and the only user message
ends up outside the kept window, providers like GLM reject the resulting
system→assistant sequence with error 1214 ("messages 参数非法").

Two-layer fix:
1. _snip_history now walks backwards through non_system messages to recover
   the nearest user message when none exists in the kept window.
2. _enforce_role_alternation inserts a synthetic user message
   "(conversation continued)" when the first non-system message is a bare
   assistant (no tool_calls), serving as a safety net for any edge cases
   that slip through.

Co-authored-by: darlingbud <darlingbud@users.noreply.github.com>
2026-04-17 16:20:53 +08:00
Xubin Ren
a6ea06e6bf docs(providers): explain MiniMax thinking endpoint
Document why MiniMax thinking mode uses a separate Anthropic-compatible provider and list the matching base URLs. Add a small registry test so the new provider stays wired to the expected backend and API key.

Made-with: Cursor
2026-04-16 01:00:45 +08:00
04cb
eacc9fbb5f refactor(providers): drop unreachable GenerationSettings fallback 2026-04-15 23:52:38 +08:00
04cb
54f7ad3752 fix(providers): guard chat_with_retry against explicit None max_tokens (#3102) 2026-04-15 23:52:38 +08:00
razzh
9e2278826f feat(provider): enable Kimi thinking via extra_body for k2.5 and k2.6
- Inject `thinking={"type": "enabled|disabled"}` via extra_body for
  Kimi thinking-capable models (kimi-k2.5, k2.6-code-preview).
- Add _is_kimi_thinking_model helper to handle both bare slugs and
  OpenRouter-style prefixed names (e.g. moonshotai/kimi-k2.5).
- reasoning_effort="minimal" maps to disabled; any other value enables it.
- Add tests for enabled/disabled states and OpenRouter prefix handling.
2026-04-15 01:59:32 +08:00
Xubin Ren
a0812ad60e test: cover retry termination notifications
Lock the new interaction-channel retry termination hints so both exhausted standard retries and persistent identical-error stops keep emitting the final progress message.

Made-with: Cursor
2026-04-15 01:55:57 +08:00
Xubin Ren
b60e8dc0ba test: cover missing tool-call arguments normalization
Lock the strict-provider sanitization path so assistant tool calls without function.arguments are normalized to {} instead of being forwarded as missing values.

Made-with: Cursor
2026-04-15 01:37:41 +08:00
Michael-lhh
f293ff7f18 fix: normalize tool-call arguments for strict providers
Ensure assistant tool-call function.arguments is always emitted as valid JSON text so strict OpenAI-compatible backends (including Alibaba code models) do not reject requests. Add regressions for dict and malformed-string argument payloads in message sanitization.

Made-with: Cursor
2026-04-15 01:37:41 +08:00
chengyongru
ac714803f6 fix(provider): recover trailing assistant message as user to prevent empty request
When a subagent result is injected with current_role="assistant",
_enforce_role_alternation drops the trailing assistant message, leaving
only the system prompt. Providers like Zhipu/GLM reject such requests
with error 1214 ("messages parameter invalid"). Now the last popped
assistant message is recovered as a user message when no user/tool
messages remain.
2026-04-13 12:54:39 +08:00
haosenwang1018
3573109408 fix(provider): preserve static error helper compatibility 2026-04-13 09:37:31 +08:00
haosenwang1018
c68b3edb9d fix(provider): clarify local 502 recovery hints 2026-04-13 09:37:31 +08:00
Xubin Ren
217e1fc957 test(retry): lock in-place image fallback behavior
Add a focused regression test for the successful no-image retry path so the original message history stays stripped after fallback and the repeated retry loop cannot silently return.

Made-with: Cursor
2026-04-12 20:10:06 +08:00
yanghan-cyber
b261201985 fix(retry): strip images in-place to prevent repeated error-retry cycles
When a non-transient LLM error occurs with image content, the retry
mechanism strips images from a copy but never updates the original
conversation history. Subsequent iterations rebuild context from the
unmodified history, causing the same error-retry cycle to repeat
every iteration until max_iterations is reached.

Add _strip_image_content_inplace() that mutates the original message
content lists in-place after a successful no-image retry, so callers
sharing those references (e.g. the runner's conversation history)
also see the stripped version.
2026-04-12 20:10:06 +08:00
Xubin Ren
2bef9cb650 fix(agent): preserve interrupted tool-call turns
Keep tool-call assistant messages valid across provider sanitization and avoid trailing user-only history after model errors. This prevents follow-up requests from sending broken tool chains back to the gateway.
2026-04-10 05:37:25 +00:00
Xubin Ren
dadf453097 Merge origin/main into fix/sanitize-messages-non-claude
Resolved conflict in azure_openai_provider.py by keeping main's
Responses API implementation (role alternation not needed for the
Responses API input format).

Made-with: Cursor
2026-04-09 04:45:45 +00:00
Xubin Ren
d084d10dc2 feat(openai): auto-route direct reasoning requests with responses fallback 2026-04-08 15:21:08 +00:00
Xubin Ren
63acfc4f2f test: fix trailing-space mismatch and add regression tests for normal models
- Fix assertion in streaming dict fallback test (trailing space in data
  not reflected in expected value).
- Add two regression tests proving that models with reasoning_content
  (e.g. DeepSeek-R1) and standard models (no reasoning fields) are
  completely unaffected by the reasoning fallback.

Made-with: Cursor
2026-04-08 00:59:39 +08:00
moranfong
9e7c07ac89 test(provider): add StepFun reasoning field fallback tests
Add comprehensive tests for the StepFun Plan API compatibility fix:
- _parse dict branch: content and reasoning_content fallback to reasoning
- _parse SDK object branch: same fallback for pydantic response objects
- _parse_chunks dict branch: reasoning field handled in streaming mode
- _parse_chunks SDK branch: reasoning fallback for SDK delta objects
- Precedence tests: reasoning_content field takes priority over reasoning

Refs: fix(provider): support StepFun Plan API reasoning field fallback
2026-04-08 00:59:39 +08:00