77 Commits

Author SHA1 Message Date
Xubin Ren
82b8a3af7e fix(provider): handle incomplete DeepSeek reasoning history 2026-04-26 20:47:55 +08:00
chengyongru
3de843a229 fix(provider): gate reasoning-to-content fallback behind spec flag
The non-streaming parse path unconditionally promoted the `reasoning`
response field to `content` when content was empty. This was intended
for StepFun (whose API returns the actual answer in `reasoning`), but
it applied to every OpenAI-compatible provider — causing internal
thinking chains from models like Xiaomi MIMO to be leaked as formal
replies.

Add `reasoning_as_content: bool` to ProviderSpec (default False) and
set it only for StepFun. The fallback now requires this flag rather
than running globally.

Fixes #3443
2026-04-26 20:11:08 +08:00
Xubin Ren
1e11b35b45 fix(providers): tighten local endpoint detection
Parse the endpoint host before disabling keepalive so public hostnames that merely contain private-network substrings keep the default connection pool behavior.

Made-with: Cursor
2026-04-26 16:14:24 +08:00
hussein1362
5943ab386d fix(providers): disable HTTP keepalive for local/LAN endpoints
Local model servers (Ollama, llama.cpp, vLLM) often close idle HTTP
connections before the client-side keepalive timer expires.  When two
LLM calls happen seconds apart — for example the heartbeat _decide()
phase followed immediately by process_direct() — the second call grabs
a now-dead pooled connection, causing a transient APIConnectionError
on every first attempt.

The fix detects local endpoints via:
- ProviderSpec.is_local (Ollama, LM Studio, vLLM, OVMS)
- Private-network URL patterns (localhost, 127.x, 192.168.x, 10.x,
  172.16-31.x, host.docker.internal, [::1])

For these endpoints, the AsyncOpenAI client is created with a custom
httpx.AsyncClient that sets keepalive_expiry=0, forcing a fresh TCP
connection for each request.  This is cheap on LAN (sub-5ms connect)
and eliminates the stale-connection retry tax entirely.

Cloud providers (OpenAI, Anthropic, OpenRouter, etc.) keep the default
5-second keepalive, which is fine for high-frequency API usage.

The private-network heuristic also covers the common case where users
configure provider='openai' but point apiBase at a LAN IP running
llama.cpp — the spec says is_local=False, but the URL clearly is.
2026-04-26 16:14:24 +08:00
Xubin Ren
3441d5f89c test(anthropic): cover remaining opus-4-7 temperature branches
The existing test only verified the adaptive path. Add two more cases:
- enabled thinking (high): temperature must also be omitted
- no thinking (None): temperature must still be omitted

Made-with: Cursor
2026-04-24 15:33:59 +08:00
04cb
9239429a00 fix(anthropic): omit temperature for opus-4-7 (#3417) 2026-04-24 15:33:59 +08:00
Xubin Ren
7f1913f619 fix(provider): add DeepSeek thinking toggle; backfill reasoning_content on legacy messages
Two issues with DeepSeek V4 thinking mode support:

1. Missing thinking parameter injection.
   DeepSeek V4 requires `extra_body: {"thinking": {"type": "enabled/disabled"}}`
   — identical to VolcEngine/BytePlus. The code had this for volcengine,
   byteplus, dashscope, minimax, and kimi but not DeepSeek. This means
   `reasoning_effort=minimal` (thinking off) silently has no effect.

   Root cause: the thinking-style→wire-format mapping was an if/elif chain
   on provider *names*. DeepSeek was forgotten.

   Fix: make the mapping declarative via `ProviderSpec.thinking_style`:
   - "thinking_type" → {"thinking": {"type": "..."}} (DeepSeek, Volc, BytePlus)
   - "enable_thinking" → {"enable_thinking": bool} (DashScope)
   - "reasoning_split" → {"reasoning_split": bool} (MiniMax)
   `_build_kwargs` now does a single dict lookup. Adding a new provider
   with an existing wire format requires zero changes to the function.

2. Legacy session messages crash thinking-mode requests.
   When a session was started without thinking mode (or with a different
   model), assistant messages lack reasoning_content. DeepSeek V4 in
   thinking mode rejects these with 400:
   "The reasoning_content in the thinking mode must be passed back to the API."
   This affects ALL assistant messages, not just those with tool_calls
   (despite the docs only mentioning the tool_calls case).

   Fix: `_build_kwargs` backfills `reasoning_content: ""` on every
   assistant message missing it, but only when thinking mode is active.
   This is semantically neutral — the model treats empty reasoning_content
   as "no thinking happened on that turn". The backfill only touches the
   in-memory request copy; session files on disk are untouched.

Tests: +5 (3 thinking toggle, 2 backfill). Full suite: 2377 passed.
Made-with: Cursor
2026-04-24 15:06:39 +08:00
Xubin Ren
239e91a4d6 test(anthropic): pin tool_result image_url conversion regression
Adds a focused regression test so the fix for tool_result image
handling cannot silently revert. Two cases:

- list content with an image_url + text block -> image_url is
  translated to a native Anthropic image block, sibling text passes
  through unchanged
- plain string content passes through untouched (the new list branch
  must not alter the string path)

These cover the exact symptom surface (silent image drop with a
"Non-transient LLM error with image content" warning) and the only
two content shapes tool results actually take today.

Made-with: Cursor
2026-04-22 22:10:53 +08:00
Xubin Ren
427deb4a70 test(providers): add regression tests for GitHub Copilot /responses routing
Locks in the four behaviors introduced by the fix so they can't silently
revert:
- _should_use_responses_api accepts github_copilot on its non-OpenAI base
- _build_responses_body strips the 'github_copilot/' routing prefix
- /responses failures on github_copilot do not fall back to /chat/completions

Made-with: Cursor
2026-04-22 06:53:37 +00:00
Xubin Ren
88c619901e review(providers): tighten comments in reasoning_effort normalize path
Made-with: Cursor
2026-04-22 12:49:55 +08:00
hlg
28c42628b0 fix: normalize DashScope reasoning_effort (minimal vs minimum)
DashScope rejects the OpenAI-style value "minimal" with
`'reasoning_effort.effort' must be one of: 'none', 'minimum', 'low',
'medium', 'high', 'xhigh'`, but nanobot was passing the string through
verbatim. Users who tried the documented "minimal" to disable thinking
got a 400; users who tried the DashScope-native "minimum" to work
around it got `enable_thinking=True` because the internal comparison
was a hard string match on "minimal".

Introduce a semantic/wire split in `_build_kwargs`:

- `semantic_effort` is the internal canonical form (OpenAI vocabulary).
  "minimum" on the way in is normalized to "minimal" here so both
  spellings share one meaning.
- `wire_effort` is what we actually serialize. For DashScope with
  semantic_effort == "minimal" we translate to "minimum" on the way
  out; other providers are unchanged.
- `thinking_enabled` and the Kimi thinking branch now compare on
  `semantic_effort`, so either user spelling correctly disables
  provider-side thinking.

Tests:

- Strengthen `test_dashscope_thinking_disabled_for_minimal` to assert
  the wire value is "minimum" in addition to the extra_body signal;
  the original version only checked extra_body and let the
  invalid-value bug slip through.
- Add `test_dashscope_thinking_disabled_for_minimum_alias` so a user
  who read the DashScope docs and configured "minimum" still gets
  thinking off.
- Add `test_non_dashscope_minimal_not_retranslated` to pin down that
  the DashScope-specific translation does not leak to OpenAI et al.
2026-04-22 12:49:55 +08:00
k
e5b288c6eb fix: map MiniMax reasoning_effort to reasoning_split 2026-04-22 00:52:56 +08:00
chengyongru
37ea8b8f5b fix(retry): recognize ZhiPu 1302 rate-limit error for retry
ZhiPu API returns code 1302 with Chinese text "速率限制" instead of
standard HTTP 429 + "rate limit", causing the retry engine to treat
it as non-transient and fail immediately.
2026-04-21 21:23:20 +08:00
Xubin Ren
6c24f24e9e feat(models): add support for kimi-k2.6 with temperature override and update documentation 2026-04-20 18:18:06 +00:00
Xubin Ren
009cce78ad fix(anthropic): also enforce leading-user + empty-array recovery
Extend `_merge_consecutive` so the three invariants from
`LLMProvider._enforce_role_alternation` all hold for Anthropic:

1. collapse consecutive same-role turns (unchanged)
2. no trailing assistant — Anthropic rejects prefill (unchanged)
3. no leading assistant — Anthropic requires the first turn be user
4. non-empty messages array — recover the last stripped assistant as a
   user turn when every turn got stripped, so callers don't hit a
   secondary "messages array empty" 400

Anthropic-specific wrinkle: `tool_use` blocks live inside `content` (not
a separate `tool_calls` field) and are illegal inside user turns, so
both recovery paths skip any message carrying them rather than silently
producing a malformed request.

Adds 4 unit tests covering the new branches, including the tool_use
opt-outs, and updates the existing `test_single_assistant_stripped` to
reflect the new rerouting contract.

Made-with: Cursor
2026-04-21 01:32:32 +08:00
hussein1362
2f02342083 fix(anthropic): strip trailing assistant messages to prevent prefill error
Anthropic does not support assistant-message prefill and returns a 400
error when the conversation ends with an assistant turn. This commonly
happens when heartbeat/system messages accumulate trailing assistant
replies in the session history.

The _merge_consecutive method already handles same-role merging but did
not strip trailing assistant messages. The base provider's
_enforce_role_alternation (used by OpenAI-compat) does strip them, but
AnthropicProvider uses its own _merge_consecutive instead.

Add a trailing-assistant stripping loop to _merge_consecutive, matching
the behavior already present in _enforce_role_alternation.

Includes 7 new tests covering merge + strip behavior.
2026-04-21 01:32:32 +08:00
Xubin Ren
b6d63fb1ec fix: normalize responses circuit breaker keys
Made-with: Cursor
2026-04-19 20:16:25 +08:00
Mohamed Elkholy
baba3b2160 fix(providers): add circuit breaker for Responses API fallback
When the Responses API fails repeatedly (3 consecutive compatibility
errors), skip it and fall back directly to Chat Completions.  Unlike a
permanent disable, the circuit re-probes after 5 minutes so recovery
is automatic when the API comes back.  Success resets the counter.

Keyed per (model, reasoning_effort) so a failure with one model does
not affect others.
2026-04-19 20:16:25 +08:00
Xubin Ren
b8d327dc41 test + docs: lock should_execute_tools guard semantics (#3220)
Two small follow-ups to the guard:

1. Fix the should_execute_tools docstring so it matches the actual code.
   The previous version said "Only execute when finish_reason explicitly
   signals tool intent" but the code also accepts finish_reason == "stop".
   Explain why (some compliant providers emit "stop" with legitimate tool
   calls — openai_compat_provider.py already mirrors this at lines ~633 /
   ~678 where ("tool_calls", "stop") are both treated as the terminal
   tool-call state). Without this, a strict "tool_calls"-only guard would
   regress 15 existing runner tests that construct LLMResponse with
   tool_calls but no explicit finish_reason (default = "stop").

2. Add tests/providers/test_llm_response.py. This locks the three cases:
   - no tool calls                  -> never executes
   - tool calls + "tool_calls"/stop -> executes
   - tool calls + refusal / content_filter / error / length / ... -> blocked

   These are exactly the boundary cases the #3220 fix is about; without a
   test here a future refactor could silently revert the guard.

Body + tests only, no behavior change beyond the existing PR's intent.

Made-with: Cursor
2026-04-17 20:39:46 +08:00
chengyongru
8c0c4e5b31 refactor(agent): tighten comments, extract constant, strengthen edge case test
- Extract synthetic user message string to module-level constant
- Tighten comments in _snip_history recovery branch
- Strengthen no-user edge case test to verify safety net interaction
2026-04-17 16:20:53 +08:00
chengyongru
44b526c4ee fix(agent): preserve user message in _snip_history to prevent GLM error 1214
When _snip_history truncates the message history and the only user message
ends up outside the kept window, providers like GLM reject the resulting
system→assistant sequence with error 1214 ("messages 参数非法").

Two-layer fix:
1. _snip_history now walks backwards through non_system messages to recover
   the nearest user message when none exists in the kept window.
2. _enforce_role_alternation inserts a synthetic user message
   "(conversation continued)" when the first non-system message is a bare
   assistant (no tool_calls), serving as a safety net for any edge cases
   that slip through.

Co-authored-by: darlingbud <darlingbud@users.noreply.github.com>
2026-04-17 16:20:53 +08:00
Xubin Ren
a6ea06e6bf docs(providers): explain MiniMax thinking endpoint
Document why MiniMax thinking mode uses a separate Anthropic-compatible provider and list the matching base URLs. Add a small registry test so the new provider stays wired to the expected backend and API key.

Made-with: Cursor
2026-04-16 01:00:45 +08:00
04cb
eacc9fbb5f refactor(providers): drop unreachable GenerationSettings fallback 2026-04-15 23:52:38 +08:00
04cb
54f7ad3752 fix(providers): guard chat_with_retry against explicit None max_tokens (#3102) 2026-04-15 23:52:38 +08:00
razzh
9e2278826f feat(provider): enable Kimi thinking via extra_body for k2.5 and k2.6
- Inject `thinking={"type": "enabled|disabled"}` via extra_body for
  Kimi thinking-capable models (kimi-k2.5, k2.6-code-preview).
- Add _is_kimi_thinking_model helper to handle both bare slugs and
  OpenRouter-style prefixed names (e.g. moonshotai/kimi-k2.5).
- reasoning_effort="minimal" maps to disabled; any other value enables it.
- Add tests for enabled/disabled states and OpenRouter prefix handling.
2026-04-15 01:59:32 +08:00
Xubin Ren
a0812ad60e test: cover retry termination notifications
Lock the new interaction-channel retry termination hints so both exhausted standard retries and persistent identical-error stops keep emitting the final progress message.

Made-with: Cursor
2026-04-15 01:55:57 +08:00
Xubin Ren
b60e8dc0ba test: cover missing tool-call arguments normalization
Lock the strict-provider sanitization path so assistant tool calls without function.arguments are normalized to {} instead of being forwarded as missing values.

Made-with: Cursor
2026-04-15 01:37:41 +08:00
Michael-lhh
f293ff7f18 fix: normalize tool-call arguments for strict providers
Ensure assistant tool-call function.arguments is always emitted as valid JSON text so strict OpenAI-compatible backends (including Alibaba code models) do not reject requests. Add regressions for dict and malformed-string argument payloads in message sanitization.

Made-with: Cursor
2026-04-15 01:37:41 +08:00
chengyongru
ac714803f6 fix(provider): recover trailing assistant message as user to prevent empty request
When a subagent result is injected with current_role="assistant",
_enforce_role_alternation drops the trailing assistant message, leaving
only the system prompt. Providers like Zhipu/GLM reject such requests
with error 1214 ("messages parameter invalid"). Now the last popped
assistant message is recovered as a user message when no user/tool
messages remain.
2026-04-13 12:54:39 +08:00
haosenwang1018
3573109408 fix(provider): preserve static error helper compatibility 2026-04-13 09:37:31 +08:00
haosenwang1018
c68b3edb9d fix(provider): clarify local 502 recovery hints 2026-04-13 09:37:31 +08:00
Xubin Ren
217e1fc957 test(retry): lock in-place image fallback behavior
Add a focused regression test for the successful no-image retry path so the original message history stays stripped after fallback and the repeated retry loop cannot silently return.

Made-with: Cursor
2026-04-12 20:10:06 +08:00
yanghan-cyber
b261201985 fix(retry): strip images in-place to prevent repeated error-retry cycles
When a non-transient LLM error occurs with image content, the retry
mechanism strips images from a copy but never updates the original
conversation history. Subsequent iterations rebuild context from the
unmodified history, causing the same error-retry cycle to repeat
every iteration until max_iterations is reached.

Add _strip_image_content_inplace() that mutates the original message
content lists in-place after a successful no-image retry, so callers
sharing those references (e.g. the runner's conversation history)
also see the stripped version.
2026-04-12 20:10:06 +08:00
Xubin Ren
2bef9cb650 fix(agent): preserve interrupted tool-call turns
Keep tool-call assistant messages valid across provider sanitization and avoid trailing user-only history after model errors. This prevents follow-up requests from sending broken tool chains back to the gateway.
2026-04-10 05:37:25 +00:00
Xubin Ren
dadf453097 Merge origin/main into fix/sanitize-messages-non-claude
Resolved conflict in azure_openai_provider.py by keeping main's
Responses API implementation (role alternation not needed for the
Responses API input format).

Made-with: Cursor
2026-04-09 04:45:45 +00:00
Xubin Ren
d084d10dc2 feat(openai): auto-route direct reasoning requests with responses fallback 2026-04-08 15:21:08 +00:00
Xubin Ren
63acfc4f2f test: fix trailing-space mismatch and add regression tests for normal models
- Fix assertion in streaming dict fallback test (trailing space in data
  not reflected in expected value).
- Add two regression tests proving that models with reasoning_content
  (e.g. DeepSeek-R1) and standard models (no reasoning fields) are
  completely unaffected by the reasoning fallback.

Made-with: Cursor
2026-04-08 00:59:39 +08:00
moranfong
9e7c07ac89 test(provider): add StepFun reasoning field fallback tests
Add comprehensive tests for the StepFun Plan API compatibility fix:
- _parse dict branch: content and reasoning_content fallback to reasoning
- _parse SDK object branch: same fallback for pydantic response objects
- _parse_chunks dict branch: reasoning field handled in streaming mode
- _parse_chunks SDK branch: reasoning fallback for SDK delta objects
- Precedence tests: reasoning_content field takes priority over reasoning

Refs: fix(provider): support StepFun Plan API reasoning field fallback
2026-04-08 00:59:39 +08:00
Xubin Ren
1e8a6663ca test(anthropic): add regression tests for thinking modes incl. adaptive
Also update schema comment to mention 'adaptive' as a valid value.

Made-with: Cursor
2026-04-07 22:53:43 +08:00
Xubin Ren
35f53a721d refactor: consolidate _parse_retry_after_headers into base class
Merge the three retry-after header parsers (base, OpenAI, Anthropic)
into a single _extract_retry_after_from_headers on LLMProvider that
handles retry-after-ms, case-insensitive lookup, and HTTP date.

Remove the per-provider _parse_retry_after_headers duplicates and
their now-unused email.utils / time imports. Add test for retry-after-ms.

Made-with: Cursor
2026-04-06 08:44:52 +00:00
Xubin Ren
aeba9a23e6 refactor: remove dead _error_response wrapper in Anthropic provider
Fold _error_response back into _handle_error to match OpenAI/Azure
convention. Update all call sites and tests accordingly.

Made-with: Cursor
2026-04-06 08:35:02 +00:00
Xubin Ren
b575aed20e Merge origin/main into fix/structured-retry-classification-main
Made-with: Cursor
2026-04-06 08:28:20 +00:00
Xubin Ren
ebf29d87ae fix: include byteplus providers, guard None reasoning_effort, merge extra_body
- Add byteplus and byteplus_coding_plan to thinking param providers
- Only send extra_body when reasoning_effort is explicitly set
- Use setdefault().update() to avoid clobbering existing extra_body
- Add 7 regression tests for thinking params

Made-with: Cursor
2026-04-06 16:12:08 +08:00
Xubin Ren
77a88446fb Merge remote-tracking branch 'origin/main' into pr-2722 2026-04-04 13:51:59 +00:00
Xubin Ren
17d9d74ccc fix(provider): omit temperature for GPT-5 models 2026-04-04 20:18:22 +08:00
Lingao Meng
519911456a test(provider): fix incorrect assertion in reasoning_content sanitize test
The test test_openai_compat_strips_message_level_reasoning_fields was
added in fbedf7a and incorrectly asserted that reasoning_content and
extra_content should be stripped from messages. This contradicts the
intent of b5302b6 which explicitly added these fields to _ALLOWED_MSG_KEYS
to preserve them through sanitization.

Rename the test and fix assertions to match the original design intent:
reasoning_content and extra_content at message level should be preserved,
and extra_content inside tool_calls should also be preserved.

Signed-off-by: Lingao Meng <menglingao@xiaomi.com>
2026-04-04 20:08:44 +08:00
pikaxinge
31d3061a0a fix(retry): classify 429 as WAIT vs STOP using semantic signals 2026-04-04 05:23:21 +00:00
pikaxinge
cabf093915 Merge remote-tracking branch 'origin/main' into fix/structured-retry-classification-main
# Conflicts:
#	nanobot/providers/anthropic_provider.py
#	nanobot/providers/base.py
#	nanobot/providers/openai_compat_provider.py
2026-04-04 05:04:43 +00:00
Xubin Ren
7229a81594 fix(providers): disable Azure SDK retries by default
Made-with: Cursor
2026-04-04 12:36:45 +08:00
pikaxinge
dbdf7e5955 fix: prevent retry amplification by disabling SDK retries 2026-04-04 12:36:45 +08:00