When the Responses API fails repeatedly (3 consecutive compatibility
errors), skip it and fall back directly to Chat Completions. Unlike a
permanent disable, the circuit re-probes after 5 minutes so recovery
is automatic when the API comes back. Success resets the counter.
Keyed per (model, reasoning_effort) so a failure with one model does
not affect others.
Two small follow-ups to the guard:
1. Fix the should_execute_tools docstring so it matches the actual code.
The previous version said "Only execute when finish_reason explicitly
signals tool intent" but the code also accepts finish_reason == "stop".
Explain why (some compliant providers emit "stop" with legitimate tool
calls — openai_compat_provider.py already mirrors this at lines ~633 /
~678 where ("tool_calls", "stop") are both treated as the terminal
tool-call state). Without this, a strict "tool_calls"-only guard would
regress 15 existing runner tests that construct LLMResponse with
tool_calls but no explicit finish_reason (default = "stop").
2. Add tests/providers/test_llm_response.py. This locks the three cases:
- no tool calls -> never executes
- tool calls + "tool_calls"/stop -> executes
- tool calls + refusal / content_filter / error / length / ... -> blocked
These are exactly the boundary cases the #3220 fix is about; without a
test here a future refactor could silently revert the guard.
Body + tests only, no behavior change beyond the existing PR's intent.
Made-with: Cursor
- Extract synthetic user message string to module-level constant
- Tighten comments in _snip_history recovery branch
- Strengthen no-user edge case test to verify safety net interaction
When _snip_history truncates the message history and the only user message
ends up outside the kept window, providers like GLM reject the resulting
system→assistant sequence with error 1214 ("messages 参数非法").
Two-layer fix:
1. _snip_history now walks backwards through non_system messages to recover
the nearest user message when none exists in the kept window.
2. _enforce_role_alternation inserts a synthetic user message
"(conversation continued)" when the first non-system message is a bare
assistant (no tool_calls), serving as a safety net for any edge cases
that slip through.
Co-authored-by: darlingbud <darlingbud@users.noreply.github.com>
Document why MiniMax thinking mode uses a separate Anthropic-compatible provider and list the matching base URLs. Add a small registry test so the new provider stays wired to the expected backend and API key.
Made-with: Cursor
- Inject `thinking={"type": "enabled|disabled"}` via extra_body for
Kimi thinking-capable models (kimi-k2.5, k2.6-code-preview).
- Add _is_kimi_thinking_model helper to handle both bare slugs and
OpenRouter-style prefixed names (e.g. moonshotai/kimi-k2.5).
- reasoning_effort="minimal" maps to disabled; any other value enables it.
- Add tests for enabled/disabled states and OpenRouter prefix handling.
Lock the new interaction-channel retry termination hints so both exhausted standard retries and persistent identical-error stops keep emitting the final progress message.
Made-with: Cursor
Lock the strict-provider sanitization path so assistant tool calls without function.arguments are normalized to {} instead of being forwarded as missing values.
Made-with: Cursor
Ensure assistant tool-call function.arguments is always emitted as valid JSON text so strict OpenAI-compatible backends (including Alibaba code models) do not reject requests. Add regressions for dict and malformed-string argument payloads in message sanitization.
Made-with: Cursor
When a subagent result is injected with current_role="assistant",
_enforce_role_alternation drops the trailing assistant message, leaving
only the system prompt. Providers like Zhipu/GLM reject such requests
with error 1214 ("messages parameter invalid"). Now the last popped
assistant message is recovered as a user message when no user/tool
messages remain.
Add a focused regression test for the successful no-image retry path so the original message history stays stripped after fallback and the repeated retry loop cannot silently return.
Made-with: Cursor
When a non-transient LLM error occurs with image content, the retry
mechanism strips images from a copy but never updates the original
conversation history. Subsequent iterations rebuild context from the
unmodified history, causing the same error-retry cycle to repeat
every iteration until max_iterations is reached.
Add _strip_image_content_inplace() that mutates the original message
content lists in-place after a successful no-image retry, so callers
sharing those references (e.g. the runner's conversation history)
also see the stripped version.
Keep tool-call assistant messages valid across provider sanitization and avoid trailing user-only history after model errors. This prevents follow-up requests from sending broken tool chains back to the gateway.
Resolved conflict in azure_openai_provider.py by keeping main's
Responses API implementation (role alternation not needed for the
Responses API input format).
Made-with: Cursor
- Fix assertion in streaming dict fallback test (trailing space in data
not reflected in expected value).
- Add two regression tests proving that models with reasoning_content
(e.g. DeepSeek-R1) and standard models (no reasoning fields) are
completely unaffected by the reasoning fallback.
Made-with: Cursor
Add comprehensive tests for the StepFun Plan API compatibility fix:
- _parse dict branch: content and reasoning_content fallback to reasoning
- _parse SDK object branch: same fallback for pydantic response objects
- _parse_chunks dict branch: reasoning field handled in streaming mode
- _parse_chunks SDK branch: reasoning fallback for SDK delta objects
- Precedence tests: reasoning_content field takes priority over reasoning
Refs: fix(provider): support StepFun Plan API reasoning field fallback
Merge the three retry-after header parsers (base, OpenAI, Anthropic)
into a single _extract_retry_after_from_headers on LLMProvider that
handles retry-after-ms, case-insensitive lookup, and HTTP date.
Remove the per-provider _parse_retry_after_headers duplicates and
their now-unused email.utils / time imports. Add test for retry-after-ms.
Made-with: Cursor
- Add byteplus and byteplus_coding_plan to thinking param providers
- Only send extra_body when reasoning_effort is explicitly set
- Use setdefault().update() to avoid clobbering existing extra_body
- Add 7 regression tests for thinking params
Made-with: Cursor
The test test_openai_compat_strips_message_level_reasoning_fields was
added in fbedf7a and incorrectly asserted that reasoning_content and
extra_content should be stripped from messages. This contradicts the
intent of b5302b6 which explicitly added these fields to _ALLOWED_MSG_KEYS
to preserve them through sanitization.
Rename the test and fix assertions to match the original design intent:
reasoning_content and extra_content at message level should be preserved,
and extra_content inside tool_calls should also be preserved.
Signed-off-by: Lingao Meng <menglingao@xiaomi.com>
Add regression tests for the non-streaming (_parse dict branch) and
streaming (_parse_chunks dict and SDK-object branches) paths that extract
reasoning_content, ensuring the field is populated when present and None
when absent.
Signed-off-by: Lingao Meng <menglingao@xiaomi.com>