Resolve conflict in context.py: keep main's build_messages which already
merges runtime context into user message (achieving the same cache goal).
The real value-add from this PR is the second cache breakpoint in
litellm_provider.py.
Made-with: Cursor
Replace the static provider-level supports_vision check with a
reactive fallback: when a model returns an image-unsupported error,
strip image_url blocks from messages and retry once. This avoids
maintaining an inaccurate vision capability table and correctly
handles gateway/unknown model scenarios.
Also extract _safe_chat() to deduplicate try/except boilerplate
in chat_with_retry().
- Add field to ProviderSpec (default True)
- Add and methods in LiteLLMProvider
- Filter image_url content blocks in before sending to non-vision models
- Reverts session-layer filtering from original PR (wrong layer)
This fixes the issue where switching from Claude (vision-capable) to
non-vision models (e.g., Baidu Qianfan) causes API errors due to
unsupported image_url content blocks.
The provider layer is the correct place for this filtering because:
1. It has access to model/provider capabilities
2. It only affects non-vision models
3. It preserves session layer purity (storage should not know about model capabilities)
GitHub Copilot and some other providers have a 64-character limit on
tool_call_id. When switching from providers that generate longer IDs
(such as OpenAI Codex), this caused validation errors.
This fix truncates tool_call_id to 64 characters by preserving the first
32 and last 32 characters to maintain uniqueness while respecting the
provider's limit.
Fixes#1554
GitHub Copilot's API returns tool_calls split across multiple choices:
- choices[0]: content only (tool_calls=null)
- choices[1]: tool_calls only (content=null)
The existing _parse_response only inspected choices[0], so tool_calls
were silently lost, causing the agent to never execute tools when using
github_copilot/ models.
This fix scans all choices and merges tool_calls + content, so
providers that return multi-choice responses work correctly.
Single-choice providers (OpenAI, Anthropic, etc.) are unaffected since
the loop over one choice is equivalent to the original code.
Part 1: Make system prompt static
- Move Current Time from system prompt to user message prefix
- System prompt now only changes when config/skills change, not every minute
- Timestamp injected as [YYYY-MM-DD HH:MM (Day) (TZ)] prefix on each user message
Part 2: Add second cache_control breakpoint
- Existing: system message breakpoint (caches static system prompt)
- New: second-to-last message breakpoint (caches conversation history prefix)
- Refactored _apply_cache_control with shared _mark() helper
Before: 0% cache hit rate (system prompt changed every minute)
After: ~90% savings on cached input tokens for multi-turn conversations
Closes#981
- Remove trailing whitespace and normalize blank lines
- Unify string quotes and line breaks for long lines
- Sort imports alphabetically across modules
_sanitize_messages strips all non-standard keys from messages, including
reasoning_content. Thinking-enabled models like Moonshot Kimi k2.5
require reasoning_content to be present in assistant tool call messages
when thinking mode is on, causing a BadRequestError (#1014).
Add reasoning_content to _ALLOWED_MSG_KEYS so it passes through
sanitization when present.
Fixes#1014
PR #947 fixed the consumer side (context.py) but the root cause is at
the provider level — getattr returns "" (empty string) instead of None
when reasoning_content is empty. This causes DeepSeek API to reject the
request with "Missing reasoning_content field" error.
`"" or None` evaluates to None, preventing empty strings from
propagating downstream.
Fixes#946
Inject cache_control: {"type": "ephemeral"} on the system message and
last tool definition for providers that support prompt caching. Adds
supports_prompt_caching flag to ProviderSpec (enabled for Anthropic only)
and skips caching when routing through a gateway.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Strip non-standard keys like 'reasoning_content' before sending to LLM
- Always include 'content' key in assistant messages (required by StepFun)
- Add _sanitize_messages to LiteLLMProvider to prevent 400 BadRequest errors