- Add StepFunTranscriptionProvider class in nanobot/providers/transcription.py
- New _post_stepfun_asr_with_retry() function handling SSE stream parsing
(transcript.text.delta → transcript.text.done event sequence)
- Register 'stepfun' in transcription_registry.py with default model stepaudio-2.5-asr
- Reuse existing stepfun provider config (apiBase can point to Plan endpoint)
- Add 17 tests covering SSE parsing, retry contract, empty-text edge case, and registry integration
- Update docs/configuration.md with stepfun ASR documentation
StepFun ASR uses a dedicated SSE endpoint (/v1/audio/asr/sse) rather
than the chat-completions or Whisper multipart formats used by other
providers. Users on Step Plan can set apiBase to the Plan endpoint.
Maintainer edit: keep the GPT-5/o-series fallback on slug-boundary matching so unrelated model names are not caught by substring checks, and include o1 alongside o3/o4 because it is also an o-series chat model.
Add AssemblyAI as a third transcription provider option alongside
OpenAI and Groq. AssemblyAI offers better accuracy for certain
audio types (distant voices, noisy environments) and serves as a
reliable fallback when other providers struggle.
Changes:
- Add AssemblyAITranscriptionProvider class in providers/transcription.py
- Add 'assemblyai' option in base channel's transcribe_audio()
- Per-channel configuration via transcriptionProvider in config
Usage:
Set transcriptionProvider: 'assemblyai' and provide an AssemblyAI
API key via transcriptionApiKey in the channel config.
Add support for Xiaomi MiMo ASR as a third transcription backend alongside
Groq and OpenAI Whisper. Xiaomi ASR uses the /v1/chat/completions endpoint
with base64-encoded audio input, rather than the standard Whisper multipart
upload format.
Co-Authored-By:连 <lian@tangping.homes>
Add a `transcriptionModel` channel setting and an OpenRouter transcription
backend so voice messages can be transcribed through OpenRouter's
speech-to-text endpoint (e.g. nvidia/parakeet-tdt-0.6b-v3, openai/whisper-1),
alongside the existing Groq/OpenAI Whisper providers.
- schema: add channels.transcriptionModel (None = provider default)
- providers/transcription: extract a shared POST/retry skeleton; add a
JSON+base64 OpenRouterTranscriptionProvider; make the STT model a
constructor param on all providers instead of hardcoding it
- channels: route transcriptionProvider="openrouter" and thread the model
through the manager to each channel
- docs + tests
Only dedicated STT models work on OpenRouter's transcription endpoint;
chat LLMs (e.g. google/gemini-3.5-flash) are rejected there.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds ProviderConfig.extra_query, threaded into AsyncOpenAI(default_query)
so that Azure-style gateways requiring query params like api-version can
be configured without URL hacks.
Also updates provider_signature to track extra_query changes so per-turn
refresh rebuilds the provider when the value changes.
Addresses the extra_query portion of #4204. The max_completion_tokens
model-awareness enhancement is intentionally left separate.
Maintainer edit: make the unsafe redirect regression go through connect_mcp_servers so both SSE and streamable HTTP prove that the request hook is attached to the MCP clients before redirects are followed.
maintainer edit: add SDK-object and tool-call history regressions so the empty-string reasoning_content fix is covered across both parse branches and the sanitized request path.
Custom providers (e.g. DeepSeek) may return reasoning_content as an
empty string "" to explicitly indicate no reasoning occurred. The
previous truthiness checks (, ) treated "" as falsy
and converted it to None, which caused the field to be dropped from
the message history entirely. Providers that require reasoning_content
on all assistant messages then rejected subsequent requests.
Replace truthiness checks with identity checks () so that
empty-string reasoning_content is preserved as-is. The streaming path
is unchanged since an empty join genuinely means no chunks received.
Fixes#4105
The SDK opened MCP connections through AgentLoop.process_direct but
never called close_mcp, leaving stdio MCP generators to be finalized
during asyncio shutdown from a different task, producing a RuntimeError
about exiting a cancel scope in a different task.
Add aclose() that delegates to AgentLoop.close_mcp (which already
drains background tasks and closes MCP stacks), plus __aenter__ and
__aexit__ so the SDK works as an async context manager.
Fixes#4211
Maintainer edit: preserve provider-specific size hints for custom image generation endpoints while keeping the default 1K mapping compatible. Clarify the custom provider contract in docs and cover response_format/size overrides in tests.
Maintainer edit: require providers.custom.apiBase before making custom image requests and allow unauthenticated local endpoints by omitting Authorization when no apiKey is configured.
maintainer edit: Keep the shared-prefix guard for Feishu numbered mention keys while still resolving placeholders followed by punctuation, matching the previous user-visible mention behavior.
maintainer edit: keep cancellation out of on_error so shutdown paths do not look like run failures, and let the SDK capture hook use the authoritative after_run snapshot.