Add support for Xiaomi MiMo ASR as a third transcription backend alongside
Groq and OpenAI Whisper. Xiaomi ASR uses the /v1/chat/completions endpoint
with base64-encoded audio input, rather than the standard Whisper multipart
upload format.
Co-Authored-By:连 <lian@tangping.homes>
Add a `transcriptionModel` channel setting and an OpenRouter transcription
backend so voice messages can be transcribed through OpenRouter's
speech-to-text endpoint (e.g. nvidia/parakeet-tdt-0.6b-v3, openai/whisper-1),
alongside the existing Groq/OpenAI Whisper providers.
- schema: add channels.transcriptionModel (None = provider default)
- providers/transcription: extract a shared POST/retry skeleton; add a
JSON+base64 OpenRouterTranscriptionProvider; make the STT model a
constructor param on all providers instead of hardcoding it
- channels: route transcriptionProvider="openrouter" and thread the model
through the manager to each channel
- docs + tests
Only dedicated STT models work on OpenRouter's transcription endpoint;
chat LLMs (e.g. google/gemini-3.5-flash) are rejected there.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Correct api_key type hint to str | None in _post_transcription_with_retry
- Remove unreachable final return ""
- Fix test_openai_missing_api_key_short_circuits to actually test
missing-key path (use audio_file fixture so file exists)
- Fix PermissionError patch for Windows (patch class method instead
of instance attribute)
A single transient failure between the agent and an OpenAI/Groq Whisper
endpoint currently vanishes as `return ""` in transcribe(). The voice
message arrives as the empty string and there is no way to tell real
silence apart from a failed upload. A malformed but successful response
body is even worse: the JSON-decode error escapes the helper unhandled.
Add a shared `_post_transcription_with_retry` used by both providers.
Retry behaviour:
- exponential backoff 1s -> 2s -> 4s, up to 3 retries (4 attempts)
- retryable HTTP statuses: 408, 429, 500, 502, 503, 504
- retryable exceptions: TimeoutException, ConnectError, ReadError,
WriteError, RemoteProtocolError
Non-transient failures short-circuit to "" on the first attempt --
retrying a misconfigured key or a broken upload only burns rate-limit
quota. Branches that short-circuit:
- missing API key, missing audio file
- file-read errors (PermissionError, OSError) on the audio path,
preserving the nightly contract for direct provider callers
- HTTP auth/4xx body issues via raise_for_status()
- response.json() parse failures
- non-dict JSON payloads
Sharing one helper means OpenAI and Groq cannot drift apart silently.
Thread `language` through the helper. The multipart files dict is rebuilt
inside the per-attempt loop, so when a caller sets self.language the
`language` field is sent on every attempt -- not just the first.
Tests cover:
- every advertised retryable status and exception, parameterized
- language present on attempts 1 and 2 of a 503->200 sequence
- language absent when unset; present when set (both providers)
- malformed JSON body and non-dict JSON body short-circuit to ""
- PermissionError on file read short-circuits with no HTTP attempt
- max-attempts give-up, exponential-backoff schedule, auth no-retry,
missing-key / missing-file short-circuit
Test stub fix: the _StubResponse in tests/channels/test_channel_plugins.py
declared no status_code, which the new helper reads for retry classification.
Set status_code = 200 so the stub advertises the successful response that
those tests already simulate. Also moved the two transcription-provider
imports to the top of that file (previously placed mid-file) so the file
is ruff-clean (E402).