nanobot

mirror of https://github.com/HKUDS/nanobot.git synced 2026-06-15 15:24:06 +00:00

Author	SHA1	Message	Date
NanoBot	c20ecc52d7	feat(transcription): add Xiaomi MiMo ASR provider (mimo-v2.5-asr) Add support for Xiaomi MiMo ASR as a third transcription backend alongside Groq and OpenAI Whisper. Xiaomi ASR uses the /v1/chat/completions endpoint with base64-encoded audio input, rather than the standard Whisper multipart upload format. Co-Authored-By:连 <lian@tangping.homes>	2026-06-09 04:29:09 +08:00
Ilia Breitburg	0eb3010e40	feat(transcription): configurable STT model + OpenRouter provider Add a `transcriptionModel` channel setting and an OpenRouter transcription backend so voice messages can be transcribed through OpenRouter's speech-to-text endpoint (e.g. nvidia/parakeet-tdt-0.6b-v3, openai/whisper-1), alongside the existing Groq/OpenAI Whisper providers. - schema: add channels.transcriptionModel (None = provider default) - providers/transcription: extract a shared POST/retry skeleton; add a JSON+base64 OpenRouterTranscriptionProvider; make the STT model a constructor param on all providers instead of hardcoding it - channels: route transcriptionProvider="openrouter" and thread the model through the manager to each channel - docs + tests Only dedicated STT models work on OpenRouter's transcription endpoint; chat LLMs (e.g. google/gemini-3.5-flash) are rejected there. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 04:01:37 +08:00
Xubin Ren	9c81280300	feat(transcription): add shared voice input support (#4232 ) * feat(webui): add voice transcription input * feat(webui): render ANSI output in code blocks * refactor(webui): isolate voice recorder logic * refactor(transcription): keep websocket ingress thin * refactor(transcription): resolve channel audio settings on demand * style(webui): neutralize voice waveform color * feat(webui): add voice input tooltip * feat(webui): add voice input keyboard shortcut * fix(webui): distinguish voice shortcut platforms * fix(webui): place voice button after model selector * refactor(webui): share voice hold recording helpers * fix(desktop): allow microphone voice input * fix(webui): stabilize token usage month labels * feat(webui): show voice input on settings overview * fix(webui): label voice capability as recognition * fix(webui): align capability overview status * refactor(webui): isolate transcription socket handling * fix(webui): soften silent voice waveform * refactor(audio): clarify transcription service location * docs(transcription): clarify audio and provider boundaries * fix(exec): reduce session output polling flake	2026-06-09 01:08:49 +08:00
04cb	ef2ef4f789	fix(transcription): normalize chat-style apiBase to audio endpoint (#3637 )	2026-05-23 17:32:59 +08:00
chengyongru	3437ff273f	fix(transcription): address review nits on PR #3253 - Correct api_key type hint to str \| None in _post_transcription_with_retry - Remove unreachable final return "" - Fix test_openai_missing_api_key_short_circuits to actually test missing-key path (use audio_file fixture so file exists) - Fix PermissionError patch for Windows (patch class method instead of instance attribute)	2026-05-06 15:52:29 +08:00
mohamed-elkholy95	7ebf611be8	fix(transcription): retry Whisper calls and guard malformed responses A single transient failure between the agent and an OpenAI/Groq Whisper endpoint currently vanishes as `return ""` in transcribe(). The voice message arrives as the empty string and there is no way to tell real silence apart from a failed upload. A malformed but successful response body is even worse: the JSON-decode error escapes the helper unhandled. Add a shared `_post_transcription_with_retry` used by both providers. Retry behaviour: - exponential backoff 1s -> 2s -> 4s, up to 3 retries (4 attempts) - retryable HTTP statuses: 408, 429, 500, 502, 503, 504 - retryable exceptions: TimeoutException, ConnectError, ReadError, WriteError, RemoteProtocolError Non-transient failures short-circuit to "" on the first attempt -- retrying a misconfigured key or a broken upload only burns rate-limit quota. Branches that short-circuit: - missing API key, missing audio file - file-read errors (PermissionError, OSError) on the audio path, preserving the nightly contract for direct provider callers - HTTP auth/4xx body issues via raise_for_status() - response.json() parse failures - non-dict JSON payloads Sharing one helper means OpenAI and Groq cannot drift apart silently. Thread `language` through the helper. The multipart files dict is rebuilt inside the per-attempt loop, so when a caller sets self.language the `language` field is sent on every attempt -- not just the first. Tests cover: - every advertised retryable status and exception, parameterized - language present on attempts 1 and 2 of a 503->200 sequence - language absent when unset; present when set (both providers) - malformed JSON body and non-dict JSON body short-circuit to "" - PermissionError on file read short-circuits with no HTTP attempt - max-attempts give-up, exponential-backoff schedule, auth no-retry, missing-key / missing-file short-circuit Test stub fix: the _StubResponse in tests/channels/test_channel_plugins.py declared no status_code, which the new helper reads for retry classification. Set status_code = 200 so the stub advertises the successful response that those tests already simulate. Also moved the two transcription-provider imports to the top of that file (previously placed mid-file) so the file is ruff-clean (E402).	2026-05-06 15:52:25 +08:00

6 Commits