The channel manager coalesces consecutive _stream_delta messages and
forwards a single merged message with _stream_end=True. In that path
no individual delta events ever reach the WebUI client, so the
stream_end frame is the only carrier of the text. The previous guard
only attached text when media-URL rewriting changed the string, which
silently dropped entire turns of plain-text output whenever the
agent generated tokens faster than the queue drained.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add StepFunTranscriptionProvider class in nanobot/providers/transcription.py
- New _post_stepfun_asr_with_retry() function handling SSE stream parsing
(transcript.text.delta → transcript.text.done event sequence)
- Register 'stepfun' in transcription_registry.py with default model stepaudio-2.5-asr
- Reuse existing stepfun provider config (apiBase can point to Plan endpoint)
- Add 17 tests covering SSE parsing, retry contract, empty-text edge case, and registry integration
- Update docs/configuration.md with stepfun ASR documentation
StepFun ASR uses a dedicated SSE endpoint (/v1/audio/asr/sse) rather
than the chat-completions or Whisper multipart formats used by other
providers. Users on Step Plan can set apiBase to the Plan endpoint.
Maintainer edit: keep the GPT-5/o-series fallback on slug-boundary matching so unrelated model names are not caught by substring checks, and include o1 alongside o3/o4 because it is also an o-series chat model.
* docs: make onboarding friendlier for beginners
* docs: build clearer documentation paths
Maintainer edit: turn the onboarding follow-up into a layered docs structure for first-time setup, provider selection, troubleshooting, CLI reference, and source-level architecture. This keeps quick start focused while giving advanced users precise reference paths.
* docs: render architecture flow with mermaid
Maintainer edit: replace the ASCII architecture sketch with a GitHub-rendered Mermaid flowchart so the core runtime path is easier to scan in the PR and README docs.
* docs: recommend model presets for model config
Maintainer edit: make named modelPresets the primary model configuration path and expand fallback preset examples so string fallbacks are clearly preset names, not raw model IDs.
* docs: document api base urls and langfuse setup
Maintainer edit: explain when users need apiBase/base URL in quick start and provider docs, and add Langfuse tracing setup with troubleshooting links.
* docs: use python module pip consistently
Maintainer edit: keep install commands tied to the active Python interpreter by using python -m pip in the Azure optional dependency notes too.
* docs: add non-technical getting started path
Maintainer edit: add a wizard-first guide for users without terminal or JSON background, including a text TUI menu example and links from the main docs entrypoints.
* docs: avoid hard-wrapped prose in user docs
Maintainer edit: unwrap ordinary prose across user-facing documentation while preserving markdown structure, code blocks, tables, lists, and prompt/template files.
* docs: keep desktop list continuations nested
Maintainer edit: preserve list nesting after unwrapping prose in the desktop WebUI sync guide.
* docs: add one-command installer
Maintainer edit: add auditable macOS/Linux and Windows install scripts that install nanobot-ai and start the onboarding wizard, then document the commands in the main onboarding entrypoints.
* docs: add installer dry run mode
Maintainer edit: add --dry-run to the one-command installer scripts so users can preview Python detection, install source, pip command, and wizard behavior without changing their environment.
* docs: clean installer error output
Maintainer edit: make PowerShell installer failures print a concise Error: message instead of Write-Error call-site details.
* docs: add provider setup cookbook
Maintainer edit: add pasteable provider recipes for common hosted, local, fallback, runtime switching, and Langfuse setups, then link the cookbook from onboarding and troubleshooting entrypoints.
* docs: address review feedback
* docs: clarify reader paths
* docs: explain terminal basics for beginners
* docs: clarify wizard navigation
* docs: avoid duplicate onboarding steps
* docs: add setup status check
* docs: explain status output
* docs: remove provider recommendation wording
* docs: explain status diagnostics
* docs: reduce hard-wrapped guidance
* docs: migrate config examples to presets
* docs: clarify python command fallbacks
* docs: improve installer failure recovery
* docs: expand install troubleshooting
* docs: cover installer download failures
* docs: put stable install paths first
* docs: add bundled webui quick path
* docs: clarify provider-neutral setup
* docs: clarify gateway setup for chat surfaces
* docs: improve docs navigation paths
* docs: add configuration quick jump
* docs: clarify provider secret variables
* chore: request PR review acknowledgement
Empty commit: please read the PR review comments and reply on the PR to confirm that you have received them.
This commit intentionally changes no files; it exists only to notify the remote Codex run so it can end its active goal.
* docs: add README start here guide
* docs: avoid provider recommendation wording
* docs: guide next steps after first reply
* docs: explain merging JSON snippets
* docs: add CLI command chooser
* docs: add configuration task map
* docs: add deployment readiness guide
* docs: simplify WebUI entry paths
* docs: add provider recipe chooser
* docs: fix provider factual references
Update OpenRouter and LongCat model examples, align Bedrock guidance, and make fallback snippets schema-valid.
Also correct group policy wording and image-generation provider lists to match the current code.
* fix: keep PowerShell installer from closing caller shell
* docs: mention self-guided configuration
Add AssemblyAI as a third transcription provider option alongside
OpenAI and Groq. AssemblyAI offers better accuracy for certain
audio types (distant voices, noisy environments) and serves as a
reliable fallback when other providers struggle.
Changes:
- Add AssemblyAITranscriptionProvider class in providers/transcription.py
- Add 'assemblyai' option in base channel's transcribe_audio()
- Per-channel configuration via transcriptionProvider in config
Usage:
Set transcriptionProvider: 'assemblyai' and provide an AssemblyAI
API key via transcriptionApiKey in the channel config.
Add support for Xiaomi MiMo ASR as a third transcription backend alongside
Groq and OpenAI Whisper. Xiaomi ASR uses the /v1/chat/completions endpoint
with base64-encoded audio input, rather than the standard Whisper multipart
upload format.
Co-Authored-By:连 <lian@tangping.homes>
Add a `transcriptionModel` channel setting and an OpenRouter transcription
backend so voice messages can be transcribed through OpenRouter's
speech-to-text endpoint (e.g. nvidia/parakeet-tdt-0.6b-v3, openai/whisper-1),
alongside the existing Groq/OpenAI Whisper providers.
- schema: add channels.transcriptionModel (None = provider default)
- providers/transcription: extract a shared POST/retry skeleton; add a
JSON+base64 OpenRouterTranscriptionProvider; make the STT model a
constructor param on all providers instead of hardcoding it
- channels: route transcriptionProvider="openrouter" and thread the model
through the manager to each channel
- docs + tests
Only dedicated STT models work on OpenRouter's transcription endpoint;
chat LLMs (e.g. google/gemini-3.5-flash) are rejected there.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds ProviderConfig.extra_query, threaded into AsyncOpenAI(default_query)
so that Azure-style gateways requiring query params like api-version can
be configured without URL hacks.
Also updates provider_signature to track extra_query changes so per-turn
refresh rebuilds the provider when the value changes.
Addresses the extra_query portion of #4204. The max_completion_tokens
model-awareness enhancement is intentionally left separate.
Maintainer edit: make the unsafe redirect regression go through connect_mcp_servers so both SSE and streamable HTTP prove that the request hook is attached to the MCP clients before redirects are followed.
Maintainer edit: explain that HTTP/SSE MCP now uses the shared SSRF guard before connecting and before following redirects, so local or private HTTP MCP endpoints require an explicit tools.ssrfWhitelist entry.
maintainer edit: add SDK-object and tool-call history regressions so the empty-string reasoning_content fix is covered across both parse branches and the sanitized request path.
Custom providers (e.g. DeepSeek) may return reasoning_content as an
empty string "" to explicitly indicate no reasoning occurred. The
previous truthiness checks (, ) treated "" as falsy
and converted it to None, which caused the field to be dropped from
the message history entirely. Providers that require reasoning_content
on all assistant messages then rejected subsequent requests.
Replace truthiness checks with identity checks () so that
empty-string reasoning_content is preserved as-is. The streaming path
is unchanged since an empty join genuinely means no chunks received.
Fixes#4105