docs: clarify streamed timeout fallback behavior

maintainer edit: update fallback docs and provider docstring to describe the new stream-stall timeout recovery exception.
2026-06-15 07:14:08 +00:00 · 2026-06-10 16:21:52 +08:00 · 2026-06-10 16:21:52 +08:00 · c00371c761
commit c00371c761
parent bc4bb508a1
2 changed files with 9 additions and 6 deletions
--- a/docs/configuration.md
+++ b/docs/configuration.md
@ -1268,7 +1268,7 @@ Inline fallback object:

 Use inline objects only when a fallback is not worth naming as a reusable preset. `fallbackModels` belongs under `agents.defaults`, not inside individual `modelPresets` entries.

-Failover only runs when the primary provider returns a retryable model/provider error before any answer text has been streamed. Typical fallback cases include timeouts, connection errors, 5xx server errors, 429 rate limits, overloads, and quota/balance exhaustion. It does not run for malformed requests, authentication/permission errors, content filtering/refusals, or context-length/message-format errors.
+Failover normally runs when the primary provider returns a retryable model/provider error before any answer text has been streamed. Stream-stall timeouts are the recovery exception: if the provider already emitted partial answer text and then stalls, nanobot closes the current stream segment and retries/fails over in a new segment. Typical fallback cases include timeouts, connection errors, 5xx server errors, 429 rate limits, overloads, and quota/balance exhaustion. It does not run for malformed requests, authentication/permission errors, content filtering/refusals, or context-length/message-format errors.

 If fallback candidates use smaller `contextWindowTokens` values, nanobot builds context using the smallest window in the active chain so every candidate can receive the same prompt.

--- a/nanobot/providers/fallback_provider.py
+++ b/nanobot/providers/fallback_provider.py
@ -58,14 +58,17 @@ _FALLBACK_ERROR_TOKENS = (
 class FallbackProvider(LLMProvider):
    """Wrap a primary provider and transparently failover to fallback models.

-    When the primary model returns an error and no content has been streamed yet,
-    the wrapper tries each fallback model in order.  Each fallback model may
-    reside on a different provider — a factory callable creates the underlying
-    provider on-the-fly.
+    When the primary model returns a fallbackable error before content has been
+    streamed, the wrapper tries each fallback model in order. Streamed timeout
+    errors are the recovery exception: the caller may close the current stream
+    segment, then the wrapper continues failover with later deltas in a new
+    segment. Each fallback model may reside on a different provider — a factory
+    callable creates the underlying provider on-the-fly.

    Key design:
    - Failover is request-scoped (the wrapper itself is stateless between turns).
-    - Skipped when content was already streamed to avoid duplicate output.
+    - Skipped when content was already streamed to avoid duplicate output,
+      except timeout recovery can resume in a new stream segment.
    - Recursive failover is prevented by the factory returning plain providers.
    - Primary provider is circuit-broken after repeated failures to avoid
      wasting requests on a known-bad endpoint.