diff --git a/docs/configuration.md b/docs/configuration.md index dd11eb3aa..0e4ab2bca 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1268,7 +1268,7 @@ Inline fallback object: Use inline objects only when a fallback is not worth naming as a reusable preset. `fallbackModels` belongs under `agents.defaults`, not inside individual `modelPresets` entries. -Failover only runs when the primary provider returns a retryable model/provider error before any answer text has been streamed. Typical fallback cases include timeouts, connection errors, 5xx server errors, 429 rate limits, overloads, and quota/balance exhaustion. It does not run for malformed requests, authentication/permission errors, content filtering/refusals, or context-length/message-format errors. +Failover normally runs when the primary provider returns a retryable model/provider error before any answer text has been streamed. Stream-stall timeouts are the recovery exception: if the provider already emitted partial answer text and then stalls, nanobot closes the current stream segment and retries/fails over in a new segment. Typical fallback cases include timeouts, connection errors, 5xx server errors, 429 rate limits, overloads, and quota/balance exhaustion. It does not run for malformed requests, authentication/permission errors, content filtering/refusals, or context-length/message-format errors. If fallback candidates use smaller `contextWindowTokens` values, nanobot builds context using the smallest window in the active chain so every candidate can receive the same prompt. diff --git a/nanobot/providers/fallback_provider.py b/nanobot/providers/fallback_provider.py index 2381d6175..b0c01afae 100644 --- a/nanobot/providers/fallback_provider.py +++ b/nanobot/providers/fallback_provider.py @@ -58,14 +58,17 @@ _FALLBACK_ERROR_TOKENS = ( class FallbackProvider(LLMProvider): """Wrap a primary provider and transparently failover to fallback models. - When the primary model returns an error and no content has been streamed yet, - the wrapper tries each fallback model in order. Each fallback model may - reside on a different provider — a factory callable creates the underlying - provider on-the-fly. + When the primary model returns a fallbackable error before content has been + streamed, the wrapper tries each fallback model in order. Streamed timeout + errors are the recovery exception: the caller may close the current stream + segment, then the wrapper continues failover with later deltas in a new + segment. Each fallback model may reside on a different provider — a factory + callable creates the underlying provider on-the-fly. Key design: - Failover is request-scoped (the wrapper itself is stateless between turns). - - Skipped when content was already streamed to avoid duplicate output. + - Skipped when content was already streamed to avoid duplicate output, + except timeout recovery can resume in a new stream segment. - Recursive failover is prevented by the factory returning plain providers. - Primary provider is circuit-broken after repeated failures to avoid wasting requests on a known-bad endpoint.