mirror of
https://github.com/HKUDS/nanobot.git
synced 2026-05-03 16:25:53 +00:00
When an MCP server restarts or a network connection drops between tool calls, the existing session throws ClosedResourceError, BrokenPipeError, ConnectionResetError, etc. Currently these are caught as generic exceptions and returned as permanent failures to the LLM, which then tells the user 'my tools are broken.' This change adds a single automatic retry with a 1-second backoff for transient connection-class errors in MCPToolWrapper, MCPResourceWrapper, and MCPPromptWrapper. Non-transient errors (ValueError, RuntimeError, McpError, etc.) are not retried. The retry is conservative: - Only 1 retry (not configurable, to keep the change minimal) - Only for a specific set of connection-class exceptions - Matched by exception class name to avoid importing anyio/etc. - 1s sleep between attempts to allow the server to recover - Clear logging distinguishes retried vs permanent failures In production this eliminates most 'MCP tool call failed: ClosedResourceError' noise when MCP bridge processes restart (e.g. after config changes or OOM kills). Tests: 22 new tests covering retry, exhaustion, non-transient bypass, timeout bypass, and all three wrapper types.