2 Commits

Author SHA1 Message Date
Xubin Ren
82aa9efc02 test(mcp): pin CancelledError short-circuits the retry loop
The retry branch is only reachable via `except Exception`, and
`CancelledError` inherits from `BaseException`, so today it naturally
bypasses the retry path and /stop still works.  Add one focused
regression test so any future refactor that widens the retry catch to
`BaseException`, re-orders the handlers, or adds `CancelledError` to
`_TRANSIENT_EXC_NAMES` fails CI instead of silently swallowing /stop.

Made-with: Cursor
2026-04-21 13:24:40 +08:00
hussein1362
368752e707 fix(mcp): retry once on transient connection errors
When an MCP server restarts or a network connection drops between
tool calls, the existing session throws ClosedResourceError,
BrokenPipeError, ConnectionResetError, etc. Currently these are
caught as generic exceptions and returned as permanent failures
to the LLM, which then tells the user 'my tools are broken.'

This change adds a single automatic retry with a 1-second backoff
for transient connection-class errors in MCPToolWrapper,
MCPResourceWrapper, and MCPPromptWrapper. Non-transient errors
(ValueError, RuntimeError, McpError, etc.) are not retried.

The retry is conservative:
- Only 1 retry (not configurable, to keep the change minimal)
- Only for a specific set of connection-class exceptions
- Matched by exception class name to avoid importing anyio/etc.
- 1s sleep between attempts to allow the server to recover
- Clear logging distinguishes retried vs permanent failures

In production this eliminates most 'MCP tool call failed:
ClosedResourceError' noise when MCP bridge processes restart
(e.g. after config changes or OOM kills).

Tests: 22 new tests covering retry, exhaustion, non-transient
bypass, timeout bypass, and all three wrapper types.
2026-04-21 13:24:40 +08:00