30 Commits

Author SHA1 Message Date
chengyongru
e2955a2e12 fix(weixin): prevent silent message drops from poll exceptions and expired tokens
- Remove suppress(Exception) from poll loop and message processing; add
  logger.exception so inbound errors are visible.
- Check both ret and errcode on send to avoid silent drops when iLink
  returns ret != 0 with errcode == 0.
- Proactively refresh context_token via getconfig before sending if the
  cached token is older than 60s. This prevents message loss on long
  agent turns and cron pushes without relying on complex retry logic.

Refs: openclaw/openclaw#61174, NousResearch/hermes-agent#21011
2026-05-19 14:09:41 +08:00
chengyongru
1672f20d6e feat(weixin): buffer and coalesce tool hints inside WeixinChannel
WeChat iLink has a strict ~7 msgs / 5 min rate limit. A busy agent turn
can trigger 8+ tool-call hints, each sent as a separate message, quickly
burning the quota and causing silent message drops.

Implement buffering entirely inside WeixinChannel (no global changes):

- Tool hints are appended to a per-chat_id buffer instead of being
  sent immediately.
- A non-tool-hint message arriving for the same chat flushes pending
  hints first (joined with newlines, sent as a single message).
- stop() clears any remaining buffered hints.
- send_tool_hints=False still drops hints as before.

- Add 6 tests covering: single hint, multiple hints coalesced,
  different chats isolated, non-tool-hint flush, disabled dropping,
  and stop clearing buffers.
2026-05-08 18:56:40 +08:00
chengyongru
2e56fb95b6 fix(weixin): treat ret=-2 as rate limit with 60s backoff
Reference wxclawbot-cli docs: ret=-2 is a rate limit (~7 msgs / 5 min
per bot), NOT a stale session signal.  Empty/missing errmsg is the
normal rate-limit response; only 'unknown error' correlates with stale
session per hermes-agent.

Changes:
- _is_stale_session_ret: only match 'unknown error', not empty errmsg
- _send_text/_send_media_file: on ret=-2 wait 60s then retry once
  instead of retrying without context_token
- Remove stale-session retry for empty errmsg (was burning quota)
- Update tests to cover rate-limit backoff path
2026-05-08 10:09:12 +08:00
chengyongru
9665a0bb1a fix(weixin): distinguish stale session from rate limit on ret=-2
Reference hermes-agent#17228 / #18100 / PR#18105.

iLink returns ret=-2 / errcode=-2 for two different reasons:
- stale context_token: errmsg is empty/None or "unknown error"
- genuine rate limit: errmsg is populated (e.g. "frequency limit")

Previously we swallowed all ret=-2 responses, which caused silent
message drops when the context_token was stale.

Changes:
- Add _is_stale_session_ret() to detect empty/"unknown error" errmsg
- _send_text/_send_media_file retry once without context_token on stale
  session signal, then raise on persistent failure so ChannelManager
  can retry with backoff
- Remove error-swallowing behavior
- Update tests to expect raises and add TestIsStaleSessionRet coverage
2026-05-08 09:40:53 +08:00
chengyongru
639b4bae32 fix(weixin): treat ret=-2 as non-fatal on sendmessage and align client_id format
The iLink sendmessage API frequently returns ret=-2 (parameter error / rate
limit / expired token) even when HTTP status is 200.  The openclaw reference
plugin ignores the JSON body for sendmessage entirely and only checks HTTP
status.  Our previous strict ret checking turned ret=-2 into RuntimeError,
causing ChannelManager retries which only made things worse.

Changes:
- _send_text: swallow ret=-2 after one retry without context_token.
  Log request body + response at warning level for diagnostics.
- _send_media_file: same ret=-2 swallowing.
- _generate_client_id: change format to ``nanobot:{timestamp}-{hex}`` to
  match openclaw-weixin ``{prefix}:{Date.now()}-{hex}``.
- Update tests to expect swallowing instead of raising for ret=-2.
2026-05-07 18:07:32 +08:00
chengyongru
9b4df02651 fix(weixin): retry send without expired context_token on ret=-2
When the iLink API returns ret=-2 (parameter error), it is often caused
by an expired context_token rather than a malformed payload. After a
gateway restart, the cached token can become stale within ~90 seconds if
no new inbound message refreshes it, causing all outbound replies to fail
silently.

Changes:
- _send_text: retry once without context_token when ret=-2 and a token
  was present; if the retry succeeds, clear the expired token from cache.
- Remove leftover @staticmethod on _check_response_error so self.logger
  and the body parameter work correctly.
- Bump WEIXIN_CHANNEL_VERSION from 2.1.1 -> 2.1.7 to match the reference
  openclaw-weixin plugin.
- Add tests covering the ret=-2 retry path, failure path, and no-token
  path.

References:
- openclaw/openclaw#61174 (context_token expiry after long agent turns)
- hermes-agent#21011 (ret=-2 rate limiting / parameter error)
2026-05-07 17:38:11 +08:00
chengyongru
2b57766743 fix(weixin): check both ret and errcode on send to avoid silent drops
The iLink API signals failures through either `ret` or `errcode`.
`_poll_once` already checked both, but `_send_text` and `_send_media_file`
only checked `errcode`. When the API returned `ret != 0` with
`errcode == 0`, the send appeared successful but the message was never
delivered, causing the "still losing messages" issue.

- Add `_check_response_error` helper that validates both fields
- Use it in `_send_text` and `_send_media_file`
- Add debug log after successful text send for observability
- Add test for nonzero ret with zero errcode

Refs: previous inbound fix (suppress -> explicit try/except)
2026-05-07 16:20:08 +08:00
chengyongru
563fcaf002 fix(weixin): log exceptions instead of silently dropping messages in poll loop
Replace `with suppress(Exception)` in `_poll_once` message processing
and the `start()` poll loop with explicit `try/except` blocks that
log errors via `logger.exception`. Previously, any exception during
message processing (e.g. in `_handle_message`) was swallowed silently,
causing inbound messages to disappear without a trace.

Also add tests verifying that:
- `_poll_once` logs and continues when `_process_message` fails
- the poll loop logs and continues when `_poll_once` fails
2026-05-07 15:15:53 +08:00
chengyongru
49c07aa45a style: address code review feedback
- Consistent "WeChat" prefix in context_token error message
- Use object() instead of httpx.AsyncClient() in new tests to avoid
  resource leak warnings
2026-05-06 23:52:50 +08:00
chengyongru
98c2f7cc27 fix(weixin): raise exceptions instead of silently dropping messages
_send_text() swallowed API errors (non-zero errcode) with just a
warning log, and send() had three silent return paths (no client,
session paused, no context_token). Neither triggered ChannelManager's
retry logic, causing persistent message loss until a new inbound
message refreshed the context_token.

Now all failure paths raise RuntimeError, matching BaseChannel's
contract and enabling proper retry behavior.
2026-05-06 23:52:50 +08:00
Xubin Ren
4db50f2e32 fix(channels): reject unauthorized inbound before side effects
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-05 23:16:36 +08:00
bahtya
fa98524944 fix(channels): prevent retry amplification and silent message loss across channels
Audited all channel implementations for overly broad exception handling
that causes retry amplification or silent message loss during network
errors. This is the same class of bug as #3050 (Telegram _send_text).

Fixes by channel:

Telegram (send_delta):
- _stream_end path used except Exception for HTML edit fallback
- Network errors (TimedOut, NetworkError) triggered redundant plain
  text edit, doubling connection demand during pool exhaustion
- Changed to except BadRequest, matching the _send_text fix

Discord:
- send() caught all exceptions without re-raising
- ChannelManager._send_with_retry() saw successful return, never retried
- Messages silently dropped on any send failure
- Added raise after error logging

DingTalk:
- _send_batch_message() returned False on all exceptions including
  network errors — no retry, fallback text sent unnecessarily
- _read_media_bytes() and _upload_media() swallowed transport errors,
  causing _send_media_ref() to cascade through doomed fallback attempts
- Added except httpx.TransportError handlers that re-raise immediately

WeChat:
- Media send failure triggered text fallback even for network errors
- During network issues: 3×(media + text) = 6 API calls per message
- Added specific catches: TimeoutException/TransportError re-raise,
  5xx HTTPStatusError re-raises, 4xx falls back to text

QQ:
- _send_media() returned False on all exceptions
- Network errors triggered fallback text instead of retry
- Added except (aiohttp.ClientError, OSError) that re-raises

Tests: 331 passed (283 existing + 48 new across 5 channel test files)

Fixes: #3054
Related: #3050, #3053
2026-04-13 00:30:45 +08:00
cypggs
ca68a89ce6 merge: resolve conflicts with upstream/main, preserve typing indicator 2026-04-02 14:28:23 +08:00
Xubin Ren
949a10f536 fix(weixin): reset QR poll host after refresh 2026-03-31 19:40:13 +08:00
xcosmosbox
2a6c616080 fix(WeiXin): fix full_url download error 2026-03-31 19:40:13 +08:00
xcosmosbox
1bcd5f9742 fix(weixin): fix test file version reader 2026-03-31 19:40:13 +08:00
xcosmosbox
26947db479 feat(weixin): add voice message, typing keepalive, getConfig cache, and QR polling resilience 2026-03-31 19:40:13 +08:00
xcosmosbox
0514233217 fix(weixin): align full_url AES key handling and quoted media fallback logic with reference
1. Fix full_url path for non-image media to require AES key and skip download when missing,
   instead of persisting encrypted bytes as valid media.
2. Restrict quoted media fallback trigger to only when no top-level media item exists,
   not when top-level media download/decryption fails.
2026-03-31 19:40:13 +08:00
xcosmosbox
345c393e53 feat(weixin): implement getConfig and sendTyping 2026-03-31 19:40:13 +08:00
xcosmosbox
faf2b07923 feat(weixin): add fallback logic for referenced media download 2026-03-31 19:40:13 +08:00
xcosmosbox
efd42cc236 feat(weixin): implement QR redirect handling 2026-03-31 19:40:13 +08:00
xcosmosbox
3823042290 fix(weixin): correct PKCS7 unpadding for AES-ECB; support full_url for media download 2026-03-31 19:40:13 +08:00
xcosmosbox
5bdb7a90b1 feat(weixin):
1.align protocol headers with package.json metadata
2.support upload_full_url with fallback to upload_param
2026-03-31 19:40:13 +08:00
qcypggs
0340f81cfd fix: restore Weixin typing indicator
Fetch and cache typing tickets so the Weixin channel shows typing while nanobot is processing and clears it after the final reply.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
2026-03-30 19:25:55 +08:00
xcosmosbox
0dad6124a2 chore(WeiXin): version migration and compatibility update 2026-03-25 02:58:19 +08:00
xcosmosbox
48902ae95a fix(WeiXin): auto-refresh expired QR code during login to improve success rate 2026-03-25 02:58:19 +08:00
xcosmosbox
1f5492ea9e fix(WeiXin): persist _context_tokens with account.json to restore conversations after restart 2026-03-25 02:58:19 +08:00
xcosmosbox
9c872c3458 fix(WeiXin): resolve polling issues in WeiXin plugin
- Prevent repeated retries on expired sessions in the polling thread
- Stop sending messages to invalid agent sessions to eliminate noise logs and unnecessary requests
2026-03-25 02:58:19 +08:00
xcosmosbox
3a9d6ea536 feat(WeXin): add route_tag property to adapt to WeChat official ilinkai 1.0.3 requirements 2026-03-25 02:58:19 +08:00
chengyongru
72acba5d27 refactor(tests): optimize unit test structure 2026-03-24 15:12:22 +08:00