Reference hermes-agent#17228 / #18100 / PR#18105.
iLink returns ret=-2 / errcode=-2 for two different reasons:
- stale context_token: errmsg is empty/None or "unknown error"
- genuine rate limit: errmsg is populated (e.g. "frequency limit")
Previously we swallowed all ret=-2 responses, which caused silent
message drops when the context_token was stale.
Changes:
- Add _is_stale_session_ret() to detect empty/"unknown error" errmsg
- _send_text/_send_media_file retry once without context_token on stale
session signal, then raise on persistent failure so ChannelManager
can retry with backoff
- Remove error-swallowing behavior
- Update tests to expect raises and add TestIsStaleSessionRet coverage
The iLink sendmessage API frequently returns ret=-2 (parameter error / rate
limit / expired token) even when HTTP status is 200. The openclaw reference
plugin ignores the JSON body for sendmessage entirely and only checks HTTP
status. Our previous strict ret checking turned ret=-2 into RuntimeError,
causing ChannelManager retries which only made things worse.
Changes:
- _send_text: swallow ret=-2 after one retry without context_token.
Log request body + response at warning level for diagnostics.
- _send_media_file: same ret=-2 swallowing.
- _generate_client_id: change format to ``nanobot:{timestamp}-{hex}`` to
match openclaw-weixin ``{prefix}:{Date.now()}-{hex}``.
- Update tests to expect swallowing instead of raising for ret=-2.
When the iLink API returns ret=-2 (parameter error), it is often caused
by an expired context_token rather than a malformed payload. After a
gateway restart, the cached token can become stale within ~90 seconds if
no new inbound message refreshes it, causing all outbound replies to fail
silently.
Changes:
- _send_text: retry once without context_token when ret=-2 and a token
was present; if the retry succeeds, clear the expired token from cache.
- Remove leftover @staticmethod on _check_response_error so self.logger
and the body parameter work correctly.
- Bump WEIXIN_CHANNEL_VERSION from 2.1.1 -> 2.1.7 to match the reference
openclaw-weixin plugin.
- Add tests covering the ret=-2 retry path, failure path, and no-token
path.
References:
- openclaw/openclaw#61174 (context_token expiry after long agent turns)
- hermes-agent#21011 (ret=-2 rate limiting / parameter error)
The iLink API signals failures through either `ret` or `errcode`.
`_poll_once` already checked both, but `_send_text` and `_send_media_file`
only checked `errcode`. When the API returned `ret != 0` with
`errcode == 0`, the send appeared successful but the message was never
delivered, causing the "still losing messages" issue.
- Add `_check_response_error` helper that validates both fields
- Use it in `_send_text` and `_send_media_file`
- Add debug log after successful text send for observability
- Add test for nonzero ret with zero errcode
Refs: previous inbound fix (suppress -> explicit try/except)
Replace `with suppress(Exception)` in `_poll_once` message processing
and the `start()` poll loop with explicit `try/except` blocks that
log errors via `logger.exception`. Previously, any exception during
message processing (e.g. in `_handle_message`) was swallowed silently,
causing inbound messages to disappear without a trace.
Also add tests verifying that:
- `_poll_once` logs and continues when `_process_message` fails
- the poll loop logs and continues when `_poll_once` fails
_send_text() swallowed API errors (non-zero errcode) with just a
warning log, and send() had three silent return paths (no client,
session paused, no context_token). Neither triggered ChannelManager's
retry logic, causing persistent message loss until a new inbound
message refreshed the context_token.
Now all failure paths raise RuntimeError, matching BaseChannel's
contract and enabling proper retry behavior.
Audited all channel implementations for overly broad exception handling
that causes retry amplification or silent message loss during network
errors. This is the same class of bug as #3050 (Telegram _send_text).
Fixes by channel:
Telegram (send_delta):
- _stream_end path used except Exception for HTML edit fallback
- Network errors (TimedOut, NetworkError) triggered redundant plain
text edit, doubling connection demand during pool exhaustion
- Changed to except BadRequest, matching the _send_text fix
Discord:
- send() caught all exceptions without re-raising
- ChannelManager._send_with_retry() saw successful return, never retried
- Messages silently dropped on any send failure
- Added raise after error logging
DingTalk:
- _send_batch_message() returned False on all exceptions including
network errors — no retry, fallback text sent unnecessarily
- _read_media_bytes() and _upload_media() swallowed transport errors,
causing _send_media_ref() to cascade through doomed fallback attempts
- Added except httpx.TransportError handlers that re-raise immediately
WeChat:
- Media send failure triggered text fallback even for network errors
- During network issues: 3×(media + text) = 6 API calls per message
- Added specific catches: TimeoutException/TransportError re-raise,
5xx HTTPStatusError re-raises, 4xx falls back to text
QQ:
- _send_media() returned False on all exceptions
- Network errors triggered fallback text instead of retry
- Added except (aiohttp.ClientError, OSError) that re-raises
Tests: 331 passed (283 existing + 48 new across 5 channel test files)
Fixes: #3054
Related: #3050, #3053
Add CI step to detect unused imports (F401) and unused variables (F841)
with ruff. Clean up existing violations:
- Remove unused Consolidator import in agent/__init__.py
- Remove unused re import in agent/loop.py
- Remove unused Path import in channels/feishu.py
- Remove unused ContentRepositoryConfigError import in channels/matrix.py
- Remove unused field and CommandHandler imports in channels/telegram.py
- Remove unused exception variable in channels/weixin.py
1. Fix full_url path for non-image media to require AES key and skip download when missing,
instead of persisting encrypted bytes as valid media.
2. Restrict quoted media fallback trigger to only when no top-level media item exists,
not when top-level media download/decryption fails.
Fetch and cache typing tickets so the Weixin channel shows typing while nanobot is processing and clears it after the final reply.
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Make channel delivery failures raise consistently so retry policy lives in ChannelManager rather than being split across individual channels. Tighten Telegram stream finalization, clarify sendMaxRetries semantics, and align the docs with the behavior the system actually guarantees.
- Prevent repeated retries on expired sessions in the polling thread
- Stop sending messages to invalid agent sessions to eliminate noise logs and unnecessary requests
Move channel-specific login logic from CLI into each channel class via a
new `login(force=False)` method on BaseChannel. The `channels login <name>`
command now dynamically loads the channel and calls its login() method.
- WeixinChannel.login(): calls existing _qr_login(), with force to clear saved token
- WhatsAppChannel.login(): sets up bridge and spawns npm process for QR login
- CLI no longer contains duplicate login logic per channel
- Update CHANNEL_PLUGIN_GUIDE to document the login() hook
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously the WeChat channel's send() method only handled text messages,
completely ignoring msg.media. When the agent called message(media=[...]),
the file was never delivered to the user.
Implement the full WeChat CDN upload protocol following the reference
@tencent-weixin/openclaw-weixin v1.0.2:
1. Generate a client-side AES-128 key (16 random bytes)
2. Call getuploadurl with file metadata + hex-encoded AES key
3. AES-128-ECB encrypt the file and POST to CDN with filekey param
4. Read x-encrypted-param from CDN response header as download param
5. Send message with the media item (image/video/file) referencing
the CDN upload
Also adds:
- _encrypt_aes_ecb() for AES-128-ECB encryption (reverse of existing
_decrypt_aes_ecb)
- Media type detection from file extension (image/video/file)
- Graceful error handling: failed media sends notify the user via text
without blocking subsequent text delivery
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a new WeChat (微信) channel that connects to personal WeChat using
the ilinkai.weixin.qq.com HTTP long-poll API. Protocol reverse-engineered
from @tencent-weixin/openclaw-weixin v1.0.2.
Features:
- QR code login flow (nanobot weixin login)
- HTTP long-poll message receiving (getupdates)
- Text message sending with proper WeixinMessage format
- Media download with AES-128-ECB decryption (image/voice/file/video)
- Voice-to-text from WeChat + Groq Whisper fallback
- Quoted message (ref_msg) support
- Session expiry detection and auto-pause
- Server-suggested poll timeout adaptation
- Context token caching for replies
- Auto-discovery via channel registry
No WebSocket, no Node.js bridge, no local WeChat client needed — pure
HTTP with a bot token obtained via QR code scan.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>