The iLink sendmessage API frequently returns ret=-2 (parameter error / rate
limit / expired token) even when HTTP status is 200. The openclaw reference
plugin ignores the JSON body for sendmessage entirely and only checks HTTP
status. Our previous strict ret checking turned ret=-2 into RuntimeError,
causing ChannelManager retries which only made things worse.
Changes:
- _send_text: swallow ret=-2 after one retry without context_token.
Log request body + response at warning level for diagnostics.
- _send_media_file: same ret=-2 swallowing.
- _generate_client_id: change format to ``nanobot:{timestamp}-{hex}`` to
match openclaw-weixin ``{prefix}:{Date.now()}-{hex}``.
- Update tests to expect swallowing instead of raising for ret=-2.
When the iLink API returns ret=-2 (parameter error), it is often caused
by an expired context_token rather than a malformed payload. After a
gateway restart, the cached token can become stale within ~90 seconds if
no new inbound message refreshes it, causing all outbound replies to fail
silently.
Changes:
- _send_text: retry once without context_token when ret=-2 and a token
was present; if the retry succeeds, clear the expired token from cache.
- Remove leftover @staticmethod on _check_response_error so self.logger
and the body parameter work correctly.
- Bump WEIXIN_CHANNEL_VERSION from 2.1.1 -> 2.1.7 to match the reference
openclaw-weixin plugin.
- Add tests covering the ret=-2 retry path, failure path, and no-token
path.
References:
- openclaw/openclaw#61174 (context_token expiry after long agent turns)
- hermes-agent#21011 (ret=-2 rate limiting / parameter error)
The iLink API signals failures through either `ret` or `errcode`.
`_poll_once` already checked both, but `_send_text` and `_send_media_file`
only checked `errcode`. When the API returned `ret != 0` with
`errcode == 0`, the send appeared successful but the message was never
delivered, causing the "still losing messages" issue.
- Add `_check_response_error` helper that validates both fields
- Use it in `_send_text` and `_send_media_file`
- Add debug log after successful text send for observability
- Add test for nonzero ret with zero errcode
Refs: previous inbound fix (suppress -> explicit try/except)
Replace `with suppress(Exception)` in `_poll_once` message processing
and the `start()` poll loop with explicit `try/except` blocks that
log errors via `logger.exception`. Previously, any exception during
message processing (e.g. in `_handle_message`) was swallowed silently,
causing inbound messages to disappear without a trace.
Also add tests verifying that:
- `_poll_once` logs and continues when `_process_message` fails
- the poll loop logs and continues when `_poll_once` fails
Add a session-scoped slash command palette sourced from backend command metadata, and keep welcome-page quick actions localized across all WebUI languages.
Co-authored-by: Cursor <cursoragent@cursor.com>
_send_text() swallowed API errors (non-zero errcode) with just a
warning log, and send() had three silent return paths (no client,
session paused, no context_token). Neither triggered ChannelManager's
retry logic, causing persistent message loss until a new inbound
message refreshed the context_token.
Now all failure paths raise RuntimeError, matching BaseChannel's
contract and enabling proper retry behavior.
When host is set to 0.0.0.0, the gateway now enforces that either token
or token_issue_secret must be configured — it refuses to start otherwise.
Bootstrap endpoint behavior:
- token_issue_secret configured: always validate regardless of source IP
(handles reverse-proxy scenarios where all connections appear as localhost)
- No secret: only localhost can bootstrap (local dev mode)
The frontend shows an authentication form when bootstrap returns 401/403,
persists the secret in localStorage, and retries automatically on reload.
The previous LAN-access fix (PR #3656) relaxed the bootstrap localhost
check when host was 0.0.0.0, but did not require any authentication —
any device on the network could obtain a token without credentials.
New behavior:
- token_issue_secret configured: always validate, regardless of source
IP (handles reverse-proxy scenarios where all connections appear as
localhost).
- No secret configured: only localhost can bootstrap (local dev mode).
This supersedes the host-based check from PR #3656.
The webui bootstrap endpoint (/webui/bootstrap) rejected all non-localhost
connections with HTTP 403, preventing the embedded webui from working when
accessed from another device on the LAN — even when host was set to 0.0.0.0.
Skip the localhost check when the server is explicitly bound to 0.0.0.0 or ::,
since that signals intent to accept external connections.
Align the WebUI sidebar and chat chrome with the updated design, and generate WebUI session titles asynchronously without blocking turns.
Co-authored-by: Cursor <cursoragent@cursor.com>
A single transient failure between the agent and an OpenAI/Groq Whisper
endpoint currently vanishes as `return ""` in transcribe(). The voice
message arrives as the empty string and there is no way to tell real
silence apart from a failed upload. A malformed but successful response
body is even worse: the JSON-decode error escapes the helper unhandled.
Add a shared `_post_transcription_with_retry` used by both providers.
Retry behaviour:
- exponential backoff 1s -> 2s -> 4s, up to 3 retries (4 attempts)
- retryable HTTP statuses: 408, 429, 500, 502, 503, 504
- retryable exceptions: TimeoutException, ConnectError, ReadError,
WriteError, RemoteProtocolError
Non-transient failures short-circuit to "" on the first attempt --
retrying a misconfigured key or a broken upload only burns rate-limit
quota. Branches that short-circuit:
- missing API key, missing audio file
- file-read errors (PermissionError, OSError) on the audio path,
preserving the nightly contract for direct provider callers
- HTTP auth/4xx body issues via raise_for_status()
- response.json() parse failures
- non-dict JSON payloads
Sharing one helper means OpenAI and Groq cannot drift apart silently.
Thread `language` through the helper. The multipart files dict is rebuilt
inside the per-attempt loop, so when a caller sets self.language the
`language` field is sent on every attempt -- not just the first.
Tests cover:
- every advertised retryable status and exception, parameterized
- language present on attempts 1 and 2 of a 503->200 sequence
- language absent when unset; present when set (both providers)
- malformed JSON body and non-dict JSON body short-circuit to ""
- PermissionError on file read short-circuits with no HTTP attempt
- max-attempts give-up, exponential-backoff schedule, auth no-retry,
missing-key / missing-file short-circuit
Test stub fix: the _StubResponse in tests/channels/test_channel_plugins.py
declared no status_code, which the new helper reads for retry classification.
Set status_code = 200 so the stub advertises the successful response that
those tests already simulate. Also moved the two transcription-provider
imports to the top of that file (previously placed mid-file) so the file
is ruff-clean (E402).
When the Matrix homeserver returns M_UNKNOWN_TOKEN / M_FORBIDDEN /
M_UNAUTHORIZED (or soft_logout), the previous _sync_loop kept retrying
sync_forever every 2 seconds forever, spamming the homeserver and
filling logs (#1851). The auth state cannot recover by retrying, so
this is pure noise and a soft DoS on the homeserver.
- Extract `_is_fatal_auth_response()` helper
- In `_on_sync_error`, on fatal auth: set `_running=False` and call
`stop_sync_forever()` so the loop exits cleanly
- Add exponential backoff (2s → 60s cap) to the generic exception path
in `_sync_loop` so transient network blips also stop hammering
Closes#1851
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Matrix sync replays the room timeline on each startup or `/restart`,
causing already-handled messages to be reprocessed (#3553). Even with
`store_sync_tokens=True`, the sync token isn't reliably re-injected
when restoring a session via access_token + load_store(), so the
client re-reads recent timeline entries.
Filter `event.server_timestamp` against the process start time so old
events are dropped at the `_on_message` / `_on_media_message` entry
points. Trade-off: messages received during downtime won't be
processed, which matches the issue reporter's expectation.
Closes#3553
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stream-end events are emitted at the end of every assistant turn. When
the agent has more tool-call rounds queued, the runner sets
`_resuming=True` on the metadata. Without a guard, every intermediate
stream end removed the OnIt reaction (the first one wins, since
`_reaction_ids.pop` empties the slot) and re-added `done_emoji`,
producing a DONE reaction after every tool call instead of only at
final completion.
Wrap the OnIt removal and `done_emoji` add in a `not _resuming` guard
so the OnIt indicator persists across tool-call rounds and DONE fires
exactly once when the agent's final response lands.
`_resuming` already flows through outbound metadata
(`nanobot/agent/loop.py:747`) and survives `_coalesce_stream_deltas`
because pure `_stream_end` messages without `_stream_delta` skip the
merge branch.
Tests:
- test_no_removal_when_resuming
- test_done_emoji_only_on_final_stream_end
Slack inbound events with subtype=file_share were silently dropped, so
nanobot never saw messages that included attachments. Allow file_share
through, download Slack-private files using the bot token into the
local media dir, and pass them to the agent as media paths plus a
"[file: name]" / "[image: name]" placeholder in the content. Reject
responses that look like Slack's login HTML so an auth page is never
saved as if it were the user's file. Document the required files:read
scope alongside files:write so installs that read attachments are not
quietly missing the permission.
Capture Slack thread metadata for cron and message-tool deliveries so replies stay in the originating thread, and hydrate first thread mentions with recent Slack context.
Made-with: Cursor
Align with deer-flow: group top-level messages (no root_id) now get
their own session keyed by message_id instead of sharing a single
group-wide session. Topic replies continue to share session via
root_id.
The stream-end reaction cleanup now reads from _reaction_ids instead
of metadata, so pre-populate the dict in the test instead of passing
reaction_id via metadata.
Align reply targeting with deer-flow: always reply to the inbound
message_id (not root_id). The Feishu Reply API keeps responses in
the same topic automatically when the target message is inside a topic.
Also fix run_in_executor calls that passed reply_in_thread as a
positional arg to a keyword-only parameter, and route standalone
tool hints through the reply API for group chats.
When reply_to_message config is enabled, the bot's first reply now
uses reply_in_thread=True to create a visual topic/thread in the
Feishu client. Subsequent chunks fall back to regular create.
The reply_to_message default remains False for backward compatibility.
Failed replies still fall back to regular send — messages are never
silently dropped.
Thread replies (messages with root_id != message_id) in group chats
now get their own session key: feishu:{chat_id}:{root_id}. This
means each Feishu thread has an independent conversation context.
Top-level group messages and all private chat messages keep the
default session key (no override), consistent with Telegram and
Slack channel behavior.
Co-authored-by: shenchengtsi <228445050+shenchengtsi@users.noreply.github.com>