Fixes two related input-handling bugs in the onboard wizard:
1. _input_text treated "" as None, preventing users from clearing
optional string fields or entering empty strings intentionally.
2. _input_model_with_autocomplete used `if value else None`, which
discarded falsy values such as empty strings or 0.
To support clearing optional string fields, add _is_str_or_none() and
normalize empty strings to None inside _configure_pydantic_model only
when the field annotation is `str | None`. Required str fields keep
"" as a valid value.
Also included:
- Remember last selected item in provider/channel/model menus for
better UX when configuring multiple items.
- Rename _SIMPLE_TYPES and _MENU_DISPATCH to lowercase to follow
Python naming conventions (they are local variables, not constants).
- Remove unused imports in test file.
Extracted from PR #3358.
Reference hermes-agent#17228 / #18100 / PR#18105.
iLink returns ret=-2 / errcode=-2 for two different reasons:
- stale context_token: errmsg is empty/None or "unknown error"
- genuine rate limit: errmsg is populated (e.g. "frequency limit")
Previously we swallowed all ret=-2 responses, which caused silent
message drops when the context_token was stale.
Changes:
- Add _is_stale_session_ret() to detect empty/"unknown error" errmsg
- _send_text/_send_media_file retry once without context_token on stale
session signal, then raise on persistent failure so ChannelManager
can retry with backoff
- Remove error-swallowing behavior
- Update tests to expect raises and add TestIsStaleSessionRet coverage
The iLink sendmessage API frequently returns ret=-2 (parameter error / rate
limit / expired token) even when HTTP status is 200. The openclaw reference
plugin ignores the JSON body for sendmessage entirely and only checks HTTP
status. Our previous strict ret checking turned ret=-2 into RuntimeError,
causing ChannelManager retries which only made things worse.
Changes:
- _send_text: swallow ret=-2 after one retry without context_token.
Log request body + response at warning level for diagnostics.
- _send_media_file: same ret=-2 swallowing.
- _generate_client_id: change format to ``nanobot:{timestamp}-{hex}`` to
match openclaw-weixin ``{prefix}:{Date.now()}-{hex}``.
- Update tests to expect swallowing instead of raising for ret=-2.
When the iLink API returns ret=-2 (parameter error), it is often caused
by an expired context_token rather than a malformed payload. After a
gateway restart, the cached token can become stale within ~90 seconds if
no new inbound message refreshes it, causing all outbound replies to fail
silently.
Changes:
- _send_text: retry once without context_token when ret=-2 and a token
was present; if the retry succeeds, clear the expired token from cache.
- Remove leftover @staticmethod on _check_response_error so self.logger
and the body parameter work correctly.
- Bump WEIXIN_CHANNEL_VERSION from 2.1.1 -> 2.1.7 to match the reference
openclaw-weixin plugin.
- Add tests covering the ret=-2 retry path, failure path, and no-token
path.
References:
- openclaw/openclaw#61174 (context_token expiry after long agent turns)
- hermes-agent#21011 (ret=-2 rate limiting / parameter error)
The iLink API signals failures through either `ret` or `errcode`.
`_poll_once` already checked both, but `_send_text` and `_send_media_file`
only checked `errcode`. When the API returned `ret != 0` with
`errcode == 0`, the send appeared successful but the message was never
delivered, causing the "still losing messages" issue.
- Add `_check_response_error` helper that validates both fields
- Use it in `_send_text` and `_send_media_file`
- Add debug log after successful text send for observability
- Add test for nonzero ret with zero errcode
Refs: previous inbound fix (suppress -> explicit try/except)
Replace `with suppress(Exception)` in `_poll_once` message processing
and the `start()` poll loop with explicit `try/except` blocks that
log errors via `logger.exception`. Previously, any exception during
message processing (e.g. in `_handle_message`) was swallowed silently,
causing inbound messages to disappear without a trace.
Also add tests verifying that:
- `_poll_once` logs and continues when `_process_message` fails
- the poll loop logs and continues when `_poll_once` fails
Add a session-scoped slash command palette sourced from backend command metadata, and keep welcome-page quick actions localized across all WebUI languages.
Co-authored-by: Cursor <cursoragent@cursor.com>
_send_text() swallowed API errors (non-zero errcode) with just a
warning log, and send() had three silent return paths (no client,
session paused, no context_token). Neither triggered ChannelManager's
retry logic, causing persistent message loss until a new inbound
message refreshed the context_token.
Now all failure paths raise RuntimeError, matching BaseChannel's
contract and enabling proper retry behavior.
When host is set to 0.0.0.0, the gateway now enforces that either token
or token_issue_secret must be configured — it refuses to start otherwise.
Bootstrap endpoint behavior:
- token_issue_secret configured: always validate regardless of source IP
(handles reverse-proxy scenarios where all connections appear as localhost)
- No secret: only localhost can bootstrap (local dev mode)
The frontend shows an authentication form when bootstrap returns 401/403,
persists the secret in localStorage, and retries automatically on reload.
The previous LAN-access fix (PR #3656) relaxed the bootstrap localhost
check when host was 0.0.0.0, but did not require any authentication —
any device on the network could obtain a token without credentials.
New behavior:
- token_issue_secret configured: always validate, regardless of source
IP (handles reverse-proxy scenarios where all connections appear as
localhost).
- No secret configured: only localhost can bootstrap (local dev mode).
This supersedes the host-based check from PR #3656.
The webui bootstrap endpoint (/webui/bootstrap) rejected all non-localhost
connections with HTTP 403, preventing the embedded webui from working when
accessed from another device on the LAN — even when host was set to 0.0.0.0.
Skip the localhost check when the server is explicitly bound to 0.0.0.0 or ::,
since that signals intent to accept external connections.
Align the WebUI sidebar and chat chrome with the updated design, and generate WebUI session titles asynchronously without blocking turns.
Co-authored-by: Cursor <cursoragent@cursor.com>
asyncio.create_task in connect_mcp_servers creates child tasks for
each MCP server, but close_mcp calls stack.aclose() from the main
task. anyio CancelScope requires enter/exit in the same task, so the
cross-task exit raises RuntimeError which gets silently caught. The
orphaned cancel scope keeps retrying via call_soon on every event
loop tick, consuming 100% CPU.
Fix: remove create_task/gather and connect servers sequentially in the
caller task. MCP servers are typically 1-2, so parallel connection
provides negligible benefit while introducing the cancel scope hazard.
Closes#3638
The is_path branch in _fmt_known was not passing max_length to
abbreviate_path, so read_file, write_file, edit, list_dir, and
web_fetch always truncated paths at 40 chars regardless of config.
Now all three branches (is_path, is_command, fallback) honor the
configured toolHintMaxLength.
The config field was added but never passed from config to AgentLoop.
The value was always falling back to the default (40) regardless of
what was set in config.json.
Now passes tool_hint_max_length through all AgentLoop() call sites:
- nanobot/nanobot.py (main bot)
- nanobot/cli/commands.py (CLI agent, dev, webui commands)
Also adds documentation in docs/configuration.md.
Add to config (default: 40, range: 20-500).
Controls how many characters of tool hints are shown in progress updates
(e.g. '$ cd …/project && npm test').
Set to 120+ to see full commands instead of truncated hints:
```json
{
"agents": {
"defaults": {
"toolHintMaxLength": 120
}
}
}
```
- Thread max_length through format_tool_hints → _fmt_known/_fmt_mcp/_fmt_fallback
- Make path abbreviation in _abbreviate_command proportional to max_length
- Add TestToolHintMaxLength test class with 5 tests
- All 41 existing tests pass
- Correct api_key type hint to str | None in _post_transcription_with_retry
- Remove unreachable final return ""
- Fix test_openai_missing_api_key_short_circuits to actually test
missing-key path (use audio_file fixture so file exists)
- Fix PermissionError patch for Windows (patch class method instead
of instance attribute)
A single transient failure between the agent and an OpenAI/Groq Whisper
endpoint currently vanishes as `return ""` in transcribe(). The voice
message arrives as the empty string and there is no way to tell real
silence apart from a failed upload. A malformed but successful response
body is even worse: the JSON-decode error escapes the helper unhandled.
Add a shared `_post_transcription_with_retry` used by both providers.
Retry behaviour:
- exponential backoff 1s -> 2s -> 4s, up to 3 retries (4 attempts)
- retryable HTTP statuses: 408, 429, 500, 502, 503, 504
- retryable exceptions: TimeoutException, ConnectError, ReadError,
WriteError, RemoteProtocolError
Non-transient failures short-circuit to "" on the first attempt --
retrying a misconfigured key or a broken upload only burns rate-limit
quota. Branches that short-circuit:
- missing API key, missing audio file
- file-read errors (PermissionError, OSError) on the audio path,
preserving the nightly contract for direct provider callers
- HTTP auth/4xx body issues via raise_for_status()
- response.json() parse failures
- non-dict JSON payloads
Sharing one helper means OpenAI and Groq cannot drift apart silently.
Thread `language` through the helper. The multipart files dict is rebuilt
inside the per-attempt loop, so when a caller sets self.language the
`language` field is sent on every attempt -- not just the first.
Tests cover:
- every advertised retryable status and exception, parameterized
- language present on attempts 1 and 2 of a 503->200 sequence
- language absent when unset; present when set (both providers)
- malformed JSON body and non-dict JSON body short-circuit to ""
- PermissionError on file read short-circuits with no HTTP attempt
- max-attempts give-up, exponential-backoff schedule, auth no-retry,
missing-key / missing-file short-circuit
Test stub fix: the _StubResponse in tests/channels/test_channel_plugins.py
declared no status_code, which the new helper reads for retry classification.
Set status_code = 200 so the stub advertises the successful response that
those tests already simulate. Also moved the two transcription-provider
imports to the top of that file (previously placed mid-file) so the file
is ruff-clean (E402).
- Correct api_key type hint to str | None in _post_transcription_with_retry
- Remove unreachable final return ""
- Fix test_openai_missing_api_key_short_circuits to actually test
missing-key path (use audio_file fixture so file exists)
- Fix PermissionError patch for Windows (patch class method instead
of instance attribute)
A single transient failure between the agent and an OpenAI/Groq Whisper
endpoint currently vanishes as `return ""` in transcribe(). The voice
message arrives as the empty string and there is no way to tell real
silence apart from a failed upload. A malformed but successful response
body is even worse: the JSON-decode error escapes the helper unhandled.
Add a shared `_post_transcription_with_retry` used by both providers.
Retry behaviour:
- exponential backoff 1s -> 2s -> 4s, up to 3 retries (4 attempts)
- retryable HTTP statuses: 408, 429, 500, 502, 503, 504
- retryable exceptions: TimeoutException, ConnectError, ReadError,
WriteError, RemoteProtocolError
Non-transient failures short-circuit to "" on the first attempt --
retrying a misconfigured key or a broken upload only burns rate-limit
quota. Branches that short-circuit:
- missing API key, missing audio file
- file-read errors (PermissionError, OSError) on the audio path,
preserving the nightly contract for direct provider callers
- HTTP auth/4xx body issues via raise_for_status()
- response.json() parse failures
- non-dict JSON payloads
Sharing one helper means OpenAI and Groq cannot drift apart silently.
Thread `language` through the helper. The multipart files dict is rebuilt
inside the per-attempt loop, so when a caller sets self.language the
`language` field is sent on every attempt -- not just the first.
Tests cover:
- every advertised retryable status and exception, parameterized
- language present on attempts 1 and 2 of a 503->200 sequence
- language absent when unset; present when set (both providers)
- malformed JSON body and non-dict JSON body short-circuit to ""
- PermissionError on file read short-circuits with no HTTP attempt
- max-attempts give-up, exponential-backoff schedule, auth no-retry,
missing-key / missing-file short-circuit
Test stub fix: the _StubResponse in tests/channels/test_channel_plugins.py
declared no status_code, which the new helper reads for retry classification.
Set status_code = 200 so the stub advertises the successful response that
those tests already simulate. Also moved the two transcription-provider
imports to the top of that file (previously placed mid-file) so the file
is ruff-clean (E402).
The is_path branch in _fmt_known was not passing max_length to
abbreviate_path, so read_file, write_file, edit, list_dir, and
web_fetch always truncated paths at 40 chars regardless of config.
Now all three branches (is_path, is_command, fallback) honor the
configured toolHintMaxLength.
The config field was added but never passed from config to AgentLoop.
The value was always falling back to the default (40) regardless of
what was set in config.json.
Now passes tool_hint_max_length through all AgentLoop() call sites:
- nanobot/nanobot.py (main bot)
- nanobot/cli/commands.py (CLI agent, dev, webui commands)
Also adds documentation in docs/configuration.md.
Add to config (default: 40, range: 20-500).
Controls how many characters of tool hints are shown in progress updates
(e.g. '$ cd …/project && npm test').
Set to 120+ to see full commands instead of truncated hints:
```json
{
"agents": {
"defaults": {
"toolHintMaxLength": 120
}
}
}
```
- Thread max_length through format_tool_hints → _fmt_known/_fmt_mcp/_fmt_fallback
- Make path abbreviation in _abbreviate_command proportional to max_length
- Add TestToolHintMaxLength test class with 5 tests
- All 41 existing tests pass
Disable the externally updated cron job before yielding to the event loop so slow Windows CI cannot run the short-interval job before the test writes the update.
Keep private URL access blocked at the tool boundary, but return a clear non-retryable hint so the agent can recover conversationally instead of aborting the turn.
Co-authored-by: Cursor <cursoragent@cursor.com>
``Nanobot.run()`` has always documented ``RunResult.tools_used`` and
``RunResult.messages`` but actually returned ``[]`` for both, so SDK
consumers could never inspect which tools fired or what the final
message list looked like — the only useful field was ``content``.
This threads the data out via a tiny ``_SDKCaptureHook`` that installs
alongside any user-supplied hooks. The capture hook accumulates tool
names across iterations and snapshots the message list on each
``after_iteration`` call; the last snapshot reflects end-of-turn state.
Only the SDK facade is touched: ``AgentLoop.process_direct`` and
``AgentRunner`` signatures are unchanged, so channels / CLI / API paths
are unaffected.
Replace the asyncio.Semaphore queueing approach with a simple count
check in SpawnTool.execute(). When the concurrency limit is reached,
the tool returns an error string so the agent can perceive the reason
and adjust its behavior instead of silently queueing.
- Remove max_concurrent_subagents parameter threading through
AgentLoop, commands.py, and nanobot.py
- SubagentManager reads the limit directly from AgentDefaults
- SpawnTool checks get_running_count() before calling spawn()
- Simplify tests to verify rejection behavior
Replace the asyncio.Semaphore queueing approach with a simple count
check in SpawnTool.execute(). When the concurrency limit is reached,
the tool returns an error string so the agent can perceive the reason
and adjust its behavior instead of silently queueing.
- Remove max_concurrent_subagents parameter threading through
AgentLoop, commands.py, and nanobot.py
- SubagentManager reads the limit directly from AgentDefaults
- SpawnTool checks get_running_count() before calling spawn()
- Simplify tests to verify rejection behavior
``Nanobot.run()`` has always documented ``RunResult.tools_used`` and
``RunResult.messages`` but actually returned ``[]`` for both, so SDK
consumers could never inspect which tools fired or what the final
message list looked like — the only useful field was ``content``.
This threads the data out via a tiny ``_SDKCaptureHook`` that installs
alongside any user-supplied hooks. The capture hook accumulates tool
names across iterations and snapshots the message list on each
``after_iteration`` call; the last snapshot reflects end-of-turn state.
Only the SDK facade is touched: ``AgentLoop.process_direct`` and
``AgentRunner`` signatures are unchanged, so channels / CLI / API paths
are unaffected.
Logout previously claimed to support github-copilot in --help text but had
no registered handler, so `provider logout github-copilot` failed with
"Logout not implemented". Add the handler, sharing token deletion with the
codex flow via `_delete_oauth_files`. Tighten handler-table types, fix the
codex test fixture filename, and cover github-copilot plus the unknown
provider path.
- Implement \
anobot provider logout <provider>\ to clear OAuth credentials.
- Add \_LOGOUT_HANDLERS\ registration mechanism mirroring login.
- Implement logout for \openai-codex\ by deleting local \oauth-cli-kit\ token and lock files.
- Fallback gracefully when attempting to logout from providers lacking local credentials or implementations.
- Fixes#2665
Keep the /dev workspace guard exception scoped to the known benign device paths already handled by ExecTool, and add coverage that non-benign /dev targets still get blocked. Also add a streaming regression for tool_error responses so fatal tool failures are delivered by channels instead of being marked as already streamed.
Co-authored-by: Cursor <cursoragent@cursor.com>
Three independent fixes for issues exposed by PR #3493:
1. shell.py: allow /dev/* paths in workspace guard
Commands like `rm file.txt 2>/dev/null` were blocked because
_extract_absolute_paths captured /dev/null as a path outside
the workspace. Allow /dev like media_path is already allowed.
2. shell.py: remove | from home_paths regex prefix
Loki query operator `|~` was misinterpreted as pipe + home
directory, causing false workspace violation errors.
3. loop.py: change _streamed from blacklist to whitelist
stop_reason "tool_error" was not in the exclusion set
{"ask_user", "error"}, so _streamed=True was set on fatal
errors. channel manager then skipped channel.send() because
it assumed the content was already streamed — but it never
was. Whitelist to only {"stop", "end_turn", "max_tokens"}.
Also fixes a pre-existing Windows bug in _spawn where
create_subprocess_exec + list2cmdline breaks commands with
paths containing spaces (e.g. D:\Program Files\python.exe).
Closes: #3599, #3605