The Anthropic SDK raises a client-side ValueError when a non-streaming
`messages.create` call could exceed the 10-minute server timeout (e.g.
high `max_tokens` combined with extended thinking budget). The error
text "Streaming is required for operations that may take longer than
10 minutes" was bubbling up to the user as an opaque LLM error in
channels that use the non-stream path (e.g. wecom in #2709).
Detect this specific ValueError in `chat()` and transparently retry
through `chat_stream()` (without `on_content_delta` so behavior matches
the non-stream contract). Other ValueErrors continue to flow through
`_handle_error` unchanged.
Closes#2709
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the Matrix homeserver returns M_UNKNOWN_TOKEN / M_FORBIDDEN /
M_UNAUTHORIZED (or soft_logout), the previous _sync_loop kept retrying
sync_forever every 2 seconds forever, spamming the homeserver and
filling logs (#1851). The auth state cannot recover by retrying, so
this is pure noise and a soft DoS on the homeserver.
- Extract `_is_fatal_auth_response()` helper
- In `_on_sync_error`, on fatal auth: set `_running=False` and call
`stop_sync_forever()` so the loop exits cleanly
- Add exponential backoff (2s → 60s cap) to the generic exception path
in `_sync_loop` so transient network blips also stop hammering
Closes#1851
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LLM-generated tool calls may wrap URLs in markdown backticks or quotes
(e.g. \https://example.com\), causing urlparse to produce empty scheme
and netloc, which leads to all fetch attempts failing silently.
Add URL cleaning at the top of WebFetchTool.execute to strip whitespace,
backticks, double quotes, and single quotes, plus an early rejection guard
for non-http(s) URLs after cleaning.
Matrix sync replays the room timeline on each startup or `/restart`,
causing already-handled messages to be reprocessed (#3553). Even with
`store_sync_tokens=True`, the sync token isn't reliably re-injected
when restoring a session via access_token + load_store(), so the
client re-reads recent timeline entries.
Filter `event.server_timestamp` against the process start time so old
events are dropped at the `_on_message` / `_on_media_message` entry
points. Trade-off: messages received during downtime won't be
processed, which matches the issue reporter's expectation.
Closes#3553
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Anthropic: "none" must not enable extended thinking
- Azure: "none" must not suppress temperature or inject reasoning body
- DeepSeek/DashScope/Kimi: "none" sends thinking disabled, skips reasoning_effort field
- Gemini: gemma keyword enables auto-routing for gemma models
Adds Olostep (https://www.olostep.com) as an optional web_search backend
using the official olostep Python SDK (client.answers.create()).
Changes:
- pyproject.toml: adds olostep>=0.1.0 optional dependency
- schema.py: adds olostep to provider comment in WebSearchConfig
- web.py: adds _search_olostep() with lazy import and provider branching
- docs/configuration.md: documents Olostep setup under web search config
- tests: unit tests for the new provider
Backward compatible: existing users see no behavior change unless they
opt into provider: "olostep". No hard dependency at runtime path.
Co-authored-by: umerkay <umerkk164@gmail.com>
Stream-end events are emitted at the end of every assistant turn. When
the agent has more tool-call rounds queued, the runner sets
`_resuming=True` on the metadata. Without a guard, every intermediate
stream end removed the OnIt reaction (the first one wins, since
`_reaction_ids.pop` empties the slot) and re-added `done_emoji`,
producing a DONE reaction after every tool call instead of only at
final completion.
Wrap the OnIt removal and `done_emoji` add in a `not _resuming` guard
so the OnIt indicator persists across tool-call rounds and DONE fires
exactly once when the agent's final response lands.
`_resuming` already flows through outbound metadata
(`nanobot/agent/loop.py:747`) and survives `_coalesce_stream_deltas`
because pure `_stream_end` messages without `_stream_delta` skip the
merge branch.
Tests:
- test_no_removal_when_resuming
- test_done_emoji_only_on_final_stream_end
Add an `extra_body` field to `ProviderConfig` that merges arbitrary
key-value pairs into every OpenAI-compatible request body. This is the
escape hatch for provider-specific features that nanobot does not have
first-class fields for.
Real-world use cases this unblocks via config alone (no code changes):
- vLLM/TGI `chat_template_kwargs` (e.g. `enable_thinking: false`)
- vLLM guided decoding (`guided_json`, `guided_regex`)
- Local model sampling params (`repetition_penalty`, `top_k`, `min_p`)
- Any future provider-specific param without a new PR each time
The config extra_body is applied last via recursive deep-merge, so it
can extend or override provider-specific defaults (e.g. thinking
params) without clobbering sibling keys set by internal logic.
Changes:
- Add `extra_body: dict[str, Any] | None` to `ProviderConfig`
- Pass it through `factory.py` to `OpenAICompatProvider.__init__`
- Deep-merge into `_build_kwargs` after all internal extra_body entries
- Add `_deep_merge` helper (recursive dict merge, does not mutate inputs)
- 21 tests: deep-merge semantics, provider init, _build_kwargs
integration, thinking coexistence, real-world patterns (guided_json,
repetition_penalty), and schema validation
Treat workspace and safety guard failures as fatal regardless of whether they arrive from tool preparation, returned tool output, or raised exceptions.
Made-with: Cursor
The max_messages config field in AgentDefaults was accepted by the
schema but never threaded through to the actual get_history() calls
in the agent loop. Both call sites in _process_message hardcoded the
default, so sessions with slow or local models accumulated unbounded
history that inflated prompt tokens and caused LLM timeouts.
Changes:
- Add max_messages field to AgentDefaults (default 0 = use built-in
constant, any positive value caps history replay)
- Store the value on AgentLoop and pass it to get_history() when
non-zero
- Wire the config through all three AgentLoop construction sites in
commands.py (gateway, API server, CLI chat)
- 14 focused tests covering schema validation, init storage, history
slicing, boundary alignment, integration wiring, and the
zero/default path
Adds /history [n] to display the last N user/assistant messages from
the current session (default 10, max 50).
- Tool and system messages are filtered out for readability
- Long messages are truncated to 200 characters with an ellipsis
- Multimodal content (image blocks) is collapsed to its text parts
- Invalid count argument returns a usage hint
- /history n uses prefix routing; /history uses exact routing
Also registers /history in build_help_text().
Three failure modes addressed:
1. Model reflects HEARTBEAT.md instructions back as output instead of
executing them ("HEARTBEAT.md has active tasks listed...")
2. Model narrates decision logic ("Best judgment call: stay quiet")
3. Model produces empty output for silence, runner treats it as failure,
finalization retry generates "couldn't produce a final answer" which
gets delivered to the user
Changes:
- Add _is_deliverable() pre-filter in HeartbeatService._tick() that catches
finalization fallback messages and leaked reasoning patterns before they
reach the evaluator
- Wrap Phase 2 task input with a delivery-awareness preamble telling the
model its output goes directly to the user's messaging app
- Add meta-reasoning suppression criterion to evaluator template
No changes to agent/loop.py, runner.py, providers, or config schema.