- Replace ':' with '_' in store_name to avoid WinError 123
- Pass sanitized store_name via AsyncClientConfig
- Fixes issue #3506 where Matrix channel fails on Windows due to
colon in user_id causing invalid file paths in matrix-nio's DefaultStore
- Anthropic: "none" must not enable extended thinking
- Azure: "none" must not suppress temperature or inject reasoning body
- DeepSeek/DashScope/Kimi: "none" sends thinking disabled, skips reasoning_effort field
- Gemini: gemma keyword enables auto-routing for gemma models
- Do not send reasoning_effort="none" to APIs (prevents 400 on gemma/Gemini)
- Treat "none" as thinking disabled in thinking_style, Kimi, and reasoning_content backfill paths
- Fix Anthropic extended thinking not respecting "none"
- Fix Azure OpenAI temperature suppression and reasoning body for "none"
- Fix Codex reasoning body for "none"
- Add "gemma" keyword to Gemini ProviderSpec for correct auto routing
Adds Olostep (https://www.olostep.com) as an optional web_search backend
using the official olostep Python SDK (client.answers.create()).
Changes:
- pyproject.toml: adds olostep>=0.1.0 optional dependency
- schema.py: adds olostep to provider comment in WebSearchConfig
- web.py: adds _search_olostep() with lazy import and provider branching
- docs/configuration.md: documents Olostep setup under web search config
- tests: unit tests for the new provider
Backward compatible: existing users see no behavior change unless they
opt into provider: "olostep". No hard dependency at runtime path.
Co-authored-by: umerkay <umerkk164@gmail.com>
Stream-end events are emitted at the end of every assistant turn. When
the agent has more tool-call rounds queued, the runner sets
`_resuming=True` on the metadata. Without a guard, every intermediate
stream end removed the OnIt reaction (the first one wins, since
`_reaction_ids.pop` empties the slot) and re-added `done_emoji`,
producing a DONE reaction after every tool call instead of only at
final completion.
Wrap the OnIt removal and `done_emoji` add in a `not _resuming` guard
so the OnIt indicator persists across tool-call rounds and DONE fires
exactly once when the agent's final response lands.
`_resuming` already flows through outbound metadata
(`nanobot/agent/loop.py:747`) and survives `_coalesce_stream_deltas`
because pure `_stream_end` messages without `_stream_delta` skip the
merge branch.
Tests:
- test_no_removal_when_resuming
- test_done_emoji_only_on_final_stream_end
Add an `extra_body` field to `ProviderConfig` that merges arbitrary
key-value pairs into every OpenAI-compatible request body. This is the
escape hatch for provider-specific features that nanobot does not have
first-class fields for.
Real-world use cases this unblocks via config alone (no code changes):
- vLLM/TGI `chat_template_kwargs` (e.g. `enable_thinking: false`)
- vLLM guided decoding (`guided_json`, `guided_regex`)
- Local model sampling params (`repetition_penalty`, `top_k`, `min_p`)
- Any future provider-specific param without a new PR each time
The config extra_body is applied last via recursive deep-merge, so it
can extend or override provider-specific defaults (e.g. thinking
params) without clobbering sibling keys set by internal logic.
Changes:
- Add `extra_body: dict[str, Any] | None` to `ProviderConfig`
- Pass it through `factory.py` to `OpenAICompatProvider.__init__`
- Deep-merge into `_build_kwargs` after all internal extra_body entries
- Add `_deep_merge` helper (recursive dict merge, does not mutate inputs)
- 21 tests: deep-merge semantics, provider init, _build_kwargs
integration, thinking coexistence, real-world patterns (guided_json,
repetition_penalty), and schema validation
Treat workspace and safety guard failures as fatal regardless of whether they arrive from tool preparation, returned tool output, or raised exceptions.
Made-with: Cursor
The max_messages config field in AgentDefaults was accepted by the
schema but never threaded through to the actual get_history() calls
in the agent loop. Both call sites in _process_message hardcoded the
default, so sessions with slow or local models accumulated unbounded
history that inflated prompt tokens and caused LLM timeouts.
Changes:
- Add max_messages field to AgentDefaults (default 0 = use built-in
constant, any positive value caps history replay)
- Store the value on AgentLoop and pass it to get_history() when
non-zero
- Wire the config through all three AgentLoop construction sites in
commands.py (gateway, API server, CLI chat)
- 14 focused tests covering schema validation, init storage, history
slicing, boundary alignment, integration wiring, and the
zero/default path
Adds /history [n] to display the last N user/assistant messages from
the current session (default 10, max 50).
- Tool and system messages are filtered out for readability
- Long messages are truncated to 200 characters with an ellipsis
- Multimodal content (image blocks) is collapsed to its text parts
- Invalid count argument returns a usage hint
- /history n uses prefix routing; /history uses exact routing
Also registers /history in build_help_text().
Three failure modes addressed:
1. Model reflects HEARTBEAT.md instructions back as output instead of
executing them ("HEARTBEAT.md has active tasks listed...")
2. Model narrates decision logic ("Best judgment call: stay quiet")
3. Model produces empty output for silence, runner treats it as failure,
finalization retry generates "couldn't produce a final answer" which
gets delivered to the user
Changes:
- Add _is_deliverable() pre-filter in HeartbeatService._tick() that catches
finalization fallback messages and leaked reasoning patterns before they
reach the evaluator
- Wrap Phase 2 task input with a delivery-awareness preamble telling the
model its output goes directly to the user's messaging app
- Add meta-reasoning suppression criterion to evaluator template
No changes to agent/loop.py, runner.py, providers, or config schema.
Move sessionHistoryMaxMessages, sessionHistoryMaxTokens, and
sessionFileMaxMessages out of user-facing config into internal
constants (HISTORY_MAX_MESSAGES=120, FILE_MAX_MESSAGES=2000).
- Remove 3 fields from AgentDefaults and config pipeline
- Sink enforce_file_cap into Session (was AgentLoop)
- Auto-derive token budget from context window (was configurable)
- Net -113 lines across 7 files; 723 tests green
Made-with: Cursor
The PR stores ref freshness in the metadata sidecar, so the merged main test should assert updated_at there instead of in the refs payload.
Made-with: Cursor
Resolve the MSTeams stale-reference cleanup conflict by keeping the PR's locked, atomic sidecar-meta implementation and aligning the merged test expectation locally.
Made-with: Cursor
Past assistant turns in history were prefixed with "[Message Time: ...]"
just like user turns. The model treated these as in-context demos and
started prefixing its own replies with the same marker, leaking
metadata to the user. Prompt-level warnings could not beat dozens of
prior assistant samples.
Annotate only user turns and proactive deliveries
(_channel_delivery=True, i.e. cron / heartbeat pushes whose timing is
the whole point and which are too infrequent to act as demos). Adjacent
user-side timestamps still pin every normal assistant reply for
relative-time reasoning. The now-redundant identity.md warning is
removed along with the demonstration source.
Slack inbound events with subtype=file_share were silently dropped, so
nanobot never saw messages that included attachments. Allow file_share
through, download Slack-private files using the bot token into the
local media dir, and pass them to the agent as media paths plus a
"[file: name]" / "[image: name]" placeholder in the content. Reject
responses that look like Slack's login HTML so an auth page is never
saved as if it were the user's file. Document the required files:read
scope alongside files:write so installs that read attachments are not
quietly missing the permission.
Builds on PR #3463 (commit 038a140), which introduced metadata and
session_key parameters through _LoopHook and _set_tool_context for the
cron and message tools. Three downstream gaps remained:
1. _set_tool_context's body still computes effective_key from
channel:chat_id and passes that to spawn, even when the caller
provides a thread-scoped session_key. The new parameter is wired in
for cron/message but spawn dispatch ignores it. Result: subagent
announces from threaded callers carry a channel-only
session_key_override, dropping thread_ts.
2. _process_message's system-channel branch loads the session via
key = f"{channel}:{chat_id}", ignoring msg.session_key_override.
So even when the announce InboundMessage carries the right override
(after fix 1), the consumer side discards it and routes to the
channel-level session.
3. The OutboundMessage returned from the system-channel branch has no
metadata, so slack's outbound dispatcher has no thread_ts to use and
posts the LLM's reply to the channel top-level rather than the
originating thread.
This change closes all three gaps with three small edits in loop.py.
Behavior change:
- Slack channels with reply_in_thread: true: subagent announces and
follow-up replies now arrive in the originating thread session
instead of leaking into the channel-level session.
- Other channels constructing thread-scoped session keys (matrix
threads, telegram thread mode, etc.): the session-loading and
effective-key fixes apply identically since they're platform-agnostic.
The outbound thread_ts reconstruction is slack-specific by virtue of
the session-key format slack uses; other channels would benefit from
the same pattern but are out of scope for this PR.
- Unified session mode: no change. Falls back to UNIFIED_SESSION_KEY
when session_key is not provided.
- CLI / non-channel callers: no change. They don't pass session_key
and the fallback to f"{channel}:{chat_id}" matches prior behavior.
Reproducer (slack with reply_in_thread: true):
1. From a slack thread, send a message that triggers a subagent spawn.
2. Before fix: announce lands in slack:<channel>.jsonl session,
parent agent in the thread never sees the completion event,
eventual reply (if any) posts to the channel top-level, not the
thread.
3. After fix: announce lands in slack:<channel>:<thread_ts>.jsonl,
parent agent in the thread responds within seconds, reply posts in
the thread.