cmd_restart only persisted channel + chat_id across the os.execv boundary, so
when the new process announced "Restart completed" the OutboundMessage had
no Slack thread_ts and the reply fell back to the channel root.
Serialize msg.metadata into NANOBOT_RESTART_NOTIFY_METADATA, restore it on the
RestartNotice, and forward it to OutboundMessage so the completion message
follows the same routing as the original /restart invocation.
Made-with: Cursor
The old prompt framed cron firing as a "task triggered" status report,
which led the agent to reply with things like "Done ✅ 已提醒
U0AV8BJPV8D 喝水" — exposing the user id and reading like a system log
instead of a friendly reminder. Reword it to instruct the agent to
speak directly to the user and forbid status-style language.
Made-with: Cursor
Without writing these fields into jobs.json, cron jobs created in a
Slack thread lost their thread_ts (and original session_key) after the
service was reloaded, so reminders fired into the channel root.
Made-with: Cursor
MCP resource/prompt/tool names containing spaces or special characters
(e.g. "PostgreSQL System Information") were forwarded verbatim to model
provider APIs, causing validation errors from both Anthropic and OpenAI
which require names matching ^[a-zA-Z0-9_-]{1,128}$.
Add _sanitize_name() that replaces invalid characters with underscores
and collapses consecutive underscores. Applied in MCPToolWrapper,
MCPResourceWrapper, MCPPromptWrapper constructors and the enabled_tools
filtering logic.
Closes#3468
Capture Slack thread metadata for cron and message-tool deliveries so replies stay in the originating thread, and hydrate first thread mentions with recent Slack context.
Made-with: Cursor
Include persisted turn timestamps when assembling LLM prompts so relative-date references like yesterday and today have concrete anchors.
Made-with: Cursor
The non-streaming parse path unconditionally promoted the `reasoning`
response field to `content` when content was empty. This was intended
for StepFun (whose API returns the actual answer in `reasoning`), but
it applied to every OpenAI-compatible provider — causing internal
thinking chains from models like Xiaomi MIMO to be leaked as formal
replies.
Add `reasoning_as_content: bool` to ProviderSpec (default False) and
set it only for StepFun. The fallback now requires this flag rather
than running globally.
Fixes#3443
Only mark message-tool deliveries for channel-session recording while cron jobs are running, avoiding duplicate session writes during normal user turns.
Made-with: Cursor
Route heartbeat, cron, and message-tool deliveries through one gateway helper so user-visible proactive messages are available when the channel replies.
Made-with: Cursor
When heartbeat delivers output to a channel (e.g. Telegram), the message
is a raw OutboundMessage that bypasses the channel's session. If the user
replies, their reply enters a different session with no context about the
heartbeat message, so the agent cannot follow through.
This change injects the delivered heartbeat message as an assistant turn
into the target channel's session before publishing the outbound. When
the user replies, the channel session has conversational context.
Handles unified_session mode by resolving to UNIFIED_SESSION_KEY when
enabled, matching the agent loop's own session routing.
No changes to agent/loop.py, session/manager.py, channels, providers,
or config schema — uses existing add_message() and save() APIs.
Parse the endpoint host before disabling keepalive so public hostnames that merely contain private-network substrings keep the default connection pool behavior.
Made-with: Cursor
Local model servers (Ollama, llama.cpp, vLLM) often close idle HTTP
connections before the client-side keepalive timer expires. When two
LLM calls happen seconds apart — for example the heartbeat _decide()
phase followed immediately by process_direct() — the second call grabs
a now-dead pooled connection, causing a transient APIConnectionError
on every first attempt.
The fix detects local endpoints via:
- ProviderSpec.is_local (Ollama, LM Studio, vLLM, OVMS)
- Private-network URL patterns (localhost, 127.x, 192.168.x, 10.x,
172.16-31.x, host.docker.internal, [::1])
For these endpoints, the AsyncOpenAI client is created with a custom
httpx.AsyncClient that sets keepalive_expiry=0, forcing a fresh TCP
connection for each request. This is cheap on LAN (sub-5ms connect)
and eliminates the stale-connection retry tax entirely.
Cloud providers (OpenAI, Anthropic, OpenRouter, etc.) keep the default
5-second keepalive, which is fine for high-frequency API usage.
The private-network heuristic also covers the common case where users
configure provider='openai' but point apiBase at a LAN IP running
llama.cpp — the spec says is_local=False, but the URL clearly is.
Align with deer-flow: group top-level messages (no root_id) now get
their own session keyed by message_id instead of sharing a single
group-wide session. Topic replies continue to share session via
root_id.
The stream-end reaction cleanup now reads from _reaction_ids instead
of metadata, so pre-populate the dict in the test instead of passing
reaction_id via metadata.
Align reply targeting with deer-flow: always reply to the inbound
message_id (not root_id). The Feishu Reply API keeps responses in
the same topic automatically when the target message is inside a topic.
Also fix run_in_executor calls that passed reply_in_thread as a
positional arg to a keyword-only parameter, and route standalone
tool hints through the reply API for group chats.
When reply_to_message config is enabled, the bot's first reply now
uses reply_in_thread=True to create a visual topic/thread in the
Feishu client. Subsequent chunks fall back to regular create.
The reply_to_message default remains False for backward compatibility.
Failed replies still fall back to regular send — messages are never
silently dropped.
Thread replies (messages with root_id != message_id) in group chats
now get their own session key: feishu:{chat_id}:{root_id}. This
means each Feishu thread has an independent conversation context.
Top-level group messages and all private chat messages keep the
default session key (no override), consistent with Telegram and
Slack channel behavior.
Co-authored-by: shenchengtsi <228445050+shenchengtsi@users.noreply.github.com>
Add signed media URLs to live WebSocket replies and teach the WebUI to classify and render video attachments, so bot-sent videos can play inline in both live chats and session history.
Made-with: Cursor
Telegram previously sent all video files as documents via send_document,
so users saw a file icon instead of an inline player. WebSocket only
accepted image MIME types, rejecting video uploads entirely.
Telegram:
- Recognize video extensions (mp4/mov/avi/mkv/webm/3gp) in _get_media_type
- Route videos through send_video with supports_streaming=True
- Add VIDEO/VIDEO_NOTE/ANIMATION to inbound message filters
- Add video MIME mappings to _get_extension
- Fix: local file sends now use _call_with_retry (previously no retry)
WebSocket:
- Expand upload MIME whitelist with video/mp4, video/webm, video/quicktime
- Add per-type size limits (_MAX_VIDEO_BYTES=20MB, _MAX_VIDEOS_PER_MESSAGE=1)
- Expand media serving endpoint to serve video with correct Content-Type
Agent:
- Add "video" to message tool media parameter description
- Add .mp4 example to identity.md system prompt
Made-with: Cursor
The existing test only verified the adaptive path. Add two more cases:
- enabled thinking (high): temperature must also be omitted
- no thinking (None): temperature must still be omitted
Made-with: Cursor
Two issues with DeepSeek V4 thinking mode support:
1. Missing thinking parameter injection.
DeepSeek V4 requires `extra_body: {"thinking": {"type": "enabled/disabled"}}`
— identical to VolcEngine/BytePlus. The code had this for volcengine,
byteplus, dashscope, minimax, and kimi but not DeepSeek. This means
`reasoning_effort=minimal` (thinking off) silently has no effect.
Root cause: the thinking-style→wire-format mapping was an if/elif chain
on provider *names*. DeepSeek was forgotten.
Fix: make the mapping declarative via `ProviderSpec.thinking_style`:
- "thinking_type" → {"thinking": {"type": "..."}} (DeepSeek, Volc, BytePlus)
- "enable_thinking" → {"enable_thinking": bool} (DashScope)
- "reasoning_split" → {"reasoning_split": bool} (MiniMax)
`_build_kwargs` now does a single dict lookup. Adding a new provider
with an existing wire format requires zero changes to the function.
2. Legacy session messages crash thinking-mode requests.
When a session was started without thinking mode (or with a different
model), assistant messages lack reasoning_content. DeepSeek V4 in
thinking mode rejects these with 400:
"The reasoning_content in the thinking mode must be passed back to the API."
This affects ALL assistant messages, not just those with tool_calls
(despite the docs only mentioning the tool_calls case).
Fix: `_build_kwargs` backfills `reasoning_content: ""` on every
assistant message missing it, but only when thinking mode is active.
This is semantically neutral — the model treats empty reasoning_content
as "no thinking happened on that turn". The backfill only touches the
in-memory request copy; session files on disk are untouched.
Tests: +5 (3 thinking toggle, 2 backfill). Full suite: 2377 passed.
Made-with: Cursor
#3412 stopped the headline raw_archive bloat but left four adjacent leaks
on the same pollution chain:
- archive() success path appended uncapped LLM summaries to history.jsonl,
so a misbehaving LLM could re-open the #3412 bug from the happy path.
- maybe_consolidate_by_tokens did not advance last_consolidated when
archive() fell back to raw_archive, causing duplicate [RAW] dumps of
the same chunk on every subsequent call.
- Dream's Phase 1/2 prompt injected MEMORY.md / SOUL.md / USER.md and
each history entry without caps, so any legacy oversized record (or an
unbounded user edit) would blow past the context window every dream.
- append_history itself had no default cap, leaving future new callers
one forgotten-cap-away from the same vector.
Changes:
- Cap LLM-produced summaries at 8K chars (_ARCHIVE_SUMMARY_MAX_CHARS)
before writing to history.jsonl.
- Advance session.last_consolidated after archive() regardless of whether
it summarized or raw-archived — both outcomes materialize the chunk;
still break the round loop on fallback so a degraded LLM isn't hammered.
- Truncate MEMORY.md / SOUL.md / USER.md and each history entry in Dream's
Phase 1 prompt preview (Phase 2 still reaches full files via read_file).
- Add _HISTORY_ENTRY_HARD_CAP (64K) as belt-and-suspenders default in
append_history with a once-per-store warning, so any new caller that
forgets its own tighter cap gets caught and observable.
Layer the caps by scope: raw_archive=16K, archive summary=8K,
append_history default=64K. Tight per-caller values cover expected
payloads; the wide default only catches regressions.
Tests: +9 regression tests covering each fix. Full suite: 2372 passed.
Made-with: Cursor
Cover two untested boundaries from #3412:
- _truncate_to_token_budget with positive budget exercises tiktoken
- _MAX_HISTORY_CHARS caps Recent History section in system prompt
Made-with: Cursor
Root cause: when consolidation LLM fails, raw_archive() dumped full message
content (~1MB) into history.jsonl with no size limit. Since build_system_prompt()
injects history.jsonl into every system prompt, all subsequent LLM calls exceeded
the 200K context window with error 1261.
Additionally, _cap_consolidation_boundary's 60-message cap caused consolidation
to get stuck on sessions with long tool chains (200+ iterations), triggering
the raw_archive fallback in the first place.
Three-layer fix:
- Remove _cap_consolidation_boundary: let pick_consolidation_boundary drive
chunk sizing based solely on token budget
- Truncate archive() input: use tiktoken to cap formatted text to the model's
input token budget before sending to consolidation LLM
- Truncate raw_archive() output: cap history.jsonl entries at 16K chars