2157 Commits

Author SHA1 Message Date
Xubin Ren
82b8a3af7e fix(provider): handle incomplete DeepSeek reasoning history 2026-04-26 20:47:55 +08:00
Xubin Ren
3b82e14f85 fix(shell): preserve login PATH for path append
Made-with: Cursor
2026-04-26 20:32:38 +08:00
yorkhellen
814345dd78 fix: update tests for path_append env dict change 2026-04-26 20:32:38 +08:00
yorkhellen
2f2ac96ac7 fix: update tests for path_append env dict change 2026-04-26 20:32:38 +08:00
yorkhellen
23dde7b84c fix: prevent shell injection via path_append in ExecTool 2026-04-26 20:32:38 +08:00
Xubin Ren
727086ddac test: tighten consolidation ratio coverage
Made-with: Cursor
2026-04-26 20:24:42 +08:00
chengyongru
fca56d324a test: add unit tests for configurable consolidation_ratio
Cover ratio propagation, schema validation, and consolidation
behavior with different ratio values (0.1, 0.5, 0.9).
2026-04-26 20:24:42 +08:00
Subal
80ee4483f8 feat: make consolidation ratio configurable 2026-04-26 20:24:42 +08:00
chengyongru
3de843a229 fix(provider): gate reasoning-to-content fallback behind spec flag
The non-streaming parse path unconditionally promoted the `reasoning`
response field to `content` when content was empty. This was intended
for StepFun (whose API returns the actual answer in `reasoning`), but
it applied to every OpenAI-compatible provider — causing internal
thinking chains from models like Xiaomi MIMO to be leaked as formal
replies.

Add `reasoning_as_content: bool` to ProviderSpec (default False) and
set it only for StepFun. The fallback now requires this flag rather
than running globally.

Fixes #3443
2026-04-26 20:11:08 +08:00
Xubin Ren
6036355ac5 fix(message): limit session recording to proactive sends
Only mark message-tool deliveries for channel-session recording while cron jobs are running, avoiding duplicate session writes during normal user turns.

Made-with: Cursor
2026-04-26 20:08:21 +08:00
Xubin Ren
799db33517 fix(heartbeat): record proactive deliveries in channel sessions
Route heartbeat, cron, and message-tool deliveries through one gateway helper so user-visible proactive messages are available when the channel replies.

Made-with: Cursor
2026-04-26 20:08:21 +08:00
hussein1362
1572626100 fix(heartbeat): inject delivered messages into channel session for reply continuity
When heartbeat delivers output to a channel (e.g. Telegram), the message
is a raw OutboundMessage that bypasses the channel's session. If the user
replies, their reply enters a different session with no context about the
heartbeat message, so the agent cannot follow through.

This change injects the delivered heartbeat message as an assistant turn
into the target channel's session before publishing the outbound. When
the user replies, the channel session has conversational context.

Handles unified_session mode by resolving to UNIFIED_SESSION_KEY when
enabled, matching the agent loop's own session routing.

No changes to agent/loop.py, session/manager.py, channels, providers,
or config schema — uses existing add_message() and save() APIs.
2026-04-26 20:08:21 +08:00
Xubin Ren
1e11b35b45 fix(providers): tighten local endpoint detection
Parse the endpoint host before disabling keepalive so public hostnames that merely contain private-network substrings keep the default connection pool behavior.

Made-with: Cursor
2026-04-26 16:14:24 +08:00
hussein1362
5943ab386d fix(providers): disable HTTP keepalive for local/LAN endpoints
Local model servers (Ollama, llama.cpp, vLLM) often close idle HTTP
connections before the client-side keepalive timer expires.  When two
LLM calls happen seconds apart — for example the heartbeat _decide()
phase followed immediately by process_direct() — the second call grabs
a now-dead pooled connection, causing a transient APIConnectionError
on every first attempt.

The fix detects local endpoints via:
- ProviderSpec.is_local (Ollama, LM Studio, vLLM, OVMS)
- Private-network URL patterns (localhost, 127.x, 192.168.x, 10.x,
  172.16-31.x, host.docker.internal, [::1])

For these endpoints, the AsyncOpenAI client is created with a custom
httpx.AsyncClient that sets keepalive_expiry=0, forcing a fresh TCP
connection for each request.  This is cheap on LAN (sub-5ms connect)
and eliminates the stale-connection retry tax entirely.

Cloud providers (OpenAI, Anthropic, OpenRouter, etc.) keep the default
5-second keepalive, which is fine for high-frequency API usage.

The private-network heuristic also covers the common case where users
configure provider='openai' but point apiBase at a LAN IP running
llama.cpp — the spec says is_local=False, but the URL clearly is.
2026-04-26 16:14:24 +08:00
Xubin Ren
d0e1b1393a fix(feishu): scope streaming buffers by message
Keep concurrent Feishu group replies from sharing one streaming card buffer when sessions are split by topic or top-level message.

Made-with: Cursor
2026-04-26 16:09:31 +08:00
chengyongru
39eea1b762 feat(feishu): per-message session for group top-level messages
Align with deer-flow: group top-level messages (no root_id) now get
their own session keyed by message_id instead of sharing a single
group-wide session. Topic replies continue to share session via
root_id.
2026-04-26 16:09:31 +08:00
chengyongru
0e92936cf3 chore(test): remove stale reaction_id from test metadata
The production code no longer reads reaction_id from metadata, so
remove the leftover key from the test_no_removal_when_message_id_missing
test case.
2026-04-26 16:09:31 +08:00
chengyongru
3eb8838dd9 fix(test): update reaction cleanup test for _reaction_ids dict
The stream-end reaction cleanup now reads from _reaction_ids instead
of metadata, so pre-populate the dict in the test instead of passing
reaction_id via metadata.
2026-04-26 16:09:31 +08:00
chengyongru
2a9fc9392b fix(feishu): use message_id as reply target and fix keyword-only arg
Align reply targeting with deer-flow: always reply to the inbound
message_id (not root_id). The Feishu Reply API keeps responses in
the same topic automatically when the target message is inside a topic.

Also fix run_in_executor calls that passed reply_in_thread as a
positional arg to a keyword-only parameter, and route standalone
tool hints through the reply API for group chats.
2026-04-26 16:09:31 +08:00
chengyongru
8717832771 perf(feishu): make reaction non-blocking to speed up inbound dispatch
Reaction emoji is now added as a fire-and-forget background task
instead of blocking the inbound message pipeline. This removes
one API round-trip from the critical path before the agent starts
processing.
2026-04-26 16:09:31 +08:00
chengyongru
d36fba8bf5 feat(feishu): add reply_in_thread for visual topic grouping
When reply_to_message config is enabled, the bot's first reply now
uses reply_in_thread=True to create a visual topic/thread in the
Feishu client. Subsequent chunks fall back to regular create.

The reply_to_message default remains False for backward compatibility.
Failed replies still fall back to regular send — messages are never
silently dropped.
2026-04-26 16:09:31 +08:00
chengyongru
13bb31c789 feat(feishu): add thread-scoped session isolation for group chats
Thread replies (messages with root_id != message_id) in group chats
now get their own session key: feishu:{chat_id}:{root_id}. This
means each Feishu thread has an independent conversation context.

Top-level group messages and all private chat messages keep the
default session key (no override), consistent with Telegram and
Slack channel behavior.

Co-authored-by: shenchengtsi <228445050+shenchengtsi@users.noreply.github.com>
2026-04-26 16:09:31 +08:00
T3chC0wb0y
fd3d7ea752 fix(msteams): normalize nbsp in inbound text 2026-04-26 00:56:06 +08:00
T3chC0wb0y
722d935d37 fix(msteams): prune bad notify refs 2026-04-26 00:56:06 +08:00
T3chC0wb0y
7e65884acb fix(msteams): send threaded replies via replyToId 2026-04-26 00:56:06 +08:00
Xubin Ren
403ce23d22 fix(agent): tighten ask_user CLI handling
Made-with: Cursor
2026-04-25 22:10:19 +08:00
Xubin Ren
3b1ea99ee1 fix(agent): render ask_user options without buttons
Made-with: Cursor
2026-04-25 22:10:19 +08:00
Xubin Ren
cfc76ffbbf feat(agent): add ask_user tool
Made-with: Cursor
2026-04-25 22:10:19 +08:00
Xubin Ren
830211b5d4 docs: simplify macOS launchd setup
Made-with: Cursor
2026-04-25 19:36:20 +08:00
Xubin Ren
8a4c338a01 docs: tighten macOS launchd setup
Made-with: Cursor
2026-04-25 19:36:20 +08:00
choiking
41f7eae7b4 docs: add macOS launchd gateway setup 2026-04-25 19:36:20 +08:00
Xubin Ren
39a5a77874 fix(feishu): send videos with media message type 2026-04-24 20:00:56 +00:00
yorkhellen
076e4166d7 fix(agent): add LLM request timeout to prevent session lock starvation 2026-04-25 03:40:34 +08:00
Xubin Ren
e52fe2a8e2 feat(webui): render video media attachments
Add signed media URLs to live WebSocket replies and teach the WebUI to classify and render video attachments, so bot-sent videos can play inline in both live chats and session history.

Made-with: Cursor
2026-04-25 03:20:40 +08:00
Xubin Ren
be05189f39 feat(channels): add video support for Telegram and WebSocket
Telegram previously sent all video files as documents via send_document,
so users saw a file icon instead of an inline player. WebSocket only
accepted image MIME types, rejecting video uploads entirely.

Telegram:
- Recognize video extensions (mp4/mov/avi/mkv/webm/3gp) in _get_media_type
- Route videos through send_video with supports_streaming=True
- Add VIDEO/VIDEO_NOTE/ANIMATION to inbound message filters
- Add video MIME mappings to _get_extension
- Fix: local file sends now use _call_with_retry (previously no retry)

WebSocket:
- Expand upload MIME whitelist with video/mp4, video/webm, video/quicktime
- Add per-type size limits (_MAX_VIDEO_BYTES=20MB, _MAX_VIDEOS_PER_MESSAGE=1)
- Expand media serving endpoint to serve video with correct Content-Type

Agent:
- Add "video" to message tool media parameter description
- Add .mp4 example to identity.md system prompt

Made-with: Cursor
2026-04-25 02:20:13 +08:00
Matt Van Horn
ee14e2df56 perf(document): lazy-import heavy document parsers
Move pypdf, python-docx, openpyxl, and python-pptx imports from module
level into the _extract_pdf / _extract_docx / _extract_xlsx /
_extract_pptx functions that actually use them. These four libraries
became core dependencies in v0.1.5.post2 (~25 MB combined) and were
paying the import cost on every nanobot startup even when no document
parsing was needed for the session.

The module-level SUPPORTED_EXTENSIONS set and the extract_text()
dispatch stay as-is; the "[error: <lib> not installed]" branches move
from the old module-level None sentinels into the corresponding
extractor's try/except ImportError block. Behavior for the error
message and for successful parses is identical.

All 20 tests in tests/test_document_parsing.py pass unchanged.

Fixes #3422
2026-04-25 02:10:30 +08:00
Xubin Ren
3441d5f89c test(anthropic): cover remaining opus-4-7 temperature branches
The existing test only verified the adaptive path. Add two more cases:
- enabled thinking (high): temperature must also be omitted
- no thinking (None): temperature must still be omitted

Made-with: Cursor
2026-04-24 15:33:59 +08:00
04cb
9239429a00 fix(anthropic): omit temperature for opus-4-7 (#3417) 2026-04-24 15:33:59 +08:00
Xubin Ren
7f1913f619 fix(provider): add DeepSeek thinking toggle; backfill reasoning_content on legacy messages
Two issues with DeepSeek V4 thinking mode support:

1. Missing thinking parameter injection.
   DeepSeek V4 requires `extra_body: {"thinking": {"type": "enabled/disabled"}}`
   — identical to VolcEngine/BytePlus. The code had this for volcengine,
   byteplus, dashscope, minimax, and kimi but not DeepSeek. This means
   `reasoning_effort=minimal` (thinking off) silently has no effect.

   Root cause: the thinking-style→wire-format mapping was an if/elif chain
   on provider *names*. DeepSeek was forgotten.

   Fix: make the mapping declarative via `ProviderSpec.thinking_style`:
   - "thinking_type" → {"thinking": {"type": "..."}} (DeepSeek, Volc, BytePlus)
   - "enable_thinking" → {"enable_thinking": bool} (DashScope)
   - "reasoning_split" → {"reasoning_split": bool} (MiniMax)
   `_build_kwargs` now does a single dict lookup. Adding a new provider
   with an existing wire format requires zero changes to the function.

2. Legacy session messages crash thinking-mode requests.
   When a session was started without thinking mode (or with a different
   model), assistant messages lack reasoning_content. DeepSeek V4 in
   thinking mode rejects these with 400:
   "The reasoning_content in the thinking mode must be passed back to the API."
   This affects ALL assistant messages, not just those with tool_calls
   (despite the docs only mentioning the tool_calls case).

   Fix: `_build_kwargs` backfills `reasoning_content: ""` on every
   assistant message missing it, but only when thinking mode is active.
   This is semantically neutral — the model treats empty reasoning_content
   as "no thinking happened on that turn". The backfill only touches the
   in-memory request copy; session files on disk are untouched.

Tests: +5 (3 thinking toggle, 2 backfill). Full suite: 2377 passed.
Made-with: Cursor
2026-04-24 15:06:39 +08:00
Xubin Ren
4531167c12 fix(agent): bound remaining memory/history pollution paths from #3412
#3412 stopped the headline raw_archive bloat but left four adjacent leaks
on the same pollution chain:

- archive() success path appended uncapped LLM summaries to history.jsonl,
  so a misbehaving LLM could re-open the #3412 bug from the happy path.
- maybe_consolidate_by_tokens did not advance last_consolidated when
  archive() fell back to raw_archive, causing duplicate [RAW] dumps of
  the same chunk on every subsequent call.
- Dream's Phase 1/2 prompt injected MEMORY.md / SOUL.md / USER.md and
  each history entry without caps, so any legacy oversized record (or an
  unbounded user edit) would blow past the context window every dream.
- append_history itself had no default cap, leaving future new callers
  one forgotten-cap-away from the same vector.

Changes:

- Cap LLM-produced summaries at 8K chars (_ARCHIVE_SUMMARY_MAX_CHARS)
  before writing to history.jsonl.
- Advance session.last_consolidated after archive() regardless of whether
  it summarized or raw-archived — both outcomes materialize the chunk;
  still break the round loop on fallback so a degraded LLM isn't hammered.
- Truncate MEMORY.md / SOUL.md / USER.md and each history entry in Dream's
  Phase 1 prompt preview (Phase 2 still reaches full files via read_file).
- Add _HISTORY_ENTRY_HARD_CAP (64K) as belt-and-suspenders default in
  append_history with a once-per-store warning, so any new caller that
  forgets its own tighter cap gets caught and observable.

Layer the caps by scope: raw_archive=16K, archive summary=8K,
append_history default=64K. Tight per-caller values cover expected
payloads; the wide default only catches regressions.

Tests: +9 regression tests covering each fix. Full suite: 2372 passed.
Made-with: Cursor
2026-04-24 04:17:19 +08:00
Xubin Ren
81a5af2352 test(consolidation): add regression tests for tiktoken truncation path and history char cap
Cover two untested boundaries from #3412:
- _truncate_to_token_budget with positive budget exercises tiktoken
- _MAX_HISTORY_CHARS caps Recent History section in system prompt

Made-with: Cursor
2026-04-24 03:57:59 +08:00
chengyongru
4a1b9053ac fix(agent): cap recent history section in system prompt
Truncate the "Recent History" section injected by build_system_prompt()
to 32K chars. Without this, many accumulated history.jsonl entries could
still bloat the system prompt even with per-entry truncation in place.
2026-04-24 03:57:59 +08:00
chengyongru
2848f69897 fix(agent): prevent history.jsonl bloat from raw_archive and stuck consolidation
Root cause: when consolidation LLM fails, raw_archive() dumped full message
content (~1MB) into history.jsonl with no size limit. Since build_system_prompt()
injects history.jsonl into every system prompt, all subsequent LLM calls exceeded
the 200K context window with error 1261.

Additionally, _cap_consolidation_boundary's 60-message cap caused consolidation
to get stuck on sessions with long tool chains (200+ iterations), triggering
the raw_archive fallback in the first place.

Three-layer fix:
- Remove _cap_consolidation_boundary: let pick_consolidation_boundary drive
  chunk sizing based solely on token budget
- Truncate archive() input: use tiktoken to cap formatted text to the model's
  input token budget before sending to consolidation LLM
- Truncate raw_archive() output: cap history.jsonl entries at 16K chars
2026-04-24 03:57:59 +08:00
Xubin Ren
52855d463e refactor(agent): move progress event helpers out of loop
Made-with: Cursor
2026-04-23 20:06:11 +08:00
Xubin Ren
469fc90fe6 fix(agent): on_progress tool_events only when callback accepts; align progress tests with main
Made-with: Cursor
2026-04-23 20:06:11 +08:00
Pablo Cabeza
c23d719780 feat(agent): emit structured _tool_events progress metadata
Extend the existing on_progress callback to carry structured tool-event
payloads alongside the plain-text hint, so channels can render rich
tool execution state (start/finish/error, arguments, results, file
attachments) rather than only the pre-formatted hint string.

Changes
-------
- AgentLoop._tool_event_start_payload() — builds a version-1 start
  payload from a ToolCallRequest
- AgentLoop._tool_event_result_extras() — extracts files/embeds from a
  tool result dict
- AgentLoop._tool_event_finish_payloads() — maps tool_calls +
  tool_results + tool_events from AgentHookContext into finish payloads
- _LoopHook.before_execute_tools() — passes tool_events=[...] to
  on_progress together with the existing tool_hint flag
- _LoopHook.after_iteration() — emits a second on_progress call with
  the finish payloads once tool results are available
- _bus_progress() — forwards tool_events as _tool_events in OutboundMessage
  metadata so channel implementations can read them
- on_progress type widened to Callable[..., Awaitable[None]] on all
  public entry points; _cli_progress updated to accept and ignore
  tool_events

The contract is additive: callers that only accept (content, *, tool_hint)
continue to work unchanged. Callers that also accept tool_events receive
the structured data.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 20:06:11 +08:00
Xubin Ren
185a8fd34d fix(webui): opaque composer, equal-width message area, cleaner user pill 2026-04-23 07:48:32 +00:00
Xubin Ren
06503cd0fc fix(telegram): keep callback_data under Telegram's 64-byte cap
``InlineKeyboardButton(label, callback_data=label)`` fails Telegram's
API when the label exceeds 64 bytes UTF-8. An LLM-generated long
option (realistic in multilingual flows) used to 400 the ``send_message``
call silently — user got nothing, agent heard a successful retry-then-drop.

Decouple display from wire: button text keeps the full label, callback_data
gets truncated at a UTF-8 char boundary. Tap echoes the prefix back as the
user message; the LLM understands a prefix of its own option just fine,
and the display the user saw was always the full string.

Locks: helper boundary behavior (ASCII, CJK, short labels pass through)
and end-to-end ``_build_keyboard`` integration with an over-cap label.

Made-with: Cursor
2026-04-23 13:26:06 +08:00
Xubin Ren
6bc2983ab1 fix(telegram): fall back buttons to inline text when keyboard disabled
Buttons are semantic options, not a separate channel protocol: a user
who taps "Yes" and a user who types "yes" arrive at the agent as the
same string. Dropping ``msg.buttons`` when ``inline_keyboards=False``
was the worst of both worlds — the agent got told "Message sent with
N button(s)" while the user saw a question with no options.

Splice the labels into the message text instead. The LLM produces the
same ``message(buttons=...)`` call regardless of channel; the channel
layer picks the richest rendering it can afford — native keyboard when
enabled, bracketed inline text otherwise. Layout is preserved (one row
per line). Other channels can adopt the same helper incrementally.

Locks: canonical ``_buttons_as_text`` format, flag-off send-path
splices labels, flag-on send-path keeps content clean and rides
``reply_markup``.

Made-with: Cursor
2026-04-23 13:26:06 +08:00
Xubin Ren
b9b81d9301 test(telegram): pin inline-keyboards flag gate and buttons validation
Two kill-switch tests for the new inline-keyboards path. Neither is
flashy — they just make sure the next unrelated refactor can't quietly
regress two narrow contracts the PR relies on.

  1. TelegramChannel._build_keyboard returns None whenever
     TelegramConfig.inline_keyboards is False, even if buttons are
     supplied. The flag defaults off; if someone ever flips that default
     the change should fail this test before it reaches prod bots.

  2. MessageTool rejects malformed `buttons` payloads (non-list, mixed
     list/str row, non-str label, None label) up front instead of
     letting them slip into the channel layer where Telegram would
     silently 400 the send. Parametrized over four shapes the guard
     needs to reject.

No production code touched.

Made-with: Cursor
2026-04-23 13:26:06 +08:00