nanobot

mirror of https://github.com/HKUDS/nanobot.git synced 2026-04-26 12:55:58 +00:00

Author	SHA1	Message	Date
Xubin Ren	82b8a3af7e	fix(provider): handle incomplete DeepSeek reasoning history	2026-04-26 20:47:55 +08:00
Xubin Ren	3b82e14f85	fix(shell): preserve login PATH for path append Made-with: Cursor	2026-04-26 20:32:38 +08:00
yorkhellen	814345dd78	fix: update tests for path_append env dict change	2026-04-26 20:32:38 +08:00
yorkhellen	2f2ac96ac7	fix: update tests for path_append env dict change	2026-04-26 20:32:38 +08:00
yorkhellen	23dde7b84c	fix: prevent shell injection via path_append in ExecTool	2026-04-26 20:32:38 +08:00
Xubin Ren	727086ddac	test: tighten consolidation ratio coverage Made-with: Cursor	2026-04-26 20:24:42 +08:00
chengyongru	fca56d324a	test: add unit tests for configurable consolidation_ratio Cover ratio propagation, schema validation, and consolidation behavior with different ratio values (0.1, 0.5, 0.9).	2026-04-26 20:24:42 +08:00
Subal	80ee4483f8	feat: make consolidation ratio configurable	2026-04-26 20:24:42 +08:00
chengyongru	3de843a229	fix(provider): gate reasoning-to-content fallback behind spec flag The non-streaming parse path unconditionally promoted the `reasoning` response field to `content` when content was empty. This was intended for StepFun (whose API returns the actual answer in `reasoning`), but it applied to every OpenAI-compatible provider — causing internal thinking chains from models like Xiaomi MIMO to be leaked as formal replies. Add `reasoning_as_content: bool` to ProviderSpec (default False) and set it only for StepFun. The fallback now requires this flag rather than running globally. Fixes #3443	2026-04-26 20:11:08 +08:00
Xubin Ren	6036355ac5	fix(message): limit session recording to proactive sends Only mark message-tool deliveries for channel-session recording while cron jobs are running, avoiding duplicate session writes during normal user turns. Made-with: Cursor	2026-04-26 20:08:21 +08:00
Xubin Ren	799db33517	fix(heartbeat): record proactive deliveries in channel sessions Route heartbeat, cron, and message-tool deliveries through one gateway helper so user-visible proactive messages are available when the channel replies. Made-with: Cursor	2026-04-26 20:08:21 +08:00
hussein1362	1572626100	fix(heartbeat): inject delivered messages into channel session for reply continuity When heartbeat delivers output to a channel (e.g. Telegram), the message is a raw OutboundMessage that bypasses the channel's session. If the user replies, their reply enters a different session with no context about the heartbeat message, so the agent cannot follow through. This change injects the delivered heartbeat message as an assistant turn into the target channel's session before publishing the outbound. When the user replies, the channel session has conversational context. Handles unified_session mode by resolving to UNIFIED_SESSION_KEY when enabled, matching the agent loop's own session routing. No changes to agent/loop.py, session/manager.py, channels, providers, or config schema — uses existing add_message() and save() APIs.	2026-04-26 20:08:21 +08:00
Xubin Ren	1e11b35b45	fix(providers): tighten local endpoint detection Parse the endpoint host before disabling keepalive so public hostnames that merely contain private-network substrings keep the default connection pool behavior. Made-with: Cursor	2026-04-26 16:14:24 +08:00
hussein1362	5943ab386d	fix(providers): disable HTTP keepalive for local/LAN endpoints Local model servers (Ollama, llama.cpp, vLLM) often close idle HTTP connections before the client-side keepalive timer expires. When two LLM calls happen seconds apart — for example the heartbeat _decide() phase followed immediately by process_direct() — the second call grabs a now-dead pooled connection, causing a transient APIConnectionError on every first attempt. The fix detects local endpoints via: - ProviderSpec.is_local (Ollama, LM Studio, vLLM, OVMS) - Private-network URL patterns (localhost, 127.x, 192.168.x, 10.x, 172.16-31.x, host.docker.internal, [::1]) For these endpoints, the AsyncOpenAI client is created with a custom httpx.AsyncClient that sets keepalive_expiry=0, forcing a fresh TCP connection for each request. This is cheap on LAN (sub-5ms connect) and eliminates the stale-connection retry tax entirely. Cloud providers (OpenAI, Anthropic, OpenRouter, etc.) keep the default 5-second keepalive, which is fine for high-frequency API usage. The private-network heuristic also covers the common case where users configure provider='openai' but point apiBase at a LAN IP running llama.cpp — the spec says is_local=False, but the URL clearly is.	2026-04-26 16:14:24 +08:00
Xubin Ren	d0e1b1393a	fix(feishu): scope streaming buffers by message Keep concurrent Feishu group replies from sharing one streaming card buffer when sessions are split by topic or top-level message. Made-with: Cursor	2026-04-26 16:09:31 +08:00
chengyongru	39eea1b762	feat(feishu): per-message session for group top-level messages Align with deer-flow: group top-level messages (no root_id) now get their own session keyed by message_id instead of sharing a single group-wide session. Topic replies continue to share session via root_id.	2026-04-26 16:09:31 +08:00
chengyongru	0e92936cf3	chore(test): remove stale reaction_id from test metadata The production code no longer reads reaction_id from metadata, so remove the leftover key from the test_no_removal_when_message_id_missing test case.	2026-04-26 16:09:31 +08:00
chengyongru	3eb8838dd9	fix(test): update reaction cleanup test for _reaction_ids dict The stream-end reaction cleanup now reads from _reaction_ids instead of metadata, so pre-populate the dict in the test instead of passing reaction_id via metadata.	2026-04-26 16:09:31 +08:00
chengyongru	2a9fc9392b	fix(feishu): use message_id as reply target and fix keyword-only arg Align reply targeting with deer-flow: always reply to the inbound message_id (not root_id). The Feishu Reply API keeps responses in the same topic automatically when the target message is inside a topic. Also fix run_in_executor calls that passed reply_in_thread as a positional arg to a keyword-only parameter, and route standalone tool hints through the reply API for group chats.	2026-04-26 16:09:31 +08:00
chengyongru	8717832771	perf(feishu): make reaction non-blocking to speed up inbound dispatch Reaction emoji is now added as a fire-and-forget background task instead of blocking the inbound message pipeline. This removes one API round-trip from the critical path before the agent starts processing.	2026-04-26 16:09:31 +08:00
chengyongru	d36fba8bf5	feat(feishu): add reply_in_thread for visual topic grouping When reply_to_message config is enabled, the bot's first reply now uses reply_in_thread=True to create a visual topic/thread in the Feishu client. Subsequent chunks fall back to regular create. The reply_to_message default remains False for backward compatibility. Failed replies still fall back to regular send — messages are never silently dropped.	2026-04-26 16:09:31 +08:00
chengyongru	13bb31c789	feat(feishu): add thread-scoped session isolation for group chats Thread replies (messages with root_id != message_id) in group chats now get their own session key: feishu:{chat_id}:{root_id}. This means each Feishu thread has an independent conversation context. Top-level group messages and all private chat messages keep the default session key (no override), consistent with Telegram and Slack channel behavior. Co-authored-by: shenchengtsi <228445050+shenchengtsi@users.noreply.github.com>	2026-04-26 16:09:31 +08:00
T3chC0wb0y	fd3d7ea752	fix(msteams): normalize nbsp in inbound text	2026-04-26 00:56:06 +08:00
T3chC0wb0y	722d935d37	fix(msteams): prune bad notify refs	2026-04-26 00:56:06 +08:00
T3chC0wb0y	7e65884acb	fix(msteams): send threaded replies via replyToId	2026-04-26 00:56:06 +08:00
Xubin Ren	403ce23d22	fix(agent): tighten ask_user CLI handling Made-with: Cursor	2026-04-25 22:10:19 +08:00
Xubin Ren	3b1ea99ee1	fix(agent): render ask_user options without buttons Made-with: Cursor	2026-04-25 22:10:19 +08:00
Xubin Ren	cfc76ffbbf	feat(agent): add ask_user tool Made-with: Cursor	2026-04-25 22:10:19 +08:00
Xubin Ren	830211b5d4	docs: simplify macOS launchd setup Made-with: Cursor	2026-04-25 19:36:20 +08:00
Xubin Ren	8a4c338a01	docs: tighten macOS launchd setup Made-with: Cursor	2026-04-25 19:36:20 +08:00
choiking	41f7eae7b4	docs: add macOS launchd gateway setup	2026-04-25 19:36:20 +08:00
Xubin Ren	39a5a77874	fix(feishu): send videos with media message type	2026-04-24 20:00:56 +00:00
yorkhellen	076e4166d7	fix(agent): add LLM request timeout to prevent session lock starvation	2026-04-25 03:40:34 +08:00
Xubin Ren	e52fe2a8e2	feat(webui): render video media attachments Add signed media URLs to live WebSocket replies and teach the WebUI to classify and render video attachments, so bot-sent videos can play inline in both live chats and session history. Made-with: Cursor	2026-04-25 03:20:40 +08:00
Xubin Ren	be05189f39	feat(channels): add video support for Telegram and WebSocket Telegram previously sent all video files as documents via send_document, so users saw a file icon instead of an inline player. WebSocket only accepted image MIME types, rejecting video uploads entirely. Telegram: - Recognize video extensions (mp4/mov/avi/mkv/webm/3gp) in _get_media_type - Route videos through send_video with supports_streaming=True - Add VIDEO/VIDEO_NOTE/ANIMATION to inbound message filters - Add video MIME mappings to _get_extension - Fix: local file sends now use _call_with_retry (previously no retry) WebSocket: - Expand upload MIME whitelist with video/mp4, video/webm, video/quicktime - Add per-type size limits (_MAX_VIDEO_BYTES=20MB, _MAX_VIDEOS_PER_MESSAGE=1) - Expand media serving endpoint to serve video with correct Content-Type Agent: - Add "video" to message tool media parameter description - Add .mp4 example to identity.md system prompt Made-with: Cursor	2026-04-25 02:20:13 +08:00
Matt Van Horn	ee14e2df56	perf(document): lazy-import heavy document parsers Move pypdf, python-docx, openpyxl, and python-pptx imports from module level into the _extract_pdf / _extract_docx / _extract_xlsx / _extract_pptx functions that actually use them. These four libraries became core dependencies in v0.1.5.post2 (~25 MB combined) and were paying the import cost on every nanobot startup even when no document parsing was needed for the session. The module-level SUPPORTED_EXTENSIONS set and the extract_text() dispatch stay as-is; the "[error: <lib> not installed]" branches move from the old module-level None sentinels into the corresponding extractor's try/except ImportError block. Behavior for the error message and for successful parses is identical. All 20 tests in tests/test_document_parsing.py pass unchanged. Fixes #3422	2026-04-25 02:10:30 +08:00
Xubin Ren	3441d5f89c	test(anthropic): cover remaining opus-4-7 temperature branches The existing test only verified the adaptive path. Add two more cases: - enabled thinking (high): temperature must also be omitted - no thinking (None): temperature must still be omitted Made-with: Cursor	2026-04-24 15:33:59 +08:00
04cb	9239429a00	fix(anthropic): omit temperature for opus-4-7 (#3417 )	2026-04-24 15:33:59 +08:00
Xubin Ren	7f1913f619	fix(provider): add DeepSeek thinking toggle; backfill reasoning_content on legacy messages Two issues with DeepSeek V4 thinking mode support: 1. Missing thinking parameter injection. DeepSeek V4 requires `extra_body: {"thinking": {"type": "enabled/disabled"}}` — identical to VolcEngine/BytePlus. The code had this for volcengine, byteplus, dashscope, minimax, and kimi but not DeepSeek. This means `reasoning_effort=minimal` (thinking off) silently has no effect. Root cause: the thinking-style→wire-format mapping was an if/elif chain on provider names. DeepSeek was forgotten. Fix: make the mapping declarative via `ProviderSpec.thinking_style`: - "thinking_type" → {"thinking": {"type": "..."}} (DeepSeek, Volc, BytePlus) - "enable_thinking" → {"enable_thinking": bool} (DashScope) - "reasoning_split" → {"reasoning_split": bool} (MiniMax) `_build_kwargs` now does a single dict lookup. Adding a new provider with an existing wire format requires zero changes to the function. 2. Legacy session messages crash thinking-mode requests. When a session was started without thinking mode (or with a different model), assistant messages lack reasoning_content. DeepSeek V4 in thinking mode rejects these with 400: "The reasoning_content in the thinking mode must be passed back to the API." This affects ALL assistant messages, not just those with tool_calls (despite the docs only mentioning the tool_calls case). Fix: `_build_kwargs` backfills `reasoning_content: ""` on every assistant message missing it, but only when thinking mode is active. This is semantically neutral — the model treats empty reasoning_content as "no thinking happened on that turn". The backfill only touches the in-memory request copy; session files on disk are untouched. Tests: +5 (3 thinking toggle, 2 backfill). Full suite: 2377 passed. Made-with: Cursor	2026-04-24 15:06:39 +08:00
Xubin Ren	4531167c12	fix(agent): bound remaining memory/history pollution paths from #3412 #3412 stopped the headline raw_archive bloat but left four adjacent leaks on the same pollution chain: - archive() success path appended uncapped LLM summaries to history.jsonl, so a misbehaving LLM could re-open the #3412 bug from the happy path. - maybe_consolidate_by_tokens did not advance last_consolidated when archive() fell back to raw_archive, causing duplicate [RAW] dumps of the same chunk on every subsequent call. - Dream's Phase 1/2 prompt injected MEMORY.md / SOUL.md / USER.md and each history entry without caps, so any legacy oversized record (or an unbounded user edit) would blow past the context window every dream. - append_history itself had no default cap, leaving future new callers one forgotten-cap-away from the same vector. Changes: - Cap LLM-produced summaries at 8K chars (_ARCHIVE_SUMMARY_MAX_CHARS) before writing to history.jsonl. - Advance session.last_consolidated after archive() regardless of whether it summarized or raw-archived — both outcomes materialize the chunk; still break the round loop on fallback so a degraded LLM isn't hammered. - Truncate MEMORY.md / SOUL.md / USER.md and each history entry in Dream's Phase 1 prompt preview (Phase 2 still reaches full files via read_file). - Add _HISTORY_ENTRY_HARD_CAP (64K) as belt-and-suspenders default in append_history with a once-per-store warning, so any new caller that forgets its own tighter cap gets caught and observable. Layer the caps by scope: raw_archive=16K, archive summary=8K, append_history default=64K. Tight per-caller values cover expected payloads; the wide default only catches regressions. Tests: +9 regression tests covering each fix. Full suite: 2372 passed. Made-with: Cursor	2026-04-24 04:17:19 +08:00
Xubin Ren	81a5af2352	test(consolidation): add regression tests for tiktoken truncation path and history char cap Cover two untested boundaries from #3412: - _truncate_to_token_budget with positive budget exercises tiktoken - _MAX_HISTORY_CHARS caps Recent History section in system prompt Made-with: Cursor	2026-04-24 03:57:59 +08:00
chengyongru	4a1b9053ac	fix(agent): cap recent history section in system prompt Truncate the "Recent History" section injected by build_system_prompt() to 32K chars. Without this, many accumulated history.jsonl entries could still bloat the system prompt even with per-entry truncation in place.	2026-04-24 03:57:59 +08:00
chengyongru	2848f69897	fix(agent): prevent history.jsonl bloat from raw_archive and stuck consolidation Root cause: when consolidation LLM fails, raw_archive() dumped full message content (~1MB) into history.jsonl with no size limit. Since build_system_prompt() injects history.jsonl into every system prompt, all subsequent LLM calls exceeded the 200K context window with error 1261. Additionally, _cap_consolidation_boundary's 60-message cap caused consolidation to get stuck on sessions with long tool chains (200+ iterations), triggering the raw_archive fallback in the first place. Three-layer fix: - Remove _cap_consolidation_boundary: let pick_consolidation_boundary drive chunk sizing based solely on token budget - Truncate archive() input: use tiktoken to cap formatted text to the model's input token budget before sending to consolidation LLM - Truncate raw_archive() output: cap history.jsonl entries at 16K chars	2026-04-24 03:57:59 +08:00
Xubin Ren	52855d463e	refactor(agent): move progress event helpers out of loop Made-with: Cursor	2026-04-23 20:06:11 +08:00
Xubin Ren	469fc90fe6	fix(agent): on_progress tool_events only when callback accepts; align progress tests with main Made-with: Cursor	2026-04-23 20:06:11 +08:00
Pablo Cabeza	c23d719780	feat(agent): emit structured _tool_events progress metadata Extend the existing on_progress callback to carry structured tool-event payloads alongside the plain-text hint, so channels can render rich tool execution state (start/finish/error, arguments, results, file attachments) rather than only the pre-formatted hint string. Changes ------- - AgentLoop._tool_event_start_payload() — builds a version-1 start payload from a ToolCallRequest - AgentLoop._tool_event_result_extras() — extracts files/embeds from a tool result dict - AgentLoop._tool_event_finish_payloads() — maps tool_calls + tool_results + tool_events from AgentHookContext into finish payloads - _LoopHook.before_execute_tools() — passes tool_events=[...] to on_progress together with the existing tool_hint flag - _LoopHook.after_iteration() — emits a second on_progress call with the finish payloads once tool results are available - _bus_progress() — forwards tool_events as _tool_events in OutboundMessage metadata so channel implementations can read them - on_progress type widened to Callable[..., Awaitable[None]] on all public entry points; _cli_progress updated to accept and ignore tool_events The contract is additive: callers that only accept (content, *, tool_hint) continue to work unchanged. Callers that also accept tool_events receive the structured data. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 20:06:11 +08:00
Xubin Ren	185a8fd34d	fix(webui): opaque composer, equal-width message area, cleaner user pill	2026-04-23 07:48:32 +00:00
Xubin Ren	06503cd0fc	fix(telegram): keep callback_data under Telegram's 64-byte cap ``InlineKeyboardButton(label, callback_data=label)`` fails Telegram's API when the label exceeds 64 bytes UTF-8. An LLM-generated long option (realistic in multilingual flows) used to 400 the ``send_message`` call silently — user got nothing, agent heard a successful retry-then-drop. Decouple display from wire: button text keeps the full label, callback_data gets truncated at a UTF-8 char boundary. Tap echoes the prefix back as the user message; the LLM understands a prefix of its own option just fine, and the display the user saw was always the full string. Locks: helper boundary behavior (ASCII, CJK, short labels pass through) and end-to-end ``_build_keyboard`` integration with an over-cap label. Made-with: Cursor	2026-04-23 13:26:06 +08:00
Xubin Ren	6bc2983ab1	fix(telegram): fall back buttons to inline text when keyboard disabled Buttons are semantic options, not a separate channel protocol: a user who taps "Yes" and a user who types "yes" arrive at the agent as the same string. Dropping ``msg.buttons`` when ``inline_keyboards=False`` was the worst of both worlds — the agent got told "Message sent with N button(s)" while the user saw a question with no options. Splice the labels into the message text instead. The LLM produces the same ``message(buttons=...)`` call regardless of channel; the channel layer picks the richest rendering it can afford — native keyboard when enabled, bracketed inline text otherwise. Layout is preserved (one row per line). Other channels can adopt the same helper incrementally. Locks: canonical ``_buttons_as_text`` format, flag-off send-path splices labels, flag-on send-path keeps content clean and rides ``reply_markup``. Made-with: Cursor	2026-04-23 13:26:06 +08:00
Xubin Ren	b9b81d9301	test(telegram): pin inline-keyboards flag gate and buttons validation Two kill-switch tests for the new inline-keyboards path. Neither is flashy — they just make sure the next unrelated refactor can't quietly regress two narrow contracts the PR relies on. 1. TelegramChannel._build_keyboard returns None whenever TelegramConfig.inline_keyboards is False, even if buttons are supplied. The flag defaults off; if someone ever flips that default the change should fail this test before it reaches prod bots. 2. MessageTool rejects malformed `buttons` payloads (non-list, mixed list/str row, non-str label, None label) up front instead of letting them slip into the channel layer where Telegram would silently 400 the send. Parametrized over four shapes the guard needs to reject. No production code touched. Made-with: Cursor	2026-04-23 13:26:06 +08:00

1 2 3 4 5 ...

2157 Commits