- Use asyncio.to_thread for file I/O to avoid blocking event loop
- Add 200MB upload size limit with early rejection
- Fix file handle leak by using context manager
- Use memoryview for upload chunking to reduce peak memory
- Add inbound download size check to prevent OOM
- Use asyncio.to_thread for write_bytes in download path
- Extract inline media_type detection to _guess_wecom_media_type()
- Use asyncio.to_thread for file I/O to avoid blocking event loop
- Add 200MB upload size limit with early rejection
- Fix file handle leak by using context manager
- Free raw bytes early after chunking to reduce memory pressure
- Add file attachments to media_paths (was text-only, inconsistent with image)
- Use robust _sanitize_filename() instead of os.path.basename() for path safety
- Remove re-raise in send() for consistency with QQ channel
- Fix truncated media_id logging for short IDs
QQ channel improvements (on top of nightly):
- Add top-level try/except in _on_message and send() for resilience
- Use defensive getattr() for attachment attributes (botpy version compat)
- Skip file_name for image uploads to avoid QQ rendering as file attachment
- Extract only file_info from upload response to avoid extra fields
- Handle protocol-relative URLs (//...) in attachment downloads
WeCom channel improvements:
- Add _upload_media_ws() for WebSocket 3-step media upload protocol
- Send media files (image/video/voice/file) via WeCom rich media API
- Support progress messages (plain reply) vs final response (streaming)
- Support proactive send when no frame available (cron push)
- Pass media_paths to message bus for downstream processing
* feat(agent): add mid-turn message injection for responsive follow-ups
Allow user messages sent during an active agent turn to be injected
into the running LLM context instead of being queued behind a
per-session lock. Inspired by Claude Code's mid-turn queue drain
mechanism (query.ts:1547-1643).
Key design decisions:
- Messages are injected as natural user messages between iterations,
no tool cancellation or special system prompt needed
- Two drain checkpoints: after tool execution and after final LLM
response ("last-mile" to prevent dropping late arrivals)
- Bounded by MAX_INJECTION_CYCLES (5) to prevent consuming the
iteration budget on rapid follow-ups
- had_injections flag bypasses _sent_in_turn suppression so follow-up
responses are always delivered
Closes#1609
* fix(agent): harden mid-turn injection with streaming fix, bounded queue, and message safety
- Fix streaming protocol violation: Checkpoint 2 now checks for injections
BEFORE calling on_stream_end, passing resuming=True when injections found
so streaming channels (Feishu) don't prematurely finalize the card
- Bound pending queue to maxsize=20 with QueueFull handling
- Add warning log when injection batch exceeds _MAX_INJECTIONS_PER_TURN
- Re-publish leftover queue messages to bus in _dispatch finally block to
prevent silent message loss on early exit (max_iterations, tool_error, cancel)
- Fix PEP 8 blank line before dataclass and logger.info indentation
- Add 12 new tests covering drain, checkpoints, cycle cap, queue routing,
cleanup, and leftover re-publish
- Add explicit error logging for missing file_key and message_id
- Add logging for download failures
- Change audio extension from .opus to .ogg for better Whisper compatibility
- Feishu voice messages are opus in OGG container; .ogg is more widely recognized
Rename the auto compact module to autocompact.py for a cleaner path while keeping the AutoCompact type and behavior unchanged. Update the agent loop import to match.
Prefer the more user-friendly idleCompactAfterMinutes name for auto compact while keeping sessionTtlMinutes as a backward-compatible alias. Update tests and README to document the retained recent-context behavior and the new preferred key.
Keep a legal recent suffix in idle auto-compacted sessions so resumed chats preserve their freshest live context while older messages are summarized. Recover persisted summaries even when retained messages remain, and document the new behavior.
Make Consolidator.archive() return the summary string directly instead
of writing to history.jsonl then reading back via get_last_history_entry().
This eliminates a race condition where concurrent _archive calls for
different sessions could read each other's summaries from the shared
history file (cross-user context leak in multi-user deployments).
Also removes Consolidator.get_last_history_entry() — no longer needed.
history.jsonl may contain non-UTF-8 bytes (e.g. from email channel
binary content), causing auto compact to fail when reading the last
entry for summary generation. Catch UnicodeDecodeError alongside
FileNotFoundError and JSONDecodeError.
When a user is idle for longer than a configured TTL, nanobot **proactively** compresses the session context into a summary. This reduces token cost and first-token latency when the user returns — instead of re-processing a long stale context with an expired KV cache, the model receives a compact summary and fresh input.
When on_job callbacks call list_jobs() (which triggers _load_store),
the in-memory state is reloaded from disk, discarding the next_run_at_ms
updates that _on_timer is actively computing. This causes jobs to
re-trigger indefinitely on the next tick.
Add an _executing flag around the job execution loop. While set,
_load_store returns the cached store instead of reloading from disk.
Includes regression test.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each MCP server now connects in its own asyncio.Task to isolate anyio
cancel scopes and prevent 'exit cancel scope in different task' errors
when multiple servers (especially mixed transport types) are configured.
Changes:
- connect_mcp_servers() returns dict[str, AsyncExitStack] instead of None
- Each server runs in separate task via asyncio.gather()
- AgentLoop uses _mcp_stacks dict to track per-server stacks
- Tests updated to handle new API
Rename the boolean flag in _sanitize_persisted_blocks and alias the imported helper so session persistence cannot crash with TypeError when truncation is enabled.
Keep tool-call assistant messages valid across provider sanitization and avoid trailing user-only history after model errors. This prevents follow-up requests from sending broken tool chains back to the gateway.
- Wrap both token estimation calls in try/except to prevent silent failures
from crashing the consolidation cycle
- Add _MAX_CHUNK_MESSAGES = 60 to cap messages per consolidation round,
avoiding oversized chunks being sent to the consolidation LLM
- Improve idle log to include unconsolidated message count for easier debugging
These are purely defensive improvements with no behaviour change for
normal sessions.
- Adjusted message handling in AgentRunner to ensure that historical messages remain unchanged during context governance.
- Introduced tests to verify that backfill operations do not alter the saved message boundary, maintaining the integrity of the conversation history.
- Make tool_hint_prefix configurable in FeishuConfig (default: 🔧)
- Delegate tool hint card updates from send() to send_delta() so hints
automatically benefit from _STREAM_EDIT_INTERVAL throttling
- Fix staticmethod calls to use self.__class__ instead of self
- Document all supported metadata keys in send_delta docstring
- Add test for empty/whitespace-only tool hint with active stream buffer
Two display fixes based on real-world Feishu testing:
1. tool_hints.py: format_tool_hints now deduplicates by comparing the
fully formatted hint string instead of tool name alone. This fixes
`ls /Desktop` and `ls /Downloads` being incorrectly merged as
`ls /Desktop × 2`. Truly identical calls still fold correctly.
(_group_consecutive and all abbreviation logic preserved unchanged.)
2. feishu.py: inline tool hints now display one tool per line with
🔧 prefix, and use double-newline trailing to prevent Setext heading
rendering when followed by markdown `---`.
Made-with: Cursor
Tool hints should be kept as permanent content in the streaming card
so users can see which tools were called (matching the standalone card
behavior). Previously, hints were stripped when new deltas arrived or
when the stream ended, causing tool call information to disappear.
Now:
- New delta: hint becomes permanent content, delta appends after it
- New tool hint: replaces the previous hint (unchanged)
- Resuming/stream_end: hint is preserved in the final text
Updated 3 tests to verify hint preservation semantics.
Made-with: Cursor
Three fixes for inline tool hints:
1. Consecutive tool hints now replace the previous one instead of
stacking — the old suffix is stripped before appending the new one.
2. When _resuming flushes the buffer, any trailing tool hint suffix
is removed so it doesn't persist into the next streaming segment.
3. When final _stream_end closes the card, tool hint suffix is
cleaned from the text before the final card update.
Adds 3 regression tests covering all three scenarios.
Made-with: Cursor
Two improvements to Feishu streaming card experience:
1. Handle _resuming in send_delta: when a mid-turn _stream_end arrives
with resuming=True (tool call between segments), flush current text
to the card but keep the buffer alive so subsequent segments append
to the same card instead of creating a new one.
2. Inline tool hints into streaming cards: when a tool hint arrives
while a streaming card is active, append it to the card content
(e.g. "🔧 web_fetch(...)") instead of sending a separate card.
The hint is automatically stripped when the next delta arrives.
Made-with: Cursor
- Merged latest main (no conflicts)
- Added warning log when only one of proxy_username/proxy_password is set
- Added test_start_no_proxy_auth_when_only_password for coverage parity
Made-with: Cursor
- Add proxy, proxy_username, proxy_password fields to DiscordConfig
- Pass proxy and proxy_auth to discord.Client
- Add aiohttp.BasicAuth when credentials are provided
- Add tests for proxy configuration scenarios
Add allowed_env_keys config field to selectively forward host environment variables (e.g. GOPATH, JAVA_HOME) into the sandboxed subprocess environment, while keeping the default allow-list unchanged.
- Merged latest main (no conflicts)
- Added test_llm_error_not_appended_to_session_messages: verifies error
content stays out of session messages
- Added test_streamed_flag_not_set_on_llm_error: verifies _streamed is
not set when LLM returns an error, so ChannelManager delivers it
Made-with: Cursor
When the LLM returns an error (e.g. 429 quota exceeded, stream timeout),
streaming channels silently drop the error message because `_streamed=True`
is set in metadata even though no content was actually streamed.
This change:
- Skips setting `_streamed` when stop_reason is "error", so error messages
go through the normal channel.send() path and reach the user
- Stops appending error content to session history, preventing error
messages from polluting subsequent conversation context
- Exposes stop_reason from _run_agent_loop to enable the above check