Truncate the "Recent History" section injected by build_system_prompt()
to 32K chars. Without this, many accumulated history.jsonl entries could
still bloat the system prompt even with per-entry truncation in place.
- Convert skills summary from verbose XML (4-5 lines/skill) to compact
markdown list (1 line/skill) with inline path for read_file lookup
- Exclude always-loaded skills (e.g. memory) from the skills index to
avoid duplicating content already in the Active Skills section
- Skip injecting the Memory section when MEMORY.md still matches the
bundled template (i.e. Dream hasn't populated it yet)
ContextBuilder._build_user_content now only handles images (its original
responsibility). Document text extraction (PDF, DOCX, XLSX, PPTX) is
performed by the new _extract_documents() helper in server.py, called
before process_direct(). This keeps the core context builder free of
format-specific dependencies and makes the API boundary the single place
where uploaded files are pre-processed.
Tests updated to reflect the new responsibility boundary.
Made-with: Cursor
Keep the API file upload branch current with main, enforce the documented JSON base64 per-file limit, and avoid leaking document extraction error strings into user prompts.
Made-with: Cursor
When a user is idle for longer than a configured TTL, nanobot **proactively** compresses the session context into a summary. This reduces token cost and first-token latency when the user returns — instead of re-processing a long stale context with an expired KV cache, the model receives a compact summary and fresh input.
Introduce a disabled_skills option in the config schema that allows
users to specify a list of skill names to be excluded. The setting is
threaded from config through Nanobot -> AgentLoop -> ContextBuilder ->
SkillsLoader. Disabled skills are filtered out from list_skills,
get_always_skills, and build_skills_summary. Four new test cases cover
the filtering behavior.
When the Consolidator compresses old session messages into history.jsonl,
those messages are immediately removed from the LLM's context. Dream
processes history.jsonl into long-term memory (memory.md) on a cron
schedule (default every 2h), creating a window where compressed content
is invisible to the LLM.
This change closes the gap by injecting unprocessed history entries
(history.jsonl entries not yet consumed by Dream) directly into the
system prompt as "# Recent History".
Key design notes:
- Uses read_unprocessed_history(since_cursor=last_dream_cursor) so only
entries not yet reflected in long-term memory are included, avoiding
duplication with memory.md
- No overlap with session messages: Consolidator advances
last_consolidated before returning, so archived messages are already
removed from get_history() output
- Token-safe: Consolidator's estimate_session_prompt_tokens calls
build_system_prompt via the same build_messages function, so the
injected entries are included in token budget calculations and will
trigger further consolidation if needed
Signed-off-by: Lingao Meng <menglingao@xiaomi.com>
- Added Jinja2 template support for various agent responses, including identity, skills, and memory consolidation.
- Introduced new templates for evaluating notifications, handling subagent announcements, and managing platform policies.
- Updated the agent context and memory modules to utilize the new templating system for improved readability and maintainability.
- Added a new dependency on Jinja2 in pyproject.toml.
Add agent-level timezone configuration with a UTC default, propagate it into runtime context and heartbeat prompts, and document valid IANA timezone usage in the README.
During testing, we discovered that when a user requests the agent to
send a file (e.g., "send me IMG_1115.png"), the agent would call
read_file to view the content and then reply with text claiming
"file sent" — but never actually deliver the file to the user.
Root cause: The system prompt stated "Reply directly with text for
conversations. Only use the 'message' tool to send to a specific
chat channel", which led the LLM to believe text replies were
sufficient for all responses, including file delivery.
Fix: Add an explicit IMPORTANT instruction in the system prompt
telling the LLM it MUST use the 'message' tool with the 'media'
parameter to send files, and that read_file only reads content
for its own analysis.
Co-Authored-By: qulllee <qullkui@tencent.com>
Keep multimodal tool outputs on the native content-block path while
restoring redirect SSRF checks for web_fetch image responses. Also share
image block construction, simplify persisted history sanitization, and
add regression tests for image reads and blocked private redirects.
Made-with: Cursor
Instead of adding a separate load_skill tool to bypass workspace restrictions,
extend ReadFileTool with extra_allowed_dirs so it can read builtin skill paths
while keeping write/edit tools locked to the workspace. Fixes the original issue
for both main agent and subagents.
Made-with: Cursor
When restrictToWorkspace is enabled, the agent cannot read builtin skill
files via read_file since they live outside the workspace. This adds a
dedicated load_skill tool that reads skills by name through the SkillsLoader,
which accesses files directly via Python without the workspace restriction.
- Add LoadSkillTool to filesystem tools
- Register it in the agent loop
- Update system prompt to instruct agent to use load_skill instead of read_file
- Remove raw filesystem paths from skills summary
Share assistant message construction between the main agent and subagents, and add a regression test to keep reasoning_content and thinking_blocks in follow-up tool rounds.
Feishu downloads images with incorrect extensions (e.g. .jpg for PNG files).
mimetypes.guess_type() relies on the file extension, causing a MIME mismatch
that Anthropic rejects with 'image was specified using image/jpeg but appears
to be image/png'.
Fix: read the first bytes of the image data and detect the real MIME type via
magic bytes (PNG: 0x89PNG, JPEG: 0xFFD8FF, GIF: GIF87a/GIF89a, WEBP: RIFF+WEBP).
Fall back to mimetypes.guess_type() only when magic bytes are inconclusive.
Some LLM providers (Minimax, Dashscope) strictly reject consecutive
messages with the same role. build_messages() was emitting two separate
user messages back-to-back: the runtime context and the actual user
content.
Merge them into a single user message, handling both plain text and
multimodal (image) content. Update _save_turn() to strip the runtime
context prefix from the merged message when persisting to session
history.
Fixes#1414Fixes#1344
- Remove trailing whitespace and normalize blank lines
- Unify string quotes and line breaks for long lines
- Sort imports alphabetically across modules
- Strip non-standard keys like 'reasoning_content' before sending to LLM
- Always include 'content' key in assistant messages (required by StepFun)
- Add _sanitize_messages to LiteLLMProvider to prevent 400 BadRequest errors