nanobot

mirror of https://github.com/HKUDS/nanobot.git synced 2026-04-30 14:56:01 +00:00

Author	SHA1	Message	Date
Xubin Ren	c937c07178	fix: two bugs in document extraction pipeline Bug 1: _drain_pending did not call extract_documents on follow-up messages arriving mid-turn. Documents attached to queued messages were silently dropped because _build_user_content only handles images. Fix: call extract_documents before _build_user_content in _drain_pending. Bug 2: extract_documents read the entire file into memory (up to 50 MB) just to check 16 bytes of magic header for MIME detection. Fix: read only the first 16 bytes via open()+read(16) instead of Path.read_bytes(). Added regression tests for both bugs. Made-with: Cursor	2026-04-14 13:15:04 +00:00
Xubin Ren	92d6fca323	refactor: centralize document extraction in AgentLoop._process_message Move extract_documents() to nanobot.utils.document as a reusable helper and call it once in AgentLoop._process_message, the single entry point for all message processing (API + all channels). This replaces the previous API-only _extract_documents() in server.py, ensuring Telegram, Feishu, Slack, WeChat, and all other channels also benefit from automatic document text extraction. Adds a configurable max_file_size guard (default 50 MB) to skip oversized files gracefully, preventing unbounded memory/CPU usage from channel-downloaded attachments. - server.py: removed _extract_documents and related imports - document.py: added extract_documents() with size limit - loop.py: calls extract_documents() at the top of _process_message - Tests updated: 70 related tests pass Made-with: Cursor	2026-04-14 13:10:03 +00:00
Xubin Ren	47f5795708	refactor: move document extraction from ContextBuilder to API layer ContextBuilder._build_user_content now only handles images (its original responsibility). Document text extraction (PDF, DOCX, XLSX, PPTX) is performed by the new _extract_documents() helper in server.py, called before process_direct(). This keeps the core context builder free of format-specific dependencies and makes the API boundary the single place where uploaded files are pre-processed. Tests updated to reflect the new responsibility boundary. Made-with: Cursor	2026-04-14 13:00:59 +00:00
Xubin Ren	2502fc616b	Merge origin/main into feat/api-file-upload Keep the API file upload branch current with main, enforce the documented JSON base64 per-file limit, and avoid leaking document extraction error strings into user prompts. Made-with: Cursor	2026-04-14 12:29:43 +00:00
dengjingren	a068df5a79	feat(api): support file uploads via JSON base64 and multipart/form-data	2026-04-08 15:58:52 +08:00

5 Commits