nanobot

mirror of https://github.com/HKUDS/nanobot.git synced 2026-04-30 14:56:01 +00:00

Author	SHA1	Message	Date
Xubin Ren	c937c07178	fix: two bugs in document extraction pipeline Bug 1: _drain_pending did not call extract_documents on follow-up messages arriving mid-turn. Documents attached to queued messages were silently dropped because _build_user_content only handles images. Fix: call extract_documents before _build_user_content in _drain_pending. Bug 2: extract_documents read the entire file into memory (up to 50 MB) just to check 16 bytes of magic header for MIME detection. Fix: read only the first 16 bytes via open()+read(16) instead of Path.read_bytes(). Added regression tests for both bugs. Made-with: Cursor	2026-04-14 13:15:04 +00:00
Xubin Ren	47f5795708	refactor: move document extraction from ContextBuilder to API layer ContextBuilder._build_user_content now only handles images (its original responsibility). Document text extraction (PDF, DOCX, XLSX, PPTX) is performed by the new _extract_documents() helper in server.py, called before process_direct(). This keeps the core context builder free of format-specific dependencies and makes the API boundary the single place where uploaded files are pre-processed. Tests updated to reflect the new responsibility boundary. Made-with: Cursor	2026-04-14 13:00:59 +00:00
Xubin Ren	2502fc616b	Merge origin/main into feat/api-file-upload Keep the API file upload branch current with main, enforce the documented JSON base64 per-file limit, and avoid leaking document extraction error strings into user prompts. Made-with: Cursor	2026-04-14 12:29:43 +00:00
dengjingren	a068df5a79	feat(api): support file uploads via JSON base64 and multipart/form-data	2026-04-08 15:58:52 +08:00

4 Commits