mirror of
https://github.com/HKUDS/nanobot.git
synced 2026-05-21 17:12:32 +00:00
Move pypdf, python-docx, openpyxl, and python-pptx imports from module level into the _extract_pdf / _extract_docx / _extract_xlsx / _extract_pptx functions that actually use them. These four libraries became core dependencies in v0.1.5.post2 (~25 MB combined) and were paying the import cost on every nanobot startup even when no document parsing was needed for the session. The module-level SUPPORTED_EXTENSIONS set and the extract_text() dispatch stay as-is; the "[error: <lib> not installed]" branches move from the old module-level None sentinels into the corresponding extractor's try/except ImportError block. Behavior for the error message and for successful parses is identical. All 20 tests in tests/test_document_parsing.py pass unchanged. Fixes #3422