nanobot

mirror/nanobot

Fork 0

mirror of https://github.com/HKUDS/nanobot.git synced 2026-05-21 09:02:32 +00:00

Commit Graph

Author	SHA1	Message	Date
chengyongru	c053f9eba8	fix(multimodal): image OOM guard, Feishu post media extraction, vision fallback - Add file size pre-check via stat() before read_bytes() to prevent OOM on oversized images/audio/video - Fix _extract_post_content to extract media tags (file_key) from Feishu post messages so videos are no longer silently dropped - Add supports_vision=False guard to downgrade images to text placeholders - Add video_mime_compat() for video format validation - Use full file path in content_text so model read_file works if needed - Pass input_limits to AgentLoop in nanobot.py facade - Deduplicate _MEDIA_PLACEHOLDER_TYPES from LLMProvider constant - Remove unused _extract_post_text legacy wrapper - Add 14 new tests covering vision fallback, count limits, video compat	2026-04-09 01:13:40 +08:00
chengyongru	b9346b0d59	feat: generalize multimodal support with audio/video handling Add comprehensive audio and video support across the agent pipeline: - Generalize media placeholder system: _strip_image_content → _strip_media_content, _media_placeholder with type-specific labels, unified across providers - Add detect_audio_mime with magic-byte detection and filename fallback - Add _AUDIO_FORMAT_MAP for correct MIME-to-API-format conversion - Add InputLimitsConfig with count limits (max_input_audios/videos) and byte limits - Support input_audio blocks in context builder with OpenAI-compatible format - Support video_url blocks with base64 inline data - Add audio/video passthrough in Codex provider, placeholder fallback in Anthropic provider - Thread supports_vision/audio/video capability flags through AgentLoop - Unify placeholder format: [audio: path]/[video: path] instead of generic [file: path] - Optimize file I/O: single read_bytes() instead of header+full double reads - Extract _STRIP_MEDIA_TYPES as class constant to avoid per-call allocation	2026-04-08 23:14:40 +08:00

Author

SHA1

Message

Date

chengyongru

c053f9eba8

fix(multimodal): image OOM guard, Feishu post media extraction, vision fallback

- Add file size pre-check via stat() before read_bytes() to prevent OOM
  on oversized images/audio/video
- Fix _extract_post_content to extract media tags (file_key) from Feishu
  post messages so videos are no longer silently dropped
- Add supports_vision=False guard to downgrade images to text placeholders
- Add video_mime_compat() for video format validation
- Use full file path in content_text so model read_file works if needed
- Pass input_limits to AgentLoop in nanobot.py facade
- Deduplicate _MEDIA_PLACEHOLDER_TYPES from LLMProvider constant
- Remove unused _extract_post_text legacy wrapper
- Add 14 new tests covering vision fallback, count limits, video compat

2026-04-09 01:13:40 +08:00

chengyongru

b9346b0d59

feat: generalize multimodal support with audio/video handling

Add comprehensive audio and video support across the agent pipeline:

- Generalize media placeholder system: _strip_image_content → _strip_media_content,
  _media_placeholder with type-specific labels, unified across providers
- Add detect_audio_mime with magic-byte detection and filename fallback
- Add _AUDIO_FORMAT_MAP for correct MIME-to-API-format conversion
- Add InputLimitsConfig with count limits (max_input_audios/videos) and byte limits
- Support input_audio blocks in context builder with OpenAI-compatible format
- Support video_url blocks with base64 inline data
- Add audio/video passthrough in Codex provider, placeholder fallback in Anthropic provider
- Thread supports_vision/audio/video capability flags through AgentLoop
- Unify placeholder format: [audio: path]/[video: path] instead of generic [file: path]
- Optimize file I/O: single read_bytes() instead of header+full double reads
- Extract _STRIP_MEDIA_TYPES as class constant to avoid per-call allocation

2026-04-08 23:14:40 +08:00

2 Commits