mirror of
https://github.com/HKUDS/nanobot.git
synced 2026-04-30 14:56:01 +00:00
Problem: Modern LLMs (GPT-5.4, Claude, Gemini) produce markdown-heavy responses with numbered lists, headers, and nested formatting. The Telegram channel's _markdown_to_telegram_html() converter has gaps that leave these poorly formatted: 1. Numbered lists (1. 2. 3.) have zero handling — sent as raw text 2. Headers (# Title) are stripped to plain text, losing visual hierarchy 3. Mid-stream edits send raw markdown (users see **bold** and ### headers while the response generates, before the final HTML conversion) Root Cause: _markdown_to_telegram_html() handles bullets (- *) but skips numbered lists entirely. Headers are stripped of # but not given any emphasis. The streaming path in send_delta() sends buf.text as-is during mid-stream edits (plain text, no parse_mode) — only the final _stream_end edit converts to HTML. Fix: 1. Headers now render as <b>bold</b> in the final HTML (using placeholder markers that survive HTML escaping, restored after all other processing) 2. Numbered lists are normalized (extra whitespace after the dot is cleaned) 3. New _strip_md_block() function strips markdown syntax for readable plain-text preview during streaming mid-edits The final _stream_end HTML conversion is unchanged — it still produces full HTML with parse_mode=HTML. Only the intermediate edits are improved. Tests: Added 10 new tests covering: - Headers converting to bold HTML - Numbered list preservation and whitespace normalization - Headers with HTML special characters - Mixed formatting (headers + bullets + numbers + bold) - _strip_md_block for inline formatting, headers, bullets, numbers, links - Streaming mid-edit markdown stripping (initial send + edit)