hussein1362 f8a023218d fix(telegram): improve markdown rendering for modern LLM output
Problem:
Modern LLMs (GPT-5.4, Claude, Gemini) produce markdown-heavy responses with
numbered lists, headers, and nested formatting. The Telegram channel's
_markdown_to_telegram_html() converter has gaps that leave these poorly
formatted:

1. Numbered lists (1. 2. 3.) have zero handling — sent as raw text
2. Headers (# Title) are stripped to plain text, losing visual hierarchy
3. Mid-stream edits send raw markdown (users see **bold** and ### headers
   while the response generates, before the final HTML conversion)

Root Cause:
_markdown_to_telegram_html() handles bullets (- *) but skips numbered lists
entirely. Headers are stripped of # but not given any emphasis. The streaming
path in send_delta() sends buf.text as-is during mid-stream edits (plain
text, no parse_mode) — only the final _stream_end edit converts to HTML.

Fix:
1. Headers now render as <b>bold</b> in the final HTML (using placeholder
   markers that survive HTML escaping, restored after all other processing)
2. Numbered lists are normalized (extra whitespace after the dot is cleaned)
3. New _strip_md_block() function strips markdown syntax for readable
   plain-text preview during streaming mid-edits

The final _stream_end HTML conversion is unchanged — it still produces
full HTML with parse_mode=HTML. Only the intermediate edits are improved.

Tests:
Added 10 new tests covering:
- Headers converting to bold HTML
- Numbered list preservation and whitespace normalization
- Headers with HTML special characters
- Mixed formatting (headers + bullets + numbers + bold)
- _strip_md_block for inline formatting, headers, bullets, numbers, links
- Streaming mid-edit markdown stripping (initial send + edit)
2026-04-21 21:35:34 +08:00
..
2026-04-15 16:51:02 +08:00