mirror of
https://github.com/HKUDS/nanobot.git
synced 2026-05-21 17:12:32 +00:00
Signal's BodyRange (via signal-cli's textStyle) interprets start/length as UTF-16 code units, but the Phase-3 assembly used Python's len(), which counts code points. A single non-BMP character (e.g. an emoji) earlier in a message shifted every subsequent styled span left by one unit, dropping the last letter of bold/italic words. Track a running UTF-16 offset in the assembly loop and add regression tests covering emojis, supplementary CJK, ZWJ sequences, and a multi-section message that mirrors the reported failure. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>