The existing test only verified the adaptive path. Add two more cases:
- enabled thinking (high): temperature must also be omitted
- no thinking (None): temperature must still be omitted
Made-with: Cursor
Two issues with DeepSeek V4 thinking mode support:
1. Missing thinking parameter injection.
DeepSeek V4 requires `extra_body: {"thinking": {"type": "enabled/disabled"}}`
— identical to VolcEngine/BytePlus. The code had this for volcengine,
byteplus, dashscope, minimax, and kimi but not DeepSeek. This means
`reasoning_effort=minimal` (thinking off) silently has no effect.
Root cause: the thinking-style→wire-format mapping was an if/elif chain
on provider *names*. DeepSeek was forgotten.
Fix: make the mapping declarative via `ProviderSpec.thinking_style`:
- "thinking_type" → {"thinking": {"type": "..."}} (DeepSeek, Volc, BytePlus)
- "enable_thinking" → {"enable_thinking": bool} (DashScope)
- "reasoning_split" → {"reasoning_split": bool} (MiniMax)
`_build_kwargs` now does a single dict lookup. Adding a new provider
with an existing wire format requires zero changes to the function.
2. Legacy session messages crash thinking-mode requests.
When a session was started without thinking mode (or with a different
model), assistant messages lack reasoning_content. DeepSeek V4 in
thinking mode rejects these with 400:
"The reasoning_content in the thinking mode must be passed back to the API."
This affects ALL assistant messages, not just those with tool_calls
(despite the docs only mentioning the tool_calls case).
Fix: `_build_kwargs` backfills `reasoning_content: ""` on every
assistant message missing it, but only when thinking mode is active.
This is semantically neutral — the model treats empty reasoning_content
as "no thinking happened on that turn". The backfill only touches the
in-memory request copy; session files on disk are untouched.
Tests: +5 (3 thinking toggle, 2 backfill). Full suite: 2377 passed.
Made-with: Cursor
#3412 stopped the headline raw_archive bloat but left four adjacent leaks
on the same pollution chain:
- archive() success path appended uncapped LLM summaries to history.jsonl,
so a misbehaving LLM could re-open the #3412 bug from the happy path.
- maybe_consolidate_by_tokens did not advance last_consolidated when
archive() fell back to raw_archive, causing duplicate [RAW] dumps of
the same chunk on every subsequent call.
- Dream's Phase 1/2 prompt injected MEMORY.md / SOUL.md / USER.md and
each history entry without caps, so any legacy oversized record (or an
unbounded user edit) would blow past the context window every dream.
- append_history itself had no default cap, leaving future new callers
one forgotten-cap-away from the same vector.
Changes:
- Cap LLM-produced summaries at 8K chars (_ARCHIVE_SUMMARY_MAX_CHARS)
before writing to history.jsonl.
- Advance session.last_consolidated after archive() regardless of whether
it summarized or raw-archived — both outcomes materialize the chunk;
still break the round loop on fallback so a degraded LLM isn't hammered.
- Truncate MEMORY.md / SOUL.md / USER.md and each history entry in Dream's
Phase 1 prompt preview (Phase 2 still reaches full files via read_file).
- Add _HISTORY_ENTRY_HARD_CAP (64K) as belt-and-suspenders default in
append_history with a once-per-store warning, so any new caller that
forgets its own tighter cap gets caught and observable.
Layer the caps by scope: raw_archive=16K, archive summary=8K,
append_history default=64K. Tight per-caller values cover expected
payloads; the wide default only catches regressions.
Tests: +9 regression tests covering each fix. Full suite: 2372 passed.
Made-with: Cursor
Cover two untested boundaries from #3412:
- _truncate_to_token_budget with positive budget exercises tiktoken
- _MAX_HISTORY_CHARS caps Recent History section in system prompt
Made-with: Cursor
Root cause: when consolidation LLM fails, raw_archive() dumped full message
content (~1MB) into history.jsonl with no size limit. Since build_system_prompt()
injects history.jsonl into every system prompt, all subsequent LLM calls exceeded
the 200K context window with error 1261.
Additionally, _cap_consolidation_boundary's 60-message cap caused consolidation
to get stuck on sessions with long tool chains (200+ iterations), triggering
the raw_archive fallback in the first place.
Three-layer fix:
- Remove _cap_consolidation_boundary: let pick_consolidation_boundary drive
chunk sizing based solely on token budget
- Truncate archive() input: use tiktoken to cap formatted text to the model's
input token budget before sending to consolidation LLM
- Truncate raw_archive() output: cap history.jsonl entries at 16K chars
Extend the existing on_progress callback to carry structured tool-event
payloads alongside the plain-text hint, so channels can render rich
tool execution state (start/finish/error, arguments, results, file
attachments) rather than only the pre-formatted hint string.
Changes
-------
- AgentLoop._tool_event_start_payload() — builds a version-1 start
payload from a ToolCallRequest
- AgentLoop._tool_event_result_extras() — extracts files/embeds from a
tool result dict
- AgentLoop._tool_event_finish_payloads() — maps tool_calls +
tool_results + tool_events from AgentHookContext into finish payloads
- _LoopHook.before_execute_tools() — passes tool_events=[...] to
on_progress together with the existing tool_hint flag
- _LoopHook.after_iteration() — emits a second on_progress call with
the finish payloads once tool results are available
- _bus_progress() — forwards tool_events as _tool_events in OutboundMessage
metadata so channel implementations can read them
- on_progress type widened to Callable[..., Awaitable[None]] on all
public entry points; _cli_progress updated to accept and ignore
tool_events
The contract is additive: callers that only accept (content, *, tool_hint)
continue to work unchanged. Callers that also accept tool_events receive
the structured data.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
``InlineKeyboardButton(label, callback_data=label)`` fails Telegram's
API when the label exceeds 64 bytes UTF-8. An LLM-generated long
option (realistic in multilingual flows) used to 400 the ``send_message``
call silently — user got nothing, agent heard a successful retry-then-drop.
Decouple display from wire: button text keeps the full label, callback_data
gets truncated at a UTF-8 char boundary. Tap echoes the prefix back as the
user message; the LLM understands a prefix of its own option just fine,
and the display the user saw was always the full string.
Locks: helper boundary behavior (ASCII, CJK, short labels pass through)
and end-to-end ``_build_keyboard`` integration with an over-cap label.
Made-with: Cursor
Buttons are semantic options, not a separate channel protocol: a user
who taps "Yes" and a user who types "yes" arrive at the agent as the
same string. Dropping ``msg.buttons`` when ``inline_keyboards=False``
was the worst of both worlds — the agent got told "Message sent with
N button(s)" while the user saw a question with no options.
Splice the labels into the message text instead. The LLM produces the
same ``message(buttons=...)`` call regardless of channel; the channel
layer picks the richest rendering it can afford — native keyboard when
enabled, bracketed inline text otherwise. Layout is preserved (one row
per line). Other channels can adopt the same helper incrementally.
Locks: canonical ``_buttons_as_text`` format, flag-off send-path
splices labels, flag-on send-path keeps content clean and rides
``reply_markup``.
Made-with: Cursor
Two kill-switch tests for the new inline-keyboards path. Neither is
flashy — they just make sure the next unrelated refactor can't quietly
regress two narrow contracts the PR relies on.
1. TelegramChannel._build_keyboard returns None whenever
TelegramConfig.inline_keyboards is False, even if buttons are
supplied. The flag defaults off; if someone ever flips that default
the change should fail this test before it reaches prod bots.
2. MessageTool rejects malformed `buttons` payloads (non-list, mixed
list/str row, non-str label, None label) up front instead of
letting them slip into the channel layer where Telegram would
silently 400 the send. Parametrized over four shapes the guard
needs to reject.
No production code touched.
Made-with: Cursor
Replace the dump→resolve→model_validate roundtrip with a recursive walk
that substitutes ${VAR} in string values directly on BaseModel /
__pydantic_extra__ / dict / list nodes. Identity is preserved on any
subtree with no references, so the original Config instance is returned
unchanged when nothing needs resolving.
Side effects:
- exclude=True fields (e.g. DreamConfig.cron) now survive even when
other fields in the same config contain ${VAR} references, closing
the edge case left open by the previous fast-path-only fix.
- _has_env_refs is dropped (the walker short-circuits naturally).
- Added a regression test pairing cron with a resolved providers.groq
api_key to lock the coexistence case.
Made-with: Cursor
`resolve_config_env_vars` unconditionally dumped the config via
`model_dump(mode="json")` and revalidated it, which silently dropped
any field declared with `exclude=True` (e.g. `DreamConfig.cron` —
introduced by the Dream rename refactor in #2717). Result:
`agents.defaults.dream.cron` was never honored at runtime — the gateway
always fell back to the default `every 2h` schedule even when `cron`
was set in config.json.
Fix: skip the roundtrip entirely when the config has no `${VAR}`
references. Env-var interpolation still works unchanged when refs
exist; the legacy `cron` override now survives the common case of
fully-resolved config.
Regression test covers the bug path.
Adds a focused regression test so the fix for tool_result image
handling cannot silently revert. Two cases:
- list content with an image_url + text block -> image_url is
translated to a native Anthropic image block, sibling text passes
through unchanged
- plain string content passes through untouched (the new list branch
must not alter the string path)
These cover the exact symptom surface (silent image drop with a
"Non-transient LLM error with image content" warning) and the only
two content shapes tool results actually take today.
Made-with: Cursor
When the main agent spawns multiple sub-agents, each completion
independently triggered a new _dispatch, causing 3-4 user-visible
responses instead of a single comprehensive report.
- Extend _drain_pending to block-wait on pending_queue when sub-agents
are still running, keeping the runner loop alive for in-order injection
- Pass pending_queue in the system message path so subsequent sub-agent
results can still be injected mid-turn via a new dispatch
Locks in the four behaviors introduced by the fix so they can't silently
revert:
- _should_use_responses_api accepts github_copilot on its non-OpenAI base
- _build_responses_body strips the 'github_copilot/' routing prefix
- /responses failures on github_copilot do not fall back to /chat/completions
Made-with: Cursor
Calling GitHub Copilot with `gpt-5.*` / `o*` models (e.g.
`github_copilot/gpt-5.4`, `github_copilot/gpt-5.4-mini`) failed with a
chain of misleading errors:
1. `Unsupported parameter: 'max_tokens' is not supported with this
model. Use 'max_completion_tokens' instead.`
2. `model "gpt-5.4-mini" is not accessible via the /chat/completions
endpoint` (`unsupported_api_for_model`).
3. `The requested model is not supported.` (`model_not_supported`)
even after routing to /responses.
Root causes (each one masked the next):
* The `github_copilot` ProviderSpec did not opt into
`supports_max_completion_tokens`, so `_build_kwargs` always sent the
legacy `max_tokens` parameter that GPT-5/o-series reject.
* `_should_use_responses_api` was hard-gated to
`spec.name == "openai"` plus a direct-OpenAI base URL, so the
GitHub Copilot backend always went through /chat/completions even
for models the Copilot gateway exposes only via /responses
(e.g. `gpt-5.4-mini`).
* When /responses did fail on github_copilot, the existing
"compatibility marker" heuristic silently fell back to
/chat/completions — which can never succeed for these models — so
the real upstream error was hidden.
* `_build_responses_body` did not honour `spec.strip_model_prefix`,
so the request body sent `model="github_copilot/gpt-5.4-mini"`
(with the routing prefix), which the Copilot gateway rejects with
`model_not_supported`. (`_build_kwargs` already stripped it; this
branch was missed.)
Fix:
* registry.py: set `supports_max_completion_tokens=True` on the
`github_copilot` spec so requests use `max_completion_tokens`.
* openai_compat_provider.py:
- `_should_use_responses_api` now also allows the
`github_copilot` spec, and skips the direct-OpenAI base check
for it (the Copilot gateway is its own base URL).
- `_build_responses_body` now strips the model routing prefix
when `spec.strip_model_prefix` is set, matching `_build_kwargs`.
- `chat` / `chat_stream` no longer fall back from /responses to
/chat/completions on the `github_copilot` spec: the fallback
cannot succeed for GPT-5/o-series and would mask the real
gateway error.
Tests:
* tests/cli/test_commands.py: switched the
`test_github_copilot_provider_refreshes_client_api_key_before_chat`
fixture model from `gpt-5.1` to `gpt-4` so it continues to exercise
the /chat/completions code path it was designed for (gpt-5.1 now
correctly routes to /responses on github_copilot).
* `pytest tests/providers/ tests/cli/test_commands.py` — 314 passed.
* Verified end-to-end against the live Copilot gateway with both
`github_copilot/gpt-5.4` and `github_copilot/gpt-5.4-mini`.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
On Windows, opening a directory with O_RDONLY raises PermissionError.
Wrap the directory fsync in a try/except PermissionError — NTFS journals
metadata synchronously so the directory sync is unnecessary there.
Also adjust test assertions to expect 1 fsync call (file only) on
Windows vs 2 (file + directory) on POSIX.
On filesystems with write-back caching (rclone VFS, NFS, FUSE mounts)
the OS page cache may buffer recent session writes. If the process is
killed before the cache flushes, the most recent conversation turns are
silently lost — causing the agent to "forget" recent context and
respond to stale history on the next startup.
Changes:
- session/manager.py: add fsync=True option to save() that flushes the
file and its parent directory to durable storage. Add flush_all() that
re-saves every cached session with fsync. Default save() behavior is
unchanged (no fsync) to avoid performance regression in normal
operation.
- cli/commands.py: call agent.sessions.flush_all() in the gateway
shutdown finally block, after stopping heartbeat/cron/channels.
- tests/session/test_session_fsync.py: 8 tests covering fsync flag
behavior, flush_all with empty/multiple/errored sessions, and
data survival across simulated process restart.
- tests/cli/test_commands.py: add sessions attribute to _FakeAgentLoop
so the gateway health endpoint test passes with the new shutdown
flush.
DashScope rejects the OpenAI-style value "minimal" with
`'reasoning_effort.effort' must be one of: 'none', 'minimum', 'low',
'medium', 'high', 'xhigh'`, but nanobot was passing the string through
verbatim. Users who tried the documented "minimal" to disable thinking
got a 400; users who tried the DashScope-native "minimum" to work
around it got `enable_thinking=True` because the internal comparison
was a hard string match on "minimal".
Introduce a semantic/wire split in `_build_kwargs`:
- `semantic_effort` is the internal canonical form (OpenAI vocabulary).
"minimum" on the way in is normalized to "minimal" here so both
spellings share one meaning.
- `wire_effort` is what we actually serialize. For DashScope with
semantic_effort == "minimal" we translate to "minimum" on the way
out; other providers are unchanged.
- `thinking_enabled` and the Kimi thinking branch now compare on
`semantic_effort`, so either user spelling correctly disables
provider-side thinking.
Tests:
- Strengthen `test_dashscope_thinking_disabled_for_minimal` to assert
the wire value is "minimum" in addition to the extra_body signal;
the original version only checked extra_body and let the
invalid-value bug slip through.
- Add `test_dashscope_thinking_disabled_for_minimum_alias` so a user
who read the DashScope docs and configured "minimum" still gets
thinking off.
- Add `test_non_dashscope_minimal_not_retranslated` to pin down that
the DashScope-specific translation does not leak to OpenAI et al.
- Add ISO-639 pattern validation (2-3 lowercase letters) to schema
- Normalize empty language to None in provider constructors
- Extract shared httpx mock stubs, parameterize provider tests
- Add test for language=None omitting field from multipart body
- Add test for Pydantic pattern validation rejecting invalid codes
Wire up the existing office document extractors in document.py to
ReadFileTool by adding an extension guard and _read_office_doc() method
that follows the established PDF pattern. Handles missing libraries,
corrupt files, empty documents, and 128K truncation consistently.
Non-priority slash commands (e.g. /new, /help, /dream-log) arriving
while a session has an active LLM turn were silently queued into the
pending injection buffer and later injected as raw user messages into
the LLM conversation. This caused the model to respond to "/new" as
plain text instead of executing the command.
Root cause: the run() loop only checked priority commands (/stop,
/restart, /status) before routing messages to the pending queue. All
other command tiers (exact, prefix) bypassed command dispatch entirely.
Changes:
- Add CommandRouter.is_dispatchable_command() to match exact/prefix
tiers, mirroring the existing is_priority() pattern.
- In run(), intercept dispatchable commands before pending queue
insertion and dispatch them directly via _dispatch_command_inline().
- Extract _cancel_active_tasks() from cmd_stop for reuse; cmd_new now
cancels active tasks before clearing the session to prevent shared
mutable state corruption from concurrent asyncio coroutines.
- Update /new semantics: stops active task first, then clears session.
- Update documentation in help text, docs, and Discord command list.
Problem:
Modern LLMs (GPT-5.4, Claude, Gemini) produce markdown-heavy responses with
numbered lists, headers, and nested formatting. The Telegram channel's
_markdown_to_telegram_html() converter has gaps that leave these poorly
formatted:
1. Numbered lists (1. 2. 3.) have zero handling — sent as raw text
2. Headers (# Title) are stripped to plain text, losing visual hierarchy
3. Mid-stream edits send raw markdown (users see **bold** and ### headers
while the response generates, before the final HTML conversion)
Root Cause:
_markdown_to_telegram_html() handles bullets (- *) but skips numbered lists
entirely. Headers are stripped of # but not given any emphasis. The streaming
path in send_delta() sends buf.text as-is during mid-stream edits (plain
text, no parse_mode) — only the final _stream_end edit converts to HTML.
Fix:
1. Headers now render as <b>bold</b> in the final HTML (using placeholder
markers that survive HTML escaping, restored after all other processing)
2. Numbered lists are normalized (extra whitespace after the dot is cleaned)
3. New _strip_md_block() function strips markdown syntax for readable
plain-text preview during streaming mid-edits
The final _stream_end HTML conversion is unchanged — it still produces
full HTML with parse_mode=HTML. Only the intermediate edits are improved.
Tests:
Added 10 new tests covering:
- Headers converting to bold HTML
- Numbered list preservation and whitespace normalization
- Headers with HTML special characters
- Mixed formatting (headers + bullets + numbers + bold)
- _strip_md_block for inline formatting, headers, bullets, numbers, links
- Streaming mid-edit markdown stripping (initial send + edit)
ZhiPu API returns code 1302 with Chinese text "速率限制" instead of
standard HTTP 429 + "rate limit", causing the retry engine to treat
it as non-transient and fail immediately.
Move the non-int cursor guard out of the two consumer sites and into a
shared ``_iter_valid_entries`` iterator so the invariant lives in one
place. Closes three gaps left by the original fix:
* ``bool`` is now rejected — ``isinstance(True, int)`` is ``True`` in
Python, so the previous guard silently treated ``{"cursor": true}`` as
cursor ``1``.
* Recovery now returns ``max(valid cursors) + 1``. Under adversarial
corruption "first int scanning in reverse" is not the same thing, and
only ``max`` keeps the recovered cursor strictly greater than every
legitimate cursor still on disk.
* Non-int cursors are logged exactly once per ``MemoryStore``. Silently
dropping corrupted entries hides the root cause (an external writer
to ``memory/history.jsonl``); rate-limiting keeps the log clean when
the same poisoned file is read every turn.
All 7 tests from the original fix pass unchanged; 3 new tests pin the
invariants above.
Made-with: Cursor
_next_cursor now checks isinstance(cursor, int) before arithmetic,
falling back to a reverse scan of all entries when the last entry's
cursor is corrupted. read_unprocessed_history skips entries with
non-int cursors instead of crashing on comparison.
Root cause: external callers (cron jobs, plugins) occasionally wrote
string cursors to history.jsonl, which blocked all subsequent
append_history calls with TypeError/ValueError.
Includes 7 regression tests covering string, float, null, and list
cursor types.
The retry branch is only reachable via `except Exception`, and
`CancelledError` inherits from `BaseException`, so today it naturally
bypasses the retry path and /stop still works. Add one focused
regression test so any future refactor that widens the retry catch to
`BaseException`, re-orders the handlers, or adds `CancelledError` to
`_TRANSIENT_EXC_NAMES` fails CI instead of silently swallowing /stop.
Made-with: Cursor
When an MCP server restarts or a network connection drops between
tool calls, the existing session throws ClosedResourceError,
BrokenPipeError, ConnectionResetError, etc. Currently these are
caught as generic exceptions and returned as permanent failures
to the LLM, which then tells the user 'my tools are broken.'
This change adds a single automatic retry with a 1-second backoff
for transient connection-class errors in MCPToolWrapper,
MCPResourceWrapper, and MCPPromptWrapper. Non-transient errors
(ValueError, RuntimeError, McpError, etc.) are not retried.
The retry is conservative:
- Only 1 retry (not configurable, to keep the change minimal)
- Only for a specific set of connection-class exceptions
- Matched by exception class name to avoid importing anyio/etc.
- 1s sleep between attempts to allow the server to recover
- Clear logging distinguishes retried vs permanent failures
In production this eliminates most 'MCP tool call failed:
ClosedResourceError' noise when MCP bridge processes restart
(e.g. after config changes or OOM kills).
Tests: 22 new tests covering retry, exhaustion, non-transient
bypass, timeout bypass, and all three wrapper types.
Extend `_merge_consecutive` so the three invariants from
`LLMProvider._enforce_role_alternation` all hold for Anthropic:
1. collapse consecutive same-role turns (unchanged)
2. no trailing assistant — Anthropic rejects prefill (unchanged)
3. no leading assistant — Anthropic requires the first turn be user
4. non-empty messages array — recover the last stripped assistant as a
user turn when every turn got stripped, so callers don't hit a
secondary "messages array empty" 400
Anthropic-specific wrinkle: `tool_use` blocks live inside `content` (not
a separate `tool_calls` field) and are illegal inside user turns, so
both recovery paths skip any message carrying them rather than silently
producing a malformed request.
Adds 4 unit tests covering the new branches, including the tool_use
opt-outs, and updates the existing `test_single_assistant_stripped` to
reflect the new rerouting contract.
Made-with: Cursor
Anthropic does not support assistant-message prefill and returns a 400
error when the conversation ends with an assistant turn. This commonly
happens when heartbeat/system messages accumulate trailing assistant
replies in the session history.
The _merge_consecutive method already handles same-role merging but did
not strip trailing assistant messages. The base provider's
_enforce_role_alternation (used by OpenAI-compat) does strip them, but
AnthropicProvider uses its own _merge_consecutive instead.
Add a trailing-assistant stripping loop to _merge_consecutive, matching
the behavior already present in _enforce_role_alternation.
Includes 7 new tests covering merge + strip behavior.
Add a regression test that actually runs the CancelledError branch of
AgentLoop._dispatch end-to-end and asserts the in-flight checkpoint is
materialized into session.messages before the cancellation unwinds.
The three existing tests call _restore_runtime_checkpoint directly, so
they pass even if the cancel-time restore is ever removed from
_dispatch. This new test is the one that actually locks the fix in
place.
Made-with: Cursor
When a user sends /stop to interrupt an active agent turn, the task is
cancelled via CancelledError. Previously, the cancellation handler just
logged and re-raised, discarding any tool results and assistant messages
accumulated during the interrupted turn.
The runtime checkpoint mechanism already persists partial turn state
(assistant messages, completed tool results, pending tool calls) into
session metadata via _emit_checkpoint. However, this checkpoint was only
materialized into session history on the NEXT incoming message via
_restore_runtime_checkpoint — not at cancellation time.
Now the CancelledError handler in _dispatch calls
_restore_runtime_checkpoint immediately, so the partial context is
preserved in session history. This means the next message the user sends
will see all the work that was done before /stop, rather than starting
from scratch.
Fixes#2966
Includes 3 tests verifying checkpoint restoration on cancellation.
`append_history` previously used `strip_think(entry) or entry.rstrip()`
as a safety net, so if the entire entry was a template-token leak (e.g.
`<think>reasoning</think>` or `<channel|>` alone), the raw leaked text
was still persisted to history — later re-introducing the very content
`strip_think` was meant to scrub, via consolidation / replay.
Persist the cleaned content directly. When cleanup empties a non-empty
entry, log at debug and store an empty-content record (cursor continuity
preserved). Adds 3 regression tests in test_memory_store.py covering:
- Well-formed thinking blocks are stripped before persistence.
- Pure-leak entries persist as empty, not as raw text.
- Malformed prefix leaks (`<channel|>`) also persist as empty.
Some models / Ollama renderers occasionally emit tokenizer-level template
leaks that the existing regexes miss:
1. Malformed opening tags with no closing `>`, running straight into
user-facing content — e.g. `<think广场照明灯目前…` (observed with
Gemma 4 via Ollama). The earlier `<think>[\s\S]*?</think>` and
`^\s*<think>[\s\S]*$` patterns both require `>`, so these leak into
rendered messages.
2. Harmony-style channel markers like `<channel|>` / `<|channel|>` at
the start of a response.
3. Orphan `</think>` / `</thought>` closing tags left behind when only
the opener was consumed upstream.
Handles each case conservatively:
- Malformed `<think` / `<thought` only match when the next char is NOT
a tag-name continuation (`[A-Za-z0-9_\-:>/]`). Explicit ASCII class
instead of `\w` because Python's Unicode `\w` matches CJK and would
defeat the primary fix.
- Orphan closing tags and channel markers are stripped **only at the
start or end of the text**. `strip_think` is also applied before
persisting history (memory.py), so mid-text stripping would silently
rewrite transcripts where the tokens themselves are discussed.
Preserves: `<thinker>`, `<think-foo>`, `<think_foo>`, `<think1>`,
`<think:foo>`, `<thought/>`, literal `` `</think>` `` / `` `<channel|>` ``
inside prose or code blocks.
Adds 16 new regression tests covering both the leak cases and the
preserved-prose cases.
- Fix critical plain-text fallback that was sending raw HTML tags to
users: keep raw markdown available for the fallback path
- Extract TELEGRAM_HTML_MAX_LEN (4096) constant to replace hardcoded
magic number and document the difference from TELEGRAM_MAX_MESSAGE_LEN
- Add fallback to _send_text for extra HTML chunks when HTML parse fails
- Add missing @pytest.mark.asyncio decorator on
test_send_delta_stream_end_html_expansion_does_not_overflow
Cherry-picked from #3311 (stutiredboy). Streaming edits called
edit_message_text(text=buf.text) without chunking, so once accumulated
deltas crossed Telegram's 4096-char limit an ongoing stream would fail
with BadRequest.
Extracts _flush_stream_overflow helper that edits the first chunk in
place, sends any middle chunks, and re-anchors the buffer to a new
message for the tail so subsequent deltas keep streaming.
Co-Authored-By: stutiredboy <stutiredboy@users.noreply.github.com>
Cherry-picked from #3316 (himax12). When streaming completes in send_delta(),
the code was splitting raw markdown text by 4000, then converting to HTML.
The markdown-to-HTML conversion adds 10-33% characters, which could push
the result over Telegram's 4096 character limit.
The fix converts markdown to HTML first, then splits by 4096 (actual Telegram
limit), ensuring the edited message always fits.
Fixes#3315