Add two focused regression tests for the retry-wait leak this PR fixes:
- tests/agent/test_runner.py::test_runner_binds_on_retry_wait_to_retry_callback_not_progress
locks in that `AgentRunSpec.retry_wait_callback` (not `progress_callback`) is
what `_build_request_kwargs` forwards to the provider as `on_retry_wait`.
- tests/channels/test_channel_manager_delta_coalescing.py::TestRetryWaitFiltering
runs `_dispatch_outbound` end-to-end and asserts that `_retry_wait: True`
messages never reach channel send.
Both tests fail on origin/main and pass with this PR's fix applied.
Made-with: Cursor
- Extract synthetic user message string to module-level constant
- Tighten comments in _snip_history recovery branch
- Strengthen no-user edge case test to verify safety net interaction
When _snip_history truncates the message history and the only user message
ends up outside the kept window, providers like GLM reject the resulting
system→assistant sequence with error 1214 ("messages 参数非法").
Two-layer fix:
1. _snip_history now walks backwards through non_system messages to recover
the nearest user message when none exists in the kept window.
2. _enforce_role_alternation inserts a synthetic user message
"(conversation continued)" when the first non-system message is a bare
assistant (no tool_calls), serving as a safety net for any edge cases
that slip through.
Co-authored-by: darlingbud <darlingbud@users.noreply.github.com>
Add a built-in tool that lets the agent inspect and modify its own
runtime state (model, iterations, context window, etc.).
Key features:
- inspect: view current config, usage stats, and subagent status
- modify: adjust parameters at runtime (protected by type/range validation)
- Subagent observability: inspect running subagent tasks (phase,
iteration, tool events, errors) — subagents are no longer a black box
- Watchdog corrects out-of-bounds values on each iteration
- Enabled by default in read-only mode (self_modify: false)
- All changes are in-memory only; restart restores defaults
- Comprehensive test suite (90 tests)
Includes a self-awareness skill (always-on) with progressive disclosure:
SKILL.md for core rules, references/examples.md for detailed scenarios.
Keep late follow-up injections observable when they are drained during max-iteration shutdown so loop-level response suppression still makes the right decision.
Made-with: Cursor
- Migrate "after tools" inline drain to use _try_drain_injections,
completing the refactoring (all 6 drain sites now use the helper).
- Move checkpoint emission into _try_drain_injections via optional
iteration parameter, eliminating the leaky split between helper
and caller for the final-response path.
- Extract _make_injection_callback() test helper to replace 7
identical inject_cb function bodies.
- Add test_injection_cycle_cap_on_error_path to verify the cycle
cap is enforced on error exit paths.
When the agent runner exits due to LLM error, tool error, empty response,
or max_iterations, it breaks out of the iteration loop without draining
the pending injection queue. This causes leftover messages to be
re-published as independent inbound messages, resulting in duplicate or
confusing replies to the user.
Extract the injection drain logic into a `_try_drain_injections` helper
and call it before each break in the error/edge-case paths. If injections
are found, continue the loop instead of breaking. For max_iterations
(where the loop is exhausted), drain injections to prevent re-publish
without continuing.
* feat(agent): add mid-turn message injection for responsive follow-ups
Allow user messages sent during an active agent turn to be injected
into the running LLM context instead of being queued behind a
per-session lock. Inspired by Claude Code's mid-turn queue drain
mechanism (query.ts:1547-1643).
Key design decisions:
- Messages are injected as natural user messages between iterations,
no tool cancellation or special system prompt needed
- Two drain checkpoints: after tool execution and after final LLM
response ("last-mile" to prevent dropping late arrivals)
- Bounded by MAX_INJECTION_CYCLES (5) to prevent consuming the
iteration budget on rapid follow-ups
- had_injections flag bypasses _sent_in_turn suppression so follow-up
responses are always delivered
Closes#1609
* fix(agent): harden mid-turn injection with streaming fix, bounded queue, and message safety
- Fix streaming protocol violation: Checkpoint 2 now checks for injections
BEFORE calling on_stream_end, passing resuming=True when injections found
so streaming channels (Feishu) don't prematurely finalize the card
- Bound pending queue to maxsize=20 with QueueFull handling
- Add warning log when injection batch exceeds _MAX_INJECTIONS_PER_TURN
- Re-publish leftover queue messages to bus in _dispatch finally block to
prevent silent message loss on early exit (max_iterations, tool_error, cancel)
- Fix PEP 8 blank line before dataclass and logger.info indentation
- Add 12 new tests covering drain, checkpoints, cycle cap, queue routing,
cleanup, and leftover re-publish
Keep tool-call assistant messages valid across provider sanitization and avoid trailing user-only history after model errors. This prevents follow-up requests from sending broken tool chains back to the gateway.
- Adjusted message handling in AgentRunner to ensure that historical messages remain unchanged during context governance.
- Introduced tests to verify that backfill operations do not alter the saved message boundary, maintaining the integrity of the conversation history.
- Merged latest main (no conflicts)
- Added test_llm_error_not_appended_to_session_messages: verifies error
content stays out of session messages
- Added test_streamed_flag_not_set_on_llm_error: verifies _streamed is
not set when LLM returns an error, so ChannelManager delivers it
Made-with: Cursor
When the LLM returns an error (e.g. 429 quota exceeded, stream timeout),
streaming channels silently drop the error message because `_streamed=True`
is set in metadata even though no content was actually streamed.
This change:
- Skips setting `_streamed` when stop_reason is "error", so error messages
go through the normal channel.send() path and reach the user
- Stops appending error content to session history, preventing error
messages from polluting subsequent conversation context
- Exposes stop_reason from _run_agent_loop to enable the above check