141 Commits

Author SHA1 Message Date
Xubin Ren
306958d6e6 add native Bedrock Converse provider
Made-with: Cursor
2026-05-01 18:52:03 +08:00
hanyuanling
3c20d16117 fix subagent max iteration limit 2026-04-30 13:45:40 +08:00
Xubin Ren
3d7099b421 fix(memory): clean atomic write test hygiene
Made-with: Cursor
2026-04-29 16:57:50 +08:00
yorkhellen
2af45945e2 fix(memory): ensure atomic write for history.jsonl
Use temp file + os.replace + fsync to prevent partial writes on crash.
Add tests for atomic write behavior and tmp file cleanup on exception.
2026-04-29 16:57:50 +08:00
Xubin Ren
48f3cc6390 fix(agent): stop on workspace violations from tool errors
Treat workspace and safety guard failures as fatal regardless of whether they arrive from tool preparation, returned tool output, or raised exceptions.

Made-with: Cursor
2026-04-28 15:13:27 +08:00
Xubin Ren
ad4802600e refactor(config): make max messages default explicit
Use 120 as the config-level default and normalize zero back to that limit so session replay always receives an explicit message cap.

Made-with: Cursor
2026-04-28 14:54:32 +08:00
hussein1362
d45ffcf519 feat(config): wire max_messages into session history replay
The max_messages config field in AgentDefaults was accepted by the
schema but never threaded through to the actual get_history() calls
in the agent loop.  Both call sites in _process_message hardcoded the
default, so sessions with slow or local models accumulated unbounded
history that inflated prompt tokens and caused LLM timeouts.

Changes:
- Add max_messages field to AgentDefaults (default 0 = use built-in
  constant, any positive value caps history replay)
- Store the value on AgentLoop and pass it to get_history() when
  non-zero
- Wire the config through all three AgentLoop construction sites in
  commands.py (gateway, API server, CLI chat)
- 14 focused tests covering schema validation, init storage, history
  slicing, boundary alignment, integration wiring, and the
  zero/default path
2026-04-28 14:54:32 +08:00
Xubin Ren
fdfecd3ba6 refactor(codex): name progress delta capability semantically
Use a provider capability name that describes user-visible progress delta support instead of the runner implementation detail.

Made-with: Cursor
2026-04-27 18:48:05 +08:00
hanyuanling
ae14142a87 fix(codex): stream progress deltas to channels 2026-04-27 18:48:05 +08:00
Xubin Ren
e31273ebaa Merge origin/main into fix/discord-allow-channel-threads
Made-with: Cursor
2026-04-27 09:26:24 +00:00
Xubin Ren
eb4b3d9e26 refactor(session): internalize history/file-cap knobs as constants
Move sessionHistoryMaxMessages, sessionHistoryMaxTokens, and
sessionFileMaxMessages out of user-facing config into internal
constants (HISTORY_MAX_MESSAGES=120, FILE_MAX_MESSAGES=2000).

- Remove 3 fields from AgentDefaults and config pipeline
- Sink enforce_file_cap into Session (was AgentLoop)
- Auto-derive token budget from context window (was configurable)
- Net -113 lines across 7 files; 723 tests green

Made-with: Cursor
2026-04-27 08:06:50 +00:00
Xubin Ren
29ebc2d355 Merge origin/main into feat/session-replay-file-cap-invariants
Preserve main's timestamp/tool-context replay semantics while keeping the PR's session history and file-cap budgets.

Made-with: Cursor
2026-04-27 07:32:00 +00:00
Xubin Ren
311a7fe36e fix(session): stop training the model to parrot [Message Time: ...]
Past assistant turns in history were prefixed with "[Message Time: ...]"
just like user turns. The model treated these as in-context demos and
started prefixing its own replies with the same marker, leaking
metadata to the user. Prompt-level warnings could not beat dozens of
prior assistant samples.

Annotate only user turns and proactive deliveries
(_channel_delivery=True, i.e. cron / heartbeat pushes whose timing is
the whole point and which are too infrequent to act as demos). Adjacent
user-side timestamps still pin every normal assistant reply for
relative-time reasoning. The now-redundant identity.md warning is
removed along with the demonstration source.
2026-04-27 07:11:20 +00:00
Xubin Ren
7dcf83e389 test(agent): cover threaded subagent routing
Made-with: Cursor
2026-04-27 14:37:36 +08:00
Xubin Ren
eeaec1f951 fix(agent): prevent message time metadata from leaking into replies 2026-04-27 06:23:43 +00:00
chengyongru
6eb178113e fix(mcp): sanitize MCP capability names for model API compatibility
MCP resource/prompt/tool names containing spaces or special characters
(e.g. "PostgreSQL System Information") were forwarded verbatim to model
provider APIs, causing validation errors from both Anthropic and OpenAI
which require names matching ^[a-zA-Z0-9_-]{1,128}$.

Add _sanitize_name() that replaces invalid characters with underscores
and collapses consecutive underscores. Applied in MCPToolWrapper,
MCPResourceWrapper, MCPPromptWrapper constructors and the enabled_tools
filtering logic.

Closes #3468
2026-04-27 11:49:50 +08:00
Xubin Ren
4a4ba1efc1 Merge branch 'main' into fix/session-history-timestamps
Made-with: Cursor
2026-04-26 18:13:11 +00:00
Xubin Ren
038a140ad3 fix(slack): preserve thread context for proactive replies
Capture Slack thread metadata for cron and message-tool deliveries so replies stay in the originating thread, and hydrate first thread mentions with recent Slack context.

Made-with: Cursor
2026-04-27 02:10:38 +08:00
Xubin Ren
df37a36174 fix(agent): expose session timestamps in model context
Include persisted turn timestamps when assembling LLM prompts so relative-date references like yesterday and today have concrete anchors.

Made-with: Cursor
2026-04-26 17:42:58 +00:00
hanyuanling
59dfd74842 feat(session): enforce replay/file-cap invariants for history lifecycle 2026-04-27 00:53:32 +08:00
Xubin Ren
b2aec5528a refactor(agent): move provider refresh into subsystem owners 2026-04-26 14:18:37 +00:00
Xubin Ren
f670da6c70 refactor(providers): move provider snapshot creation into factory 2026-04-26 14:05:13 +00:00
Xubin Ren
65b0ae81af Merge origin/main into webui-settings
Made-with: Cursor
2026-04-26 13:05:32 +00:00
Xubin Ren
727086ddac test: tighten consolidation ratio coverage
Made-with: Cursor
2026-04-26 20:24:42 +08:00
chengyongru
fca56d324a test: add unit tests for configurable consolidation_ratio
Cover ratio propagation, schema validation, and consolidation
behavior with different ratio values (0.1, 0.5, 0.9).
2026-04-26 20:24:42 +08:00
Xubin Ren
b440e76d2f feat(webui): add model settings runtime refresh 2026-04-25 18:05:06 +00:00
Xubin Ren
a58d9fd357 feat(webui): render ask_user choices
Made-with: Cursor
2026-04-25 15:46:47 +00:00
Xubin Ren
403ce23d22 fix(agent): tighten ask_user CLI handling
Made-with: Cursor
2026-04-25 22:10:19 +08:00
Xubin Ren
3b1ea99ee1 fix(agent): render ask_user options without buttons
Made-with: Cursor
2026-04-25 22:10:19 +08:00
Xubin Ren
cfc76ffbbf feat(agent): add ask_user tool
Made-with: Cursor
2026-04-25 22:10:19 +08:00
yorkhellen
076e4166d7 fix(agent): add LLM request timeout to prevent session lock starvation 2026-04-25 03:40:34 +08:00
Xubin Ren
4531167c12 fix(agent): bound remaining memory/history pollution paths from #3412
#3412 stopped the headline raw_archive bloat but left four adjacent leaks
on the same pollution chain:

- archive() success path appended uncapped LLM summaries to history.jsonl,
  so a misbehaving LLM could re-open the #3412 bug from the happy path.
- maybe_consolidate_by_tokens did not advance last_consolidated when
  archive() fell back to raw_archive, causing duplicate [RAW] dumps of
  the same chunk on every subsequent call.
- Dream's Phase 1/2 prompt injected MEMORY.md / SOUL.md / USER.md and
  each history entry without caps, so any legacy oversized record (or an
  unbounded user edit) would blow past the context window every dream.
- append_history itself had no default cap, leaving future new callers
  one forgotten-cap-away from the same vector.

Changes:

- Cap LLM-produced summaries at 8K chars (_ARCHIVE_SUMMARY_MAX_CHARS)
  before writing to history.jsonl.
- Advance session.last_consolidated after archive() regardless of whether
  it summarized or raw-archived — both outcomes materialize the chunk;
  still break the round loop on fallback so a degraded LLM isn't hammered.
- Truncate MEMORY.md / SOUL.md / USER.md and each history entry in Dream's
  Phase 1 prompt preview (Phase 2 still reaches full files via read_file).
- Add _HISTORY_ENTRY_HARD_CAP (64K) as belt-and-suspenders default in
  append_history with a once-per-store warning, so any new caller that
  forgets its own tighter cap gets caught and observable.

Layer the caps by scope: raw_archive=16K, archive summary=8K,
append_history default=64K. Tight per-caller values cover expected
payloads; the wide default only catches regressions.

Tests: +9 regression tests covering each fix. Full suite: 2372 passed.
Made-with: Cursor
2026-04-24 04:17:19 +08:00
Xubin Ren
81a5af2352 test(consolidation): add regression tests for tiktoken truncation path and history char cap
Cover two untested boundaries from #3412:
- _truncate_to_token_budget with positive budget exercises tiktoken
- _MAX_HISTORY_CHARS caps Recent History section in system prompt

Made-with: Cursor
2026-04-24 03:57:59 +08:00
chengyongru
2848f69897 fix(agent): prevent history.jsonl bloat from raw_archive and stuck consolidation
Root cause: when consolidation LLM fails, raw_archive() dumped full message
content (~1MB) into history.jsonl with no size limit. Since build_system_prompt()
injects history.jsonl into every system prompt, all subsequent LLM calls exceeded
the 200K context window with error 1261.

Additionally, _cap_consolidation_boundary's 60-message cap caused consolidation
to get stuck on sessions with long tool chains (200+ iterations), triggering
the raw_archive fallback in the first place.

Three-layer fix:
- Remove _cap_consolidation_boundary: let pick_consolidation_boundary drive
  chunk sizing based solely on token budget
- Truncate archive() input: use tiktoken to cap formatted text to the model's
  input token budget before sending to consolidation LLM
- Truncate raw_archive() output: cap history.jsonl entries at 16K chars
2026-04-24 03:57:59 +08:00
Xubin Ren
469fc90fe6 fix(agent): on_progress tool_events only when callback accepts; align progress tests with main
Made-with: Cursor
2026-04-23 20:06:11 +08:00
Pablo Cabeza
c23d719780 feat(agent): emit structured _tool_events progress metadata
Extend the existing on_progress callback to carry structured tool-event
payloads alongside the plain-text hint, so channels can render rich
tool execution state (start/finish/error, arguments, results, file
attachments) rather than only the pre-formatted hint string.

Changes
-------
- AgentLoop._tool_event_start_payload() — builds a version-1 start
  payload from a ToolCallRequest
- AgentLoop._tool_event_result_extras() — extracts files/embeds from a
  tool result dict
- AgentLoop._tool_event_finish_payloads() — maps tool_calls +
  tool_results + tool_events from AgentHookContext into finish payloads
- _LoopHook.before_execute_tools() — passes tool_events=[...] to
  on_progress together with the existing tool_hint flag
- _LoopHook.after_iteration() — emits a second on_progress call with
  the finish payloads once tool results are available
- _bus_progress() — forwards tool_events as _tool_events in OutboundMessage
  metadata so channel implementations can read them
- on_progress type widened to Callable[..., Awaitable[None]] on all
  public entry points; _cli_progress updated to accept and ignore
  tool_events

The contract is additive: callers that only accept (content, *, tool_hint)
continue to work unchanged. Callers that also accept tool_events receive
the structured data.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 20:06:11 +08:00
Bongjin Lee
93ca791ac6 fix(discord): full thread support with session isolation and allowlist enforcement
Discord threads use their own channel IDs, so allowChannels was blocking
thread replies unless each thread ID was listed explicitly.

- Include the thread parent channel ID as an allowlist candidate
- Enforce allow_channels on slash commands (previously bypassed)
- Show parent channel ID in runtime context, reply to the thread
- Fix subagent cancel key via effective_key propagation
- Detect bot mentions via raw_mentions and reply-to-bot references
- Cache seen thread channels for outbound delivery
- Ignore system messages that become empty prompts
2026-04-23 04:05:39 +09:00
Xubin Ren
61a28c2c0a feat(webui): support image uploads in composer and message bubbles 2026-04-23 00:07:27 +08:00
chengyongru
42c4af2118 fix(agent): prevent duplicate responses when sub-agents complete concurrently
When the main agent spawns multiple sub-agents, each completion
independently triggered a new _dispatch, causing 3-4 user-visible
responses instead of a single comprehensive report.

- Extend _drain_pending to block-wait on pending_queue when sub-agents
  are still running, keeping the runner loop alive for in-order injection
- Pass pending_queue in the system message path so subsequent sub-agent
  results can still be injected mid-turn via a new dispatch
2026-04-22 20:02:19 +08:00
chengyongru
e15705b471 fix(tests): add _cancel_active_tasks mock to cmd_new test fixtures
The existing test_unified_session tests construct a SimpleNamespace
loop mock that now needs _cancel_active_tasks since cmd_new calls it.
2026-04-21 21:50:37 +08:00
Xubin Ren
c1957e14ff refactor(memory): centralize cursor validation behind a single gate
Move the non-int cursor guard out of the two consumer sites and into a
shared ``_iter_valid_entries`` iterator so the invariant lives in one
place.  Closes three gaps left by the original fix:

* ``bool`` is now rejected — ``isinstance(True, int)`` is ``True`` in
  Python, so the previous guard silently treated ``{"cursor": true}`` as
  cursor ``1``.
* Recovery now returns ``max(valid cursors) + 1``.  Under adversarial
  corruption "first int scanning in reverse" is not the same thing, and
  only ``max`` keeps the recovered cursor strictly greater than every
  legitimate cursor still on disk.
* Non-int cursors are logged exactly once per ``MemoryStore``.  Silently
  dropping corrupted entries hides the root cause (an external writer
  to ``memory/history.jsonl``); rate-limiting keeps the log clean when
  the same poisoned file is read every turn.

All 7 tests from the original fix pass unchanged; 3 new tests pin the
invariants above.

Made-with: Cursor
2026-04-21 14:02:53 +08:00
Muata Kamdibe
c0a11c7cf4 fix(memory): harden cursor recovery against non-integer corruption
_next_cursor now checks isinstance(cursor, int) before arithmetic,
falling back to a reverse scan of all entries when the last entry's
cursor is corrupted. read_unprocessed_history skips entries with
non-int cursors instead of crashing on comparison.

Root cause: external callers (cron jobs, plugins) occasionally wrote
string cursors to history.jsonl, which blocked all subsequent
append_history calls with TypeError/ValueError.

Includes 7 regression tests covering string, float, null, and list
cursor types.
2026-04-21 14:02:53 +08:00
Xubin Ren
82aa9efc02 test(mcp): pin CancelledError short-circuits the retry loop
The retry branch is only reachable via `except Exception`, and
`CancelledError` inherits from `BaseException`, so today it naturally
bypasses the retry path and /stop still works.  Add one focused
regression test so any future refactor that widens the retry catch to
`BaseException`, re-orders the handlers, or adds `CancelledError` to
`_TRANSIENT_EXC_NAMES` fails CI instead of silently swallowing /stop.

Made-with: Cursor
2026-04-21 13:24:40 +08:00
hussein1362
368752e707 fix(mcp): retry once on transient connection errors
When an MCP server restarts or a network connection drops between
tool calls, the existing session throws ClosedResourceError,
BrokenPipeError, ConnectionResetError, etc. Currently these are
caught as generic exceptions and returned as permanent failures
to the LLM, which then tells the user 'my tools are broken.'

This change adds a single automatic retry with a 1-second backoff
for transient connection-class errors in MCPToolWrapper,
MCPResourceWrapper, and MCPPromptWrapper. Non-transient errors
(ValueError, RuntimeError, McpError, etc.) are not retried.

The retry is conservative:
- Only 1 retry (not configurable, to keep the change minimal)
- Only for a specific set of connection-class exceptions
- Matched by exception class name to avoid importing anyio/etc.
- 1s sleep between attempts to allow the server to recover
- Clear logging distinguishes retried vs permanent failures

In production this eliminates most 'MCP tool call failed:
ClosedResourceError' noise when MCP bridge processes restart
(e.g. after config changes or OOM kills).

Tests: 22 new tests covering retry, exhaustion, non-transient
bypass, timeout bypass, and all three wrapper types.
2026-04-21 13:24:40 +08:00
Xubin Ren
00de55072d test(agent): exercise /stop cancellation through _dispatch
Add a regression test that actually runs the CancelledError branch of
AgentLoop._dispatch end-to-end and asserts the in-flight checkpoint is
materialized into session.messages before the cancellation unwinds.

The three existing tests call _restore_runtime_checkpoint directly, so
they pass even if the cancel-time restore is ever removed from
_dispatch. This new test is the one that actually locks the fix in
place.

Made-with: Cursor
2026-04-21 01:14:41 +08:00
hussein1362
847c50b2de fix(loop): preserve partial context when /stop cancels a task
When a user sends /stop to interrupt an active agent turn, the task is
cancelled via CancelledError. Previously, the cancellation handler just
logged and re-raised, discarding any tool results and assistant messages
accumulated during the interrupted turn.

The runtime checkpoint mechanism already persists partial turn state
(assistant messages, completed tool results, pending tool calls) into
session metadata via _emit_checkpoint. However, this checkpoint was only
materialized into session history on the NEXT incoming message via
_restore_runtime_checkpoint — not at cancellation time.

Now the CancelledError handler in _dispatch calls
_restore_runtime_checkpoint immediately, so the partial context is
preserved in session history. This means the next message the user sends
will see all the work that was done before /stop, rather than starting
from scratch.

Fixes #2966

Includes 3 tests verifying checkpoint restoration on cancellation.
2026-04-21 01:14:41 +08:00
hlg
899a9073ce fix(memory): do not fall back to raw entry when strip_think empties it
`append_history` previously used `strip_think(entry) or entry.rstrip()`
as a safety net, so if the entire entry was a template-token leak (e.g.
`<think>reasoning</think>` or `<channel|>` alone), the raw leaked text
was still persisted to history — later re-introducing the very content
`strip_think` was meant to scrub, via consolidation / replay.

Persist the cleaned content directly. When cleanup empties a non-empty
entry, log at debug and store an empty-content record (cursor continuity
preserved). Adds 3 regression tests in test_memory_store.py covering:

  - Well-formed thinking blocks are stripped before persistence.
  - Pure-leak entries persist as empty, not as raw text.
  - Malformed prefix leaks (`<channel|>`) also persist as empty.
2026-04-20 17:04:48 +08:00
chengyongru
68466b1c2a fix(agent): propagate effective session key through subagent pipeline
The previous fix hardcoded session_key_override as channel:chat_id which
broke unified session mode where pending queues use "unified:default".
Propagate the effective key from _set_tool_context through SpawnTool
into the origin dict so _announce_result routes to the correct pending
queue in both normal and unified session modes.
2026-04-20 14:47:14 +08:00
Xubin Ren
56a779c128 fix(session): repair read-only corrupt session paths 2026-04-20 00:17:50 +08:00
aiguozhi123456
efb04a1712 fix(session): use atomic writes and add corrupt-file repair
SessionManager.save() previously used bare open("w") which could
truncate the JSONL file if the process crashed mid-write. Now writes
to a .tmp file and atomically replaces via os.replace(), matching the
pattern already used in qq.py.

_load() now attempts _repair() before returning None, recovering
valid lines from partially-written files. 12 new tests cover atomic
save correctness, temp-file cleanup on failure, and repair of
truncated/corrupt JSONL.

cowork-with:opencode(glm-5.1)
2026-04-20 00:17:50 +08:00