Keep the new turn-end signal scoped to WebSocket clients, preserve pending tool-call state across trailing tool result rows, and drop the accidental npm lockfile from the Bun-based WebUI.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: allow_patterns take priority over deny_patterns in ExecTool
Previously deny_patterns were checked first with no bypass, meaning
allow_patterns could never exempt commands from the built-in deny list.
This made it impossible to whitelist destructive commands for specific
directories (e.g. build/cleanup tasks).
Changes:
- shell.py: check allow_patterns first; if matched, skip deny check
- shell.py: deny_patterns now appends to built-in list (not replaces)
- schema.py: add allow_patterns/deny_patterns to ExecToolConfig
- loop.py/subagent.py: pass allow_patterns/deny_patterns to ExecTool
- Add test_exec_allow_patterns.py covering priority semantics
* fix: separate deny pattern errors from workspace violation detection
The deny pattern error message "Command blocked by safety guard" was
included in _WORKSPACE_BLOCK_MARKERS, causing deny_pattern blocks to be
misclassified as fatal workspace violations. This meant LLMs had no
chance to retry with a different command — the turn was aborted
immediately.
Changes:
- shell.py: deny/allowlist error messages now use distinct phrasing
("blocked by deny pattern filter" / "blocked by allowlist filter")
- runner.py: remove "blocked by safety guard" from
_WORKSPACE_BLOCK_MARKERS so deny_pattern errors are treated as normal
tool errors (LLM can retry) instead of fatal violations
- workspace path errors still use "blocked by safety guard" and remain
fatal as intended
* fix: update test assertions to match new deny pattern error message
* fix: indentation error in test file
* fix: restore SSRF fatal classification and tidy exec pattern plumbing
Address review feedback on the deny/allow_patterns rework:
- runner.py: re-add "internal/private url detected" to
_WORKSPACE_BLOCK_MARKERS. The earlier marker removal also stripped
fatal classification from SSRF / internal-URL rejections (whose
message still says "blocked by safety guard"), turning a hard
security boundary into something the LLM could retry.
- loop.py / subagent.py: drop `or None` between ExecToolConfig and
ExecTool. The schema default is an empty list and ExecTool already
normalizes None back to [], so the indirection was a no-op.
- shell.py: extract `explicitly_allowed` flag in _guard_command so
allow_patterns are scanned once instead of twice and the control
flow no longer relies on a no-op `pass + else` branch.
- tests/agent/test_runner.py: add a regression test asserting that
the SSRF block message is treated as fatal, while deny/allowlist
filter messages are deliberately non-fatal.
* fix: remove unused exec allow-pattern test import
Keep the new ExecTool allow-pattern coverage clean under ruff.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Xubin Ren <xubinrencs@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Treat workspace and safety guard failures as fatal regardless of whether they arrive from tool preparation, returned tool output, or raised exceptions.
Made-with: Cursor
The max_messages config field in AgentDefaults was accepted by the
schema but never threaded through to the actual get_history() calls
in the agent loop. Both call sites in _process_message hardcoded the
default, so sessions with slow or local models accumulated unbounded
history that inflated prompt tokens and caused LLM timeouts.
Changes:
- Add max_messages field to AgentDefaults (default 0 = use built-in
constant, any positive value caps history replay)
- Store the value on AgentLoop and pass it to get_history() when
non-zero
- Wire the config through all three AgentLoop construction sites in
commands.py (gateway, API server, CLI chat)
- 14 focused tests covering schema validation, init storage, history
slicing, boundary alignment, integration wiring, and the
zero/default path
Move sessionHistoryMaxMessages, sessionHistoryMaxTokens, and
sessionFileMaxMessages out of user-facing config into internal
constants (HISTORY_MAX_MESSAGES=120, FILE_MAX_MESSAGES=2000).
- Remove 3 fields from AgentDefaults and config pipeline
- Sink enforce_file_cap into Session (was AgentLoop)
- Auto-derive token budget from context window (was configurable)
- Net -113 lines across 7 files; 723 tests green
Made-with: Cursor
Past assistant turns in history were prefixed with "[Message Time: ...]"
just like user turns. The model treated these as in-context demos and
started prefixing its own replies with the same marker, leaking
metadata to the user. Prompt-level warnings could not beat dozens of
prior assistant samples.
Annotate only user turns and proactive deliveries
(_channel_delivery=True, i.e. cron / heartbeat pushes whose timing is
the whole point and which are too infrequent to act as demos). Adjacent
user-side timestamps still pin every normal assistant reply for
relative-time reasoning. The now-redundant identity.md warning is
removed along with the demonstration source.
MCP resource/prompt/tool names containing spaces or special characters
(e.g. "PostgreSQL System Information") were forwarded verbatim to model
provider APIs, causing validation errors from both Anthropic and OpenAI
which require names matching ^[a-zA-Z0-9_-]{1,128}$.
Add _sanitize_name() that replaces invalid characters with underscores
and collapses consecutive underscores. Applied in MCPToolWrapper,
MCPResourceWrapper, MCPPromptWrapper constructors and the enabled_tools
filtering logic.
Closes#3468
Capture Slack thread metadata for cron and message-tool deliveries so replies stay in the originating thread, and hydrate first thread mentions with recent Slack context.
Made-with: Cursor
Include persisted turn timestamps when assembling LLM prompts so relative-date references like yesterday and today have concrete anchors.
Made-with: Cursor
#3412 stopped the headline raw_archive bloat but left four adjacent leaks
on the same pollution chain:
- archive() success path appended uncapped LLM summaries to history.jsonl,
so a misbehaving LLM could re-open the #3412 bug from the happy path.
- maybe_consolidate_by_tokens did not advance last_consolidated when
archive() fell back to raw_archive, causing duplicate [RAW] dumps of
the same chunk on every subsequent call.
- Dream's Phase 1/2 prompt injected MEMORY.md / SOUL.md / USER.md and
each history entry without caps, so any legacy oversized record (or an
unbounded user edit) would blow past the context window every dream.
- append_history itself had no default cap, leaving future new callers
one forgotten-cap-away from the same vector.
Changes:
- Cap LLM-produced summaries at 8K chars (_ARCHIVE_SUMMARY_MAX_CHARS)
before writing to history.jsonl.
- Advance session.last_consolidated after archive() regardless of whether
it summarized or raw-archived — both outcomes materialize the chunk;
still break the round loop on fallback so a degraded LLM isn't hammered.
- Truncate MEMORY.md / SOUL.md / USER.md and each history entry in Dream's
Phase 1 prompt preview (Phase 2 still reaches full files via read_file).
- Add _HISTORY_ENTRY_HARD_CAP (64K) as belt-and-suspenders default in
append_history with a once-per-store warning, so any new caller that
forgets its own tighter cap gets caught and observable.
Layer the caps by scope: raw_archive=16K, archive summary=8K,
append_history default=64K. Tight per-caller values cover expected
payloads; the wide default only catches regressions.
Tests: +9 regression tests covering each fix. Full suite: 2372 passed.
Made-with: Cursor
Cover two untested boundaries from #3412:
- _truncate_to_token_budget with positive budget exercises tiktoken
- _MAX_HISTORY_CHARS caps Recent History section in system prompt
Made-with: Cursor
Root cause: when consolidation LLM fails, raw_archive() dumped full message
content (~1MB) into history.jsonl with no size limit. Since build_system_prompt()
injects history.jsonl into every system prompt, all subsequent LLM calls exceeded
the 200K context window with error 1261.
Additionally, _cap_consolidation_boundary's 60-message cap caused consolidation
to get stuck on sessions with long tool chains (200+ iterations), triggering
the raw_archive fallback in the first place.
Three-layer fix:
- Remove _cap_consolidation_boundary: let pick_consolidation_boundary drive
chunk sizing based solely on token budget
- Truncate archive() input: use tiktoken to cap formatted text to the model's
input token budget before sending to consolidation LLM
- Truncate raw_archive() output: cap history.jsonl entries at 16K chars
Extend the existing on_progress callback to carry structured tool-event
payloads alongside the plain-text hint, so channels can render rich
tool execution state (start/finish/error, arguments, results, file
attachments) rather than only the pre-formatted hint string.
Changes
-------
- AgentLoop._tool_event_start_payload() — builds a version-1 start
payload from a ToolCallRequest
- AgentLoop._tool_event_result_extras() — extracts files/embeds from a
tool result dict
- AgentLoop._tool_event_finish_payloads() — maps tool_calls +
tool_results + tool_events from AgentHookContext into finish payloads
- _LoopHook.before_execute_tools() — passes tool_events=[...] to
on_progress together with the existing tool_hint flag
- _LoopHook.after_iteration() — emits a second on_progress call with
the finish payloads once tool results are available
- _bus_progress() — forwards tool_events as _tool_events in OutboundMessage
metadata so channel implementations can read them
- on_progress type widened to Callable[..., Awaitable[None]] on all
public entry points; _cli_progress updated to accept and ignore
tool_events
The contract is additive: callers that only accept (content, *, tool_hint)
continue to work unchanged. Callers that also accept tool_events receive
the structured data.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Discord threads use their own channel IDs, so allowChannels was blocking
thread replies unless each thread ID was listed explicitly.
- Include the thread parent channel ID as an allowlist candidate
- Enforce allow_channels on slash commands (previously bypassed)
- Show parent channel ID in runtime context, reply to the thread
- Fix subagent cancel key via effective_key propagation
- Detect bot mentions via raw_mentions and reply-to-bot references
- Cache seen thread channels for outbound delivery
- Ignore system messages that become empty prompts
When the main agent spawns multiple sub-agents, each completion
independently triggered a new _dispatch, causing 3-4 user-visible
responses instead of a single comprehensive report.
- Extend _drain_pending to block-wait on pending_queue when sub-agents
are still running, keeping the runner loop alive for in-order injection
- Pass pending_queue in the system message path so subsequent sub-agent
results can still be injected mid-turn via a new dispatch
Move the non-int cursor guard out of the two consumer sites and into a
shared ``_iter_valid_entries`` iterator so the invariant lives in one
place. Closes three gaps left by the original fix:
* ``bool`` is now rejected — ``isinstance(True, int)`` is ``True`` in
Python, so the previous guard silently treated ``{"cursor": true}`` as
cursor ``1``.
* Recovery now returns ``max(valid cursors) + 1``. Under adversarial
corruption "first int scanning in reverse" is not the same thing, and
only ``max`` keeps the recovered cursor strictly greater than every
legitimate cursor still on disk.
* Non-int cursors are logged exactly once per ``MemoryStore``. Silently
dropping corrupted entries hides the root cause (an external writer
to ``memory/history.jsonl``); rate-limiting keeps the log clean when
the same poisoned file is read every turn.
All 7 tests from the original fix pass unchanged; 3 new tests pin the
invariants above.
Made-with: Cursor
_next_cursor now checks isinstance(cursor, int) before arithmetic,
falling back to a reverse scan of all entries when the last entry's
cursor is corrupted. read_unprocessed_history skips entries with
non-int cursors instead of crashing on comparison.
Root cause: external callers (cron jobs, plugins) occasionally wrote
string cursors to history.jsonl, which blocked all subsequent
append_history calls with TypeError/ValueError.
Includes 7 regression tests covering string, float, null, and list
cursor types.
The retry branch is only reachable via `except Exception`, and
`CancelledError` inherits from `BaseException`, so today it naturally
bypasses the retry path and /stop still works. Add one focused
regression test so any future refactor that widens the retry catch to
`BaseException`, re-orders the handlers, or adds `CancelledError` to
`_TRANSIENT_EXC_NAMES` fails CI instead of silently swallowing /stop.
Made-with: Cursor