nanobot

mirror of https://github.com/HKUDS/nanobot.git synced 2026-05-19 16:12:30 +00:00

Author	SHA1	Message	Date
Xubin Ren	eb0ff3ad1d	fix(memory): refresh session before empty guard	2026-05-18 01:16:47 +08:00
chengyongru	888d54790d	fix(memory): add session-refresh guard to maybe_consolidate_by_tokens When background consolidation runs with a stale session reference (captured before AutoCompact replaced the session via compact_idle_session), it could operate on outdated data. Now, after acquiring the per-session lock, the method refreshes its session reference from SessionManager.get_or_create(). If the session was replaced, it swaps in the fresh reference before doing any consolidation work. This prevents a race where AutoCompact truncates an idle session while a background maybe_consolidate_by_tokens call is in flight with the old session object.	2026-05-18 01:16:47 +08:00
chengyongru	48d35bd2d9	feat(consolidator): add compact_idle_session method with lock-protected truncation Add Consolidator.compact_idle_session(session_key, max_suffix=8) that performs hard-truncation of idle sessions under the per-session consolidation lock. This is the single lock-protected path for AutoCompact to use instead of modifying session state directly, fixing the race condition between AutoCompact and Consolidator. Behavior: - Acquires per-session consolidation lock - Invalidates cache and reloads fresh from disk - Splits unconsolidated tail into archive prefix and retained suffix - Archives prefix via LLM (with raw_archive fallback on failure) - Persists _last_summary in session metadata on success - Returns summary text, None on LLM failure, or '' if nothing to archive Tests: 6 new tests covering prefix archival, empty session timestamp refresh, (nothing) summary exclusion, LLM failure fallback, last_consolidated offset, and lock acquisition verification.	2026-05-18 01:16:47 +08:00
Xubin Ren	1c2ea1aad2	feat(goal): /goal command & long-running tasks (long_task) * feat(long-task): add LongTaskTool for multi-step agent tasks Implements a meta-ReAct loop where long-running tasks are broken into sequential subagent steps, each starting fresh with the original goal and progress from the previous step. This prevents context drift when agents work on complex, multi-step tasks. - Extract build_tool_registry() from SubagentManager for reuse - Add run_step() for synchronous subagent execution (no bus announcement) - Add HandoffTool and CompleteTool as signal mechanisms via shared dict - Add LongTaskTool orchestrator with simplified prompt (8 iterations/step) - Register LongTaskTool in main agent loop - Add _extract_handoff_from_messages fallback for robustness * fix(long-task): add debug logging for step-level observability * feat(long-task): major overhaul with structured handoffs, validation, and observability - Structured HandoffState: HandoffTool now accepts files_created, files_modified, next_step_hint, and verification fields instead of a plain string. Progress is passed between steps as structured data. - Completion validation round: After complete() is called, a dedicated validator step runs to verify the claim against the original goal. If validation fails, the task continues rather than returning a false completion. - Dynamic prompt system: 3 Jinja2 templates (step_start, step_middle, step_final) selected based on step number. Final steps get tighter budget and stronger "wrap up" guidance. - Automatic file change tracking: Extracts write_file/edit_file events from tool_events and injects them into the next step's context if the subagent forgot to report them explicitly. - Budget tracking & adaptive strategy: Cumulative token usage is tracked across steps. Per-step tool budget drops from 8 to 4 in the last two steps to force handoff/completion. - Crash retry with graceful degradation: A step that crashes is retried once. Persistent crashes terminate the task and return partial progress. - Full observability hooks for future WebUI integration: - set_hooks() with on_step_start, on_step_complete, on_handoff, on_validation_started, on_validation_passed, on_validation_failed, on_task_complete, on_task_error, and catch-all on_event. - Readable state properties: current_step, total_steps, status, last_handoff, cumulative_usage, goal. - inject_correction() allows external code to send user corrections that are injected into the next step's prompt. - run_step() accepts optional max_iterations for dynamic budget control. All 27 long-task tests and 11 subagent tests pass. * test(long-task): add boundary tests and fix race conditions - Add 7 edge-case tests: validation crash resilience, hook exception safety, mid-run correction injection, FIFO correction ordering, explicit file changes overriding auto-detection, final budget for max_steps=1, and dynamic budget switching boundaries - Fix assertion in test_long_task_completes_after_multiple_handoffs to match exact prompt format - Remove asyncio timing hack from test_state_exposure - Add asyncio.sleep(0) yield in test_inject_correction_during_execution to prevent race between signal injection and step continuation - All 34 tests passing * fix(long-task): address code review findings - Declare _scopes = {"core"} explicitly to prevent recursive nesting in subagent scope - Document fragile coupling in _extract_file_changes: path extraction depends on write_file/edit_file detail format; add debug log for unexpected formats - Align final-template threshold (max_steps - 2) with budget switch threshold - Eliminate hasattr(self, "_state") in _reset_state by initializing in __init__ * fix(long-task): honor final signal and file tracking Co-authored-by: Cursor <cursoragent@cursor.com> * feat(long-task): improve prompt structure and agent contract - Expand LongTaskTool.description to instruct parent agent on goal construction, return value semantics, and how to handle results. - Expand CompleteTool.description to emphasize that the summary IS the final answer returned to the parent agent. - Prefix validated return value with an explicit "final answer" directive to stop parent agent from re-running work. - Redesign step_start.md: Step 1 is now explicitly for exploration, planning, and skeleton-building. complete() is discouraged. - Remove bulky payload debug logging from _emit(); add targeted info/warning/error logs at key state transitions instead. - Add signal_type to HandoffState for cleaner signal detection. * test(long-task): expect wrapped completion message after validation Align assertions with LongTaskTool final return shape on main. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(webui): turn timing strip, latency, and session-switch restore - Agent loop: publish goal_status run/idle for WebSocket turns; attach wall-clock latency_ms on turn_end and persisted assistant metadata. - WebSocket channel: forward goal_status and latency fields to clients. - NanobotClient: track goal_status started_at per chat without requiring onChat; useNanobotStream restores run strip when returning to a chat. - Thread UI: composer/shell viewport hooks for run duration and latency; format helpers and i18n strings. - MessageBubble: drop trailing StreamCursor (layout artifact vs block markdown). - Builtin / tests: model command coverage, websocket and loop tests. Covers multi-session UX and round-trip timing visibility for the WebUI. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: keep message-tool file attachments after canonical history hydrate - MessageTool records per-turn media paths delivered to the active chat. - nanobot.utils.session_attachments stages out-of-media-root files and merges into the last assistant message before save (loop stays a thin call). - WebUI MediaCell: use a signed URL as a real download link when present. Fixes attachments flashing then vanishing on turn_end when paths lived outside get_media_dir (e.g. workspace files). Co-authored-by: Cursor <cursoragent@cursor.com> * feat(webui): agent activity cluster, stable keys, LTR sheen labels - Group reasoning and tool traces in AgentActivityCluster with i18n summaries - Stabilize React list keys for activity clusters (first message id anchor) - Replace background-clip shimmer with overlay sheen for streaming labels - ThreadMessages/MessageList integration and locale strings Co-authored-by: Cursor <cursoragent@cursor.com> * fix(webui): render assistant reasoning with Markdown + deferred stream - Use MarkdownText for ReasoningBubble body (same GFM/KaTeX path as replies) - Apply muted/italic prose tokens so thinking stays visually subordinate - useDeferredValue while reasoningStreaming to ease parser work during deltas - Preload markdown chunk when trace opens; add regression test with preloaded renderer Co-authored-by: Cursor <cursoragent@cursor.com> * fix(webui): default-collapse agent activity cluster while Working Outer fold no longer auto-expands during isTurnStreaming; user opens to see traces. Header sheen and live summary unchanged. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(long_task): cumulative run history, file union, and prompt tuning Inject cross-step summaries and merged file paths into middle/final step templates so chains do not lose early context. Strip the last run-history block when it duplicates Previous Progress to save tokens. Add optional cumulative_prompt_max_chars and cumulative_step_body_max_chars parameters with clamped defaults. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(webui): session switch keeps in-flight thread and replays buffered WS Save the prior chat message list to the per-chat cache in a layout effect when chatId changes (before stale writes could corrupt another chat). Skip one post-switch layout cache tick so we do not snapshot the wrong tab. Buffer inbound events per chat_id when no onChat subscriber is registered (e.g. user focused another session) and drain on resubscribe up to a cap, so streaming deltas are not lost while off-tab. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(webui): snap thread scroll to bottom on session open (no smooth glide) Use scroll-behavior auto on the viewport, instant programmatic scroll when following new messages and on scrollToBottomSignal. Keep smooth only for the explicit scroll-to-bottom button. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(webui): respect manual scroll-up after opening a session Track when the user leaves the bottom with a ref and skip ResizeObserver and deferred bottom snaps until they return or the conversation is reset. Remove the time-based force-bottom window that overrode atBottom. Multi-frame scrollToBottom honours the same guard unless force (scroll button). Co-authored-by: Cursor <cursoragent@cursor.com> * Publish long_task UI snapshots on outbound metadata - Add OUTBOUND_META_AGENT_UI (_agent_ui) for channel-agnostic structured state - LongTaskTool publishes {kind: long_task, data: snapshot} on the bus with _progress - WebSocket send forwards metadata as agent_ui for WebUI clients - Tests for bus payload, WS frame, and progress assertions - Fix loop progress tests: ignore _goal_status in streaming final filter and avoid brittle outbound[-1] ordering after goal status idle messages Co-authored-by: Cursor <cursoragent@cursor.com> * feat: WebUI long_task activity card and resilient history merge Add optional ui_summary to the long_task tool for one-line UI labels. Stream long_task agent_ui into a dedicated message row with timeline, markdown peek, and a right sheet for details. Merge canonical history after turn_end while re-inserting long_task rows before the final assistant reply. Collapse duplicate task_start/step_start steps in the timeline and extend i18n. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor: align long_task with thread_goal and drop orchestrator UI - Persist sustained objectives via session metadata (long_task / complete_goal); no subagent wiring or tool-driven agent_ui payloads.\n- Remove WebUI long-task activity UI, types, and translations; history merge preserves trace replay only, with legacy long_task rows normalized to traces.\n- Drop long_task prompt templates and get_long_task_run_dir; add webui thread disk helper for gateway persistence tests. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(agent): thread goal runtime context, tools, and skill - Add thread_goal_state helper and mirror active objectives into Runtime Context - Wire loop/context/memory/events as needed for goal metadata in turns - Expand long_task / complete_goal semantics (pivot/cancel/honest recap) - Add always-on thread-goal SKILL.md; align /goal command prompt - Tests for context builder and thread goal state - Remove unused webui ChatPane component Co-authored-by: Cursor <cursoragent@cursor.com> * feat(thread-goal): add websocket snapshot helper and publish goal updates from long_task Introduce thread_goal_ws_blob for bounded JSON snapshots, attach snapshots to websocket turn_end metadata in AgentLoop, and let long_task fan-out dedicated thread_goal frames on the websocket channel after persisting session metadata. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(channels): websocket thread_goal frames, turn_end replay, and session API scrub for subagent inject Emit thread_goal events and optional thread_goal on turn_end; scrub persisted subagent announce blobs on GET /api/sessions/.../messages and shorten session list previews so WebUI does not surface full Task/Summarize scaffolding. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(webui): merge ephemeral traces per user turn when reconciling canonical history Preserve disk/live trace rows inside the matching user–assistant segment instead of stacking every trace before the final assistant reply (fixes inflated tool counts after refresh or session switch). Co-authored-by: Cursor <cursoragent@cursor.com> * feat(webui): show assistant reply copy only on the last slice before the next user turn Avoid duplicate copy affordances on intermediate assistant bubbles that precede more agent activity in the same turn (tools or further assistant text). Co-authored-by: Cursor <cursoragent@cursor.com> * feat(webui): thread_goal stream plumbing, composer goal strip, sky glow, and client-side subagent scrub projection Track thread_goal and turn_goal snapshots in NanobotClient, hydrate React state from thread_goal frames and turn_end, surface objective/elapsed in the composer, add breathing sky halo CSS while goals are active, mirror server scrub logic on history hydration and webui_thread snapshots, and extend tests/client mocks. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(channels): add Slack Socket Mode connect timeout with actionable timeout errors Abort hung websockets.connect handshakes after a bounded wait, log REST-vs-WSS guidance, surface RuntimeError to channel startup, and log successful WSS setup. Co-authored-by: Cursor <cursoragent@cursor.com> * webui: expand thread goal in composer bottom sheet Add ChevronUp control on the run/goal strip that opens a bottom Sheet with full ui_summary and objective. Inline preview logic in RunElapsedStrip, add i18n strings across locales, and a composer unit test. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(webui): widen dedupeToolCallsForUi input for session API typing fetchSessionMessages types tool_calls as unknown; accept unknown so tsc build passes when passing message.tool_calls through. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(agent): extract WebSocket turn run status to webui_turn_helpers * refactor(skills): rename thread-goal to long-task and document idempotent goals * feat(skills): rename sustained-goal skill to long-goal and tighten long_task guidance * chore: remove unused subagent/context/router helpers * feat(session): rename sustained goal to goal_state and align WS/WebUI - Move helpers from agent/thread_goal_state to session/goal_state: GOAL_STATE_KEY, goal_state_runtime_lines, goal_state_ws_blob, parse_goal_state. - Session metadata now uses "goal_state"; still read legacy "thread_goal"; long_task writes drop the legacy key after save. - WebSocket: event/field goal_state, _goal_state_sync; turn_end carries goal_state; accept legacy _thread_goal_sync/thread_goal inbound metadata for dispatch. - WebUI: GoalStateWsPayload, goalState hook/client props, i18n keys goalState. - Runtime Context copy uses "Goal (active):" instead of "Thread goal". feat(agent): stream Anthropic thinking deltas and fix stream idle timeout * refactor(webui): transcript jsonl as sole timeline source * fix(agent): reject mismatched WS message chat_id and stream reasoning deltas * feat(webui): hydrate sustained goal and run timer after websocket subscribe * chore(webui,websocket): remove unused fetch helpers and legacy thread_goal WS paths * Raise default max_tokens and context window in agent schema. Align AgentDefaults and ModelPresetConfig with typical Claude-scale usage (32k completion budget, 256k context window) and update migration tests. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(gateway): bootstrap prefers in-memory model; clarify websocket naming * fix(websocket): websocket _handle_message passes is_dm; refresh /status test expectations --------- Co-authored-by: chengyongru <2755839590@qq.com> Co-authored-by: chengyongru <chengyongru.ai@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-16 01:14:11 +08:00
chengyongru	a6e993df25	fix(agent): move archived summary into system prompt for KV cache stability - Append [Archived Context Summary] to system prompt instead of injecting it into the user message runtime context, improving KV cache reuse across turns and avoiding consecutive same-role messages. - _last_summary persists in metadata (no pop) for restart survival; summary is re-injected every turn via the stable system prompt. - Remove dynamic "Inactive for X minutes" from _format_summary — use static last_active timestamp instead to preserve KV cache stability. - Pass session_summary through build_messages() so both normal and ask_user paths receive the archived summary in the system prompt. - estimate_session_prompt_tokens now reads _last_summary from metadata to include the summary in token budget estimation. - Remove obsolete session_summary parameter from maybe_consolidate_by_tokens and estimate_session_prompt_tokens call sites in loop.py (summary flows through build_messages instead). - Ensure /new (session.clear()) clears _last_summary from metadata.	2026-05-11 01:25:15 +08:00
Xubin Ren	9252f4d826	Revert "fix(agent): persist _last_summary across restarts with used sentinel" This reverts commit e5a1416a37b423de95b0fa279e9473110a678112.	2026-05-09 15:00:54 +08:00
chengyongru	e5a1416a37	fix(agent): persist _last_summary across restarts with used sentinel The previous implementation popped _last_summary from session.metadata after injecting it into the prompt, then saved the session. This caused the summary to be permanently lost after a process restart, making the AI forget archived context and appear to ignore memory or reference non-existent previous messages. Replace the destructive pop with a _last_summary_used sentinel: - _last_summary stays in metadata for restart survival - _last_summary_used prevents duplicate injection within the same turn - Clear the sentinel whenever a new summary is generated Updates tests to match the new persistence behavior.	2026-05-09 14:58:38 +08:00
Xubin Ren	cbd5b06075	fix(memory): align replay overflow with history trimming Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-08 20:37:03 +08:00
Xubin Ren	91ade9eaac	fix(memory): consolidate history hidden by replay window Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-08 20:37:03 +08:00
Jefsky	44a341335a	fix(dream): restore cursor with memory state Track the Dream cursor in memory versioning so restores do not skip history after rolling back Dream commits. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-07 01:06:05 +08:00
Jiajun Xie	9fa90b1034	fix: only advance dream_cursor on completed batches to prevent silent loss	2026-05-05 22:22:40 +08:00
yorkhellen	c4170fa9ba	feat: Add sender_id to LLM runtime context	2026-05-01 19:43:38 +08:00
Jack Lu	d9800ecdd2	refactor: replace try-except blocks with contextlib.suppress for cleaner error handling across multiple files	2026-05-01 19:30:11 +08:00
LZDQ	58ae2d5b7e	Claude: replace module-level file read states with per-loop per-session state class. fixes #3571	2026-05-01 19:15:07 +08:00
Xubin Ren	3d7099b421	fix(memory): clean atomic write test hygiene Made-with: Cursor	2026-04-29 16:57:50 +08:00
yorkhellen	53ca2836e7	fix(memory): also fsync directory for rename durability	2026-04-29 16:57:50 +08:00
yorkhellen	2af45945e2	fix(memory): ensure atomic write for history.jsonl Use temp file + os.replace + fsync to prevent partial writes on crash. Add tests for atomic write behavior and tmp file cleanup on exception.	2026-04-29 16:57:50 +08:00
Xubin Ren	df37a36174	fix(agent): expose session timestamps in model context Include persisted turn timestamps when assembling LLM prompts so relative-date references like yesterday and today have concrete anchors. Made-with: Cursor	2026-04-26 17:42:58 +00:00
Xubin Ren	b2aec5528a	refactor(agent): move provider refresh into subsystem owners	2026-04-26 14:18:37 +00:00
Subal	80ee4483f8	feat: make consolidation ratio configurable	2026-04-26 20:24:42 +08:00
Xubin Ren	4531167c12	fix(agent): bound remaining memory/history pollution paths from #3412 #3412 stopped the headline raw_archive bloat but left four adjacent leaks on the same pollution chain: - archive() success path appended uncapped LLM summaries to history.jsonl, so a misbehaving LLM could re-open the #3412 bug from the happy path. - maybe_consolidate_by_tokens did not advance last_consolidated when archive() fell back to raw_archive, causing duplicate [RAW] dumps of the same chunk on every subsequent call. - Dream's Phase 1/2 prompt injected MEMORY.md / SOUL.md / USER.md and each history entry without caps, so any legacy oversized record (or an unbounded user edit) would blow past the context window every dream. - append_history itself had no default cap, leaving future new callers one forgotten-cap-away from the same vector. Changes: - Cap LLM-produced summaries at 8K chars (_ARCHIVE_SUMMARY_MAX_CHARS) before writing to history.jsonl. - Advance session.last_consolidated after archive() regardless of whether it summarized or raw-archived — both outcomes materialize the chunk; still break the round loop on fallback so a degraded LLM isn't hammered. - Truncate MEMORY.md / SOUL.md / USER.md and each history entry in Dream's Phase 1 prompt preview (Phase 2 still reaches full files via read_file). - Add _HISTORY_ENTRY_HARD_CAP (64K) as belt-and-suspenders default in append_history with a once-per-store warning, so any new caller that forgets its own tighter cap gets caught and observable. Layer the caps by scope: raw_archive=16K, archive summary=8K, append_history default=64K. Tight per-caller values cover expected payloads; the wide default only catches regressions. Tests: +9 regression tests covering each fix. Full suite: 2372 passed. Made-with: Cursor	2026-04-24 04:17:19 +08:00
chengyongru	2848f69897	fix(agent): prevent history.jsonl bloat from raw_archive and stuck consolidation Root cause: when consolidation LLM fails, raw_archive() dumped full message content (~1MB) into history.jsonl with no size limit. Since build_system_prompt() injects history.jsonl into every system prompt, all subsequent LLM calls exceeded the 200K context window with error 1261. Additionally, _cap_consolidation_boundary's 60-message cap caused consolidation to get stuck on sessions with long tool chains (200+ iterations), triggering the raw_archive fallback in the first place. Three-layer fix: - Remove _cap_consolidation_boundary: let pick_consolidation_boundary drive chunk sizing based solely on token budget - Truncate archive() input: use tiktoken to cap formatted text to the model's input token budget before sending to consolidation LLM - Truncate raw_archive() output: cap history.jsonl entries at 16K chars	2026-04-24 03:57:59 +08:00
Xubin Ren	c1957e14ff	refactor(memory): centralize cursor validation behind a single gate Move the non-int cursor guard out of the two consumer sites and into a shared ``_iter_valid_entries`` iterator so the invariant lives in one place. Closes three gaps left by the original fix: * ``bool`` is now rejected — ``isinstance(True, int)`` is ``True`` in Python, so the previous guard silently treated ``{"cursor": true}`` as cursor ``1``. * Recovery now returns ``max(valid cursors) + 1``. Under adversarial corruption "first int scanning in reverse" is not the same thing, and only ``max`` keeps the recovered cursor strictly greater than every legitimate cursor still on disk. * Non-int cursors are logged exactly once per ``MemoryStore``. Silently dropping corrupted entries hides the root cause (an external writer to ``memory/history.jsonl``); rate-limiting keeps the log clean when the same poisoned file is read every turn. All 7 tests from the original fix pass unchanged; 3 new tests pin the invariants above. Made-with: Cursor	2026-04-21 14:02:53 +08:00
Muata Kamdibe	c0a11c7cf4	fix(memory): harden cursor recovery against non-integer corruption _next_cursor now checks isinstance(cursor, int) before arithmetic, falling back to a reverse scan of all entries when the last entry's cursor is corrupted. read_unprocessed_history skips entries with non-int cursors instead of crashing on comparison. Root cause: external callers (cron jobs, plugins) occasionally wrote string cursors to history.jsonl, which blocked all subsequent append_history calls with TypeError/ValueError. Includes 7 regression tests covering string, float, null, and list cursor types.	2026-04-21 14:02:53 +08:00
hlg	899a9073ce	fix(memory): do not fall back to raw entry when strip_think empties it `append_history` previously used `strip_think(entry) or entry.rstrip()` as a safety net, so if the entire entry was a template-token leak (e.g. `<think>reasoning</think>` or `<channel\|>` alone), the raw leaked text was still persisted to history — later re-introducing the very content `strip_think` was meant to scrub, via consolidation / replay. Persist the cleaned content directly. When cleanup empties a non-empty entry, log at debug and store an empty-content record (cursor continuity preserved). Adds 3 regression tests in test_memory_store.py covering: - Well-formed thinking blocks are stripped before persistence. - Pure-leak entries persist as empty, not as raw text. - Malformed prefix leaks (`<channel\|>`) also persist as empty.	2026-04-20 17:04:48 +08:00
Xubin Ren	ccd6c05f71	fix: include pending summaries in consolidation estimates Made-with: Cursor	2026-04-19 20:06:11 +08:00
Jiajun Xie	d95bc9c9c4	fix: unify summary injection strategy between consolidation paths - Track last_summary in maybe_consolidate_by_tokens() to persist the summary - Change return to break in the consolidation loop to allow summary persistence - Save summary to session.metadata['_last_summary'] for consistency with AutoCompact._archive() - Ensures compressed content remains visible to the model via prepare_session() injection Fixes #3274	2026-04-19 20:06:11 +08:00
Cheng Yongru	aabc3d5017	fix(memory): fall back to raw_archive on LLM error response When chat_with_retry returns an error response (finish_reason='error') instead of raising an exception, archive() previously treated the error message as a valid summary and wrote it to history.jsonl, while the original session data was already cleared by /new — causing irreversible data loss. Fix: check finish_reason after the LLM call and raise RuntimeError on error responses, which naturally falls through to the existing raw_archive fallback. This preserves the original messages in history.jsonl instead of losing them. Fixes #3244	2026-04-17 20:15:07 +08:00
Xubin Ren	cc5a666d5d	review(dream): harden line-age annotation per review feedback Follow-up to #3212, fully backward compatible: - Extract the 14-day staleness threshold as `_STALE_THRESHOLD_DAYS` module constant and pass it into the Phase 1 prompt template as `{{ stale_threshold_days }}`. The number lived in three places before (code threshold, prompt instruction, docstring); now there is one. - Add `DreamConfig.annotate_line_ages` (default True = current behavior) and propagate it through `Dream.__init__` and the gateway wiring in cli/commands.py. Gives users a knob to disable the feature without a code patch if an LLM reacts poorly to the `← Nd` suffix. - Harden `_annotate_with_ages` against dirty working trees: when HEAD blob line count disagrees with the working-tree content length, skip annotation entirely instead of assigning ages to the wrong lines. The previous `i >= len(ages)` guard only handled one direction of the mismatch. - Inline-comment the `max_iterations` 10→15 bump with a pointer to exp002 so future blame has context. - Add 4 regression tests: end-to-end `← 30d` reaches prompt, 14/15 threshold boundary, `annotate_line_ages=False` bypasses git entirely (verified via `assert_not_called`), length-mismatch defense, and template-var rendering. Made-with: Cursor	2026-04-17 13:45:38 +08:00
chengyongru	35f3084c03	feat(dream): per-line age annotations + dedup-aware prompt + max_iter=15 Three improvements to Dream's memory consolidation: 1. Per-line git-blame age annotations: MEMORY.md lines get `← Nd` suffixes (N>14) from dulwich annotate. SOUL.md/USER.md excluded as permanent. LLM uses content judgment, not just age, to decide what to prune. 2. Dedup-aware Phase 1 prompt: reframed as dual-task (extract facts + deduplicate existing files) with explicit redundancy patterns to scan for. Validated through 20 experiments (exp-002 prompt + max_iter=15 was best, averaging -1643 chars/5.4% compression per run). 3. Phase 1 analysis as commit body: dream git commits now include the full Phase 1 analysis for transparency via /dream-log. 4. max_iterations raised from 10 to 15: 30% improvement over 10 with no risk; 20 showed diminishing returns (exp-020: -701 vs exp-017: -1643).	2026-04-17 13:45:38 +08:00
chengyongru	d64e963258	test(memory): add regression tests for missing cursor key Cover read_unprocessed_history skipping cursorless entries and _next_cursor safe fallback when last entry has no cursor.	2026-04-16 12:32:38 +08:00
chengyongru	524c097f76	refactor(memory): simplify read_unprocessed_history cursor guard Replace verbose loop with one-liner list comprehension using e.get("cursor", 0) to handle missing cursor keys.	2026-04-16 12:32:38 +08:00
Jiajun Xie	f4a7ad16aa	fix(memory): handle missing cursor key in history entries - Use .get('cursor') instead of direct dict access to prevent KeyError - Skip entries without cursor and log a warning - Fix _next_cursor fallback to safely check for cursor existence Fixes #3190	2026-04-16 12:32:38 +08:00
Xubin Ren	7a7f5c9689	fix(dream): use valid builtin skill template paths Point Dream skill creation at a readable builtin skill-creator template, keep skill writes rooted at the workspace, and document the new skill discovery behavior in README. Made-with: Cursor	2026-04-12 16:49:55 +08:00
chengyongru	2a243bfe4f	feat(agent): integrate skill discovery into Dream consolidation Instead of a separate skill discovery system, extend Dream's two-phase pipeline to also detect reusable behavioral patterns from conversation history and generate SKILL.md files. Phase 1 gains a [SKILL] output type for pattern detection. Phase 2 gains write_file (scoped to skills/) and read access to builtin skills, enabling it to check for duplicates and follow skill-creator's format conventions before creating new skills. Inspired by PR #3039 by @wanghesong2019. Co-authored-by: wanghesong2019 <wanghesong2019@users.noreply.github.com>	2026-04-12 16:49:55 +08:00
chengyongru	d03458f034	fix(agent): eliminate race condition in auto compact summary retrieval Make Consolidator.archive() return the summary string directly instead of writing to history.jsonl then reading back via get_last_history_entry(). This eliminates a race condition where concurrent _archive calls for different sessions could read each other's summaries from the shared history file (cross-user context leak in multi-user deployments). Also removes Consolidator.get_last_history_entry() — no longer needed.	2026-04-11 15:56:41 +08:00
chengyongru	69d60e2b06	fix(agent): handle UnicodeDecodeError in _read_last_entry history.jsonl may contain non-UTF-8 bytes (e.g. from email channel binary content), causing auto compact to fail when reading the last entry for summary generation. Catch UnicodeDecodeError alongside FileNotFoundError and JSONDecodeError.	2026-04-11 15:56:41 +08:00
chengyongru	fb6dd111e1	feat(agent): auto compact — proactive session compression to reduce token cost and latency (#2982 ) When a user is idle for longer than a configured TTL, nanobot proactively compresses the session context into a summary. This reduces token cost and first-token latency when the user returns — instead of re-processing a long stale context with an expired KV cache, the model receives a compact summary and fresh input.	2026-04-11 15:56:41 +08:00
Xubin Ren	c579d67887	fix(memory): preserve consolidation turn boundaries under chunk cap Made-with: Cursor	2026-04-10 12:58:58 +08:00
comadreja	bfe53ebb10	fix(memory): harden consolidation with try/except on token estimation and chunk size cap - Wrap both token estimation calls in try/except to prevent silent failures from crashing the consolidation cycle - Add _MAX_CHUNK_MESSAGES = 60 to cap messages per consolidation round, avoiding oversized chunks being sent to the consolidation LLM - Improve idle log to include unconsolidated message count for easier debugging These are purely defensive improvements with no behaviour change for normal sessions.	2026-04-10 12:58:58 +08:00
chengyongru	b4f985f3dc	feat(memory):dream enhancement (#2887 ) * feat(dream): enhance memory cleanup with staleness detection - Phase 1: add [FILE-REMOVE] directive and staleness patterns (14-day threshold, completed tasks, superseded info, resolved tracking) - Phase 2: add explicit cleanup rules, file paths section, and deletion guidance to prevent LLM path confusion - Inject current date and file sizes into Phase 1 context for age-aware analysis - Add _dream_debug() helper for observability (dream-debug.log in workspace) - Log Phase 1 analysis output and Phase 2 tool events for debugging Tested with glm-5-turbo: MEMORY.md reduced from 149 to 108-129 lines across two rounds, correctly identifying and removing weather data, detailed incident info, completed research, and stale discussions. * refactor(dream): replace _dream_debug file logger with loguru Remove the custom _dream_debug() helper that wrote to dream-debug.log and use the existing loguru logger instead. Phase 1 analysis is logged at debug level, tool events at info level — consistent with the rest of the codebase and no extra log file to manage. * fix(dream): make stale scan independent of conversation history Reframe Phase 1 from a single comparison task to two independent tasks: history diff AND proactive stale scan. The LLM was skipping stale content that wasn't referenced in conversation history (e.g. old triage snapshots). Now explicitly requires scanning memory files for staleness patterns on every run. * fix(dream): correct old_text param name and truncate debug log - Phase 2 prompt: old_string -> old_text to match EditFileTool interface - Phase 1 debug log: truncate analysis to 500 chars to avoid oversized lines * refactor(dream): streamline prompts by separating concerns Phase 1 owns all staleness judgment logic; Phase 2 is pure execution guidance. Remove duplicated cleanup rules from Phase 2 since Phase 1 already determines what to add/remove. Fix remaining old_string -> old_text. Total prompt size reduced ~45% (870 -> 480 tokens). * fix(dream): add FILE-REMOVE execution guidance to Phase 2 prompt Phase 2 was only processing [FILE] additions and ignoring [FILE-REMOVE] deletions after the cleanup rules were removed. Add explicit mapping: [FILE] → add content, [FILE-REMOVE] → delete content.	2026-04-07 22:39:47 +08:00
chengyongru	401d1f57fa	fix(dream): allow LLM to retry on tool errors instead of failing immediately Dream Phase 2 uses fail_on_tool_error=True, which terminates the entire run on the first tool error (e.g. old_text not found in edit_file). Normal agent runs default to False so the LLM can self-correct and retry. Dream should behave the same way.	2026-04-05 22:10:34 +08:00
Xubin Ren	c3b4ebae53	refactor(agent): move internal prompts into packaged templates	2026-04-04 11:09:37 +00:00
Xubin Ren	04419326ad	fix(memory): migrate legacy HISTORY.md even when history.jsonl is empty	2026-04-04 10:11:53 +00:00
Xubin Ren	0a3a60a7a4	refactor(memory): simplify Dream config naming and rename gitstore module	2026-04-04 10:01:45 +00:00
Xubin Ren	6e896249c8	feat(memory): harden legacy history migration and Dream UX	2026-04-04 08:41:46 +00:00
Jack Lu	d436a1d678	feat: integrate Jinja2 templating for agent responses and memory consolidation - Added Jinja2 template support for various agent responses, including identity, skills, and memory consolidation. - Introduced new templates for evaluating notifications, handling subagent announcements, and managing platform policies. - Updated the agent context and memory modules to utilize the new templating system for improved readability and maintainability. - Added a new dependency on Jinja2 in pyproject.toml.	2026-04-04 14:18:22 +08:00
Xubin Ren	7e0c196797	fix(memory): repair Dream follow-up paths and move GitStore to utils Made-with: Cursor	2026-04-04 04:49:42 +00:00
chengyongru	f824a629a8	feat(memory): add git-backed version control for dream memory files - Add GitStore class wrapping dulwich for memory file versioning - Auto-commit memory changes during Dream consolidation - Add /dream-log and /dream-restore commands for history browsing - Pass tracked_files as constructor param, generate .gitignore dynamically	2026-04-03 00:32:54 +08:00
chengyongru	a9e01bf838	fix(memory): extract successful solutions in consolidate prompt Add "Solutions" category to consolidate prompt so trial-and-error workflows that reach a working approach are captured in history for Dream to persist. Remove overly broad "debug steps" skip rule that discarded these valuable findings.	2026-04-02 23:02:42 +08:00

1 2

74 Commits