* refactor(dream): replace two-phase Dream class with simple cron + process_direct
- Remove the heavyweight Dream class (AgentRunner-based two-phase system)
from nanobot/agent/memory.py
- Delete dream_phase1.md and dream_phase2.md templates
- New dream.md template serves as the consolidation prompt
- Cron callback uses agent.process_direct(prompt, session_key=\"dream\")
instead of agent.dream.run()
- Always performs git auto_commit after execution
- /dream command updated to use process_direct + git commit
- DreamConfig kept for backward compatibility; deprecated fields
(model_override, max_batch_size, max_iterations, annotate_line_ages)
are ignored but accepted in config
- interval_h remains configurable via agents.defaults.dream.interval_h
- Update tests and webui settings to match new architecture
* feat(loop): add ephemeral mode to process_direct, skip history writes for Dream
When ephemeral=True, _state_save skips enforce_file_cap (which calls
raw_archive -> append_history) and consolidator.maybe_consolidate_by_tokens.
This prevents Dream sessions from creating a positive feedback loop where
they process their own output. The session IS still saved to disk.
* fix(loop): skip extra hooks for ephemeral sessions (Dream)
* feat(dream): per-run timestamped sessions with rotation for WebUI
* test(config): restore DreamConfig schedule and alias tests
* fix(dream): include LLM response summary in git auto-commit message
The old two-phase Dream class included the Phase 1 analysis in the git
commit message body. The new single-phase version lost this. Restore it
by extracting resp.content from the process_direct return value and
appending it to the commit message in both the cron handler and the
/dream command.
* fix(test): accept ephemeral kwarg in test_openai_api fake_process
* refactor(dream): merge dream_session.py into MemoryStore
The standalone dream_session.py module only contained three small helpers
that all revolve around MemoryStore concerns (session keys, commit messages,
file pruning). Fold them into MemoryStore as @staticmethod to reduce
indirection and avoid a 35-line module with no independent reason to exist.
* fix(test): address code review — patch correct instance, use actual function
- Fix test_ephemeral_skips_raw_archive to patch loop.context.memory
instead of the fixture's separate MemoryStore instance
- Fix TestDreamCommitMessage to call MemoryStore.build_dream_commit_message
instead of reimplementing the logic inline
- Move Dream helpers in memory.py above the Consolidator section comment
to avoid misleading visual boundary
* fix(dream): gate cursor advancement and restrict tools
maintainer edit: Dream now processes backlog from the oldest unprocessed entries, only advances the cursor after a completed ephemeral run, and uses a restricted file-only tool registry for background consolidation.
* fix(dream): skip idle compact for dream sessions
Dream runs use internal dream:* sessions that are pruned by Dream retention. Exclude them from AutoCompact scheduling, archive execution, and summary injection so idle-session compaction cannot truncate Dream transcripts.
* fix(dream): keep batched history isolated
* feat(dream): tag archived memory for single-phase Dream
---------
Co-authored-by: Xubin Ren <52506698+Re-bin@users.noreply.github.com>
Replace AutoCompact._archive() direct session mutation with delegation
to Consolidator.compact_idle_session(). Remove _split_unconsolidated()
method since that logic now lives inside compact_idle_session.
All session mutation for idle compaction now goes through the
Consolidator's lock, eliminating the race condition between
background token consolidation and idle TTL compaction.
Changes:
- autocompact.py: rewrite _archive() to call compact_idle_session,
remove _split_unconsolidated(), clean up unused imports
- test_autocompact_unit.py: replace TestArchive/TestSplitUnconsolidated
with TestArchiveDelegates that verifies delegation behavior
- test_auto_compact.py: convert all consolidator.archive mocks to
consolidator.compact_idle_session mocks via _make_fake_compact helper
- Assert pending_user_turn is cleared from session metadata after
shortcut commands (e.g. /help) in test_auto_compact.py.
- Add test for None allow_from / allowFrom values in
test_base_channel.py to prevent TypeError regressions.
Shortcut commands (e.g. /help, /pairing) skip BUILD and SAVE states,
so their turns were never persisted to the session. This caused WebUI
chats to appear empty after _turn_end because history hydration reads
from the session file.
Fix by persisting the user message and assistant response inside
_state_command, but tag them with _command=True so Session.get_history
filters them out of LLM context. /new is excluded because it
intentionally clears the session.
- AgentLoop._persist_user_message_early now accepts **kwargs so
_state_command can pass _command=True for the user turn.
- Session.get_history skips messages with _command=True.
- Append [Archived Context Summary] to system prompt instead of injecting
it into the user message runtime context, improving KV cache reuse across
turns and avoiding consecutive same-role messages.
- _last_summary persists in metadata (no pop) for restart survival;
summary is re-injected every turn via the stable system prompt.
- Remove dynamic "Inactive for X minutes" from _format_summary — use
static last_active timestamp instead to preserve KV cache stability.
- Pass session_summary through build_messages() so both normal and
ask_user paths receive the archived summary in the system prompt.
- estimate_session_prompt_tokens now reads _last_summary from metadata
to include the summary in token budget estimation.
- Remove obsolete session_summary parameter from
maybe_consolidate_by_tokens and estimate_session_prompt_tokens
call sites in loop.py (summary flows through build_messages instead).
- Ensure /new (session.clear()) clears _last_summary from metadata.
The previous implementation popped _last_summary from session.metadata
after injecting it into the prompt, then saved the session. This caused
the summary to be permanently lost after a process restart, making the
AI forget archived context and appear to ignore memory or reference
non-existent previous messages.
Replace the destructive pop with a _last_summary_used sentinel:
- _last_summary stays in metadata for restart survival
- _last_summary_used prevents duplicate injection within the same turn
- Clear the sentinel whenever a new summary is generated
Updates tests to match the new persistence behavior.
Move sessionHistoryMaxMessages, sessionHistoryMaxTokens, and
sessionFileMaxMessages out of user-facing config into internal
constants (HISTORY_MAX_MESSAGES=120, FILE_MAX_MESSAGES=2000).
- Remove 3 fields from AgentDefaults and config pipeline
- Sink enforce_file_cap into Session (was AgentLoop)
- Auto-derive token budget from context window (was configurable)
- Net -113 lines across 7 files; 723 tests green
Made-with: Cursor
Prevent proactive compaction from archiving sessions that have an
in-flight agent task, avoiding mid-turn context truncation when a
task runs longer than the idle TTL.
Prefer the more user-friendly idleCompactAfterMinutes name for auto compact while keeping sessionTtlMinutes as a backward-compatible alias. Update tests and README to document the retained recent-context behavior and the new preferred key.
Keep a legal recent suffix in idle auto-compacted sessions so resumed chats preserve their freshest live context while older messages are summarized. Recover persisted summaries even when retained messages remain, and document the new behavior.
Make Consolidator.archive() return the summary string directly instead
of writing to history.jsonl then reading back via get_last_history_entry().
This eliminates a race condition where concurrent _archive calls for
different sessions could read each other's summaries from the shared
history file (cross-user context leak in multi-user deployments).
Also removes Consolidator.get_last_history_entry() — no longer needed.
When a user is idle for longer than a configured TTL, nanobot **proactively** compresses the session context into a summary. This reduces token cost and first-token latency when the user returns — instead of re-processing a long stale context with an expired KV cache, the model receives a compact summary and fresh input.