88 Commits

Author SHA1 Message Date
Xubin Ren
ccd6c05f71 fix: include pending summaries in consolidation estimates
Made-with: Cursor
2026-04-19 20:06:11 +08:00
Xubin Ren
54b659929e test: cover summary persistence after token consolidation
Made-with: Cursor
2026-04-19 20:06:11 +08:00
chengyongru
5818569e8f feat(wizard): auto-detect Literal fields as select menus
Literal["standard", "persistent"] fields are now rendered as select
dropdowns instead of free-text input. This makes provider_retry_mode
and any future Literal fields self-documenting in the wizard.
2026-04-18 21:56:10 +08:00
chengyongru
ebb5179cab feat(wizard): add Channel Common, API Server menus and field constraint validation
- Add [H] Channel Common menu to configure send_progress, send_tool_hints,
  send_max_retries, and transcription_provider
- Add [I] API Server menu to configure host, port, timeout
- Add real-time Pydantic field constraint validation (ge/gt/le/lt/min_length/max_length)
  with constraint hints shown in field display (e.g. "Send Max Retries (0-10)")
- Add _pause() to View Configuration Summary to prevent immediate screen clear
- Fix _format_value dict branch to handle BaseModel instances without crashing
2026-04-18 21:56:10 +08:00
chengyongru
34e8f97b1f refactor(templates): separate identity and SOUL responsibilities
Move all behavioral instructions out of identity.md into SOUL.md so that
each file has a single clear purpose:

- identity.md: capability facts only (runtime, workspace, format hints,
  tool guidance, untrusted content warning)
- SOUL.md: behavioral rules (name, personality, execution rules)

The "Act, don't narrate" rule is refined into layered behavior: act
immediately on single-step tasks, plan first for multi-step tasks. This
eliminates the contradiction where identity said "never end with a plan"
but user SOUL.md said "always plan first".
2026-04-18 21:55:56 +08:00
Xubin Ren
70a1279b86 test: pin retry-wait callback routing so internal heartbeats stay off channels
Add two focused regression tests for the retry-wait leak this PR fixes:

- tests/agent/test_runner.py::test_runner_binds_on_retry_wait_to_retry_callback_not_progress
  locks in that `AgentRunSpec.retry_wait_callback` (not `progress_callback`) is
  what `_build_request_kwargs` forwards to the provider as `on_retry_wait`.

- tests/channels/test_channel_manager_delta_coalescing.py::TestRetryWaitFiltering
  runs `_dispatch_outbound` end-to-end and asserts that `_retry_wait: True`
  messages never reach channel send.

Both tests fail on origin/main and pass with this PR's fix applied.

Made-with: Cursor
2026-04-18 13:50:05 +08:00
Xubin Ren
c8d834a504 fix(loop): document subagent-followup persistence and guard empty content
- Add inline rationale for persisting before ContextBuilder and for
  passing current_message="" on subagent follow-ups (avoids
  double-projection after merge).
- Skip persistence for empty subagent content (no-op messages should
  not pollute history).
- Add regression test covering the empty-content guard.

Made-with: Cursor
2026-04-18 13:30:22 +08:00
xzq.xu
1c939e8a5f fix(loop): persist subagent follow-up events in history 2026-04-18 13:30:22 +08:00
JunghwanNA
34fccb2ee9 Prevent self-inspection from leaking configured secrets
MyTool blocks direct access to sensitive nested paths, but its formatter
still printed scalar fields for small config objects. That let
`my(action="check", key="web_config.search")` expose `api_key` in plain
text even though the docs promise sensitive sub-fields are protected.

This keeps the change narrow: sensitive nested config fields are omitted
from MyTool's formatted output, and regression coverage locks the
behavior in.

Constraint: Must preserve existing read-only inspection behavior for non-sensitive fields
Constraint: Keep scope limited to MyTool rather than introducing broader redaction plumbing
Rejected: Rework global context/tool redaction around MyTool | broader than needed for the leak path
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If more nested config rendering is added later, filter sensitive field names at the formatter boundary as well as the path resolver
Tested: PYTHONPATH=$PWD pytest -q tests/agent/tools/test_self_tool.py /Users/jh0927/Workspace/nanobot-validation-artifacts-2026-04-18/test_my_tool_secret_leak_regression.py
Not-tested: Full repository test suite
Related: #3259
2026-04-18 00:59:08 +08:00
Cheng Yongru
aabc3d5017 fix(memory): fall back to raw_archive on LLM error response
When chat_with_retry returns an error response (finish_reason='error')
instead of raising an exception, archive() previously treated the error
message as a valid summary and wrote it to history.jsonl, while the
original session data was already cleared by /new — causing irreversible
data loss.

Fix: check finish_reason after the LLM call and raise RuntimeError on
error responses, which naturally falls through to the existing raw_archive
fallback. This preserves the original messages in history.jsonl instead
of losing them.

Fixes #3244
2026-04-17 20:15:07 +08:00
Mariano Campo
d0e65ebf70 fix(exec): pass allowed_env_keys to exec tool calls in subagents 2026-04-17 16:32:25 +08:00
chengyongru
8c0c4e5b31 refactor(agent): tighten comments, extract constant, strengthen edge case test
- Extract synthetic user message string to module-level constant
- Tighten comments in _snip_history recovery branch
- Strengthen no-user edge case test to verify safety net interaction
2026-04-17 16:20:53 +08:00
chengyongru
44b526c4ee fix(agent): preserve user message in _snip_history to prevent GLM error 1214
When _snip_history truncates the message history and the only user message
ends up outside the kept window, providers like GLM reject the resulting
system→assistant sequence with error 1214 ("messages 参数非法").

Two-layer fix:
1. _snip_history now walks backwards through non_system messages to recover
   the nearest user message when none exists in the kept window.
2. _enforce_role_alternation inserts a synthetic user message
   "(conversation continued)" when the first non-system message is a bare
   assistant (no tool_calls), serving as a safety net for any edge cases
   that slip through.

Co-authored-by: darlingbud <darlingbud@users.noreply.github.com>
2026-04-17 16:20:53 +08:00
Xubin Ren
cc5a666d5d review(dream): harden line-age annotation per review feedback
Follow-up to #3212, fully backward compatible:

- Extract the 14-day staleness threshold as `_STALE_THRESHOLD_DAYS` module
  constant and pass it into the Phase 1 prompt template as
  `{{ stale_threshold_days }}`. The number lived in three places before
  (code threshold, prompt instruction, docstring); now there is one.
- Add `DreamConfig.annotate_line_ages` (default True = current behavior)
  and propagate it through `Dream.__init__` and the gateway wiring in
  cli/commands.py. Gives users a knob to disable the feature without a
  code patch if an LLM reacts poorly to the `← Nd` suffix.
- Harden `_annotate_with_ages` against dirty working trees: when HEAD
  blob line count disagrees with the working-tree content length, skip
  annotation entirely instead of assigning ages to the wrong lines. The
  previous `i >= len(ages)` guard only handled one direction of the
  mismatch.
- Inline-comment the `max_iterations` 10→15 bump with a pointer to
  exp002 so future blame has context.
- Add 4 regression tests: end-to-end `← 30d` reaches prompt, 14/15
  threshold boundary, `annotate_line_ages=False` bypasses git entirely
  (verified via `assert_not_called`), length-mismatch defense, and
  template-var rendering.

Made-with: Cursor
2026-04-17 13:45:38 +08:00
chengyongru
35f3084c03 feat(dream): per-line age annotations + dedup-aware prompt + max_iter=15
Three improvements to Dream's memory consolidation:

1. Per-line git-blame age annotations: MEMORY.md lines get `← Nd` suffixes
   (N>14) from dulwich annotate. SOUL.md/USER.md excluded as permanent.
   LLM uses content judgment, not just age, to decide what to prune.

2. Dedup-aware Phase 1 prompt: reframed as dual-task (extract facts +
   deduplicate existing files) with explicit redundancy patterns to scan for.
   Validated through 20 experiments (exp-002 prompt + max_iter=15 was best,
   averaging -1643 chars/5.4% compression per run).

3. Phase 1 analysis as commit body: dream git commits now include the full
   Phase 1 analysis for transparency via /dream-log.

4. max_iterations raised from 10 to 15: 30% improvement over 10 with no
   risk; 20 showed diminishing returns (exp-020: -701 vs exp-017: -1643).
2026-04-17 13:45:38 +08:00
Xubin Ren
90b7d940e8 refactor(config): nest MyTool settings under tools.my (with legacy-key migration) 2026-04-16 15:58:20 +00:00
chengyongru
b51da93cbb feat(agent): add SelfTool for runtime self-inspection and configuration
Add a built-in tool that lets the agent inspect and modify its own
runtime state (model, iterations, context window, etc.).

Key features:
- inspect: view current config, usage stats, and subagent status
- modify: adjust parameters at runtime (protected by type/range validation)
- Subagent observability: inspect running subagent tasks (phase,
  iteration, tool events, errors) — subagents are no longer a black box
- Watchdog corrects out-of-bounds values on each iteration
- Enabled by default in read-only mode (self_modify: false)
- All changes are in-memory only; restart restores defaults
- Comprehensive test suite (90 tests)

Includes a self-awareness skill (always-on) with progressive disclosure:
SKILL.md for core rules, references/examples.md for detailed scenarios.
2026-04-16 23:44:26 +08:00
Xubin Ren
92a5125108
Merge PR #3141: fix(skills): use yaml.safe_load for frontmatter parsing to handle multiline descriptions
fix(skills): use yaml.safe_load for frontmatter parsing to handle multiline descriptions
2026-04-16 20:07:15 +08:00
chengyongru
d64e963258 test(memory): add regression tests for missing cursor key
Cover read_unprocessed_history skipping cursorless entries and
_next_cursor safe fallback when last entry has no cursor.
2026-04-16 12:32:38 +08:00
chengyongru
015833e34b
Merge branch 'main' into fix/skills-yaml-frontmatter 2026-04-15 16:56:23 +08:00
chengyongru
6fbada5363 refactor(context): deduplicate system prompt — markdown skills index, skip template MEMORY.md
- Convert skills summary from verbose XML (4-5 lines/skill) to compact
  markdown list (1 line/skill) with inline path for read_file lookup
- Exclude always-loaded skills (e.g. memory) from the skills index to
  avoid duplicating content already in the Active Skills section
- Skip injecting the Memory section when MEMORY.md still matches the
  bundled template (i.e. Dream hasn't populated it yet)
2026-04-15 15:49:30 +08:00
yanghan-cyber
a1b544fd23 fix(skills): use yaml.safe_load for frontmatter parsing to handle multiline descriptions
The hand-rolled line-by-line YAML parser treated each line independently,
so YAML multiline scalars (folded `>` and literal `|`) were captured as
the literal characters ">" or "|" instead of the actual text content.
2026-04-14 15:29:59 +08:00
yeyitech
65a15f39ee test(loop): cover /stop checkpoint recovery 2026-04-14 14:15:22 +08:00
yeyitech
ee061f0595 fix(web): serialize duckduckgo search calls 2026-04-14 14:10:06 +08:00
Xubin Ren
a38bc637bd fix(runner): preserve injection flag after max-iteration drain
Keep late follow-up injections observable when they are drained during max-iteration shutdown so loop-level response suppression still makes the right decision.

Made-with: Cursor
2026-04-14 00:30:30 +08:00
chengyongru
a1e1eed2f1 refactor(runner): consolidate all injection drain paths and deduplicate tests
- Migrate "after tools" inline drain to use _try_drain_injections,
  completing the refactoring (all 6 drain sites now use the helper).
- Move checkpoint emission into _try_drain_injections via optional
  iteration parameter, eliminating the leaky split between helper
  and caller for the final-response path.
- Extract _make_injection_callback() test helper to replace 7
  identical inject_cb function bodies.
- Add test_injection_cycle_cap_on_error_path to verify the cycle
  cap is enforced on error exit paths.
2026-04-14 00:30:30 +08:00
chengyongru
d849a3fa06 fix(agent): drain injection queue on error/edge-case exit paths
When the agent runner exits due to LLM error, tool error, empty response,
or max_iterations, it breaks out of the iteration loop without draining
the pending injection queue. This causes leftover messages to be
re-published as independent inbound messages, resulting in duplicate or
confusing replies to the user.

Extract the injection drain logic into a `_try_drain_injections` helper
and call it before each break in the error/edge-case paths. If injections
are found, continue the loop instead of breaking. For max_iterations
(where the loop is exhausted), drain injections to prevent re-publish
without continuing.
2026-04-14 00:30:30 +08:00
chengyongru
becaff3e9d fix(agent): skip auto-compact for sessions with active agent tasks
Prevent proactive compaction from archiving sessions that have an
in-flight agent task, avoiding mid-turn context truncation when a
task runs longer than the idle TTL.
2026-04-13 12:51:37 +08:00
Xubin Ren
6484c7c47a fix(agent): close interrupted early-persisted user turns
Track text-only user messages that were flushed before the turn loop completes, then materialize an interrupted assistant placeholder on the next request so session history stays legal and later turns do not skip their own assistant reply.

Made-with: Cursor
2026-04-13 10:26:09 +08:00
Xubin Ren
b964a894d2 test(agent): cover early user-message persistence
Use session.add_message for the pre-turn user-message flush and add focused regression tests for crash-time persistence and duplicate-free successful saves.

Made-with: Cursor
2026-04-13 10:26:09 +08:00
Xubin Ren
7a7f5c9689 fix(dream): use valid builtin skill template paths
Point Dream skill creation at a readable builtin skill-creator template, keep skill writes rooted at the workspace, and document the new skill discovery behavior in README.

Made-with: Cursor
2026-04-12 16:49:55 +08:00
Xubin Ren
09c238ca0f Merge origin/main into pr-2959
Resolve the config plumbing conflicts and keep disabled skill filtering consistent for subagent prompts after syncing with main.

Made-with: Cursor
2026-04-12 02:02:39 +00:00
layla
f25cdb7138
Merge branch 'main' into fix/tool-call-result-order-2943 2026-04-11 22:00:07 +08:00
04cb
4cd4ed8ada fix(agent): preserve tool results on fatal error to prevent orphan tool_calls (#2943) 2026-04-11 21:50:44 +08:00
Xubin Ren
cf8381f517 feat(agent): enhance message injection handling and content merging 2026-04-11 21:43:23 +08:00
Xubin Ren
f6c39ec946 feat(agent): enhance session key handling for follow-up messages 2026-04-11 21:43:23 +08:00
chengyongru
36d2a11e73 feat(agent): mid-turn message injection for responsive follow-ups (#2985)
* feat(agent): add mid-turn message injection for responsive follow-ups

Allow user messages sent during an active agent turn to be injected
into the running LLM context instead of being queued behind a
per-session lock. Inspired by Claude Code's mid-turn queue drain
mechanism (query.ts:1547-1643).

Key design decisions:
- Messages are injected as natural user messages between iterations,
  no tool cancellation or special system prompt needed
- Two drain checkpoints: after tool execution and after final LLM
  response ("last-mile" to prevent dropping late arrivals)
- Bounded by MAX_INJECTION_CYCLES (5) to prevent consuming the
  iteration budget on rapid follow-ups
- had_injections flag bypasses _sent_in_turn suppression so follow-up
  responses are always delivered

Closes #1609

* fix(agent): harden mid-turn injection with streaming fix, bounded queue, and message safety

- Fix streaming protocol violation: Checkpoint 2 now checks for injections
  BEFORE calling on_stream_end, passing resuming=True when injections found
  so streaming channels (Feishu) don't prematurely finalize the card
- Bound pending queue to maxsize=20 with QueueFull handling
- Add warning log when injection batch exceeds _MAX_INJECTIONS_PER_TURN
- Re-publish leftover queue messages to bus in _dispatch finally block to
  prevent silent message loss on early exit (max_iterations, tool_error, cancel)
- Fix PEP 8 blank line before dataclass and logger.info indentation
- Add 12 new tests covering drain, checkpoints, cycle cap, queue routing,
  cleanup, and leftover re-publish
2026-04-11 21:43:23 +08:00
Xubin Ren
322142f7ad Merge origin/main into main 2026-04-11 09:32:05 +00:00
Xubin Ren
84e840659a refactor(config): rename auto compact config key
Prefer the more user-friendly idleCompactAfterMinutes name for auto compact while keeping sessionTtlMinutes as a backward-compatible alias. Update tests and README to document the retained recent-context behavior and the new preferred key.
2026-04-11 15:56:41 +08:00
Xubin Ren
1cb28b39a3 feat(agent): retain recent context during auto compact
Keep a legal recent suffix in idle auto-compacted sessions so resumed chats preserve their freshest live context while older messages are summarized. Recover persisted summaries even when retained messages remain, and document the new behavior.
2026-04-11 15:56:41 +08:00
chengyongru
d03458f034 fix(agent): eliminate race condition in auto compact summary retrieval
Make Consolidator.archive() return the summary string directly instead
of writing to history.jsonl then reading back via get_last_history_entry().
This eliminates a race condition where concurrent _archive calls for
different sessions could read each other's summaries from the shared
history file (cross-user context leak in multi-user deployments).

Also removes Consolidator.get_last_history_entry() — no longer needed.
2026-04-11 15:56:41 +08:00
chengyongru
fb6dd111e1 feat(agent): auto compact — proactive session compression to reduce token cost and latency (#2982)
When a user is idle for longer than a configured TTL, nanobot **proactively** compresses the session context into a summary. This reduces token cost and first-token latency when the user returns — instead of re-processing a long stale context with an expired KV cache, the model receives a compact summary and fresh input.
2026-04-11 15:56:41 +08:00
Xubin Ren
2bef9cb650 fix(agent): preserve interrupted tool-call turns
Keep tool-call assistant messages valid across provider sanitization and avoid trailing user-only history after model errors. This prevents follow-up requests from sending broken tool chains back to the gateway.
2026-04-10 05:37:25 +00:00
Xubin Ren
c579d67887 fix(memory): preserve consolidation turn boundaries under chunk cap
Made-with: Cursor
2026-04-10 12:58:58 +08:00
Xubin Ren
363a0704db refactor(runner): update message processing to preserve historical context
- Adjusted message handling in AgentRunner to ensure that historical messages remain unchanged during context governance.
- Introduced tests to verify that backfill operations do not alter the saved message boundary, maintaining the integrity of the conversation history.
2026-04-10 04:46:48 +00:00
Xubin Ren
c625c0c2a7 Merge origin/main and add regression tests for streaming error delivery
- Merged latest main (no conflicts)
- Added test_llm_error_not_appended_to_session_messages: verifies error
  content stays out of session messages
- Added test_streamed_flag_not_set_on_llm_error: verifies _streamed is
  not set when LLM returns an error, so ChannelManager delivers it

Made-with: Cursor
2026-04-09 23:10:46 +08:00
yanghan-cyber
10f6c875a5 fix(agent): deliver LLM errors to streaming channels and avoid polluting session context
When the LLM returns an error (e.g. 429 quota exceeded, stream timeout),
streaming channels silently drop the error message because `_streamed=True`
is set in metadata even though no content was actually streamed.

This change:
- Skips setting `_streamed` when stop_reason is "error", so error messages
  go through the normal channel.send() path and reach the user
- Stops appending error content to session history, preventing error
  messages from polluting subsequent conversation context
- Exposes stop_reason from _run_agent_loop to enable the above check
2026-04-09 23:10:46 +08:00
chenyahui
e9c4fe6824 feat(skills): add disabled_skills config to exclude skills from loading
Introduce a disabled_skills option in the config schema that allows
users to specify a list of skill names to be excluded. The setting is
threaded from config through Nanobot -> AgentLoop -> ContextBuilder ->
SkillsLoader. Disabled skills are filtered out from list_skills,
get_always_skills, and build_skills_summary. Four new test cases cover
the filtering behavior.
2026-04-09 14:11:47 +08:00
Xubin Ren
dadf453097 Merge origin/main into fix/sanitize-messages-non-claude
Resolved conflict in azure_openai_provider.py by keeping main's
Responses API implementation (role alternation not needed for the
Responses API input format).

Made-with: Cursor
2026-04-09 04:45:45 +00:00
whs
b4c7cd654e fix: use effective session key for _active_tasks in unified mode 2026-04-09 11:09:25 +08:00