17 Commits

Author SHA1 Message Date
Xubin Ren
b8406be215 fix(runner): soft workspace boundary + per-target throttle (#3493 #3599 #3605)
Replaces PR #3493's blanket fatal abort with a "tell the model + throttle
the bypass loop" policy.  Workspace-bound rejections are now ordinary
recoverable tool errors enriched with a structured "this is a hard policy
boundary" instruction; SSRF stays the only marker that aborts the turn.

Why the fatal-abort approach broke
----------------------------------
PR #3493 promoted every shell `_guard_command` and filesystem path-resolution
rejection to a turn-fatal RuntimeError.  Two of those messages (`path
outside working dir` and `path traversal detected`) are heuristic substring
scans on the raw command, so legitimate commands like `rm <ws>/x.txt
2>/dev/null` or `find . -type f` killed the user's turn (#3599).  On
channels with outbound dedupe (Telegram) the user just saw silence (#3605),
and the noise polluted the LLM's context until it started hallucinating
guard rejections on plain relative paths (#3597).

Why we still need *some* throttle
---------------------------------
The original #3493 pain point was real: the LLM, refused once, would
swap tools and try again -- read_file -> exec cat -> exec cp -> bash -c
-> ln -sf -> python -c open(...).  Just removing the fatal escape lets
that loop run wild until max_iterations.

What this commit does
---------------------
- `nanobot/utils/runtime.py`: add `workspace_violation_signature` and
  `repeated_workspace_violation_error`.  The signature normalizes
  filesystem `path` arguments and the first absolute path inside an
  exec command, so swapping tools against the same outside target hits
  the same throttle bucket.  Two soft attempts are allowed; the third
  attempt's tool result is replaced with a hard "stop trying to bypass"
  message that quotes the target path and tells the model to ask the
  user for help.

- `nanobot/agent/runner.py`: split classification into `_is_ssrf_violation`
  (still fatal) and `_is_workspace_violation` (now soft).  All three
  failure branches in `_run_tool` (prep_error / exception / Error
  result) route through a shared `_classify_violation` that bumps the
  per-turn workspace_violation_counts dict and either keeps the tool's
  own message or substitutes the throttle escalation.  `_execute_tools`
  now threads that dict alongside the existing external_lookup_counts.

- `nanobot/agent/tools/shell.py`: append a structured boundary note to
  every workspace-bound guard rejection (`working_dir could not be
  resolved`, `working_dir is outside`, `path outside working dir`,
  `path traversal detected`).  SSRF errors stay short and direct so the
  model doesn't try to "phrase around" them.  Existing `2>/dev/null`
  allow-list and benign device passthrough from the previous commit
  remain.

- `nanobot/agent/tools/filesystem.py`: append the same boundary note to
  the `outside allowed directory` PermissionError so read_file / write_file
  / list_dir errors give the LLM the same explicit hint.

Tests
-----
- `tests/utils/test_workspace_violation_throttle.py` (new): signature
  collapses across read_file/exec/python -c against the same path,
  different paths get independent budgets, escalation only fires after
  the third attempt.

- `tests/agent/test_runner.py`:
  - `test_runner_does_not_abort_on_workspace_violation_anymore` -- v2
    contract: filesystem PermissionError is now soft, runner moves to
    the next iteration and finalizes cleanly.
  - `test_is_ssrf_violation_remains_fatal` + the existing
    `test_runner_aborts_on_ssrf_violation` -- SSRF still aborts on the
    first attempt.
  - `test_runner_lets_llm_recover_from_shell_guard_path_outside` -- end
    to end recovery from `path outside working dir`.
  - `test_runner_throttles_repeated_workspace_bypass_attempts` -- four
    bypass attempts against the same outside target produce at least
    one `workspace_violation_escalated` event and the run completes
    naturally without aborting the turn.
  - The two `_execute_tools` direct-call tests now pass the new
    workspace_violation_counts dict.

- `tests/tools/test_tool_validation.py`: relax three `==` assertions
  to `startswith` + "hard policy boundary" substring check to match
  the new structured error messages.

- `tests/tools/test_exec_security.py` keeps the prior `2>/dev/null`
  regression and the `> /etc/issue` negative case from the previous
  commit on this branch -- they still pass under the new policy.

Coverage status: full pytest 2648 passed / 2 skipped (was 2638 / 2
on origin/main).  Ruff is clean for every file touched in this commit.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-04 01:18:39 +08:00
Xubin Ren
2fa15ccf1b fix: improve media failure diagnostics and token fallback coverage 2026-05-02 11:37:07 +00:00
Xubin Ren
188e6df757 fix(utils): cover complete trailing think markers
Made-with: Cursor
2026-05-01 20:09:59 +08:00
bravel
2c397ad442 fix: strip partial think tags in streaming output 2026-05-01 20:09:59 +08:00
Xubin Ren
1fe3f0eb22 fix(restart): preserve channel metadata across /restart so reply lands in thread
cmd_restart only persisted channel + chat_id across the os.execv boundary, so
when the new process announced "Restart completed" the OutboundMessage had
no Slack thread_ts and the reply fell back to the channel root.

Serialize msg.metadata into NANOBOT_RESTART_NOTIFY_METADATA, restore it on the
RestartNotice, and forward it to OutboundMessage so the completion message
follows the same routing as the original /restart invocation.

Made-with: Cursor
2026-04-27 12:45:00 +08:00
Xubin Ren
61a28c2c0a feat(webui): support image uploads in composer and message bubbles 2026-04-23 00:07:27 +08:00
hlg
8e7d8bef6a fix(utils): handle malformed think tags and channel markers in strip_think
Some models / Ollama renderers occasionally emit tokenizer-level template
leaks that the existing regexes miss:

  1. Malformed opening tags with no closing `>`, running straight into
     user-facing content — e.g. `<think广场照明灯目前…` (observed with
     Gemma 4 via Ollama). The earlier `<think>[\s\S]*?</think>` and
     `^\s*<think>[\s\S]*$` patterns both require `>`, so these leak into
     rendered messages.
  2. Harmony-style channel markers like `<channel|>` / `<|channel|>` at
     the start of a response.
  3. Orphan `</think>` / `</thought>` closing tags left behind when only
     the opener was consumed upstream.

Handles each case conservatively:

  - Malformed `<think` / `<thought` only match when the next char is NOT
    a tag-name continuation (`[A-Za-z0-9_\-:>/]`). Explicit ASCII class
    instead of `\w` because Python's Unicode `\w` matches CJK and would
    defeat the primary fix.
  - Orphan closing tags and channel markers are stripped **only at the
    start or end of the text**. `strip_think` is also applied before
    persisting history (memory.py), so mid-text stripping would silently
    rewrite transcripts where the tokens themselves are discussed.

Preserves: `<thinker>`, `<think-foo>`, `<think_foo>`, `<think1>`,
`<think:foo>`, `<thought/>`, literal `` `</think>` `` / `` `<channel|>` ``
inside prose or code blocks.

Adds 16 new regression tests covering both the leak cases and the
preserved-prose cases.
2026-04-20 17:04:48 +08:00
Xubin Ren
e08507f3ce fix: handle git worktrees in GitStore nested repo protection
Treat `.git` files the same as `.git` directories so GitStore refuses to initialize inside git worktrees, and add a focused regression test for that checkout shape.

Made-with: Cursor
2026-04-19 03:38:22 +08:00
longle325
fb28678b64 fix: prevent GitStore from creating nested repos and overwriting .gitignore (#2980)
GitStore.init() now checks if the workspace is already inside a git
repository before calling porcelain.init(). If so, it refuses to create
a nested repo. Additionally, existing .gitignore files are preserved
by appending only missing Dream-specific entries rather than overwriting.

Closes #2980
2026-04-19 03:38:22 +08:00
chengyongru
35f3084c03 feat(dream): per-line age annotations + dedup-aware prompt + max_iter=15
Three improvements to Dream's memory consolidation:

1. Per-line git-blame age annotations: MEMORY.md lines get `← Nd` suffixes
   (N>14) from dulwich annotate. SOUL.md/USER.md excluded as permanent.
   LLM uses content judgment, not just age, to decide what to prune.

2. Dedup-aware Phase 1 prompt: reframed as dual-task (extract facts +
   deduplicate existing files) with explicit redundancy patterns to scan for.
   Validated through 20 experiments (exp-002 prompt + max_iter=15 was best,
   averaging -1643 chars/5.4% compression per run).

3. Phase 1 analysis as commit body: dream git commits now include the full
   Phase 1 analysis for transparency via /dream-log.

4. max_iterations raised from 10 to 15: 30% improvement over 10 with no
   risk; 20 showed diminishing returns (exp-020: -701 vs exp-017: -1643).
2026-04-17 13:45:38 +08:00
04cb
e392c27f7e fix(utils): anchor unclosed think-tag regex to string start (#3004) 2026-04-11 13:46:15 +08:00
chengyongru
e0c6e6f180 test: add regression tests for <thought> tag stripping 2026-04-10 12:10:23 +08:00
chengyongru
3e3a7654f8 fix(agent): address code review findings for tool hint enhancement
- C1: Fix IndexError on empty list arguments via _get_args() helper
- I1: Remove redundant branch in _fmt_known
- I2: Export abbreviate_path from nanobot.utils.__init__
- I3: Fix _abbreviate_url negative-budget format consistency
- S1: Move FORMATS to class-level _TOOL_HINT_FORMATS constant
- S2: Add list_dir to FORMATS registry (ls path)
- G1-G5: Add tests for empty list args, None args, URL edge cases,
  mixed folding groups, and list_dir format
2026-04-07 15:15:07 +08:00
chengyongru
f452af6c62 feat(utils): add abbreviate_path for smart path/URL truncation 2026-04-07 15:15:07 +08:00
Xubin Ren
dad9c07843 fix(tests): update Tavily usage tests to match actual API response shape
The _parse_tavily_usage implementation was updated to use the real
{account: {plan_usage, plan_limit, ...}} structure, but the tests
still used the old flat {used, limit, breakdown} format.

Made-with: Cursor
2026-04-06 19:17:55 +08:00
Xubin Ren
7ffd93f48d refactor: move search_usage to utils/searchusage, remove brave stub
- Rename agent/tools/search_usage.py → utils/searchusage.py
  (not an LLM tool, matches utils/ naming convention)
- Remove redundant _fetch_brave_usage — handled by else branch
- Move test to tests/utils/test_searchusage.py

Made-with: Cursor
2026-04-06 13:37:55 +08:00
imfondof
896d578677 fix(restart): show restart completion with elapsed time across channels 2026-04-04 02:21:42 +08:00