95 Commits

Author SHA1 Message Date
Xubin Ren
eb3aed359f Refine file edit progress gating 2026-05-18 01:59:55 +08:00
Xubin Ren
112f40ad67 fix(agent): refresh llm runtime for background tasks 2026-05-18 00:35:12 +08:00
Xubin Ren
c8bb04a8fe feat(webui): persist agent activity events 2026-05-17 23:51:52 +08:00
Xubin Ren
1c2ea1aad2
feat(goal): /goal command & long-running tasks (long_task)
* feat(long-task): add LongTaskTool for multi-step agent tasks

Implements a meta-ReAct loop where long-running tasks are broken into
sequential subagent steps, each starting fresh with the original goal
and progress from the previous step. This prevents context drift when
agents work on complex, multi-step tasks.

- Extract build_tool_registry() from SubagentManager for reuse
- Add run_step() for synchronous subagent execution (no bus announcement)
- Add HandoffTool and CompleteTool as signal mechanisms via shared dict
- Add LongTaskTool orchestrator with simplified prompt (8 iterations/step)
- Register LongTaskTool in main agent loop
- Add _extract_handoff_from_messages fallback for robustness

* fix(long-task): add debug logging for step-level observability

* feat(long-task): major overhaul with structured handoffs, validation, and observability

- Structured HandoffState: HandoffTool now accepts files_created,
  files_modified, next_step_hint, and verification fields instead of
  a plain string. Progress is passed between steps as structured data.

- Completion validation round: After complete() is called, a dedicated
  validator step runs to verify the claim against the original goal.
  If validation fails, the task continues rather than returning
  a false completion.

- Dynamic prompt system: 3 Jinja2 templates (step_start, step_middle,
  step_final) selected based on step number. Final steps get tighter
  budget and stronger "wrap up" guidance.

- Automatic file change tracking: Extracts write_file/edit_file events
  from tool_events and injects them into the next step's context if
  the subagent forgot to report them explicitly.

- Budget tracking & adaptive strategy: Cumulative token usage is tracked
  across steps. Per-step tool budget drops from 8 to 4 in the last
  two steps to force handoff/completion.

- Crash retry with graceful degradation: A step that crashes is retried
  once. Persistent crashes terminate the task and return partial progress.

- Full observability hooks for future WebUI integration:
  - set_hooks() with on_step_start, on_step_complete, on_handoff,
    on_validation_started, on_validation_passed, on_validation_failed,
    on_task_complete, on_task_error, and catch-all on_event.
  - Readable state properties: current_step, total_steps, status,
    last_handoff, cumulative_usage, goal.
  - inject_correction() allows external code to send user corrections
    that are injected into the next step's prompt.

- run_step() accepts optional max_iterations for dynamic budget control.

All 27 long-task tests and 11 subagent tests pass.

* test(long-task): add boundary tests and fix race conditions

- Add 7 edge-case tests: validation crash resilience, hook exception safety, mid-run correction injection, FIFO correction ordering, explicit file changes overriding auto-detection, final budget for max_steps=1, and dynamic budget switching boundaries

- Fix assertion in test_long_task_completes_after_multiple_handoffs to match exact prompt format

- Remove asyncio timing hack from test_state_exposure

- Add asyncio.sleep(0) yield in test_inject_correction_during_execution to prevent race between signal injection and step continuation

- All 34 tests passing

* fix(long-task): address code review findings

- Declare _scopes = {"core"} explicitly to prevent recursive nesting in subagent scope
- Document fragile coupling in _extract_file_changes: path extraction depends on
  write_file/edit_file detail format; add debug log for unexpected formats
- Align final-template threshold (max_steps - 2) with budget switch threshold
- Eliminate hasattr(self, "_state") in _reset_state by initializing in __init__

* fix(long-task): honor final signal and file tracking

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(long-task): improve prompt structure and agent contract

- Expand LongTaskTool.description to instruct parent agent on goal
  construction, return value semantics, and how to handle results.
- Expand CompleteTool.description to emphasize that the summary IS the
  final answer returned to the parent agent.
- Prefix validated return value with an explicit "final answer" directive
  to stop parent agent from re-running work.
- Redesign step_start.md: Step 1 is now explicitly for exploration,
  planning, and skeleton-building. complete() is discouraged.
- Remove bulky payload debug logging from _emit(); add targeted
  info/warning/error logs at key state transitions instead.
- Add signal_type to HandoffState for cleaner signal detection.

* test(long-task): expect wrapped completion message after validation

Align assertions with LongTaskTool final return shape on main.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(webui): turn timing strip, latency, and session-switch restore

- Agent loop: publish goal_status run/idle for WebSocket turns; attach
  wall-clock latency_ms on turn_end and persisted assistant metadata.
- WebSocket channel: forward goal_status and latency fields to clients.
- NanobotClient: track goal_status started_at per chat without requiring
  onChat; useNanobotStream restores run strip when returning to a chat.
- Thread UI: composer/shell viewport hooks for run duration and latency;
  format helpers and i18n strings.
- MessageBubble: drop trailing StreamCursor (layout artifact vs block markdown).
- Builtin / tests: model command coverage, websocket and loop tests.

Covers multi-session UX and round-trip timing visibility for the WebUI.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: keep message-tool file attachments after canonical history hydrate

- MessageTool records per-turn media paths delivered to the active chat.
- nanobot.utils.session_attachments stages out-of-media-root files and
  merges into the last assistant message before save (loop stays a thin call).
- WebUI MediaCell: use a signed URL as a real download link when present.

Fixes attachments flashing then vanishing on turn_end when paths lived
outside get_media_dir (e.g. workspace files).

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(webui): agent activity cluster, stable keys, LTR sheen labels

- Group reasoning and tool traces in AgentActivityCluster with i18n summaries
- Stabilize React list keys for activity clusters (first message id anchor)
- Replace background-clip shimmer with overlay sheen for streaming labels
- ThreadMessages/MessageList integration and locale strings

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(webui): render assistant reasoning with Markdown + deferred stream

- Use MarkdownText for ReasoningBubble body (same GFM/KaTeX path as replies)
- Apply muted/italic prose tokens so thinking stays visually subordinate
- useDeferredValue while reasoningStreaming to ease parser work during deltas
- Preload markdown chunk when trace opens; add regression test with preloaded renderer

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(webui): default-collapse agent activity cluster while Working

Outer fold no longer auto-expands during isTurnStreaming; user opens to see traces.
Header sheen and live summary unchanged.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(long_task): cumulative run history, file union, and prompt tuning

Inject cross-step summaries and merged file paths into middle/final step
templates so chains do not lose early context. Strip the last run-history
block when it duplicates Previous Progress to save tokens. Add optional
cumulative_prompt_max_chars and cumulative_step_body_max_chars parameters
with clamped defaults.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(webui): session switch keeps in-flight thread and replays buffered WS

Save the prior chat message list to the per-chat cache in a layout effect
when chatId changes (before stale writes could corrupt another chat).
Skip one post-switch layout cache tick so we do not snapshot the wrong tab.

Buffer inbound events per chat_id when no onChat subscriber is registered
(e.g. user focused another session) and drain on resubscribe up to a cap,
so streaming deltas are not lost while off-tab.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(webui): snap thread scroll to bottom on session open (no smooth glide)

Use scroll-behavior auto on the viewport, instant programmatic scroll when
following new messages and on scrollToBottomSignal. Keep smooth only for
the explicit scroll-to-bottom button.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(webui): respect manual scroll-up after opening a session

Track when the user leaves the bottom with a ref and skip ResizeObserver
and deferred bottom snaps until they return or the conversation is reset.
Remove the time-based force-bottom window that overrode atBottom.

Multi-frame scrollToBottom honours the same guard unless force (scroll button).

Co-authored-by: Cursor <cursoragent@cursor.com>

* Publish long_task UI snapshots on outbound metadata

- Add OUTBOUND_META_AGENT_UI (_agent_ui) for channel-agnostic structured state
- LongTaskTool publishes {kind: long_task, data: snapshot} on the bus with _progress
- WebSocket send forwards metadata as agent_ui for WebUI clients
- Tests for bus payload, WS frame, and progress assertions
- Fix loop progress tests: ignore _goal_status in streaming final filter and
  avoid brittle outbound[-1] ordering after goal status idle messages

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat: WebUI long_task activity card and resilient history merge

Add optional ui_summary to the long_task tool for one-line UI labels. Stream
long_task agent_ui into a dedicated message row with timeline, markdown peek,
and a right sheet for details. Merge canonical history after turn_end while
re-inserting long_task rows before the final assistant reply. Collapse
duplicate task_start/step_start steps in the timeline and extend i18n.

Co-authored-by: Cursor <cursoragent@cursor.com>

* refactor: align long_task with thread_goal and drop orchestrator UI

- Persist sustained objectives via session metadata (long_task / complete_goal); no subagent wiring or tool-driven agent_ui payloads.\n- Remove WebUI long-task activity UI, types, and translations; history merge preserves trace replay only, with legacy long_task rows normalized to traces.\n- Drop long_task prompt templates and get_long_task_run_dir; add webui thread disk helper for gateway persistence tests.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(agent): thread goal runtime context, tools, and skill

- Add thread_goal_state helper and mirror active objectives into Runtime Context
- Wire loop/context/memory/events as needed for goal metadata in turns
- Expand long_task / complete_goal semantics (pivot/cancel/honest recap)
- Add always-on thread-goal SKILL.md; align /goal command prompt
- Tests for context builder and thread goal state
- Remove unused webui ChatPane component

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(thread-goal): add websocket snapshot helper and publish goal updates from long_task

Introduce thread_goal_ws_blob for bounded JSON snapshots, attach snapshots to
websocket turn_end metadata in AgentLoop, and let long_task fan-out dedicated
thread_goal frames on the websocket channel after persisting session metadata.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(channels): websocket thread_goal frames, turn_end replay, and session API scrub for subagent inject

Emit thread_goal events and optional thread_goal on turn_end; scrub persisted
subagent announce blobs on GET /api/sessions/.../messages and shorten session
list previews so WebUI does not surface full Task/Summarize scaffolding.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(webui): merge ephemeral traces per user turn when reconciling canonical history

Preserve disk/live trace rows inside the matching user–assistant segment instead
of stacking every trace before the final assistant reply (fixes inflated tool
counts after refresh or session switch).

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(webui): show assistant reply copy only on the last slice before the next user turn

Avoid duplicate copy affordances on intermediate assistant bubbles that precede
more agent activity in the same turn (tools or further assistant text).

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(webui): thread_goal stream plumbing, composer goal strip, sky glow, and client-side subagent scrub projection

Track thread_goal and turn_goal snapshots in NanobotClient, hydrate React state
from thread_goal frames and turn_end, surface objective/elapsed in the composer,
add breathing sky halo CSS while goals are active, mirror server scrub logic on
history hydration and webui_thread snapshots, and extend tests/client mocks.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(channels): add Slack Socket Mode connect timeout with actionable timeout errors

Abort hung websockets.connect handshakes after a bounded wait, log REST-vs-WSS
guidance, surface RuntimeError to channel startup, and log successful WSS setup.

Co-authored-by: Cursor <cursoragent@cursor.com>

* webui: expand thread goal in composer bottom sheet

Add ChevronUp control on the run/goal strip that opens a bottom Sheet
with full ui_summary and objective. Inline preview logic in RunElapsedStrip,
add i18n strings across locales, and a composer unit test.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(webui): widen dedupeToolCallsForUi input for session API typing

fetchSessionMessages types tool_calls as unknown; accept unknown so tsc
build passes when passing message.tool_calls through.

Co-authored-by: Cursor <cursoragent@cursor.com>

* refactor(agent): extract WebSocket turn run status to webui_turn_helpers

* refactor(skills): rename thread-goal to long-task and document idempotent goals

* feat(skills): rename sustained-goal skill to long-goal and tighten long_task guidance

* chore: remove unused subagent/context/router helpers

* feat(session): rename sustained goal to goal_state and align WS/WebUI

- Move helpers from agent/thread_goal_state to session/goal_state:
  GOAL_STATE_KEY, goal_state_runtime_lines, goal_state_ws_blob, parse_goal_state.
- Session metadata now uses "goal_state"; still read legacy "thread_goal";
  long_task writes drop the legacy key after save.
- WebSocket: event/field goal_state, _goal_state_sync; turn_end carries goal_state;
  accept legacy _thread_goal_sync/thread_goal inbound metadata for dispatch.
- WebUI: GoalStateWsPayload, goalState hook/client props, i18n keys goalState*.
- Runtime Context copy uses "Goal (active):" instead of "Thread goal".

* feat(agent): stream Anthropic thinking deltas and fix stream idle timeout

* refactor(webui): transcript jsonl as sole timeline source

* fix(agent): reject mismatched WS message chat_id and stream reasoning deltas

* feat(webui): hydrate sustained goal and run timer after websocket subscribe

* chore(webui,websocket): remove unused fetch helpers and legacy thread_goal WS paths

* Raise default max_tokens and context window in agent schema.

Align AgentDefaults and ModelPresetConfig with typical Claude-scale usage
(32k completion budget, 256k context window) and update migration tests.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(gateway): bootstrap prefers in-memory model; clarify websocket naming

* fix(websocket): websocket _handle_message passes is_dm; refresh /status test expectations

---------

Co-authored-by: chengyongru <2755839590@qq.com>
Co-authored-by: chengyongru <chengyongru.ai@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-16 01:14:11 +08:00
chengyongru
fe90edd71f refactor(tools): remove GlobTool
GlobTool is redundant — GrepTool already supports glob-based file
filtering via its `glob` parameter, making a standalone glob-only
tool unnecessary. Removing it simplifies the tool surface and reduces
LLM confusion between glob and grep.
2026-05-15 17:19:00 +08:00
Xubin Ren
01fa362c03 Merge origin/main into feat/show-reasoning
Resolves conflicts after main landed the state-machine turn refactor
and the test_runner.py 9-file split:

- nanobot/agent/loop.py: take main's `_state_build`/`_persist_user_message_early`
  flow; restore the `reasoning: bool` parameter on `_build_bus_progress_callback`
  so the loop hook can mark progress as reasoning-channel without coupling to
  the answer stream.
- nanobot/cli/stream.py: keep main's configurable `bot_name`/`bot_icon` header
  while preserving the PR's `transient=True` Live + `self._console` routing
  + `_renderable()` final-render path that fixed TUI duplication.
- tests/agent/test_runner.py was deleted on main and split into 9 focused
  files; relocated all 6 reasoning tests into a new `test_runner_reasoning.py`
  matching the new layout, deduplicated the per-test `ReasoningHook` boilerplate
  through a shared `_RecordingHook` helper.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-13 05:07:14 +00:00
Xubin Ren
352aaf0627 refactor(reasoning): unify reasoning extraction across providers
Reasoning surfacing was split across three branches in runner.py plus
two separate streaming buffers (loop hook and runner progress stream),
with three independent display-side gates in the CLI. This collapsed
the policy into one source of truth and fixed two real bugs:

- Structured `reasoning_content` was suppressed whenever the answer was
  streamed, because the runner gated emission on `streamed_content`.
  Providers don't stream `reasoning_content`; it only arrives on the
  final response, so the answer stream and the reasoning channel are
  independent. Added `streamed_reasoning` to `AgentHookContext` to track
  the right bit.
- `channels.showReasoning` was subordinated to `sendProgress`. They are
  orthogonal — turning off progress streaming shouldn't silence
  reasoning. Reworked the CLI gates accordingly.

Single-helper consolidation:

- `extract_reasoning(reasoning_content, thinking_blocks, content)`
  returns `(reasoning_text, cleaned_content)` with a defined fallback
  order: dedicated field → Anthropic thinking_blocks → inline
  `<think>`/`<thought>` tags. Models that expose none of these
  short-circuit to `(None, content)` — zero overhead.
- `IncrementalThinkExtractor` replaces the ad-hoc `emit_incremental_think`
  function and its hand-rolled "emitted cursor" state in both the loop
  hook and the runner progress stream.

Also documented the new `showReasoning` channel option in
docs/configuration.md and noted its independence from sendProgress.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-12 17:14:19 +00:00
Flinn Xie
3a851f8f8d feat(reasoning): add inline think tag extraction and Anthropic thinking_blocks support
Add extract_think() and emit_incremental_think() helpers to extract thinking content from inline <think> and <thought> tags in the content field. This handles models served via Ollama, self-hosted vLLM, or other compatible endpoints that embed reasoning as inline tags instead of using the dedicated reasoning_content API field.

Also adds Anthropic thinking_blocks support for extended thinking via the thinking content blocks array.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-12 23:02:59 +08:00
chengyongru
73a8d8a875 fix(utils): remove unreachable dead code in find_legal_message_start
The for loop at line 168 never executes because start is assigned
i + 1 immediately before slicing messages[start : i + 1], which
is always an empty list. Remove the dead code.

Fixes #3716
2026-05-09 18:53:13 +08:00
Xubin Ren
3231aaf9ee fix(image): prevent duplicate delivery and replay artifacts 2026-05-09 05:45:13 +00:00
Xubin Ren
e936ed48bd feat: add image generation tool and WebUI mode
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-08 20:06:23 +08:00
Xubin Ren
790a03ec28 feat(webui): polish chat layout and titles
Align the WebUI sidebar and chat chrome with the updated design, and generate WebUI session titles asynchronously without blocking turns.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-06 22:20:35 +08:00
Tim O'Brien
99209a806d fix(tool_hints): pass max_length to abbreviate_path for is_path tools
The is_path branch in _fmt_known was not passing max_length to
abbreviate_path, so read_file, write_file, edit, list_dir, and
web_fetch always truncated paths at 40 chars regardless of config.

Now all three branches (is_path, is_command, fallback) honor the
configured toolHintMaxLength.
2026-05-06 21:18:39 +08:00
Tim O'Brien
daa4a25c9b feat(config): add toolHintMaxLength to control tool hint truncation
Add  to  config (default: 40, range: 20-500).
Controls how many characters of tool hints are shown in progress updates
(e.g. '$ cd …/project && npm test').

Set to 120+ to see full commands instead of truncated hints:

```json
{
  "agents": {
    "defaults": {
      "toolHintMaxLength": 120
    }
  }
}
```

- Thread max_length through format_tool_hints → _fmt_known/_fmt_mcp/_fmt_fallback
- Make path abbreviation in _abbreviate_command proportional to max_length
- Add TestToolHintMaxLength test class with 5 tests
- All 41 existing tests pass
2026-05-06 21:18:39 +08:00
chengyongru
05e0106592 refactor(logging): preserve tracebacks and add channel context
- Preserve tracebacks: logger.error in except blocks → logger.exception
- Channel context: BaseChannel injects self.logger = logger.bind(channel=name)
- Third-party bridge: redirect_lib_logging() replaces ad-hoc stdlib-to-loguru bridges
- Log levels: network timeouts downgraded from ERROR → WARNING
- Fix --verbose flag to actually work with loguru (set handler to DEBUG)
2026-05-06 21:17:45 +08:00
Xubin Ren
2a7433b7ec chore(runner): tighten workspace guard comments and Windows tests
Keep the workspace-boundary changes easier to review by trimming long explanatory comments down to short local notes. Also make the #3599 POSIX command regression skip on Windows and normalize workspace violation signatures to POSIX separators so the throttle tests are platform-stable.

Tests:
- uv run pytest tests/tools/test_exec_security.py tests/utils/test_workspace_violation_throttle.py -q
- uv run pytest -q

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-04 01:18:39 +08:00
Xubin Ren
b8406be215 fix(runner): soft workspace boundary + per-target throttle (#3493 #3599 #3605)
Replaces PR #3493's blanket fatal abort with a "tell the model + throttle
the bypass loop" policy.  Workspace-bound rejections are now ordinary
recoverable tool errors enriched with a structured "this is a hard policy
boundary" instruction; SSRF stays the only marker that aborts the turn.

Why the fatal-abort approach broke
----------------------------------
PR #3493 promoted every shell `_guard_command` and filesystem path-resolution
rejection to a turn-fatal RuntimeError.  Two of those messages (`path
outside working dir` and `path traversal detected`) are heuristic substring
scans on the raw command, so legitimate commands like `rm <ws>/x.txt
2>/dev/null` or `find . -type f` killed the user's turn (#3599).  On
channels with outbound dedupe (Telegram) the user just saw silence (#3605),
and the noise polluted the LLM's context until it started hallucinating
guard rejections on plain relative paths (#3597).

Why we still need *some* throttle
---------------------------------
The original #3493 pain point was real: the LLM, refused once, would
swap tools and try again -- read_file -> exec cat -> exec cp -> bash -c
-> ln -sf -> python -c open(...).  Just removing the fatal escape lets
that loop run wild until max_iterations.

What this commit does
---------------------
- `nanobot/utils/runtime.py`: add `workspace_violation_signature` and
  `repeated_workspace_violation_error`.  The signature normalizes
  filesystem `path` arguments and the first absolute path inside an
  exec command, so swapping tools against the same outside target hits
  the same throttle bucket.  Two soft attempts are allowed; the third
  attempt's tool result is replaced with a hard "stop trying to bypass"
  message that quotes the target path and tells the model to ask the
  user for help.

- `nanobot/agent/runner.py`: split classification into `_is_ssrf_violation`
  (still fatal) and `_is_workspace_violation` (now soft).  All three
  failure branches in `_run_tool` (prep_error / exception / Error
  result) route through a shared `_classify_violation` that bumps the
  per-turn workspace_violation_counts dict and either keeps the tool's
  own message or substitutes the throttle escalation.  `_execute_tools`
  now threads that dict alongside the existing external_lookup_counts.

- `nanobot/agent/tools/shell.py`: append a structured boundary note to
  every workspace-bound guard rejection (`working_dir could not be
  resolved`, `working_dir is outside`, `path outside working dir`,
  `path traversal detected`).  SSRF errors stay short and direct so the
  model doesn't try to "phrase around" them.  Existing `2>/dev/null`
  allow-list and benign device passthrough from the previous commit
  remain.

- `nanobot/agent/tools/filesystem.py`: append the same boundary note to
  the `outside allowed directory` PermissionError so read_file / write_file
  / list_dir errors give the LLM the same explicit hint.

Tests
-----
- `tests/utils/test_workspace_violation_throttle.py` (new): signature
  collapses across read_file/exec/python -c against the same path,
  different paths get independent budgets, escalation only fires after
  the third attempt.

- `tests/agent/test_runner.py`:
  - `test_runner_does_not_abort_on_workspace_violation_anymore` -- v2
    contract: filesystem PermissionError is now soft, runner moves to
    the next iteration and finalizes cleanly.
  - `test_is_ssrf_violation_remains_fatal` + the existing
    `test_runner_aborts_on_ssrf_violation` -- SSRF still aborts on the
    first attempt.
  - `test_runner_lets_llm_recover_from_shell_guard_path_outside` -- end
    to end recovery from `path outside working dir`.
  - `test_runner_throttles_repeated_workspace_bypass_attempts` -- four
    bypass attempts against the same outside target produce at least
    one `workspace_violation_escalated` event and the run completes
    naturally without aborting the turn.
  - The two `_execute_tools` direct-call tests now pass the new
    workspace_violation_counts dict.

- `tests/tools/test_tool_validation.py`: relax three `==` assertions
  to `startswith` + "hard policy boundary" substring check to match
  the new structured error messages.

- `tests/tools/test_exec_security.py` keeps the prior `2>/dev/null`
  regression and the `> /etc/issue` negative case from the previous
  commit on this branch -- they still pass under the new policy.

Coverage status: full pytest 2648 passed / 2 skipped (was 2638 / 2
on origin/main).  Ruff is clean for every file touched in this commit.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-04 01:18:39 +08:00
yorkhellen
ee364c6ac1 fix(helpers): restore tiktoken fallback in estimate_prompt_tokens_chain 2026-05-02 00:07:45 +08:00
Xubin Ren
188e6df757 fix(utils): cover complete trailing think markers
Made-with: Cursor
2026-05-01 20:09:59 +08:00
bravel
2c397ad442 fix: strip partial think tags in streaming output 2026-05-01 20:09:59 +08:00
Jack Lu
d9800ecdd2 refactor: replace try-except blocks with contextlib.suppress for cleaner error handling across multiple files 2026-05-01 19:30:11 +08:00
Xubin Ren
1fe3f0eb22 fix(restart): preserve channel metadata across /restart so reply lands in thread
cmd_restart only persisted channel + chat_id across the os.execv boundary, so
when the new process announced "Restart completed" the OutboundMessage had
no Slack thread_ts and the reply fell back to the channel root.

Serialize msg.metadata into NANOBOT_RESTART_NOTIFY_METADATA, restore it on the
RestartNotice, and forward it to OutboundMessage so the completion message
follows the same routing as the original /restart invocation.

Made-with: Cursor
2026-04-27 12:45:00 +08:00
Matt Van Horn
ee14e2df56 perf(document): lazy-import heavy document parsers
Move pypdf, python-docx, openpyxl, and python-pptx imports from module
level into the _extract_pdf / _extract_docx / _extract_xlsx /
_extract_pptx functions that actually use them. These four libraries
became core dependencies in v0.1.5.post2 (~25 MB combined) and were
paying the import cost on every nanobot startup even when no document
parsing was needed for the session.

The module-level SUPPORTED_EXTENSIONS set and the extract_text()
dispatch stay as-is; the "[error: <lib> not installed]" branches move
from the old module-level None sentinels into the corresponding
extractor's try/except ImportError block. Behavior for the error
message and for successful parses is identical.

All 20 tests in tests/test_document_parsing.py pass unchanged.

Fixes #3422
2026-04-25 02:10:30 +08:00
Xubin Ren
52855d463e refactor(agent): move progress event helpers out of loop
Made-with: Cursor
2026-04-23 20:06:11 +08:00
Xubin Ren
61a28c2c0a feat(webui): support image uploads in composer and message bubbles 2026-04-23 00:07:27 +08:00
彭星杰
46864b0911 fix: use try/finally in _extract_xlsx to prevent resource leak 2026-04-21 22:01:17 +08:00
彭星杰
a00beebd06 fix: use context manager in _extract_xlsx to prevent resource leak 2026-04-21 22:01:17 +08:00
hlg
8e7d8bef6a fix(utils): handle malformed think tags and channel markers in strip_think
Some models / Ollama renderers occasionally emit tokenizer-level template
leaks that the existing regexes miss:

  1. Malformed opening tags with no closing `>`, running straight into
     user-facing content — e.g. `<think广场照明灯目前…` (observed with
     Gemma 4 via Ollama). The earlier `<think>[\s\S]*?</think>` and
     `^\s*<think>[\s\S]*$` patterns both require `>`, so these leak into
     rendered messages.
  2. Harmony-style channel markers like `<channel|>` / `<|channel|>` at
     the start of a response.
  3. Orphan `</think>` / `</thought>` closing tags left behind when only
     the opener was consumed upstream.

Handles each case conservatively:

  - Malformed `<think` / `<thought` only match when the next char is NOT
    a tag-name continuation (`[A-Za-z0-9_\-:>/]`). Explicit ASCII class
    instead of `\w` because Python's Unicode `\w` matches CJK and would
    defeat the primary fix.
  - Orphan closing tags and channel markers are stripped **only at the
    start or end of the text**. `strip_think` is also applied before
    persisting history (memory.py), so mid-text stripping would silently
    rewrite transcripts where the tokens themselves are discussed.

Preserves: `<thinker>`, `<think-foo>`, `<think_foo>`, `<think1>`,
`<think:foo>`, `<thought/>`, literal `` `</think>` `` / `` `<channel|>` ``
inside prose or code blocks.

Adds 16 new regression tests covering both the leak cases and the
preserved-prose cases.
2026-04-20 17:04:48 +08:00
Xubin Ren
e08507f3ce fix: handle git worktrees in GitStore nested repo protection
Treat `.git` files the same as `.git` directories so GitStore refuses to initialize inside git worktrees, and add a focused regression test for that checkout shape.

Made-with: Cursor
2026-04-19 03:38:22 +08:00
longle325
fb28678b64 fix: prevent GitStore from creating nested repos and overwriting .gitignore (#2980)
GitStore.init() now checks if the workspace is already inside a git
repository before calling porcelain.init(). If so, it refuses to create
a nested repo. Additionally, existing .gitignore files are preserved
by appending only missing Dream-specific entries rather than overwriting.

Closes #2980
2026-04-19 03:38:22 +08:00
04cb
c27b4d07c4 fix(utils): recurse into PPTX groups and tables when extracting text (#3250) 2026-04-18 12:30:42 +08:00
Xubin Ren
14ee7cb121 style: revert unrelated Black-style formatting churn (#3220)
The earlier commits picked up a large amount of Black-style reformatting
(multi-line frozenset / keyword-arg wrapping / docstring blanks / removed
parens) on top of the actual guard fix. @chengyongru flagged it; the
first pass reverted some but not all.

This restores nanobot/providers/base.py, runner.py, heartbeat/service.py,
and utils/evaluator.py to origin/main and reapplies only the guard logic:

  - base.py: add should_execute_tools property
  - runner.py / heartbeat/service.py / utils/evaluator.py: route through it
    + log a warning when has_tool_calls but finish_reason is anomalous

Net diff vs main is now +87/-4 (was +211/-102) — roughly 30 lines of real
logic, which is what the PR is actually about.

Behavior unchanged from previous HEAD; full suite still 2014 passed.

Made-with: Cursor
2026-04-17 20:39:46 +08:00
Subal
322da6ca06 fix: guard tool execution against non-compliant API gateway injection 2026-04-17 20:39:46 +08:00
chengyongru
35f3084c03 feat(dream): per-line age annotations + dedup-aware prompt + max_iter=15
Three improvements to Dream's memory consolidation:

1. Per-line git-blame age annotations: MEMORY.md lines get `← Nd` suffixes
   (N>14) from dulwich annotate. SOUL.md/USER.md excluded as permanent.
   LLM uses content judgment, not just age, to decide what to prune.

2. Dedup-aware Phase 1 prompt: reframed as dual-task (extract facts +
   deduplicate existing files) with explicit redundancy patterns to scan for.
   Validated through 20 experiments (exp-002 prompt + max_iter=15 was best,
   averaging -1643 chars/5.4% compression per run).

3. Phase 1 analysis as commit body: dream git commits now include the full
   Phase 1 analysis for transparency via /dream-log.

4. max_iterations raised from 10 to 15: 30% improvement over 10 with no
   risk; 20 showed diminishing returns (exp-020: -701 vs exp-017: -1643).
2026-04-17 13:45:38 +08:00
chengyongru
e1fdca7d40 fix(status): correct context percentage calculation and sync consolidator
- Pass resolved self.context_window_tokens to Consolidator instead of
  raw parameter that could be None, preventing consolidation failures
- Calculate percentage against input budget (ctx - max_completion - 1024)
  instead of raw context window, consistent with Consolidator/snip formulas
- Pass actual max_completion_tokens from provider to build_status_content
- Cap percentage display at 999 to prevent runaway values
- Add tests for budget-based percentage and cap behavior
2026-04-16 20:30:39 +08:00
aiguozhi123456
634f4b45c1 feat: show active task count in /status output 2026-04-15 01:49:42 +08:00
Xubin Ren
c937c07178 fix: two bugs in document extraction pipeline
Bug 1: _drain_pending did not call extract_documents on follow-up
messages arriving mid-turn. Documents attached to queued messages were
silently dropped because _build_user_content only handles images.
Fix: call extract_documents before _build_user_content in _drain_pending.

Bug 2: extract_documents read the entire file into memory (up to 50 MB)
just to check 16 bytes of magic header for MIME detection.
Fix: read only the first 16 bytes via open()+read(16) instead of
Path.read_bytes().

Added regression tests for both bugs.

Made-with: Cursor
2026-04-14 13:15:04 +00:00
Xubin Ren
92d6fca323 refactor: centralize document extraction in AgentLoop._process_message
Move extract_documents() to nanobot.utils.document as a reusable helper
and call it once in AgentLoop._process_message, the single entry point
for all message processing (API + all channels).

This replaces the previous API-only _extract_documents() in server.py,
ensuring Telegram, Feishu, Slack, WeChat, and all other channels also
benefit from automatic document text extraction.

Adds a configurable max_file_size guard (default 50 MB) to skip
oversized files gracefully, preventing unbounded memory/CPU usage
from channel-downloaded attachments.

- server.py: removed _extract_documents and related imports
- document.py: added extract_documents() with size limit
- loop.py: calls extract_documents() at the top of _process_message
- Tests updated: 70 related tests pass

Made-with: Cursor
2026-04-14 13:10:03 +00:00
Xubin Ren
2502fc616b Merge origin/main into feat/api-file-upload
Keep the API file upload branch current with main, enforce the documented JSON base64 per-file limit, and avoid leaking document extraction error strings into user prompts.

Made-with: Cursor
2026-04-14 12:29:43 +00:00
04cb
e392c27f7e fix(utils): anchor unclosed think-tag regex to string start (#3004) 2026-04-11 13:46:15 +08:00
flobo3
6b7e78a8e0 fix: strip <thought> blocks from Gemma 4 and similar models 2026-04-10 12:10:23 +08:00
Alfredo Arenas
6445b3b0cf fix(helpers): repair corrupted split_message and ensure content never None
Fix accidental line corruption in split_message() where 'break' was
merged with unrelated code during manual editing.

The actual fix: build_assistant_message() now returns content or ""
instead of content (which could be None), preventing providers like
MiMo V2 Omni from rejecting tool-call messages with missing text field.

Fixes #2519
2026-04-09 11:42:53 +08:00
Alfredo Arenas
6d74c88014 fix(helpers): ensure assistant message content is never None 2026-04-09 11:42:53 +08:00
Leo fu
66409784f4 fix(status): use consistent divisor (1000) for token count display
The /status command divided context_used by 1000 but context_total by
1024, producing inconsistent values. For example a 128000-token window
displayed as 125k instead of 128k. Tokens are not a binary unit, so
both should use 1000.
2026-04-09 10:40:20 +08:00
Xubin Ren
c092896922 fix(tool-hint): handle quoted paths in exec hints
Preserve path folding for quoted exec command paths with spaces so hint previews do not fall back to mid-path truncation. Add regression coverage for quoted Unix and Windows path cases.

Made-with: Cursor
2026-04-08 23:05:52 +08:00
chengyongru
b16865722b fix(tool-hint): fold paths in exec commands and deduplicate by formatted string
1. exec tool hints previously used val[:40] blind character truncation,
   cutting paths mid-segment. Now detects file paths via regex and
   abbreviates them with abbreviate_path. Supports Windows, Unix
   absolute, and ~/ home paths.

2. Deduplication now compares fully formatted hint strings instead of
   tool names alone. Fixes ls /Desktop and ls /Downloads being
   incorrectly merged as "ls /Desktop × 2".

Co-authored-by: xzq.xu <zhiqiang.xu@nodeskai.com>
2026-04-08 23:05:52 +08:00
dengjingren
a068df5a79 feat(api): support file uploads via JSON base64 and multipart/form-data 2026-04-08 15:58:52 +08:00
Xubin Ren
edb821e10d feat(agent): prompt behavior directives, tool descriptions, and loop robustness 2026-04-08 02:22:25 +08:00
Xubin Ren
82dec12f66 refactor: extract tool hint formatting to utils/tool_hints.py
- Move _tool_hint implementation from loop.py to nanobot/utils/tool_hints.py
- Keep thin delegation in AgentLoop._tool_hint for backward compat
- Update test imports to test format_tool_hints directly

Made-with: Cursor
2026-04-07 15:15:07 +08:00
chengyongru
3e3a7654f8 fix(agent): address code review findings for tool hint enhancement
- C1: Fix IndexError on empty list arguments via _get_args() helper
- I1: Remove redundant branch in _fmt_known
- I2: Export abbreviate_path from nanobot.utils.__init__
- I3: Fix _abbreviate_url negative-budget format consistency
- S1: Move FORMATS to class-level _TOOL_HINT_FORMATS constant
- S2: Add list_dir to FORMATS registry (ls path)
- G1-G5: Add tests for empty list args, None args, URL edge cases,
  mixed folding groups, and list_dir format
2026-04-07 15:15:07 +08:00