`append_history` previously used `strip_think(entry) or entry.rstrip()`
as a safety net, so if the entire entry was a template-token leak (e.g.
`<think>reasoning</think>` or `<channel|>` alone), the raw leaked text
was still persisted to history — later re-introducing the very content
`strip_think` was meant to scrub, via consolidation / replay.
Persist the cleaned content directly. When cleanup empties a non-empty
entry, log at debug and store an empty-content record (cursor continuity
preserved). Adds 3 regression tests in test_memory_store.py covering:
- Well-formed thinking blocks are stripped before persistence.
- Pure-leak entries persist as empty, not as raw text.
- Malformed prefix leaks (`<channel|>`) also persist as empty.
Some models / Ollama renderers occasionally emit tokenizer-level template
leaks that the existing regexes miss:
1. Malformed opening tags with no closing `>`, running straight into
user-facing content — e.g. `<think广场照明灯目前…` (observed with
Gemma 4 via Ollama). The earlier `<think>[\s\S]*?</think>` and
`^\s*<think>[\s\S]*$` patterns both require `>`, so these leak into
rendered messages.
2. Harmony-style channel markers like `<channel|>` / `<|channel|>` at
the start of a response.
3. Orphan `</think>` / `</thought>` closing tags left behind when only
the opener was consumed upstream.
Handles each case conservatively:
- Malformed `<think` / `<thought` only match when the next char is NOT
a tag-name continuation (`[A-Za-z0-9_\-:>/]`). Explicit ASCII class
instead of `\w` because Python's Unicode `\w` matches CJK and would
defeat the primary fix.
- Orphan closing tags and channel markers are stripped **only at the
start or end of the text**. `strip_think` is also applied before
persisting history (memory.py), so mid-text stripping would silently
rewrite transcripts where the tokens themselves are discussed.
Preserves: `<thinker>`, `<think-foo>`, `<think_foo>`, `<think1>`,
`<think:foo>`, `<thought/>`, literal `` `</think>` `` / `` `<channel|>` ``
inside prose or code blocks.
Adds 16 new regression tests covering both the leak cases and the
preserved-prose cases.
- Fix critical plain-text fallback that was sending raw HTML tags to
users: keep raw markdown available for the fallback path
- Extract TELEGRAM_HTML_MAX_LEN (4096) constant to replace hardcoded
magic number and document the difference from TELEGRAM_MAX_MESSAGE_LEN
- Add fallback to _send_text for extra HTML chunks when HTML parse fails
- Add missing @pytest.mark.asyncio decorator on
test_send_delta_stream_end_html_expansion_does_not_overflow
Cherry-picked from #3311 (stutiredboy). Streaming edits called
edit_message_text(text=buf.text) without chunking, so once accumulated
deltas crossed Telegram's 4096-char limit an ongoing stream would fail
with BadRequest.
Extracts _flush_stream_overflow helper that edits the first chunk in
place, sends any middle chunks, and re-anchors the buffer to a new
message for the tail so subsequent deltas keep streaming.
Co-Authored-By: stutiredboy <stutiredboy@users.noreply.github.com>
Cherry-picked from #3316 (himax12). When streaming completes in send_delta(),
the code was splitting raw markdown text by 4000, then converting to HTML.
The markdown-to-HTML conversion adds 10-33% characters, which could push
the result over Telegram's 4096 character limit.
The fix converts markdown to HTML first, then splits by 4096 (actual Telegram
limit), ensuring the edited message always fits.
Fixes#3315
The previous fix hardcoded session_key_override as channel:chat_id which
broke unified session mode where pending queues use "unified:default".
Propagate the effective key from _set_tool_context through SpawnTool
into the origin dict so _announce_result routes to the correct pending
queue in both normal and unified session modes.
Cron jobs now pass on_progress=_silent to process_direct, matching
the heartbeat pattern. Previously, tool hints and streaming deltas
were published to the user channel via bus during execution, but the
final response could be rejected by evaluate_response — leaving users
with confusing partial output and no conclusion.
Closes#3319
SessionManager.save() previously used bare open("w") which could
truncate the JSONL file if the process crashed mid-write. Now writes
to a .tmp file and atomically replaces via os.replace(), matching the
pattern already used in qq.py.
_load() now attempts _repair() before returning None, recovering
valid lines from partially-written files. 12 new tests cover atomic
save correctness, temp-file cleanup on failure, and repair of
truncated/corrupt JSONL.
cowork-with:opencode(glm-5.1)
The old test `test_on_message_ignores_bot_messages` asserted the
previous (incorrect) contract that ALL bot-authored messages are
dropped. With #3217 only self-loops are dropped, so this test was
replaced with three more precise tests:
- test_on_message_ignores_self_messages: verifies self-loop guard
(author_id == _bot_user_id is dropped)
- test_on_message_accepts_messages_from_other_bots: new test for
the fix itself — other bots' messages flow through
- test_on_message_stops_typing_on_handle_exception: preserves the
typing cleanup assertion from the original test
Net result: +1 behavior tested, same behaviors retained.
Co-authored with Claude Opus 4.7
PR #3125 added a top-level `oneOf` branch to `_CRON_PARAMETERS` to
advertise per-action required fields. OpenAI Codex/Responses rejects
`oneOf`/`anyOf`/`allOf`/`enum`/`not` at the root of function
parameters, so any agent that registers the cron tool now fails to
start with:
HTTP 400: Invalid schema for function 'cron': schema must have
type 'object' and not have 'oneOf'/'anyOf'/'allOf'/'enum'/'not'
at the top level.
Remove the top-level `oneOf`. The original intent of #3125 (stop LLMs
from looping on the #3113 contract mismatch) is preserved by:
- `validate_params` — runtime-enforces `message` for `action='add'`
and `job_id` for `action='remove'`
- field descriptions — each schema field already flags
"REQUIRED when action='...'" so the LLM sees the contract
The regression test is updated to lock the invariant in the other
direction: the top-level schema must not contain
`oneOf`/`anyOf`/`allOf`/`not`, and the REQUIRED hints must stay on
`message` and `job_id`.
Verified:
- tests/cron/ 70 passed
- tests/agent/test_loop_cron_timezone.py + tests/providers/ 232 passed
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Replace fixed sleep-based waits with condition polling in cron tests and mock the restart delay in CLI restart tests to reduce suite runtime without changing behavior.
When the Responses API fails repeatedly (3 consecutive compatibility
errors), skip it and fall back directly to Chat Completions. Unlike a
permanent disable, the circuit re-probes after 5 minutes so recovery
is automatic when the API comes back. Success resets the counter.
Keyed per (model, reasoning_effort) so a failure with one model does
not affect others.
The old test `test_make_console_uses_force_terminal` hardcoded
`force_terminal is True`, which contradicts the fix: we now defer
to sys.stdout.isatty() so piped / non-TTY output gets plain text
instead of ANSI escape codes.
Split into two tests covering both branches:
- test_make_console_force_terminal_when_stdout_is_tty: TTY path
(force_terminal=True, rich output)
- test_make_console_force_terminal_false_when_stdout_is_not_tty:
non-TTY path (force_terminal=False, plain text) — regression
guard for the bug reported in #3265
Co-authored with Claude Opus 4.7
The global guard changed baseline agent and subagent behavior without
proving a real no-progress loop. Keep this PR focused on the cron
contract hardening and validation fixes.
Made-with: Cursor
Treat `.git` files the same as `.git` directories so GitStore refuses to initialize inside git worktrees, and add a focused regression test for that checkout shape.
Made-with: Cursor
GitStore.init() now checks if the workspace is already inside a git
repository before calling porcelain.init(). If so, it refuses to create
a nested repo. Additionally, existing .gitignore files are preserved
by appending only missing Dream-specific entries rather than overwriting.
Closes#2980
Literal["standard", "persistent"] fields are now rendered as select
dropdowns instead of free-text input. This makes provider_retry_mode
and any future Literal fields self-documenting in the wizard.
- Add [H] Channel Common menu to configure send_progress, send_tool_hints,
send_max_retries, and transcription_provider
- Add [I] API Server menu to configure host, port, timeout
- Add real-time Pydantic field constraint validation (ge/gt/le/lt/min_length/max_length)
with constraint hints shown in field display (e.g. "Send Max Retries (0-10)")
- Add _pause() to View Configuration Summary to prevent immediate screen clear
- Fix _format_value dict branch to handle BaseModel instances without crashing
Move all behavioral instructions out of identity.md into SOUL.md so that
each file has a single clear purpose:
- identity.md: capability facts only (runtime, workspace, format hints,
tool guidance, untrusted content warning)
- SOUL.md: behavioral rules (name, personality, execution rules)
The "Act, don't narrate" rule is refined into layered behavior: act
immediately on single-step tasks, plan first for multi-step tasks. This
eliminates the contradiction where identity said "never end with a plan"
but user SOUL.md said "always plan first".
Add two focused regression tests for the retry-wait leak this PR fixes:
- tests/agent/test_runner.py::test_runner_binds_on_retry_wait_to_retry_callback_not_progress
locks in that `AgentRunSpec.retry_wait_callback` (not `progress_callback`) is
what `_build_request_kwargs` forwards to the provider as `on_retry_wait`.
- tests/channels/test_channel_manager_delta_coalescing.py::TestRetryWaitFiltering
runs `_dispatch_outbound` end-to-end and asserts that `_retry_wait: True`
messages never reach channel send.
Both tests fail on origin/main and pass with this PR's fix applied.
Made-with: Cursor
- Add inline rationale for persisting before ContextBuilder and for
passing current_message="" on subagent follow-ups (avoids
double-projection after merge).
- Skip persistence for empty subagent content (no-op messages should
not pollute history).
- Add regression test covering the empty-content guard.
Made-with: Cursor
MyTool blocks direct access to sensitive nested paths, but its formatter
still printed scalar fields for small config objects. That let
`my(action="check", key="web_config.search")` expose `api_key` in plain
text even though the docs promise sensitive sub-fields are protected.
This keeps the change narrow: sensitive nested config fields are omitted
from MyTool's formatted output, and regression coverage locks the
behavior in.
Constraint: Must preserve existing read-only inspection behavior for non-sensitive fields
Constraint: Keep scope limited to MyTool rather than introducing broader redaction plumbing
Rejected: Rework global context/tool redaction around MyTool | broader than needed for the leak path
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If more nested config rendering is added later, filter sensitive field names at the formatter boundary as well as the path resolver
Tested: PYTHONPATH=$PWD pytest -q tests/agent/tools/test_self_tool.py /Users/jh0927/Workspace/nanobot-validation-artifacts-2026-04-18/test_my_tool_secret_leak_regression.py
Not-tested: Full repository test suite
Related: #3259
The streaming API currently logs backend exceptions but still emits the
same `finish_reason: "stop"` + `[DONE]` terminator used for successful
responses. That makes a failed streamed request look successful to
OpenAI-compatible clients.
This keeps the fix narrow: track whether the stream backend failed and
suppress the success terminator in that case. A regression test locks in
the expected behavior.
Constraint: Keep the non-streaming response path untouched
Constraint: Follow up on the known limitation called out during PR #3222 review without redesigning the SSE protocol
Rejected: Introduce a custom SSE error event shape in the same patch | expands API surface and review scope
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: If explicit streamed error events are added later, keep them distinct from the success stop+[DONE] terminator to preserve client retry semantics
Tested: PYTHONPATH=$PWD pytest -q tests/test_api_stream.py /Users/jh0927/Workspace/nanobot-validation-artifacts-2026-04-18/test_api_stream_error_regression.py
Not-tested: Full repository test suite
Related: #3260
Related: #3222
The previous patch promoted `message` into top-level `required`, which solved
the `add` loop but broke `list` and `remove`: `ToolRegistry.prepare_call`
enforces `required` via `validate_params`, so `cron(action="list")` and
`cron(action="remove", job_id=...)` — both documented in `SKILL.md` — started
failing schema validation with the same "missing required message" shape that
#3113 describes for `add`.
Instead:
- Keep `required=["action"]` so `list`/`remove` stay callable.
- Prefix `message`'s description with `REQUIRED when action='add'.` and
`job_id`'s with `REQUIRED when action='remove'.` so LLMs see the real
per-action contract up front.
- Keep the improved runtime error message from the previous commit for the
case an LLM still omits `message` on `add`.
Also add `tests/cron/test_cron_tool_schema_contract.py` to lock in:
- `list` and `remove` pass schema validation with no `message`
- `add` with `message` passes
- `add` without `message` surfaces the actionable runtime error
- field descriptions carry the REQUIRED hints
- top-level `required` stays `["action"]`
Existing `tests/cron/test_cron_tool_list.py` cases bypass schema validation by
calling `_list_jobs()` / `_remove_job()` directly, which is why CI didn't catch
the regression; the new test goes through `ToolRegistry.prepare_call`.
Two small follow-ups to the guard:
1. Fix the should_execute_tools docstring so it matches the actual code.
The previous version said "Only execute when finish_reason explicitly
signals tool intent" but the code also accepts finish_reason == "stop".
Explain why (some compliant providers emit "stop" with legitimate tool
calls — openai_compat_provider.py already mirrors this at lines ~633 /
~678 where ("tool_calls", "stop") are both treated as the terminal
tool-call state). Without this, a strict "tool_calls"-only guard would
regress 15 existing runner tests that construct LLMResponse with
tool_calls but no explicit finish_reason (default = "stop").
2. Add tests/providers/test_llm_response.py. This locks the three cases:
- no tool calls -> never executes
- tool calls + "tool_calls"/stop -> executes
- tool calls + refusal / content_filter / error / length / ... -> blocked
These are exactly the boundary cases the #3220 fix is about; without a
test here a future refactor could silently revert the guard.
Body + tests only, no behavior change beyond the existing PR's intent.
Made-with: Cursor
When chat_with_retry returns an error response (finish_reason='error')
instead of raising an exception, archive() previously treated the error
message as a valid summary and wrote it to history.jsonl, while the
original session data was already cleared by /new — causing irreversible
data loss.
Fix: check finish_reason after the LLM call and raise RuntimeError on
error responses, which naturally falls through to the existing raw_archive
fallback. This preserves the original messages in history.jsonl instead
of losing them.
Fixes#3244
The original regression only exercised a from_address match with all three
identity fields set to the same value, so it couldn't distinguish whether
_self_addresses actually picks up smtp_username and imap_username or just
collapses on from_address. Add a parametrized test covering:
- smtp_username-only match (from_address empty, imap_username different) —
simulates SMTP relays that rewrite outbound From to the login identity.
- imap_username-only match — simulates mailbox-identity setups.
- Case-insensitive match — inbound From arriving upper-cased must still hit.
No production code changes.
Made-with: Cursor
Skip inbound emails that come from the bot's own configured addresses so a mailbox wired to the same SMTP/IMAP account does not trigger infinite reply loops.
- Extract synthetic user message string to module-level constant
- Tighten comments in _snip_history recovery branch
- Strengthen no-user edge case test to verify safety net interaction
When _snip_history truncates the message history and the only user message
ends up outside the kept window, providers like GLM reject the resulting
system→assistant sequence with error 1214 ("messages 参数非法").
Two-layer fix:
1. _snip_history now walks backwards through non_system messages to recover
the nearest user message when none exists in the kept window.
2. _enforce_role_alternation inserts a synthetic user message
"(conversation continued)" when the first non-system message is a bare
assistant (no tool_calls), serving as a safety net for any edge cases
that slip through.
Co-authored-by: darlingbud <darlingbud@users.noreply.github.com>
Follow-ups from review of #3194:
- ci.yml: drop unconditional --ignore=tests/channels/test_matrix_channel.py.
That test file already calls pytest.importorskip("nio") at module top, so
it self-skips on Windows (where nio isn't installed) without also hiding
62 tests from Linux CI.
- filesystem.py: hoist `import os` to the module top and drop the duplicate
inline import in ReadFileTool.execute. Document the CRLF->LF normalization
as intentional (primarily a Windows UX fix so downstream StrReplace/Grep
match consistently regardless of where the file was written).
- test_read_enhancements.py: lock down two new behaviors
* TestFileStateHashFallback: check_read warns when content changes but
mtime is unchanged (coarse-mtime filesystems on Windows).
* TestReadFileLineEndingNormalization: ReadFileTool strips CRLF and
preserves LF-only files untouched.
- test_tool_validation.py: restore list2cmdline/shlex.quote in
test_exec_head_tail_truncation. The temp_path-based form was correct,
but dropping the quoting broke on any Windows path containing spaces
(e.g. C:\Users\John Doe\...). CI runners happen not to have spaces so
this slipped through.
Tests: 1993 passed locally.
Made-with: Cursor
Complete the symmetry left by #3214: ChannelManager._resolve_transcription_base
already resolves providers.openai.api_base, but BaseChannel.transcribe_audio
instantiated OpenAITranscriptionProvider without forwarding it, and the provider
__init__ did not accept the parameter. Self-hosted OpenAI-compatible Whisper
endpoints (LiteLLM, vLLM, etc.) configured via config.json were therefore
ignored for the OpenAI backend.
- OpenAITranscriptionProvider.__init__ now accepts api_base with env fallback
(OPENAI_TRANSCRIPTION_BASE_URL) matching the Groq pattern.
- BaseChannel.transcribe_audio forwards self.transcription_api_base to OpenAI.
- Tests mirror the existing Groq coverage: manager propagation for provider
"openai", BaseChannel-to-provider argument passing, and provider default vs
override for api_url.
Fully backward-compatible: when api_base is None and the env var is unset,
the default https://api.openai.com/v1/audio/transcriptions is used.
Refs #3213, follow-up to #3214.
Follow-up to #3212, fully backward compatible:
- Extract the 14-day staleness threshold as `_STALE_THRESHOLD_DAYS` module
constant and pass it into the Phase 1 prompt template as
`{{ stale_threshold_days }}`. The number lived in three places before
(code threshold, prompt instruction, docstring); now there is one.
- Add `DreamConfig.annotate_line_ages` (default True = current behavior)
and propagate it through `Dream.__init__` and the gateway wiring in
cli/commands.py. Gives users a knob to disable the feature without a
code patch if an LLM reacts poorly to the `← Nd` suffix.
- Harden `_annotate_with_ages` against dirty working trees: when HEAD
blob line count disagrees with the working-tree content length, skip
annotation entirely instead of assigning ages to the wrong lines. The
previous `i >= len(ages)` guard only handled one direction of the
mismatch.
- Inline-comment the `max_iterations` 10→15 bump with a pointer to
exp002 so future blame has context.
- Add 4 regression tests: end-to-end `← 30d` reaches prompt, 14/15
threshold boundary, `annotate_line_ages=False` bypasses git entirely
(verified via `assert_not_called`), length-mismatch defense, and
template-var rendering.
Made-with: Cursor