24 Commits

Author SHA1 Message Date
hussein1362
e72c415473 fix(heartbeat): prevent internal reasoning leaks and finalization fallback in delivery
Three failure modes addressed:

1. Model reflects HEARTBEAT.md instructions back as output instead of
   executing them ("HEARTBEAT.md has active tasks listed...")
2. Model narrates decision logic ("Best judgment call: stay quiet")
3. Model produces empty output for silence, runner treats it as failure,
   finalization retry generates "couldn't produce a final answer" which
   gets delivered to the user

Changes:
- Add _is_deliverable() pre-filter in HeartbeatService._tick() that catches
  finalization fallback messages and leaked reasoning patterns before they
  reach the evaluator
- Wrap Phase 2 task input with a delivery-awareness preamble telling the
  model its output goes directly to the user's messaging app
- Add meta-reasoning suppression criterion to evaluator template

No changes to agent/loop.py, runner.py, providers, or config schema.
2026-04-27 18:14:13 +08:00
Xubin Ren
311a7fe36e fix(session): stop training the model to parrot [Message Time: ...]
Past assistant turns in history were prefixed with "[Message Time: ...]"
just like user turns. The model treated these as in-context demos and
started prefixing its own replies with the same marker, leaking
metadata to the user. Prompt-level warnings could not beat dozens of
prior assistant samples.

Annotate only user turns and proactive deliveries
(_channel_delivery=True, i.e. cron / heartbeat pushes whose timing is
the whole point and which are too infrequent to act as demos). Adjacent
user-side timestamps still pin every normal assistant reply for
relative-time reasoning. The now-redundant identity.md warning is
removed along with the demonstration source.
2026-04-27 07:11:20 +00:00
Xubin Ren
eeaec1f951 fix(agent): prevent message time metadata from leaking into replies 2026-04-27 06:23:43 +00:00
Xubin Ren
be05189f39 feat(channels): add video support for Telegram and WebSocket
Telegram previously sent all video files as documents via send_document,
so users saw a file icon instead of an inline player. WebSocket only
accepted image MIME types, rejecting video uploads entirely.

Telegram:
- Recognize video extensions (mp4/mov/avi/mkv/webm/3gp) in _get_media_type
- Route videos through send_video with supports_streaming=True
- Add VIDEO/VIDEO_NOTE/ANIMATION to inbound message filters
- Add video MIME mappings to _get_extension
- Fix: local file sends now use _call_with_retry (previously no retry)

WebSocket:
- Expand upload MIME whitelist with video/mp4, video/webm, video/quicktime
- Add per-type size limits (_MAX_VIDEO_BYTES=20MB, _MAX_VIDEOS_PER_MESSAGE=1)
- Expand media serving endpoint to serve video with correct Content-Type

Agent:
- Add "video" to message tool media parameter description
- Add .mp4 example to identity.md system prompt

Made-with: Cursor
2026-04-25 02:20:13 +08:00
chengyongru
58110afb88 fix(templates): keep Search & Discovery heading in identity.md
No reason to rename it to "Tools" — the section still covers the
same grep/glob search tips as before.
2026-04-18 21:55:56 +08:00
chengyongru
34e8f97b1f refactor(templates): separate identity and SOUL responsibilities
Move all behavioral instructions out of identity.md into SOUL.md so that
each file has a single clear purpose:

- identity.md: capability facts only (runtime, workspace, format hints,
  tool guidance, untrusted content warning)
- SOUL.md: behavioral rules (name, personality, execution rules)

The "Act, don't narrate" rule is refined into layered behavior: act
immediately on single-step tasks, plan first for multi-step tasks. This
eliminates the contradiction where identity said "never end with a plan"
but user SOUL.md said "always plan first".
2026-04-18 21:55:56 +08:00
Xubin Ren
cc5a666d5d review(dream): harden line-age annotation per review feedback
Follow-up to #3212, fully backward compatible:

- Extract the 14-day staleness threshold as `_STALE_THRESHOLD_DAYS` module
  constant and pass it into the Phase 1 prompt template as
  `{{ stale_threshold_days }}`. The number lived in three places before
  (code threshold, prompt instruction, docstring); now there is one.
- Add `DreamConfig.annotate_line_ages` (default True = current behavior)
  and propagate it through `Dream.__init__` and the gateway wiring in
  cli/commands.py. Gives users a knob to disable the feature without a
  code patch if an LLM reacts poorly to the `← Nd` suffix.
- Harden `_annotate_with_ages` against dirty working trees: when HEAD
  blob line count disagrees with the working-tree content length, skip
  annotation entirely instead of assigning ages to the wrong lines. The
  previous `i >= len(ages)` guard only handled one direction of the
  mismatch.
- Inline-comment the `max_iterations` 10→15 bump with a pointer to
  exp002 so future blame has context.
- Add 4 regression tests: end-to-end `← 30d` reaches prompt, 14/15
  threshold boundary, `annotate_line_ages=False` bypasses git entirely
  (verified via `assert_not_called`), length-mismatch defense, and
  template-var rendering.

Made-with: Cursor
2026-04-17 13:45:38 +08:00
chengyongru
35f3084c03 feat(dream): per-line age annotations + dedup-aware prompt + max_iter=15
Three improvements to Dream's memory consolidation:

1. Per-line git-blame age annotations: MEMORY.md lines get `← Nd` suffixes
   (N>14) from dulwich annotate. SOUL.md/USER.md excluded as permanent.
   LLM uses content judgment, not just age, to decide what to prune.

2. Dedup-aware Phase 1 prompt: reframed as dual-task (extract facts +
   deduplicate existing files) with explicit redundancy patterns to scan for.
   Validated through 20 experiments (exp-002 prompt + max_iter=15 was best,
   averaging -1643 chars/5.4% compression per run).

3. Phase 1 analysis as commit body: dream git commits now include the full
   Phase 1 analysis for transparency via /dream-log.

4. max_iterations raised from 10 to 15: 30% improvement over 10 with no
   risk; 20 showed diminishing returns (exp-020: -701 vs exp-017: -1643).
2026-04-17 13:45:38 +08:00
chengyongru
6fbada5363 refactor(context): deduplicate system prompt — markdown skills index, skip template MEMORY.md
- Convert skills summary from verbose XML (4-5 lines/skill) to compact
  markdown list (1 line/skill) with inline path for read_file lookup
- Exclude always-loaded skills (e.g. memory) from the skills index to
  avoid duplicating content already in the Active Skills section
- Skip injecting the Memory section when MEMORY.md still matches the
  bundled template (i.e. Dream hasn't populated it yet)
2026-04-15 15:49:30 +08:00
Xubin Ren
7a7f5c9689 fix(dream): use valid builtin skill template paths
Point Dream skill creation at a readable builtin skill-creator template, keep skill writes rooted at the workspace, and document the new skill discovery behavior in README.

Made-with: Cursor
2026-04-12 16:49:55 +08:00
chengyongru
2a243bfe4f feat(agent): integrate skill discovery into Dream consolidation
Instead of a separate skill discovery system, extend Dream's two-phase
pipeline to also detect reusable behavioral patterns from conversation
history and generate SKILL.md files.

Phase 1 gains a [SKILL] output type for pattern detection.
Phase 2 gains write_file (scoped to skills/) and read access to builtin
skills, enabling it to check for duplicates and follow skill-creator's
format conventions before creating new skills.

Inspired by PR #3039 by @wanghesong2019.

Co-authored-by: wanghesong2019 <wanghesong2019@users.noreply.github.com>
2026-04-12 16:49:55 +08:00
Xubin Ren
c7d10de253 feat(soul): restore friendly and curious tone to SOUL.md
Made-with: Cursor
2026-04-08 02:22:25 +08:00
Xubin Ren
edb821e10d feat(agent): prompt behavior directives, tool descriptions, and loop robustness 2026-04-08 02:22:25 +08:00
chengyongru
b4f985f3dc feat(memory):dream enhancement (#2887)
* feat(dream): enhance memory cleanup with staleness detection

- Phase 1: add [FILE-REMOVE] directive and staleness patterns (14-day
  threshold, completed tasks, superseded info, resolved tracking)
- Phase 2: add explicit cleanup rules, file paths section, and deletion
  guidance to prevent LLM path confusion
- Inject current date and file sizes into Phase 1 context for age-aware
  analysis
- Add _dream_debug() helper for observability (dream-debug.log in workspace)
- Log Phase 1 analysis output and Phase 2 tool events for debugging

Tested with glm-5-turbo: MEMORY.md reduced from 149 to 108-129 lines
across two rounds, correctly identifying and removing weather data,
detailed incident info, completed research, and stale discussions.

* refactor(dream): replace _dream_debug file logger with loguru

Remove the custom _dream_debug() helper that wrote to dream-debug.log
and use the existing loguru logger instead. Phase 1 analysis is logged
at debug level, tool events at info level — consistent with the rest
of the codebase and no extra log file to manage.

* fix(dream): make stale scan independent of conversation history

Reframe Phase 1 from a single comparison task to two independent
tasks: history diff AND proactive stale scan. The LLM was skipping
stale content that wasn't referenced in conversation history (e.g.
old triage snapshots). Now explicitly requires scanning memory files
for staleness patterns on every run.

* fix(dream): correct old_text param name and truncate debug log

- Phase 2 prompt: old_string -> old_text to match EditFileTool interface
- Phase 1 debug log: truncate analysis to 500 chars to avoid oversized lines

* refactor(dream): streamline prompts by separating concerns

Phase 1 owns all staleness judgment logic; Phase 2 is pure execution
guidance. Remove duplicated cleanup rules from Phase 2 since Phase 1
already determines what to add/remove. Fix remaining old_string -> old_text.
Total prompt size reduced ~45% (870 -> 480 tokens).

* fix(dream): add FILE-REMOVE execution guidance to Phase 2 prompt

Phase 2 was only processing [FILE] additions and ignoring [FILE-REMOVE]
deletions after the cleanup rules were removed. Add explicit mapping:
[FILE] → add content, [FILE-REMOVE] → delete content.
2026-04-07 22:39:47 +08:00
Xubin Ren
c9d4b7b905 Merge remote-tracking branch 'origin/main' into pr-2449
Made-with: Cursor

# Conflicts:
#	nanobot/utils/evaluator.py
2026-04-06 06:30:11 +00:00
Xubin Ren
33bef8d508 Merge remote-tracking branch 'origin/main' into feat/search-tools
Made-with: Cursor
2026-04-04 14:37:59 +00:00
Xubin Ren
c3b4ebae53 refactor(agent): move internal prompts into packaged templates 2026-04-04 11:09:37 +00:00
Jack Lu
d436a1d678 feat: integrate Jinja2 templating for agent responses and memory consolidation
- Added Jinja2 template support for various agent responses, including identity, skills, and memory consolidation.
- Introduced new templates for evaluating notifications, handling subagent announcements, and managing platform policies.
- Updated the agent context and memory modules to utilize the new templating system for improved readability and maintainability.
- Added a new dependency on Jinja2 in pyproject.toml.
2026-04-04 14:18:22 +08:00
Xubin Ren
15cc9b23b4 feat(agent): add built-in grep and glob search tools 2026-04-02 15:37:57 +00:00
Re-bin
c05cb2ef64 refactor(cron): remove CLI cron commands and unify scheduling via cron tool 2026-03-03 05:51:24 +00:00
Re-bin
cdbede2fa8 refactor: simplify /stop dispatch, inline commands, trim verbose docstrings 2026-02-25 17:04:08 +00:00
Re-bin
30361c9307 refactor: replace cron usage docs in TOOLS.md with reference to cron skill 2026-02-23 18:28:09 +00:00
Re-bin
35e3f7ed26 fix(templates): tighten AGENTS.md tool call guidelines to reduce hallucinations 2026-02-23 14:10:43 +00:00
Re-bin
577b3d104a refactor: move workspace/ to nanobot/templates/ for packaging 2026-02-23 08:08:01 +00:00