nanobot/docs/CONTEXT_BUDGET.md
Jesse 528b3cfe5a
feat: configurable context budget for tool-loop iterations (#2317)
* feat: add contextBudgetTokens config field for tool-loop trimming

* feat: implement _trim_history_for_budget for tool-loop cost reduction

* feat: thread contextBudgetTokens into AgentLoop constructor

* feat: wire context budget trimming into agent loop

* refactor: move trim_history_for_budget to helpers and add docs

- Extract trim_history_for_budget() as a pure function in helpers.py
- AgentLoop._trim_history_for_budget becomes a thin wrapper
- Add docs/CONTEXT_BUDGET.md with usage guide and trade-off notes
- Replace wrapper tests with direct helper unit tests

---------

Co-authored-by: chengyongru <chengyongru.ai@gmail.com>
2026-03-23 18:13:03 +08:00

63 lines
2.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Context Budget (`context_budget_tokens`)
Caps how many tokens of old session history are sent to the LLM during tool-loop iterations 2+. Reduces cost and first-token latency by trimming history between turns.
## How It Works
During multi-turn tool-use sessions, each iteration re-sends the full conversation history. `context_budget_tokens` limits how many old tokens are included:
- **Iteration 1** — always receives full context (no trimming)
- **Iteration 2+** — old history is trimmed to fit within the budget; current turn is never trimmed
- **Memory consolidation** — runs before/after the loop and always sees the full canonical history; trimming only affects the LLM's view
## Configuration
```json
{
"agents": {
"defaults": {
"context_budget_tokens": 1000
}
}
}
```
| Value | Behavior |
|---|---|
---
`0` (default) | No trimming — full history sent every iteration
`4000` | Conservative — barely trims in practice; good for multi-step tasks
`1000` | Aggressive — significant savings; works well for typical linear tasks
`< 500` | Clamped to `500` minimum when positive (12 message pairs at typical token density)
## Trade-offs
**Cost & latency** — Trimming reduces tokens sent each iteration, which saves money and lowers first-token time (TTFT). This is nanobot's primary sweet spot.
**Context loss** — Older context is not visible to the LLM in later iterations. For tasks that genuinely require 20+ iterations of history to stay coherent, consider `0` or `4000`.
**Tool-result truncation** — Large results from a previous turn (e.g., reading a 10,000-line file in Round 1, then editing in Round 2) can be trimmed. The agent can re-read the file via its tools — this is a 1-tool-call recovery cost, not a failure.
**Prefix caching** — Some providers (e.g., DeepSeek) use implicit prefix-based caching. Aggressive trimming breaks prefix matching and can reduce cache hit rates. For these providers, `0` or a high value may be more cost-effective overall.
## When to Use
| Use case | Recommended value |
|---|---|
| Simple read → process → act chains | `1000` |
| Multi-step reasoning with tool chains | `4000` |
| Complex debugging / long task traces | `0` |
| Providers with implicit prefix caching | `0` or `4000` |
| Long file operations across turns | `0` or re-read via tools |
## Example
```
Turn 1: User asks to read a.py (10k lines)
Turn 2: User asks to edit line 100
```
With `context_budget_tokens=500`, the file-content result from Turn 1 may be trimmed before Turn 2. The agent will re-read the file to perform the edit — a 1-call recovery. This is normal behavior for the feature; it is not a bug.