* feat: add contextBudgetTokens config field for tool-loop trimming * feat: implement _trim_history_for_budget for tool-loop cost reduction * feat: thread contextBudgetTokens into AgentLoop constructor * feat: wire context budget trimming into agent loop * refactor: move trim_history_for_budget to helpers and add docs - Extract trim_history_for_budget() as a pure function in helpers.py - AgentLoop._trim_history_for_budget becomes a thin wrapper - Add docs/CONTEXT_BUDGET.md with usage guide and trade-off notes - Replace wrapper tests with direct helper unit tests --------- Co-authored-by: chengyongru <chengyongru.ai@gmail.com>
2.6 KiB
Context Budget (context_budget_tokens)
Caps how many tokens of old session history are sent to the LLM during tool-loop iterations 2+. Reduces cost and first-token latency by trimming history between turns.
How It Works
During multi-turn tool-use sessions, each iteration re-sends the full conversation history. context_budget_tokens limits how many old tokens are included:
- Iteration 1 — always receives full context (no trimming)
- Iteration 2+ — old history is trimmed to fit within the budget; current turn is never trimmed
- Memory consolidation — runs before/after the loop and always sees the full canonical history; trimming only affects the LLM's view
Configuration
{
"agents": {
"defaults": {
"context_budget_tokens": 1000
}
}
}
| Value | Behavior |
|---|
0 (default) | No trimming — full history sent every iteration
4000 | Conservative — barely trims in practice; good for multi-step tasks
1000 | Aggressive — significant savings; works well for typical linear tasks
< 500 | Clamped to 500 minimum when positive (1–2 message pairs at typical token density)
Trade-offs
Cost & latency — Trimming reduces tokens sent each iteration, which saves money and lowers first-token time (TTFT). This is nanobot's primary sweet spot.
Context loss — Older context is not visible to the LLM in later iterations. For tasks that genuinely require 20+ iterations of history to stay coherent, consider 0 or 4000.
Tool-result truncation — Large results from a previous turn (e.g., reading a 10,000-line file in Round 1, then editing in Round 2) can be trimmed. The agent can re-read the file via its tools — this is a 1-tool-call recovery cost, not a failure.
Prefix caching — Some providers (e.g., DeepSeek) use implicit prefix-based caching. Aggressive trimming breaks prefix matching and can reduce cache hit rates. For these providers, 0 or a high value may be more cost-effective overall.
When to Use
| Use case | Recommended value |
|---|---|
| Simple read → process → act chains | 1000 |
| Multi-step reasoning with tool chains | 4000 |
| Complex debugging / long task traces | 0 |
| Providers with implicit prefix caching | 0 or 4000 |
| Long file operations across turns | 0 or re-read via tools |
Example
Turn 1: User asks to read a.py (10k lines)
Turn 2: User asks to edit line 100
With context_budget_tokens=500, the file-content result from Turn 1 may be trimmed before Turn 2. The agent will re-read the file to perform the edit — a 1-call recovery. This is normal behavior for the feature; it is not a bug.