nanobot/docs/CONTEXT_BUDGET.md

# Context Budget (`context_budget_tokens`)

Caps how many tokens of old session history are sent to the LLM during tool-loop iterations 2+. Reduces cost and first-token latency by trimming history between turns.

## How It Works

During multi-turn tool-use sessions, each iteration re-sends the full conversation history. `context_budget_tokens` limits how many old tokens are included:

- **Iteration 1** — always receives full context (no trimming)
- **Iteration 2+** — old history is trimmed to fit within the budget; current turn is never trimmed
- **Memory consolidation** — runs before/after the loop and always sees the full canonical history; trimming only affects the LLM's view

## Configuration

```json
{
  "agents": {
    "defaults": {
      "context_budget_tokens": 1000
    }
  }
}
```

| Value | Behavior |
|---|---|

---

`0` (default) | No trimming — full history sent every iteration
`4000` | Conservative — barely trims in practice; good for multi-step tasks
`1000` | Aggressive — significant savings; works well for typical linear tasks
`< 500` | Clamped to `500` minimum when positive (1–2 message pairs at typical token density)

## Trade-offs

**Cost & latency** — Trimming reduces tokens sent each iteration, which saves money and lowers first-token time (TTFT). This is nanobot's primary sweet spot.

**Context loss** — Older context is not visible to the LLM in later iterations. For tasks that genuinely require 20+ iterations of history to stay coherent, consider `0` or `4000`.

**Tool-result truncation** — Large results from a previous turn (e.g., reading a 10,000-line file in Round 1, then editing in Round 2) can be trimmed. The agent can re-read the file via its tools — this is a 1-tool-call recovery cost, not a failure.

**Prefix caching** — Some providers (e.g., DeepSeek) use implicit prefix-based caching. Aggressive trimming breaks prefix matching and can reduce cache hit rates. For these providers, `0` or a high value may be more cost-effective overall.

## When to Use

| Use case | Recommended value |
|---|---|
| Simple read → process → act chains | `1000` |
| Multi-step reasoning with tool chains | `4000` |
| Complex debugging / long task traces | `0` |
| Providers with implicit prefix caching | `0` or `4000` |
| Long file operations across turns | `0` or re-read via tools |

## Example

```
Turn 1: User asks to read a.py (10k lines)
Turn 2: User asks to edit line 100
```

With `context_budget_tokens=500`, the file-content result from Turn 1 may be trimmed before Turn 2. The agent will re-read the file to perform the edit — a 1-call recovery. This is normal behavior for the feature; it is not a bug.