VITOHJL dbc518098e refactor: implement token-based context compression mechanism
Major changes:
- Replace message-count-based memory window with token-budget-based compression
- Add max_tokens_input, compression_start_ratio, compression_target_ratio config
- Implement _maybe_compress_history() that triggers based on prompt token usage
- Use _build_compressed_history_view() to provide compressed history to LLM
- Refactor MemoryStore.consolidate() -> consolidate_chunk() for chunk-based compression
- Remove last_consolidated from Session, use _compressed_until metadata instead
- Add background compression scheduling to avoid blocking message processing

Key improvements:
- Compression now based on actual token usage, not arbitrary message counts
- Better handling of long conversations with large context windows
- Non-destructive compression: old messages remain in session, but excluded from prompt
- Automatic compression when history exceeds configured token thresholds
2026-03-08 14:20:16 +08:00
..
2026-03-08 03:24:15 +00:00
2026-02-28 17:55:18 +00:00
2026-02-01 07:36:42 +00:00