Research
Tiered Memory for Long-Running Agents
Long-running agents accumulate more state than a single context window can hold. The naive fix — paste everything back each turn — burns tokens and degrades recall as the window fills. We separate memory into three tiers by access frequency.
The tiers
A small hot tier stays resident every turn. A larger warm tier is retrieved on demand. A cold tier is archived and only reached through explicit search.
| Tier | Resident | Typical size | Access |
|---|---|---|---|
| Hot | Always | ~2 KB | Every turn |
| Warm | On demand | ~50 KB | Retrieval |
| Cold | Never | Unbounded | Explicit search |
Why it holds up
Keeping the hot tier tiny means the per-turn cost stays flat as total memory grows. Recall does not suffer, because the warm and cold tiers remain reachable — they are paged in only when a turn actually needs them.
The cost of a turn should track what the turn needs, not what the agent has ever seen.
This is not a new idea — it mirrors CPU cache hierarchies — but it maps cleanly onto agent context budgets. Limitations: tier placement is a heuristic, and a mis-placed hot fact still costs a retrieval round-trip to recover.