Tiered Memory for Long-Running Agents

Long-running agents accumulate more state than a single context window can hold. The naive fix — paste everything back each turn — burns tokens and degrades recall as the window fills. We separate memory into three tiers by access frequency.

The tiers

A small hot tier stays resident every turn. A larger warm tier is retrieved on demand. A cold tier is archived and only reached through explicit search.

Tier	Resident	Typical size	Access
Hot	Always	~2 KB	Every turn
Warm	On demand	~50 KB	Retrieval
Cold	Never	Unbounded	Explicit search

Why it holds up

Keeping the hot tier tiny means the per-turn cost stays flat as total memory grows. Recall does not suffer, because the warm and cold tiers remain reachable — they are paged in only when a turn actually needs them.

The cost of a turn should track what the turn needs, not what the agent has ever seen.

This is not a new idea — it mirrors CPU cache hierarchies — but it maps cleanly onto agent context budgets. Limitations: tier placement is a heuristic, and a mis-placed hot fact still costs a retrieval round-trip to recover.