Roketus Research

Research

Tiered Memory for Long-Running Agents

Published May 26, 2026 · Roketus Research

Long-running agents accumulate more state than a single context window can hold. The naive fix — paste everything back each turn — burns tokens and degrades recall as the window fills. We separate memory into three tiers by access frequency.

The tiers

A small hot tier stays resident every turn. A larger warm tier is retrieved on demand. A cold tier is archived and only reached through explicit search.

TierResidentTypical sizeAccess
HotAlways~2 KBEvery turn
WarmOn demand~50 KBRetrieval
ColdNeverUnboundedExplicit search

Why it holds up

Keeping the hot tier tiny means the per-turn cost stays flat as total memory grows. Recall does not suffer, because the warm and cold tiers remain reachable — they are paged in only when a turn actually needs them.

The cost of a turn should track what the turn needs, not what the agent has ever seen.

This is not a new idea — it mirrors CPU cache hierarchies — but it maps cleanly onto agent context budgets. Limitations: tier placement is a heuristic, and a mis-placed hot fact still costs a retrieval round-trip to recover.

#agents#memory#context

← All research