Problem
When AI costs rise, many companies react by cutting tokens. Shorter prompts, less context, fewer examples, less memory. Sometimes it works. Sometimes it blinds the agent.
Context isn’t fat that can be cut without thinking. It’s the informational environment where the agent decides. If you remove critical context, the agent consumes less but fails more. If you put everything in, it consumes more and can get confused.
The challenge isn’t “less context.” It’s better context budgeting.
Thesis
Context Budgeting should be its own discipline within the AI operating model.
It involves deciding what information goes in, where it’s placed, how long it lasts, when it’s cached, when it expires, what’s retrieved on demand, and what should never be included.
Good context budgeting reduces cost without destroying quality. Bad budgeting saves tokens by buying rework.
Framework
Divide context into five budgets:
- Stable: instructions, policies, criteria, schemas, and lasting examples.
- Situational: case data, user, customer, channel, or task.
- Retrieved: documents, tickets, memory, knowledge, or sources.
- Transient: tool outputs, temporary logs, and intermediate steps.
- Prohibited: secrets, unnecessary data, noise, and unauthorized context.
Mini-case: a legal agent receives a contract, internal policies, customer history, redline examples, and tool outputs. If everything enters as a flat block, cost rises and precision drops. If stable policies are cached, the contract enters as a case, sources are retrieved with permissions, and tool outputs expire, the system decides better and costs less.
Measurable signal: cost per accepted outcome after separating stable, situational, retrieved, and transient context.
Posture: context is inventory. If you don’t budget it, it becomes expensive garbage.
Why It Matters Now
Anthropic documents prompt caching to reuse stable content like tool definitions, system instructions, context, and examples. AWS announced in January 2026 a 1-hour TTL option for prompt caching in Amazon Bedrock with selected Claude models, aimed at long agentic workflows, tool use, retrieval, and orchestration. OpenAI documents agents and SDKs where tools, memory, and execution structure become explicit pieces of the system.
All these pieces point to the same problem: long agents need to manage context as an operational resource, not as glued text.
The cost of context doesn’t just appear on the bill. It appears in latency, errors, data exposure, and debugging difficulty.
Anti-Example
“Let’s put the entire knowledge base in the context so it doesn’t fail.”
That usually fails expensively. It increases tokens, includes outdated documents, mixes permissions, and makes it hard to know which source influenced the response. An agent doesn’t need everything; it needs sufficient, relevant, authorized, and fresh context.
Protocol (3 steps)
- Mark context by useful life. Minutes, hours, days, release, contract, or permanent.
- Cache the stable, retrieve the dynamic. Don’t treat policies and case data as the same thing.
- Measure blindness and noise. If cost drops but rework rises, the cut was false savings.
| Type | Strategy | Risk |
|---|---|---|
| stable | cache | old version |
| situational | inject per case | lack of context |
| retrieved | RAG with permissions | wrong source |
| transient | expire | contaminated memory |
| prohibited | block | data leak |
Related
- Token-to-Outcome: the KPI that separates used AI from profitable AI
- Context Architecture: why prompt engineering doesn’t scale business
- Enterprise AI Search: why internal search is becoming an operating system
Sources Consulted
- Anthropic: Prompt caching
- AWS: Amazon Bedrock now supports 1-hour duration for prompt caching
- OpenAI Agents SDK: Memory
Next Step
Take an expensive workflow and paint its context in five colors: stable, situational, retrieved, transient, and prohibited. There you’ll see what part gets cached, what part gets retrieved, and what part is surplus.
Translated from the Spanish original with AI assistance and reviewed for accuracy. Read the original in Spanish.