Skip to content
Back to Magazine
ai-operating-models 4 min read

Context Budgeting: Saving Tokens Without Blinding the Agent

Does this apply to your company?

Free 30-min AI diagnostic →

Key Takeaways

  • - Stable: instructions, policies, criteria, schemas, and lasting examples.
  • - Situational: case data, user, customer, channel, or task.
  • - Retrieved: documents, tickets, memory, knowledge, or sources.
  • - Transient: tool outputs, temporary logs, and intermediate steps.

Decision

Decide what governance, ownership or cadence is missing before scaling AI.

Room

Executive committee, AI portfolio review, transformation steering.

Risk

Mistaking activity, pilots and tooling for real operating capability.

Agent prompt: map decision rights, KPIs, risks and the next operational move

Problem

When AI costs rise, many companies react by cutting tokens. Shorter prompts, less context, fewer examples, less memory. Sometimes it works. Sometimes it blinds the agent.

Context isn’t fat that can be cut without thinking. It’s the informational environment where the agent decides. If you remove critical context, the agent consumes less but fails more. If you put everything in, it consumes more and can get confused.

The challenge isn’t “less context.” It’s better context budgeting.

Thesis

Context Budgeting should be its own discipline within the AI operating model.

It involves deciding what information goes in, where it’s placed, how long it lasts, when it’s cached, when it expires, what’s retrieved on demand, and what should never be included.

Good context budgeting reduces cost without destroying quality. Bad budgeting saves tokens by buying rework.

Framework

Divide context into five budgets:

  • Stable: instructions, policies, criteria, schemas, and lasting examples.
  • Situational: case data, user, customer, channel, or task.
  • Retrieved: documents, tickets, memory, knowledge, or sources.
  • Transient: tool outputs, temporary logs, and intermediate steps.
  • Prohibited: secrets, unnecessary data, noise, and unauthorized context.

Mini-case: a legal agent receives a contract, internal policies, customer history, redline examples, and tool outputs. If everything enters as a flat block, cost rises and precision drops. If stable policies are cached, the contract enters as a case, sources are retrieved with permissions, and tool outputs expire, the system decides better and costs less.

Measurable signal: cost per accepted outcome after separating stable, situational, retrieved, and transient context.

Posture: context is inventory. If you don’t budget it, it becomes expensive garbage.

Why It Matters Now

Anthropic documents prompt caching to reuse stable content like tool definitions, system instructions, context, and examples. AWS announced in January 2026 a 1-hour TTL option for prompt caching in Amazon Bedrock with selected Claude models, aimed at long agentic workflows, tool use, retrieval, and orchestration. OpenAI documents agents and SDKs where tools, memory, and execution structure become explicit pieces of the system.

All these pieces point to the same problem: long agents need to manage context as an operational resource, not as glued text.

The cost of context doesn’t just appear on the bill. It appears in latency, errors, data exposure, and debugging difficulty.

Anti-Example

“Let’s put the entire knowledge base in the context so it doesn’t fail.”

That usually fails expensively. It increases tokens, includes outdated documents, mixes permissions, and makes it hard to know which source influenced the response. An agent doesn’t need everything; it needs sufficient, relevant, authorized, and fresh context.

Protocol (3 steps)

  1. Mark context by useful life. Minutes, hours, days, release, contract, or permanent.
  2. Cache the stable, retrieve the dynamic. Don’t treat policies and case data as the same thing.
  3. Measure blindness and noise. If cost drops but rework rises, the cut was false savings.
TypeStrategyRisk
stablecacheold version
situationalinject per caselack of context
retrievedRAG with permissionswrong source
transientexpirecontaminated memory
prohibitedblockdata leak

Sources Consulted

Next Step

Take an expensive workflow and paint its context in five colors: stable, situational, retrieved, transient, and prohibited. There you’ll see what part gets cached, what part gets retrieved, and what part is surplus.


Translated from the Spanish original with AI assistance and reviewed for accuracy. Read the original in Spanish.

context-engineering token-economics prompt-caching ai-operating-models
Cite this article

Berthelius, V. (2026). “Context Budgeting: Saving Tokens Without Blinding the Agent”. BRTHLS Magazine. https://www.brthls.com/magazine/context-budgeting-saving-tokens-without-blinding-agent-en

Fractional CAIO · Free diagnostic

Is your company ready to operate with AI?

30 minutes. No pitch. An honest read on where you are and what to move first.

Book free diagnostic