Skip to content
Back to Magazine
automation-aiops 4 min read

AI Traces: the layer that turns agents into auditable systems

Does this apply to your company?

Free 30-min AI diagnostic →

Key Takeaways

  • - Model: provider, model, parameters, tokens, latency and response.
  • - Context: retrieved documents, source, ranking, permissions and freshness.
  • - Tools: tool call, arguments, result, error, retry and side effect.
  • - Decision: why the agent chose one path over another.

Decision

Separate reliable automation from fragile demo before granting it autonomy.

Room

Operations review, architecture, security or platform.

Risk

Adding speed with no observability, rollback, ownership or stop criterion.

Agent prompt: identify guardrails, control points, likely failures and autonomy criteria

Problem

A traditional application fails and you can usually follow the trail: request, service, database, error, log, alert. An agent fails differently. It may retrieve incorrect context, choose a bad tool, repeat a call, ignore an instruction, spend tokens without producing a result, or declare success when nothing real happened.

If you only look at input and output, the system looks like a black box. You see the question and the answer, but you don’t know which path it took.

That is not enough for enterprise operation. An agent that acts without a trace is not autonomous; it is automated opacity.

Thesis

AI Traces will be a mandatory layer for any serious agentic system.

Not because every team needs sophisticated dashboards from day one, but because without traces there is no debugging, evaluation, compliance, cost optimization, or learning.

The trace turns a run into evidence: which model was called, which prompt was used, which context entered, which tool calls occurred, which failures appeared, which decision was made, and how much it cost.

Framework

A useful trace should capture five levels:

  • Model: provider, model, parameters, tokens, latency and response.
  • Context: retrieved documents, source, ranking, permissions and freshness.
  • Tools: tool call, arguments, result, error, retry and side effect.
  • Decision: why the agent chose one path over another.
  • Outcome: what changed in the external system and whether it was verified.

Mini-case: an operations agent says it has updated a CRM. Without a trace, you only see a convincing response. With a trace, you see that it retrieved the correct contact, called the API with the correct ID, received a 200, wrote the expected field, and then verified the status. That difference separates demo from production.

Measurable signal: percentage of critical runs that can be reproduced or audited from a complete trace.

Position: an agent without trace should not touch production systems.

Why it matters now

OpenTelemetry already maintains semantic conventions for generative systems, including signals for model, agent, framework, event, exception, and metric spans. LangSmith documents observability for agents with tracing of calls, steps, and decisions. OpenAI, in its practical guide for agents, treats guardrails, tool safeguards, and output validation as production components, not as extras.

The market direction is clear: AI observability stops being “saving prompts” and starts resembling distributed tracing with agent semantics.

That has deep implications. If each provider stores traces in its own format, the team becomes locked to tools. If traces use shared conventions, the company can compare, migrate, and audit.

Anti-example

“We store all prompts and responses in a table.”

It’s a good start, but it’s not a trace. The causal chain is missing: retrieval, tools, errors, retries, permissions, cost, and external effects. The textual log tells what the agent said. The trace tells what it did.

Protocol (3 steps)

  1. Trace workflows with side effects first. If it modifies data, sends messages, or executes actions, it must leave a trace.
  2. Combine trace and outcome. A successful execution only counts if the external system confirms the expected change.
  3. Label failures by layer. Model, context, tool, permission, network, criterion, or verification. Without taxonomy, each incident starts over.
LayerWhat capturesWhat it’s for
modeltokens, latency, responsecost and performance
contextsources and permissionstrust and compliance
toolarguments and resultdebugging
decisionchosen pathevaluation
outcomeverified effectreal ROI

Sources consulted

Next step

Pick an agent that already delivers value. Don’t start by improving prompts. Start by instrumenting a full trace of a real run and look for where evidence is lost.


Translated from the Spanish original with AI assistance and reviewed for accuracy. Read the original in Spanish.

ai-traces observability agentic-ai automation-aiops
Cite this article

Berthelius, V. (2026). “AI Traces: the layer that turns agents into auditable systems”. BRTHLS Magazine. https://www.brthls.com/magazine/ai-traces-auditable-agents-en

Fractional CAIO · Free diagnostic

Is your company ready to operate with AI?

30 minutes. No pitch. An honest read on where you are and what to move first.

Book free diagnostic