# AI Traces: the layer that turns agents into auditable systems

> Explains why AI tracing is essential for enterprise agents, outlining a five-level framework and steps to implement auditable runs.

- Author: Viktor Berthelius (BRTHLS)
- Published: 2026-06-24
- Updated: 2026-06-29
- Category: automation aiops
- Tags: ai-traces, observability, agentic-ai, automation-aiops
- Language: en
- Canonical: https://www.brthls.com/magazine/ai-traces-auditable-agents-en
- Source: BRTHLS Magazine — https://www.brthls.com

---

## Problem

A traditional application fails and you can usually follow the trail: request, service, database, error, log, alert. An agent fails differently. It may retrieve incorrect context, choose a bad tool, repeat a call, ignore an instruction, spend tokens without producing a result, or declare success when nothing real happened.

If you only look at input and output, the system looks like a black box. You see the question and the answer, but you don't know which path it took.

That is not enough for enterprise operation. An agent that acts without a trace is not autonomous; it is automated opacity.

## Thesis

`AI Traces` will be a mandatory layer for any serious agentic system.

Not because every team needs sophisticated dashboards from day one, but because without traces there is no debugging, evaluation, compliance, cost optimization, or learning.

The trace turns a run into evidence: which model was called, which prompt was used, which context entered, which tool calls occurred, which failures appeared, which decision was made, and how much it cost.

## Framework

A useful trace should capture five levels:

- **Model:** provider, model, parameters, tokens, latency and response.
- **Context:** retrieved documents, source, ranking, permissions and freshness.
- **Tools:** tool call, arguments, result, error, retry and side effect.
- **Decision:** why the agent chose one path over another.
- **Outcome:** what changed in the external system and whether it was verified.

Mini-case: an operations agent says it has updated a CRM. Without a trace, you only see a convincing response. With a trace, you see that it retrieved the correct contact, called the API with the correct ID, received a 200, wrote the expected field, and then verified the status. That difference separates demo from production.

**Measurable signal:** percentage of critical runs that can be reproduced or audited from a complete trace.

**Position:** an agent without trace should not touch production systems.

## Why it matters now

OpenTelemetry already maintains semantic conventions for generative systems, including signals for model, agent, framework, event, exception, and metric spans. LangSmith documents observability for agents with tracing of calls, steps, and decisions. OpenAI, in its practical guide for agents, treats guardrails, tool safeguards, and output validation as production components, not as extras.

The market direction is clear: AI observability stops being “saving prompts” and starts resembling distributed tracing with agent semantics.

That has deep implications. If each provider stores traces in its own format, the team becomes locked to tools. If traces use shared conventions, the company can compare, migrate, and audit.

## Anti-example

"We store all prompts and responses in a table."

It's a good start, but it's not a trace. The causal chain is missing: retrieval, tools, errors, retries, permissions, cost, and external effects. The textual log tells what the agent said. The trace tells what it did.

## Protocol (3 steps)

1. **Trace workflows with side effects first.** If it modifies data, sends messages, or executes actions, it must leave a trace.
2. **Combine trace and outcome.** A successful execution only counts if the external system confirms the expected change.
3. **Label failures by layer.** Model, context, tool, permission, network, criterion, or verification. Without taxonomy, each incident starts over.

| Layer | What captures | What it's for |
| --- | --- | --- |
| model | tokens, latency, response | cost and performance |
| context | sources and permissions | trust and compliance |
| tool | arguments and result | debugging |
| decision | chosen path | evaluation |
| outcome | verified effect | real ROI |

## Related

- [Agent Memory from Trace: useful memory doesn't live in the chat, it lives in the operation](/magazine/agent-memory-from-trace-en)
- [Token-to-Outcome: the KPI that separates used AI from profitable AI](/magazine/token-to-outcome-kpi-ai-profitability-en)
- [Agent Reliability Score: how to know if an agent deserves autonomy](/magazine/agent-reliability-score-autonomy-en)

## Sources consulted

- [OpenTelemetry: Semantic conventions for generative AI systems](https://opentelemetry.io/docs/specs/semconv/gen-ai/)
- [LangSmith Observability](https://docs.langchain.com/oss/python/langchain/observability)
- [OpenAI: A practical guide to building agents](https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf)

## Next step

Pick an agent that already delivers value. Don't start by improving prompts. Start by instrumenting a full trace of a real run and look for where evidence is lost.

---

*Translated from the Spanish original with AI assistance and reviewed for accuracy. [Read the original in Spanish](/magazine/ai-traces-capa-convierte-agentes-sistemas-auditables-es).*

---

_Cite as: Berthelius, V. (2026). "AI Traces: the layer that turns agents into auditable systems". BRTHLS Magazine. https://www.brthls.com/magazine/ai-traces-auditable-agents-en_