Problem
Many teams only instrument agents when something breaks.
That leaves out half the problem. An agent can work flawlessly and still destroy margin: too many tokens, too much latency, too many retries, or too much cost per user for the value delivered.
If observability only helps you investigate incidents, you’re too late. The system’s economy has already taken a hit.
Thesis
The new AI observability matters because it shifts observation from technical debugging to economic management of the product.
It’s not enough to know what prompt went in and what output came out. You need to see:
- how much each conversation costs
- which organization or client consumes the most
- what latency worsens the experience
- what traces or sessions explain the cost
When this layer exists, the agent stops being an expensive magic trick and becomes an operable unit.
Framework
Business-oriented AI observability needs five views:
- Conversation: what went in, what came out, and with what context.
- Trace: what chain of calls and tools occurred.
- Cost: how much it consumes per chat, user, org, or workflow.
- Performance: latency, errors, and throughput.
- Session: how multiple interactions connect within the actual journey.
Mini-case: a support agent seems useful because it responds well. But when you combine cost, latency, and sessions, you see that 20% of clients trigger expensive loops when switching languages and attaching screenshots. This finding doesn’t appear in a prompt table. It appears when observability and product intersect.
Measurable signal: total cost per useful outcome, separated by workflow and client segment.
Why It Matters Now
PostHog’s official documentation positions AI Observability as a layer to capture LLM conversations, tokens, cost, latency, errors, traces, and multi-conversation sessions. It also highlights something relevant for the operating model: how much each chat, user, or organization is costing.
The product page itself reinforces this reading. It talks about cost analysis, performance monitoring, traces, and native integrations, presenting them as regular events within the product system. This combination matters because it brings AI into a language that business and product already understand.
The consequence is clear: agent observability is no longer just a console for engineers. It’s becoming a layer of operational accounting.
Anti-Example
“We already have prompt logs.”
That explains a single call. It doesn’t explain accumulated cost, performance per client, long sessions, margin leaks, or comparisons between workflows.
Protocol (3 steps)
- Link observability and P&L. Don’t measure tokens without outcome.
- Look by organization and workflow. Average cost hides expensive problems.
- Label loops and retries. Many leaks come from silent iterations.
| Layer | Question | Risk if missing |
|---|---|---|
| conversation | what happened in each call | partial reading |
| trace | what chain produced it | slow debugging |
| cost | who pays how much | blind margin |
| performance | where experience drops | normalized latency |
| session | what pattern repeats | invisible leak |
Related
- AI Traces: the layer that turns agents into auditable systems
- Agent Memory from Trace: useful memory doesn’t live in the chat, it lives in operation
- Token-to-Outcome: the KPI that separates used AI from profitable AI
Sources Consulted
- Getting started with AI Observability
- AI Observability - PostHog
- Manual capture AI Observability installation
Next Step
Choose an agent already in production and calculate three things in the same view: cost per organization, latency per workflow, and percentage of sessions with retries. That’s where the conversation about real margin usually starts.
Translated from the Spanish original with AI assistance and reviewed for accuracy. Read the original in Spanish.