The Jagged Frontier of AI: The Failure Map Every Team Needs…

Problem

The conversation about AI usually treats capability as a continuous line: this model is better than the previous one, this benchmark rises, this task can already be automated, that other one cannot yet.

In practice it does not work that way. Two tasks that seem equally difficult for a person can fall on opposite sides of the model’s capability. One is solved with surprising quality. The other produces a convincing but wrong answer.

That pattern is known as the jagged frontier. And in 2026 it is a more important idea than ever because agents not only respond: they act.

Thesis

Before automating, every team should draw its jagged‑frontier map.

It is not enough to ask “can AI do this?” One must ask: in which variants does it perform well, in which does it appear to perform well but fail, what signals anticipate the failure, and where should it stop before executing.

The jagged frontier turns governance into concrete work. It is not a committee saying “be careful with AI”; it is a matrix of tasks, errors, tests, and operational limits.

Framework

Map each workflow into four zones:

Green zone: repeatable, verifiable tasks with low error cost.
Yellow zone: useful tasks with light human review.
Red zone: tasks where a plausible answer could cause operational harm.
Gray zone: tasks where there is still insufficient evidence to decide.

Mini‑case: an agent can summarize tickets and extract entities with good performance. It can also propose refund policies in exceptional cases. Both look like text tasks, but they do not belong to the same zone. The first is validated against data. The second mixes judgment, exceptions, legal risk, and customer experience.

Measurable signal: percentage of failures detected before execution versus failures discovered by the client, user, or downstream team.

Stance: automating without a frontier map is delegating not only work but also ignorance.

Why it matters now

The Harvard Business School and BCG study on the jagged frontier showed a paradox: inside the frontier, participants with AI completed more tasks, faster and with higher quality; outside it, they were less likely to produce correct solutions than those who did not use AI.

In March 2026, Harvard revisited the research after its formal publication in Organization Science. The lesson remains uncomfortable: AI does not simply help or fail. Sometimes it helps just enough for the failure to look professional.

Stanford HAI, in its AI Index 2026, also highlights governance, transparency, and hallucination problems. The context is not “AI is useless”; it is that systems need situated evaluation, not blind faith in the model.

Anti-example

“The benchmark says the model is good at reasoning, so it can make this decision.”

A general benchmark does not describe your local frontier. Your process has incomplete data, exceptions, internal policies, legacy systems, real customers, and consequences. The jagged frontier is not bought on a spec sheet; it is discovered in controlled execution.

Protocol (3 steps)

List variants, not tasks. “Respond to tickets” is not a task; simple refund, potential fraud, and enterprise client are distinct variants.
Force frontier cases. Test ambiguity, contradictory data, incomplete instructions, and rare exceptions.
Define automatic stop. If a red signal appears, the agent must escalate, not improvise.

Zone	Example	Minimum control
green	repeatable ticket classification	sampling and metrics
yellow	drafting sensitive response	light review
red	approving economic exception	responsible human
gray	new process without history	closed pilot

Sources consulted

Next step

Choose a process you want to automate and do not run a generic test. Build twenty variants: ten easy, five ambiguous, three rare, and two dangerous. That’s where your real map starts.

Translated from the Spanish original with AI assistance and reviewed for accuracy. Read the original in Spanish.

The Jagged Frontier of AI: The Failure Map Every Team Needs Before Automating

Key Takeaways

Problem

Thesis

Framework

Why it matters now

Anti-example

Protocol (3 steps)

Sources consulted

Next step

Related Reading

MiniMax M3: el open weight que baja el umbral para agentes largos

MiniMax M3: The Open Weight That Lowers the Threshold for Long Agents

MiniMax M3: open weight-modellen der sænker tærsklen for lange agenter

Context Supply Chain: the supply chain that decides if your AI knows how to work

La frontera dentada de la IA: el mapa de fallos que todo equipo necesita antes de automatizar