Skip to content
Back to Magazine
ai-operating-models 4 min read

The Jagged Frontier of AI: The Failure Map Every Team Needs Before Automating

Does this apply to your company?

Free 30-min AI diagnostic →

Key Takeaways

  • - Green zone: repeatable, verifiable tasks with low error cost.
  • - Yellow zone: useful tasks with light human review.
  • - Red zone: tasks where a plausible answer could cause operational harm.
  • - Gray zone: tasks where there is still insufficient evidence to decide.

Decision

Decide what governance, ownership or cadence is missing before scaling AI.

Room

Executive committee, AI portfolio review, transformation steering.

Risk

Mistaking activity, pilots and tooling for real operating capability.

Agent prompt: map decision rights, KPIs, risks and the next operational move

Problem

The conversation about AI usually treats capability as a continuous line: this model is better than the previous one, this benchmark rises, this task can already be automated, that other one cannot yet.

In practice it does not work that way. Two tasks that seem equally difficult for a person can fall on opposite sides of the model’s capability. One is solved with surprising quality. The other produces a convincing but wrong answer.

That pattern is known as the jagged frontier. And in 2026 it is a more important idea than ever because agents not only respond: they act.

Thesis

Before automating, every team should draw its jagged‑frontier map.

It is not enough to ask “can AI do this?” One must ask: in which variants does it perform well, in which does it appear to perform well but fail, what signals anticipate the failure, and where should it stop before executing.

The jagged frontier turns governance into concrete work. It is not a committee saying “be careful with AI”; it is a matrix of tasks, errors, tests, and operational limits.

Framework

Map each workflow into four zones:

  • Green zone: repeatable, verifiable tasks with low error cost.
  • Yellow zone: useful tasks with light human review.
  • Red zone: tasks where a plausible answer could cause operational harm.
  • Gray zone: tasks where there is still insufficient evidence to decide.

Mini‑case: an agent can summarize tickets and extract entities with good performance. It can also propose refund policies in exceptional cases. Both look like text tasks, but they do not belong to the same zone. The first is validated against data. The second mixes judgment, exceptions, legal risk, and customer experience.

Measurable signal: percentage of failures detected before execution versus failures discovered by the client, user, or downstream team.

Stance: automating without a frontier map is delegating not only work but also ignorance.

Why it matters now

The Harvard Business School and BCG study on the jagged frontier showed a paradox: inside the frontier, participants with AI completed more tasks, faster and with higher quality; outside it, they were less likely to produce correct solutions than those who did not use AI.

In March 2026, Harvard revisited the research after its formal publication in Organization Science. The lesson remains uncomfortable: AI does not simply help or fail. Sometimes it helps just enough for the failure to look professional.

Stanford HAI, in its AI Index 2026, also highlights governance, transparency, and hallucination problems. The context is not “AI is useless”; it is that systems need situated evaluation, not blind faith in the model.

Anti-example

“The benchmark says the model is good at reasoning, so it can make this decision.”

A general benchmark does not describe your local frontier. Your process has incomplete data, exceptions, internal policies, legacy systems, real customers, and consequences. The jagged frontier is not bought on a spec sheet; it is discovered in controlled execution.

Protocol (3 steps)

  1. List variants, not tasks. “Respond to tickets” is not a task; simple refund, potential fraud, and enterprise client are distinct variants.
  2. Force frontier cases. Test ambiguity, contradictory data, incomplete instructions, and rare exceptions.
  3. Define automatic stop. If a red signal appears, the agent must escalate, not improvise.
ZoneExampleMinimum control
greenrepeatable ticket classificationsampling and metrics
yellowdrafting sensitive responselight review
redapproving economic exceptionresponsible human
graynew process without historyclosed pilot

Sources consulted

Next step

Choose a process you want to automate and do not run a generic test. Build twenty variants: ten easy, five ambiguous, three rare, and two dangerous. That’s where your real map starts.


Translated from the Spanish original with AI assistance and reviewed for accuracy. Read the original in Spanish.

jagged-frontier ai-governance ai-automation ai-operating-models
Cite this article

Berthelius, V. (2026). “The Jagged Frontier of AI: The Failure Map Every Team Needs Before Automating”. BRTHLS Magazine. https://www.brthls.com/magazine/jagged-ai-frontier-failure-map-en

Fractional CAIO · Free diagnostic

Is your company ready to operate with AI?

30 minutes. No pitch. An honest read on where you are and what to move first.

Book free diagnostic