Skip to content
Back to Magazine
automation-aiops 3 min read

Sandboxed Work: the new execution perimeter for production agents

Does this apply to your company?

Free 30-min AI diagnostic →

Key Takeaways

  • - Filesystem: what it can read, create, or modify.
  • - Network: which endpoints it may touch and from where.
  • - Secrets: which credentials are injected and for how long.
  • - Tools: which commands, APIs, or MCP servers it may invoke.

Decision

Separate reliable automation from fragile demo before granting it autonomy.

Room

Operations review, architecture, security or platform.

Risk

Adding speed with no observability, rollback, ownership or stop criterion.

Agent prompt: identify guardrails, control points, likely failures and autonomy criteria

Problem

An agent that only drafts text can fail with little damage. An agent that executes code, queries private systems, installs dependencies, writes files, or touches infrastructure fails on another level.

The error is no longer a bad response. It is a side effect.

Many companies try to solve it with prompts: “don’t do anything dangerous”. That is not a control. It is a hope written in natural language.

Thesis

Sandboxed Work will be the new perimeter for agents that do real work.

The point is not to cage the model. The model does not execute. The point is to cage the action: filesystem, network, secrets, tools, processes, time, cost, and permissions.

Mature architecture separates brain, orchestrator, sandbox, and target systems.

Framework

An agentic sandbox must define five limits:

  • Filesystem: what it can read, create, or modify.
  • Network: which endpoints it may touch and from where.
  • Secrets: which credentials are injected and for how long.
  • Tools: which commands, APIs, or MCP servers it may invoke.
  • Timebox: how long the work runs before the environment is killed.

Mini-case: a support agent reproduces a client bug. In a sandbox it can clone the repo, install dependencies, run tests, read sanitized logs, and propose a patch. On a shared machine it could contaminate the environment, leak secrets, or leave processes alive.

Measurable signal: percentage of agent actions with ephemeral environment, logs, and declared limits.

Stance: if the agent can execute, it must also be containable.

Why it matters now

Cloudflare has pushed sandboxes for agents as isolated and scalable environments. Anthropic has introduced Managed Agents with external sandboxes and MCP tunnels. AWS has taken MCP to cloud operations with IAM, CloudWatch, CloudTrail, and bounded execution.

The trend is not just “more agents”. It is agents with hands.

And when a system has hands, it needs gloves, a cleanroom, and movement logs.

Anti-example

“We have a shared runner for all agents.”

That mixes contexts, permissions, and execution residues. An exploration agent should not live in the same perimeter as an agent that touches production.

Protocol (3 steps)

  1. Classify actions by risk. Read, write, execute, private network, secrets, and production.
  2. Create sandboxes by class. Do not use the same environment for investigation, build, and sensitive systems.
  3. Destroy by default. The environment must expire, log, and clean up.
LayerMinimum controlOperational question
filesystemisolated directorywhat it can read
networkallowlistwhere it can call
secretsephemeral tokenshow long they last
toolspermissions per toolwhat it can do
timetimeoutwhen it dies

Sources consulted

Next step

Draw an execution diagram of your most dangerous agent: model, orchestrator, sandbox, secrets, network, tools, logs, and target system. If you don’t know where the limit is, you still don’t have a perimeter.


Translated from the Spanish original with AI assistance and reviewed for accuracy. Read the original in Spanish.

sandboxes agents aiops cloudflare
Cite this article

Berthelius, V. (2026). “Sandboxed Work: the new execution perimeter for production agents”. BRTHLS Magazine. https://www.brthls.com/magazine/sandboxed-work-execution-perimeter-en

Fractional CAIO · Free diagnostic

Is your company ready to operate with AI?

30 minutes. No pitch. An honest read on where you are and what to move first.

Book free diagnostic