Problem
The question “which agent framework do we use?” sounds technical. In reality it often hides a more uncomfortable decision: which part of the operation you want to govern with agents.
In 2026 we no longer talk only about LangGraph or CrewAI. Proposals such as Eve, Flue, Microsoft Agent Framework, OpenAI Agents SDK, Vercel AI SDK and Factory appear. Some are frameworks. Others are harnesses. Others are platforms. Others are software factories disguised as products.
If you compare them as if they were equivalent, you choose wrong. Not because they are bad tools, but because they solve different layers.
Thesis
An agent framework is not chosen for hype. It is chosen for the control layer missing in your system.
The correct decision starts with an operational question:
- do you need a durable runtime?
- do you need a programmable harness?
- do you need state orchestration?
- do you need fast multi‑agent?
- do you need enterprise integration?
- do you need a software factory?
- do you just need to set up a first case with guardrails?
The tool comes later.
Framework
Think of the agent market in five layers:
| Layer | What it solves | Correct question |
|---|---|---|
| Light SDK | tools, handoffs, guardrails, tracing | who controls the app and the state |
| Runtime/orchestrator | state, memory, branching, long tasks | how a complex flow is executed |
| Harness | channels, workflows, policies, durable cycles | how you package the agent as a system |
| Platform | hosting, sandbox, permissions, deployment, observability | where the agent lives and operates |
| Factory | SDLC, QA, review, flow learning | how the complete software work scales |
The common mistake is buying the upper layer to solve a deficiency of the lower layer. Or the reverse: mounting a low‑level framework when the real problem is deployment, permissions, and responsibility.
Quick map
| Option | Primary layer | Best for | Main risk | Don’t use it if |
|---|---|---|---|---|
| Eve | durable framework/platform on Vercel | backend agents with filesystem, sandbox, approvals and subagents | beta and lock‑in to Vercel stack | you need total infrastructure neutrality |
| Flue | durable TypeScript harness | programmable agents with workflows, channels and policies | young ecosystem | you lack the technical capacity to maintain the runtime |
| LangGraph | stateful runtime/orchestrator | complex flows, memory, human‑in‑the‑loop and granular control | over‑engineering | you only need a simple business case |
| CrewAI | high‑level multi‑agent framework | crews, flows and operational prototypes with clear roles | role theater if there is no evaluation | you still don’t know which decision each agent should make |
| OpenAI Agents SDK | light SDK | apps that already control their stack and want tools, handoffs and guardrails | your team maintains the operation | you need a full out‑of‑the‑box platform |
| Microsoft Agent Framework | enterprise framework | Microsoft/.NET/Python/Azure teams that need MCP, A2A and AutoGen/Semantic Kernel continuity | enterprise ecosystem dependency | your stack doesn’t live near Microsoft |
| Vercel AI SDK | AI product SDK | interfaces, tool calling and web apps with simple agents | does not replace an agent operating system | you need a complex durable workflow |
| Factory | software platform/factory | scaling development, QA and agent‑native software delivery | not a generalist framework | you are looking to build any business agent |
The table does not decide for you. Avoid comparing things that do not belong to the same category.
Why it matters now
Eve sends a clear signal: Vercel does not present it as another chatbot. It is oriented to durable backend agents, filesystem‑first, with sandbox, workflows, approvals, subagents and evals. In other words, it wants the agent to have a workplace, not just a conversation.
Flue pushes another reading: the agent as a programmable harness. Its proposal revolves around TypeScript, tasks, workflows, channels, policies and runtime. Cloudflare positions it as a layer that can rely on primitives of its Agents SDK. The important word is not “agent”; it is “harness”.
LangGraph remains strong when you need state control, memory, persistence and complex loops. It does not compete with a pretty landing page or a simple wrapper. It competes against the chaos of workflows that need to be traceable and resumable.
CrewAI has another place: accelerating the construction of crews and multi‑agent flows with a more opinionated layer. It is useful when the team understands roles, responsibilities and outputs. It is dangerous when it is only used to assign human names to processes that no one has designed.
OpenAI Agents SDK and Vercel AI SDK are lighter. They serve when your app already has architecture and you want to incorporate tools, handoffs, guardrails, streaming or model calls without buying an entire platform.
Microsoft Agent Framework carries weight for another reason: enterprise continuity. If your organization lives in .NET, Python, Azure, Microsoft 365, AutoGen or Semantic Kernel, the decision is not only technical. It is about integration, compliance and internal support.
Factory plays a different league. It should not enter as an “agent framework” at the same level. It is a bet on software factories: agents and systems that observe, test and improve the delivery chain. It helps understand where the market is heading, but does not replace a general SDK for building business agents.
Anti-example
“Let’s try Eve, Flue, LangGraph, CrewAI and Factory and see which wins.”
That benchmark measures nothing. Eve and Flue compete more on how to run durable agents. LangGraph competes on state control. CrewAI competes on speed of modeling crews. OpenAI Agents SDK competes on lightness within an app. Microsoft Agent Framework competes on enterprise fit. Factory competes on transforming software delivery.
If everyone enters the same table without distinguishing layer, the decision is contaminated from the start.
Protocol (3 steps)
- Define the work that must survive the prompt. Which state, permissions, memory, artifacts and evidence must persist when the conversation ends.
- Choose the missing layer. SDK, runtime, harness, platform or factory. Don’t buy a platform if only a tool loop is missing. Don’t mount a runtime if the problem is ownership.
- Run a test with a real case. The case should include input, tool use, failure, retry, supervision, evidence and closure criteria.
| Decision | Question | If the answer is yes |
|---|---|---|
| durable state | the agent must resume work days later | look at Eve, Flue or LangGraph |
| web/product UI | the user interacts in a own app | look at Vercel AI SDK or OpenAI Agents SDK |
| Microsoft enterprise | data, permissions and teams live in Microsoft | look at Microsoft Agent Framework |
| fast multi‑agent | you need roles and visible flows soon | look at CrewAI, but with evals |
| software delivery | the problem is development, QA and review | look at Factory as a platform |
| low‑level control | you need to govern each transition | look at LangGraph |
| safe production | you need sandbox, approvals and evidence | require guardrails before choosing a vendor |
Long decision guide
Eve: when the agent needs a workplace
Eve is interesting because it starts from the filesystem. That changes the mental model. The agent not only responds; it creates, modifies, executes and leaves traces in a work environment. For teams already close to Vercel, the combination with Workflows, Sandbox, AI Gateway and Connect can greatly reduce the gap between demo and production.
Caution: it is in beta. I would not sell it as a mature standard for any organization. I would test it when the Vercel stack already exists and the case needs durable backend agents.
Flue: when you want a programmable harness
Flue fits if the team wants to write the agent’s behavior as software, not as product configuration. Tasks, workflows, channels, policies and runtime give a clear structure for systems that must operate beyond a single request.
Caution: it requires engineering judgment. If the team is looking for “something that does everything”, Flue does not eliminate the need to design operations. It orders them.
LangGraph: when state matters
LangGraph remains one of the most serious options when the problem is state: memory, checkpoints, branches, human‑in‑the‑loop, retries and flows that do not fit in a linear sequence.
Caution: you could end up building a nuclear plant to light a bulb. If the case does not need complex state, LangGraph may be overkill.
CrewAI: when you need to model crews and flows
CrewAI is attractive because it lowers the friction of building multi‑agent systems. Roles, crews, flows, memory, knowledge and guardrails help turn an intuition into a rapid prototype.
Caution: multi‑agent seduces. Assigning a “researcher”, a “planner” and a “reviewer” does not create governance. It only creates theater if there are no evaluations, owners and exit criteria.
OpenAI Agents SDK: when the app drives
OpenAI Agents SDK makes sense when you already know your application will be the main container. The framework helps with agents, handoffs, guardrails, tracing and tool use, but does not aim to replace your operational architecture.
Caution: if you need complex persistence, multi‑team policies, durable runtime and long operations, you will have to design it or rely on another layer.
Microsoft Agent Framework: when the enterprise already lives in Microsoft
Microsoft Agent Framework matters less for novelty and more for continuity. It brings together the path of AutoGen and Semantic Kernel, with .NET/Python support and an enterprise narrative around MCP, A2A and multiple providers.
Caution: it is a great option when the enterprise environment justifies it. For a small team outside the Microsoft ecosystem it may be more infrastructure than needed.
Factory: when the product is not the agent, it is the factory
Factory should not be on the same line as LangGraph or CrewAI. Its promise is to transform software delivery: agents, droids, QA, evidence and flow learning. It is a decision about the engineering operating model.
Caution: if you want to set up a support, sales or internal operations agent, Factory is not the first place you would look. If you want to redesign how software is produced, yes.
BRTHLS matrix for choosing
Before choosing a tool, fill out this matrix:
| Criterion | Low | Medium | High |
|---|---|---|---|
| criticality | reversible output | affects internal workflow | affects client, money or compliance |
| duration | seconds | minutes/hours | days or asynchronous processes |
| state | stateless | short history | persistent memory and artifacts |
| permissions | one tool | several internal tools | sensitive data and real actions |
| supervision | final review | approvals per stage | invocable human control |
| evidence | basic logs | step traces | reproducible audit |
Simple rule:
- Low: Light SDK.
- Medium: harness or runtime.
- High: platform, sandbox, observability, evaluation and ownership before autonomy.
Related
- Factory 2.0: the engineer no longer scales only code, scales software factories
- Tool Registry: the new risk map of enterprise agents
- A2A + MCP: agent protocols are not product, they are nervous system
- AI Observability stops being debugging: now it decides margin
Sources consulted
- Vercel: Introducing Eve
- Vercel Eve documentation
- Flue documentation
- Cloudflare: the Agents SDK and Flue
- LangGraph overview
- CrewAI documentation
- OpenAI Agents SDK guide
- Vercel AI SDK Agents
- Microsoft Agent Framework 1.0
- Microsoft AutoGen repository
- Factory 2.0: From coding agents to software factories
Next step
If you are choosing an agent framework, you don’t need another demo. You need to decide which operation you want the agent to execute, what permissions it will have, how it is audited, where it stops and what evidence it leaves.
At BRTHLS we can build it with you: decision map, minimal architecture, first productive case, guardrails, evaluation and a comparative table adapted to your real stack. Start by contacting us and bring a short list of processes where there is already repetitive work, decision risk or handoffs that burn margin.
Translated from the Spanish original with AI assistance and reviewed for accuracy. Read the original in Spanish.