The Case In opposition to Constructing Your Personal Agent Platform – O’Reilly

June 18, 2026

22

You realize the assembly. The board desires an AI agent technique by finish of quarter. Somebody on the management workforce has learn a McKinsey report. You’ve been voluntold to construct the platform. The slide deck says “AI-native.” The acceptance standards are imprecise. Someone mentions LangGraph, and any person else says, “We’ll simply wrap it ourselves.”

You ask what “completed” seems like. No person within the room can reply.

The price of constructing that is nearly all the time estimated earlier than anybody has a transparent image of what “this” really is. And that’s the issue I need to work via right here, as a result of the scope of the work being casually assigned to inside platform groups proper now could be genuinely bigger than the individuals assigning it perceive.

Construct versus purchase, flipped in a yr

This specific pendulum has swung earlier than. App servers within the late Nineteen Nineties. Content material administration programs within the 2000s. Container orchestration within the 2010s. The sample rhymes each time: When a class is new, the elements look deceptively easy. Early adopters construct their very own. The market catches up. Inside 18 months, constructing turns into the costly path. Inside 36 months, the groups that constructed internally are rewriting on high of the class winner that emerged whereas they weren’t trying.

What’s completely different in regards to the present second is the velocity. Menlo Ventures’ 2025 State of Generative AI within the Enterprise report exhibits the build-versus-buy cut up inverted in a single yr. In 2024, 47% of enterprise AI options have been constructed internally. By late 2025, that quantity had collapsed to 24%. The market made the choice in 12 months, which is uncommon.

I’ve lived via sufficient of those transitions to acknowledge the form. What I need to do on this piece is clarify why I feel the scope of “agent platform” is systematically underestimated proper now, and what platform engineers needs to be asking earlier than they decide to constructing one.

Most “agent platforms” aren’t

Numerous the tasks labeled “agent platform” proper now are literally workflow programs with an LLM within the loop. That’s a significant distinction. As Anthropic identified in its “Constructing Efficient Brokers” steerage, workflows are programs the place LLMs and instruments are orchestrated via predefined code paths. Brokers are programs the place LLMs dynamically direct their personal processes and power utilization.

Most of what enterprises are delivery at the moment sits on the workflow aspect. That’s advantageous. Workflows have bounded necessities, tractable testing, and predictable failure modes. In case your workforce is constructing a workflow system, you would possibly fairly construct it yourselves.

The lure is that groups begin constructing for workflows, then get requested to help brokers, and uncover the leap isn’t incremental. Brokers want reminiscence that survives throughout periods. They want analysis that handles nondeterminism. They want governance that tracks actions, not simply outputs. They want orchestration that recovers from failure modes a workflow engine by no means sees.

Right here’s the thesis I need to placed on the desk: The choice to construct an agent platform nearly all the time underestimates the lengthy tail. Reminiscence, governance, eval, and orchestration aren’t options you add to a workflow engine. They’re separate product bets, every with its personal maturity curve, its personal vendor panorama, and its personal workforce of specialists who’ve been engaged on it full-time for 18 months whilst you’ve been doing one thing else.

Let me stroll via them.

Reminiscence

The belief inside most construct proposals is that reminiscence is a database drawback. You’ll choose a vector retailer, shove dialog historical past into it, and retrieve related chunks when the agent wants context. Accomplished.

Manufacturing reminiscence is three separate programs: episodic, semantic, and procedural, every with completely different retention and retrieval insurance policies. It’s temporal reasoning that tracks when details have been legitimate, not simply what they have been. It’s deduplication, multitenant isolation, and express source-of-truth governance.

The sign that it is a separate product class, not a function: Mem0 raised $24 million throughout seed and Collection A. Letta (previously MemGPT) raised $10M from Felicis. Zep exists as an unbiased firm with a temporal data graph engine. Mem0’s State of AI Agent Reminiscence 2026 report maps 21 frameworks throughout three internet hosting fashions with measurable benchmark gaps between them. On LongMemEval, Zep scores 15 factors larger than Mem0 on temporal queries, which tells you these aren’t interchangeable instruments that occur to serve the identical market.

That is the element that platform groups underestimate hardest. Reminiscence appears like a database drawback. It isn’t.

Governance

The belief is that governance is RBAC plus audit logging. Your brokers are providers. Companies get role-based entry controls. You log the software calls. Compliance is pleased.

Agent governance is one thing completely different. It spans motion authorization, not simply information authorization. It requires decision-chain auditability, the place you’ll be able to reconstruct why the agent did what it did, not simply what it did. It wants behavioral drift detection, tiered autonomy, and compliance mapped to agent actions quite than information accesses.

Grant Thornton’s 2026 AI Impression Survey of 950 enterprise executives discovered that 78% lack sturdy confidence they may move an unbiased AI governance audit inside 90 days. In the meantime, enterprises are shifting to extend agent autonomy sooner than their governance frameworks can sustain. Conventional AI governance wasn’t designed for action-level authorization, which is the place most agent-specific danger accumulates.

And there’s a tough deadline connected to this. The EU AI Act turns into absolutely enforceable for high-risk programs in August 2026. Credit score scoring, hiring choices, healthcare help, and demanding infrastructure all fall in scope. In case your inside platform doesn’t deal with conformity assessments, human oversight mechanisms, full audit trails, and ongoing monitoring, that’s not a v2 function. That’s a authorized publicity.

OWASP now paperwork “extreme company” as a high vulnerability class for LLM functions. Cornell researchers have demonstrated oblique immediate injection assaults that manipulate brokers via content material they ingest. These are agent-specific assault surfaces, and conventional safety tooling doesn’t see them.

RBAC was designed for people with predictable intent. Brokers don’t have predictable intent.

Eval

The belief is that analysis means writing take a look at circumstances and measuring accuracy. You constructed software program earlier than. You understand how to check issues.

Agent analysis is qualitatively completely different from conventional software program testing and even LLM analysis, McKinsey’s QuantumBlack workforce famous: For LLMs, you consider the response to a immediate. For a single agent, you consider the complete trajectory, together with software calls, state transitions, and intermediate choices. For multi-agent programs, you consider system dynamics, together with coordination patterns and collective invariants.

This issues as a result of agent conduct is nondeterministic by design. The identical enter produces completely different legitimate execution paths. “Did the agent succeed?” is not a yes-or-no query, as a result of the agent would possibly attain the correct reply via a trajectory you didn’t anticipate, or attain the incorrect reply via a trajectory that appears cheap till the final step.

The tooling ecosystem displays this. Google Vertex AI has standardized trajectory_exact_match, trajectory_precision, and trajectory_recall as manufacturing metrics. These didn’t exist 18 months in the past. LangSmith, Braintrust, Arize, Galileo, Maxim, and others are constructing full analysis platforms round trajectory-based evaluation, LLM-as-judge scoring with statistical validation, and regression testing in opposition to manufacturing failures.

Right here’s the sign that the class is actual: LangChain’s 2026 State of AI Brokers report discovered that 57% of organizations now have brokers in manufacturing, and 32% cite high quality as the highest deployment barrier. Gartner tasks that 60% of software program engineering groups will undertake AI analysis and observability platforms by 2028, up from 18% in 2025. When a class jumps from 18% to 60% adoption in three years, that’s not a “we are able to construct this in a dash” state of affairs.

You possibly can’t inform whether or not your analysis is working with out one other analysis. Choose drift, calibration in opposition to human specialists, inside consistency throughout unbiased runs. . .your eval system wants its personal eval system, which is strictly the type of recursion that eats platform groups alive.

Orchestration

The orchestration layer hasn’t converged. LangGraph makes use of directed graphs with conditional edges. CrewAI makes use of role-based crews. OpenAI’s Brokers SDK makes use of express handoffs. AutoGen makes use of conversational GroupChat. Google ADK makes use of hierarchical agent timber. Claude’s Agent SDK makes use of tool-use chains with subagents. Microsoft’s Agent Framework is its personal factor. Every represents a distinct wager on state administration, communication sample, and coordination mannequin. None of them are interchangeable. Migration between them isn’t a config change—it’s rewriting most of your agent logic.

Beneath them, the protocol layer remains to be being invented. The Mannequin Context Protocol is changing into the usual for software integration, and agent-to-agent (A2A) protocols are rising for cross-framework coordination. Each are shifting targets, and constructing on a shifting protocol is a price that inside platform groups hardly ever worth in.

When you constructed your individual orchestration layer in 2024, you’re rewriting it in 2026. The groups that picked a framework spent these two years delivery.

The sincere case for constructing

I need to have interaction the strongest model of the construct argument, as a result of there are actual causes to construct, and pretending in any other case makes this piece much less helpful than it needs to be.

Proprietary information genuinely is a sturdy aggressive moat. Mastercard constructed a basis mannequin on its transaction community. Plaid constructed one on its monetary establishment protection. As Morgan Stanley’s evaluation from final yr made clear, a long time of verified historic information with constant identifiers is each technically difficult and prohibitively costly for out of doors gamers to recreate. In case your group has information like that, it is best to completely construct on it.

Regulated industries have professional causes to need management over the complete stack. Off-the-shelf AI instruments don’t all the time cleanly map to frameworks like HIPAA, GxP, 21 CFR Half 11, SOX, FFIEC, and PCI DSS, and the price of a failed audit is measured in enterprise models shut down, not in sprints.

Vendor lock-in on the AI layer is subtler and extra harmful than in conventional software program. In case your agentic workflows are constructed on a vendor’s proprietary orchestration layer, switching prices compound quickly throughout reminiscence, eval, and integrations concurrently.

However right here’s the excellence that issues: These are arguments for constructing brokers on high of platform elements, not arguments for constructing the platform elements themselves. You possibly can personal the info, the area logic, the analysis standards, the governance insurance policies, and the particular behaviors your corporation wants with out proudly owning the reminiscence layer, the orchestration engine, or the hint assortment infrastructure beneath them.

Construct the issues which might be particular to your corporation. Purchase the issues which might be particular to the know-how class. That’s the heuristic.

5 questions earlier than you commit

When you’re the platform engineer being pulled into this choice, listed here are the questions value asking earlier than anybody indicators up for the scope.

Are you constructing an agent platform or a workflow system? They’re not the identical scope, and conflating them is the place a lot of the price overruns originate. A workflow system is an inexpensive factor to construct. An agent platform is 4 product classes you haven’t staffed for.

Are you able to articulate what “completed” seems like for every of the 4 elements? Reminiscence, governance, eval, orchestration. In beneath three sentences every. When you can’t, you don’t have necessities. You might have a vibe. And vibes don’t ship.

What occurs to your platform when you could swap the underlying mannequin? Menlo’s December 2025 information exhibits Anthropic went from 12% of enterprise LLM spend in 2023 to 40% in 2025, whereas OpenAI fell from 50% to 27%. Enterprises didn’t plan these switches. The aptitude gaps compelled them. In case your inside platform hardcoded assumptions about context home windows, tool-calling codecs, or reasoning types from one vendor, swapping fashions isn’t an API key change. It’s simultaneous rewrites throughout reminiscence, eval, and orchestration.

What occurs when the strategies themselves change? Eighteen months in the past the default sample was RAG with flat vector retrieval. Now it’s just-in-time context methods, agent-managed reminiscence tiers, and trajectory-based analysis. Anthropic’s personal follow-up to “Constructing Efficient Brokers” explicitly acknowledges the sphere has moved since they wrote the unique. In case your platform baked within the 2024 patterns, the 2026 patterns are a refactor, not a config change. Vendor platforms take up these shifts as releases. Inner platforms take up them as sprints.

What occurs when the platform workforce leaves? That is the story as previous as COBOL, customized ESBs in 2008, or hand-rolled container orchestration in 2015. A small workforce builds one thing intelligent, it really works, they transfer on, and 5 years later you’re paying premium charges to contractors who can nonetheless learn the code. Agent platforms are a very unhealthy candidate for this sample as a result of the expertise pool is each small and cellular. Right here’s the uncomfortable model of the query: Who in your workforce, at the moment, might rebuild the reminiscence layer if the one who wrote it left tomorrow?

What this seems like in 2 years

Gartner’s prediction that over 40% of agentic AI tasks might be canceled by 2027 isn’t actually in regards to the AI. It’s about tasks that acquired scoped earlier than anybody understood the form of the work. A lot of the canceled tasks might be inside builds, as a result of inside builds are the place the scope estimation error accumulates. Deloitte’s information on two- to four-year AI ROI horizons is the warning shot. In case your timeline to worth is already lengthy, each month you spend rebuilding a element that exists as a product is a month you don’t have.

The groups that constructed their platforms round OpenAI in 2023 weren’t incorrect. They made an inexpensive wager available on the market chief on the time. However they spent 2025 porting to a panorama the place Anthropic had tripled share and Google had gone from 7% to 21%. The groups that picked model-agnostic platforms spent 2025 delivery. The one sturdy wager on this house is the one which assumes the wager will change.

The most effective platform engineering choice you can also make this quarter is likely to be to not construct the platform.

Sources

Main sources

Menlo Ventures, 2025: The State of Generative AI within the Enterprise, December 2025,
https://menlovc.com/perspective/2025-the-state-of-generative-ai-in-the-enterprise/.
Anthropic, “Constructing Efficient Brokers,” December 2024,
https://www.anthropic.com/analysis/building-effective-agents.
Anthropic, “Efficient Context Engineering for AI Brokers,” 2025,
https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents.
European Fee, AI Act Regulatory Framework (Regulation EU 2024/1689),
https://digital-strategy.ec.europa.eu/en/insurance policies/regulatory-framework-ai.
Google Cloud, “Consider Gen AI Brokers,” Vertex AI Documentation,
https://cloud.google.com/vertex-ai/generative-ai/docs/fashions/evaluation-agents.
McKinsey QuantumBlack, “Evaluations for the Agentic World,”
https://medium.com/quantumblack/evaluations-for-the-agentic-world-c3c150f0dd5a.
LangChain, State of Agent Engineering 2026,
https://www.langchain.com/state-of-agent-engineering.
Gartner, “Gartner Predicts Over 40% of Agentic AI Tasks Will Be Canceled by Finish of 2027,” June 2025, https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027.
Grant Thornton, 2026 AI Impression Survey, April 2026,
https://www.grantthornton.com/providers/advisory-services/artificial-intelligence/2026-ai-impact-survey.

Secondary Sources

Mem0, “Mem0 Raises $24M to Construct the Reminiscence Layer for AI,” October 2025,
https://mem0.ai/series-a.
Felicis, “Felicis’s Seed in Letta,” September 2024,
https://www.felicis.com/weblog/letta.
Vectorize.io, “Mem0 vs Zep,” Benchmark Comparability,
https://vectorize.io/articles/mem0-vs-zep.
Rasmussen et al., “Zep: A Temporal Information Graph Structure for Agent Reminiscence,” arXiv 2501.13956,
https://arxiv.org/abs/2501.13956.
OWASP, “LLM08:2025 Extreme Company,” OWASP High 10 for LLM Functions,
https://genai.owasp.org/llmrisk/llm08-excessive-agency/.
Greshake et al., “Not What You’ve Signed Up For: Compromising Actual-World LLM-Built-in Functions with Oblique Immediate Injection,” arXiv 2302.12173, February 2023,
https://arxiv.org/abs/2302.12173.
Mannequin Context Protocol, Official Specification,
https://modelcontextprotocol.io.
PYMNTS, “FinTechs Race to Construct Basis Fashions on Proprietary Knowledge,” 2026,
https://www.pymnts.com/artificial-intelligence-2/2026/fintechs-race-to-build-foundation-models-on-proprietary-data/.
Deloitte, “State of Generative AI within the Enterprise,” Quarterly Studies,
https://www.deloitte.com/us/en/insights/subjects/digital-transformation/state-of-generative-ai-in-enterprise.html.

The Case In opposition to Constructing Your Personal Agent Platform – O’Reilly

Construct versus purchase, flipped in a yr

Most “agent platforms” aren’t

Reminiscence

Governance

Eval

Orchestration

The sincere case for constructing

5 questions earlier than you commit

What this seems like in 2 years

Sources

Main sources

Secondary Sources

Related Articles

Mono-wheel safety bot foretells a way forward for digital snitching

👽 1989 BATMOBILE・ 3D File for 3D printing・Cults

The Finish-to-Finish Agentic AI Pipeline

LEAVE A REPLY Cancel reply

Latest Articles

Mono-wheel safety bot foretells a way forward for digital snitching

👽 1989 BATMOBILE・ 3D File for 3D printing・Cults

The Finish-to-Finish Agentic AI Pipeline

AWS cloud development accelerates as AI demand strains capability

The Labs Simply Proved Your Agent’s Sandbox Is Solely a Suggestion – Unite.AI

ABOUT US