8.2 C
Canberra
Sunday, May 24, 2026

Rethinking the Agent Harness – O’Reilly



We kicked off our new weekly collection This Week in AI on Monday, and we coated plenty of floor in half-hour, together with an AI mannequin that discovered safety holes quicker than a long time of human auditing, a knowledge heart in Utah the dimensions of two Manhattans, and a sensible argument for why the harness you construct round a mannequin now issues greater than which mannequin you choose.

Listed below are a number of takeaways from the dialog between host Eric Freeman, college member at UT Austin and a longtime pal of O’Reilly, and visitor John Berryman, founding father of Arcturus Labs, an early manufacturing engineer on GitHub Copilot, and coauthor of O’Reilly’s Immediate Engineering for LLMs. Watch your entire episode to search out out why you have to be constructing your personal agent and why John believes ultimately there will probably be no web for people.

AI’s safety downside is now a coverage downside

You’ve in all probability already heard about Mythos. Anthropic’s inside testing of the frontier mannequin surfaced hundreds of beforehand unknown safety vulnerabilities throughout main working methods, browsers, and monetary infrastructure, together with a 27-year-old bug in OpenBSD. Anthropic selected to not launch the mannequin publicly and as a substitute launched Challenge Glasswing, a restricted program giving monitored entry to a small group of trusted companions for defensive patching.

That call moved quick in Washington. In roughly six weeks, the dialog shifted from the light-touch nationwide AI coverage launched in March to reported White Home discussions of an government order assessment course of modeled on how the FDA handles medicine. Safety researcher Bruce Schneier has questioned whether or not Mythos is uniquely succesful right here or whether or not related outcomes are achievable with cheaper public fashions, however as Freeman famous (paraphrasing Schneier), both approach, it’s an issue that’s coming.

The compute race is getting stranger

Anthropic leased xAI’s complete Colossus 1 supercluster in Memphis: greater than 200,000 GPUs and 300 megawatts of energy. A month earlier than that deal, Anthropic expanded its settlement with Google and Broadcom for 3.5 gigawatts of capability coming on-line in 2027. For context, that’s roughly 10 instances the ability output of the Colossus 1 deal, in a single contract. After this episode aired, Anthropic introduced that that deal has been expanded to Colossus 2 as properly.

Field Elder County, Utah, simply permitted a 40,000-acre AI information heart referred to as the Stratos venture, backed by investor and TV character Kevin O’Leary (a.okay.a. Mr. Great). It’s deliberate for 9 gigawatts at full buildout. That’s a footprint greater than twice the dimensions of Manhattan, powered by the equal of 9 business nuclear reactors. And like many information heart offers going ahead, together with Colossus above, it was permitted over native protests.

Infrastructure at this unimaginable scale takes years to come back on-line, and the businesses making these bets are pricing in a world the place mannequin functionality retains scaling. Whether or not that assumption holds will decide loads about what’s economically viable to construct within the subsequent decade.

The harness issues greater than the mannequin

John was readily available to rethink the agent harness, which as he identified, entered a brand new part with the step change in mannequin functionality that occurred in November and December of final yr. He took Eric by means of the arc of AI product improvement, from doc completion and chat loops to tool-calling brokers, DAG-based workflows, and now the harness period represented by instruments like Claude Code. Every development added functionality, John famous, but in addition complexity, and every generated a brand new class of issues round reliability and management. In our present second, which John has dubbed the “age of the unharnessed agent,” brokers are actually inside attain of everybody, not simply software program builders.

The payoff of this “unharnessed” period is management. John described a shopper engagement the place he changed a bespoke software with a skills-driven agent. Now area consultants with no improvement expertise can learn the agent’s conduct written in plain English and higher perceive it. As John defined,

Moderately than constructing a bespoke agent. . ., I simply constructed one thing that was simply the agent harness—the agent—and I simply gave it expertise that describe what principally I realized in interviewing their consultants, how they might work with these brokers. And it labored completely. Not solely does the agent keep on observe and do what it must do today, however it’s coded, so far as my shopper is anxious, in English.

The consultants don’t need to complain to builders “this doesn’t work.” The consultants can have a look at the English description of what’s occurring and see issues, and possibly even repair it themselves. And I’m actually excited to principally give that energy into the palms of the folks that know finest tips on how to change it, the consultants.

That’s a special relationship between the consultants and the device than something a wrapped business product affords.

As Eric identified, current Stanford analysis helps this broader level: Efficiency gaps between a naked mannequin and a well-designed harness now usually matter greater than which underlying mannequin you’re utilizing. The benchmark that used to dominate shopping for selections, which mannequin scores highest, has been displaced by a tougher query about which harness suits the duty.

John closed with a demo of his private agent transferring from an Obsidian pocket book into Wikipedia and again, carrying context throughout environments. He used it as an example an idea he referred to as the “open agent protocol,” his time period for a not-yet-existing normal the place an agent receives environment-specific expertise because it strikes between contexts. The protocol doesn’t exist but, however the demo made the path clear.

What’s subsequent

Be a part of us and a rotating lineup of skilled company for weekly reside device demos and deeper dives into the subjects that matter in AI. We’re taking subsequent week off for Memorial Day within the US, however we’ll be again on June 1 with host Andreas Welsch and company Maya Mikhailov and Doug Shannon to chop by means of one other week of AI headlines and separate what truly drives enterprise worth from what appears good in a demo however goes nowhere in manufacturing. Our first few episodes are free and open to all when you’d prefer to attend reside—register right here.

We’ll proceed to share full episodes and publish our takeaways right here on Radar every Friday. You can too watch or hear on YouTube, Spotify, Apple, or wherever you get your podcasts.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

[td_block_social_counter facebook="tagdiv" twitter="tagdivofficial" youtube="tagdiv" style="style8 td-social-boxed td-social-font-icons" tdc_css="eyJhbGwiOnsibWFyZ2luLWJvdHRvbSI6IjM4IiwiZGlzcGxheSI6IiJ9LCJwb3J0cmFpdCI6eyJtYXJnaW4tYm90dG9tIjoiMzAiLCJkaXNwbGF5IjoiIn0sInBvcnRyYWl0X21heF93aWR0aCI6MTAxOCwicG9ydHJhaXRfbWluX3dpZHRoIjo3Njh9" custom_title="Stay Connected" block_template_id="td_block_template_8" f_header_font_family="712" f_header_font_transform="uppercase" f_header_font_weight="500" f_header_font_size="17" border_color="#dd3333"]
- Advertisement -spot_img

Latest Articles