14 C
Canberra
Monday, May 18, 2026

Why Doesn’t Anybody Educate Builders About Context Administration? – O’Reilly



That is the sixth article in a collection on agentic engineering and AI-driven improvement. Learn half one right here, half two right here, half three right here, half 4 right here, and half 5 right here.

I feel context administration is among the most essential abilities in AI-driven improvement, and it’s bizarre that in comparison with different AI-related subjects, nearly no one talks about it. We discuss immediate engineering, about which mannequin to make use of, about agentic workflows and gear use. However greater than the rest, the factor that really determines whether or not your AI session produces good work or mediocre work is how effectively you handle context (or in case you even do it in any respect!).

Numerous builders utilizing AI instruments deal with all this “context” discuss as AI jargon that may be dismissed, and it’s not laborious to grasp why. AI improvement instruments have gotten really easy that an skilled developer may be extremely efficient by simply combining vibe coding with important considering (that’s the central thought behind the Sens-AI Framework), and not likely take into consideration context in any respect. That’s ironic, as a result of regardless of all of the “I’m functionally illiterate however I simply vibe coded a complete multitenant SaaS platform” articles, and regardless of everybody’s basic concern that AI will put all builders out of labor, the event abilities you’ve been engaged on for years make you particularly efficient at writing code with AI—and context administration is the place these abilities actually shine.

Simply to verify we’re all on the identical web page, context is (mainly) the whole lot the AI is considering proper now: your immediate, the dialog to this point, the information it’s learn, the choices you’ve made collectively. If you begin a contemporary session with an AI, its context is cleaned, and it begins contemporary with simply the preliminary directions it’s been given. Managing context is central for constructing AI brokers and abilities. But it surely’s additionally actually essential if you’re utilizing instruments like Claude Code, Cursor, or Copilot for day-to-day improvement work. Context is usually measured in tokens, and there’s a finite quantity of it. When the context window, or the utmost quantity of data (enter and output tokens) an AI mannequin can course of and retain without delay, fills up, the AI begins shedding monitor of issues, and that’s if you begin to see it give incorrect and peculiar solutions.

Sadly plenty of builders learn paragraphs just like the final one and their eyes glaze over. One way or the other it will get categorised in the identical a part of our brains as studying how our construct methods work: boring stuff we by some means don’t actually wish to take into consideration as a result of it takes us away from “actual” programming. That’s a disgrace, as a result of once we don’t perceive the fundamentals of how context works we waste plenty of time.

For instance, right here’s one thing I see builders do on a regular basis that they completely shouldn’t. They’re deep into an AI coding session, and the AI has constructed up an in depth understanding of their codebase (e.g., it’s seen patterns, it’s making good choices, and so on.). Then they begin seeing “Compacting dialog” messages, or they discover the little context utilization indicator in Cursor or Copilot filling up, they usually don’t actually know what meaning. However they realized that closing the session and beginning a brand new one appears to repair the issue. Sadly, all they’ve accomplished is commerce compaction for whole amnesia. The brand new session simply retains going, producing output that appears nice, nevertheless it’s giving worse solutions and producing worse code as a result of it’s working from incomplete info.

The actually bizarre factor is that I used to be writing about one thing actually comparable all the way in which again in 2006, lengthy earlier than AI was round, in Utilized Software program Venture Administration: Lacking necessities are particularly insidious as a result of they’re tough to identify. I used to be writing about necessities, not AI context, however the issue is identical. I’ve written about how immediate engineering is necessities engineering, and that is one other place the place the parallel holds up. When a requirement is lacking, there’s no artifact to flag it, you simply find yourself with code that doesn’t do what it’s imagined to do. When context is lacking from an AI session, there’s no error message telling you what the AI forgot; you simply find yourself with worse solutions.

The price of poor context administration is definitely measurable. A developer on Microsoft’s Dev Weblog lately timed his personal reorientation overhead and located he was spending over an hour a day simply reexplaining issues to his AI that it had recognized in a earlier session. He’s not alone. There at the moment are complete frameworks and managed companies devoted to giving brokers persistent reminiscence, from light-weight CLIs that question Copilot’s native session database to managed reminiscence companies from Cloudflare. A few of these instruments are genuinely helpful, however they’re options it’s good to consider, combine, and keep earlier than they provide help to.

My aim on this article and the following is to provide you 4 particular issues you are able to do right this moment, utilizing no matter AI instruments you’re already working with. This text covers the issue: why context administration issues and the way context loss impacts the standard of your AI’s output. The following article covers the precise practices that emerged from constructing the High quality Playbook and Octobatch, issues you’ll be able to deliver again to your individual prompts, abilities, and brokers instantly. I’ll use actual examples from these initiatives, as a result of I feel they’ve received some good examples you can draw on.

We get AI incorrect in each instructions

I feel the by line by all of that is that builders each overestimate and underestimate AI. We overestimate how a lot it will possibly maintain in its reminiscence and its potential to recollect issues and make choices for us. So we’ll simply stuff an entire bunch of stuff within the context window and assume the AI will work it out, after which get aggravated when it hallucinates or forgets.

However, we massively underestimate its potential as an orchestrator. Your immediate doesn’t simply must ask a query or ask the AI to generate one thing. You can provide it a multistep workflow the place every step writes its outcomes to information, and the AI will coordinate the entire thing, spinning off subtasks and selecting up the place it left off if one thing breaks.

When builders don’t take both of these issues severely, context administration or orchestration, you get a particular cycle. They deal with the context window as infinite and cram the whole lot in. Then when the session will get too lengthy and the AI begins shedding monitor, they throw all of it away and begin contemporary. They by no means contemplate the choice, which is designing the workflow so the AI works from externalized information throughout impartial periods.

I found this whereas constructing the High quality Playbook. The context administration was working so effectively inside my periods that I spotted the periods themselves had been the bottleneck. I used to be operating the playbook in a single immediate. I feel I had a report of over 15 million tokens in a single Copilot GPT-5.4 session that ran for hours, and I did eight of them in parallel. Which by the way is why I received rate-limited for 54 hours from Copilot, which is totally truthful.

The playbook was writing the whole lot right down to information because it went, which is why these runs might final that lengthy in any respect. However I didn’t need that habits. Working 15 million tokens in a single session is dear, and in case you’re on pay-as-you-go API tokens as an alternative of a flat-rate plan like Copilot or Claude Max or Cursor, that sort of utilization is usually a actual shock. I needed to make the playbook out there to builders who don’t wish to burn that a lot without delay. And since the context was already externalized to information, splitting into impartial phases turned out to be straightforward.

Ask the AI to jot down its context down alongside the way in which

Earlier than I get into how the pipeline splits issues up, I wish to discuss concerning the apply that made the cut up doable within the first place: storing improvement context in information as you go.

I don’t imply asking the AI to export its notes on the finish of a session, or writing up a “classes realized” doc after the actual fact. I imply baking it into the precise directions you give the AI from the beginning, so it’s frequently writing and updating context as it really works. For Octobatch, the batch LLM orchestrator that was my first experiment in agentic engineering (I wrote concerning the improvement course of in “The Unintended Orchestrator”), I had the AI write developer context in each folder, and that basically made it straightforward to spin up a brand new session.

Right here’s what that appears like in apply. Each new Claude Code session on Octobatch begins with a single line: “Learn ai_context/DEVELOPMENT_CONTEXT.md and bootstrap your self to proceed improvement.” That file accommodates a loading sequence: learn this primary, then fan out to component-level CONTEXT.md information in scripts/, tui/, pipelines/, every describing its personal subsystem on the proper stage of element. By the point the AI finishes studying, it is aware of what the mission is, the way it’s constructed, what’s presently in progress, and what the energetic bugs are.

I consider this as shifting left. As an alternative of placing constraints in each immediate (don’t use additionalProperties: false, at all times take a look at with –restrict 3), these guidelines reside within the CONTEXT.md information. The immediate stays clear as a result of the documentation does the heavy lifting.

And updating context information is a part of each activity. Earlier than we commit something, I’ve the AI assessment the context information and ensure they mirror what we simply did. If we added a characteristic or mounted a bug, the context file ought to mirror that earlier than we commit. Stale context causes the identical sorts of issues as stale documentation, besides it’s worse as a result of the AI is definitely counting on it to make choices.

I wish to be clear precisely what I imply by “improvement context.” Particularly, it’s the knowledge a brand new AI session must stand up to hurry: what the mission is, the way it’s constructed, and what choices have been made alongside the way in which. Instruments like Claude Code learn improvement context from information like AGENTS.md (and you may truly go to that web site to study extra) initially of each session, and in case you do a radical sufficient job of build up your improvement context and preserving it up-to-date, you may get them totally bootstrapped. They’re the blueprints in your AI periods. I wrote in Utilized Software program Venture Administration that constructing software program with out necessities is much like constructing a home with out blueprints. Working AI periods with out externalized context is identical mistake. You’re counting on what’s in somebody’s head as an alternative of what’s written down. And if you’re working with AI, “somebody’s head” is a context window that’s going to get compacted or thrown away.

Crucial factor is that what’s in my head matches what’s within the AI’s head. The context file is only a handy manner to assist us work out whether or not or not we agree. After I begin a brand new Claude Code session on a folder that has an excellent DEVELOPMENT_CONTEXT.md, the AI reads it and we’re instantly aligned. After I begin a session with out one, the AI has to rediscover the whole lot from scratch, and it at all times misses issues. Rediscovery is at all times lossy.

Should you’re not already writing context information as a part of your workflow, not one of the fancier methods I’m about to explain matter. That is the muse.

Embody the why, or the AI will undo your choices

There’s a particular factor that has to enter these context information, and it took me some time to study why it issues a lot: the reasoning behind each choice.

Octobatch’s DEVELOPMENT_CONTEXT.md has a bit known as “Key Technical Learnings” with 49 entries, every in a particular format: What occurred, Why it issues, Once we found it, and The place within the code it applies. On the prime of that part is a observe in daring: “IMPORTANT: All the time embrace the REASONING (the ‘Why’) for every studying. This prevents future periods from ‘refactoring’ a deliberate choice.”

That observe is there as a result of with out it, the AI will do precisely that. I had a case with Octobatch the place we used recursive set_timer() as an alternative of set_interval() for auto-refresh as a result of Textual’s set_interval() callbacks aren’t reliably serviced on pushed screens. With out the “Why” within the context file, a future session would take a look at that code, see a “cleaner” different, and helpfully refactor it proper again to the damaged method.

The identical precept applies to high quality requirements. Don’t simply say “90% protection for core logic.” Say “90% protection for core logic, as a result of expression analysis touches randomness and seeding, the place delicate bugs produce plausible-but-wrong output. The drunken sailor reseeding bug handed all visible inspection. Solely statistical verification caught that sequential seeds created correlation bias (77.5% fell in water as an alternative of a theoretical 50/50).” With out the “why,” a future AI session will argue the protection goal down. Any customary or architectural choice or uncommon code sample that doesn’t have its rationale connected is susceptible to being optimized away by an AI that doesn’t know what downside it was fixing.

The rubbish assortment downside

Lots of people like to speak concerning the context window as your AI’s short-term or working reminiscence, and context that’s endured to disk as long-term reminiscence. Personally, I’m undecided these analogies to human reminiscence work all that effectively. I feel it’s much more helpful to seek out methods to consider context which are much like how we handle reminiscence in our code.

I discover it particularly useful to check context compaction to rubbish assortment—once more, not an ideal analogy however a helpful one. If you take a look at a GC graph in Java, you see the reminiscence slowly replenish after which all of the sudden drop after every GC. That drop is the runtime determining what’s nonetheless being referenced and releasing the whole lot else.

The context window does the identical factor. Your dialog accumulates tokens, the AI’s context window fills up, after which compaction occurs. The device (or the mannequin) decides what to maintain and what to throw away. Compaction is lossy and automated, and also you don’t management what survives.

Java builders spent many years studying to design their allocation patterns so rubbish assortment wouldn’t destroy something essential. AI builders must study the identical factor, and the educational curve ought to be shorter as a result of the ideas switch straight.

If you ask the AI to jot down essential state to information, you’re selling it out of that unstable area. It’s surprisingly straightforward to do that. Simply move the AI to jot down its context to a Markdown file. For instance, you’ll be able to put the entire context associated to a particular area into a selected file, like if the AI seen a behavioral contract, you possibly can have it write all of the associated context to a file known as CONTRACTS.md. If it made a design choice, that would go into DEVELOPMENT_CONTEXT.md—that’s a sample I exploit on a regular basis to jot down down all of the essential contacts wanted to bootstrap a brand new AI session to work on the code. These information reside on disk, outdoors the context window, and compaction can’t contact them. However in case you begin a brand new session with out externalizing any of this, you’re shutting down the applying and shedding the whole lot that was in reminiscence.

The primary time I constructed Octobatch’s batch orchestrator, it was a Python script with in-memory state and plenty of hope. It labored for small batches however fell aside at scale, which is just about what most builders are doing with their AI context proper now: preserving the whole lot within the context window and hoping it holds collectively, regardless that that stops working as soon as periods get lengthy and codebases get complicated.

It’s manner too straightforward to fall into one context administration excessive or the opposite

The High quality Playbook exists partly due to this downside. After I was constructing the necessities pipeline, I found that single-pass requirement technology runs out of consideration after about 70 necessities. The mannequin forgets behavioral contracts it seen earlier. And it’s fully invisible. You don’t get a stack hint or an error message or any sort of warning, simply incomplete output and no approach to know what’s lacking.

The longer a defect goes uncorrected, the extra entrenched it turns into and the extra issues get constructed on prime of it. Context drift works the identical manner. When the AI loses monitor of a design choice early in a session, the whole lot constructed on that misplaced context compounds the error. And similar to a late-discovered defect, you don’t know what went incorrect as a result of the unique context is gone.

I had a concrete instance after I was operating the playbook towards virtio-win. Model 1.3.32 discovered 4 bugs. Model 1.3.33, after some modifications, discovered just one. That regression was solely diagnosable as a result of I had EXPLORATION.md, an externalized intermediate state file that captures what the AI noticed throughout its exploration part. With out it, the one observable output would have been “fewer bugs this time.” I had no approach to inform whether or not the playbook was worse, or the bugs had been more durable, or it had simply missed one thing. With out externalized state, I couldn’t have answered any of these questions.

The contracts file within the pipeline exists particularly to unravel this. When the mannequin forgets a few behavioral contract it seen earlier, that forgetting is often invisible. However with a contracts file, each statement is written down earlier than any necessities work begins. If a contract is within the file however has no corresponding requirement, that’s a visual, greppable hole. You may see what was forgotten and repair it.

But it surely’s simply as straightforward to overcompensate. If the LLM has to continuously hop between eight completely different reference information, its context window fragments and also you begin getting hallucinations. I’ve seen this occur. You load all of your context information and necessities paperwork and design docs into the session, and the AI will get worse, not higher. It spends all its consideration navigating between reference information as an alternative of eager about the issue.

I hit this with the High quality Playbook after I expanded the scope of a run towards virtio-win from 10 information to about 60. The end result was 6x extra information analyzed however 75% fewer bugs discovered. The mannequin burned its context on gadget drivers as an alternative of going deep on the transport layer the place the bugs truly had been. Wider scope meant shallower evaluation.

The aim isn’t to save lots of the whole lot. It’s important to resolve what to externalize, what to maintain in context, and what to let go. The most effective context file accommodates precisely what the AI wants for this session and nothing extra.

Serving to your AI handle its context helps you too

The fascinating factor about all of that is that good context administration actually makes use of your improvement experience, and it’s a kind of issues that makes you a greater developer the extra you do it. Each apply I’ve described on this article, writing down your choices, recording why you made them, being deliberate about what goes right into a session and what doesn’t, is one thing builders have at all times been informed to do. We write ADRs and design docs and inline feedback explaining nonobvious selections, and everyone knows we must always do extra of it. If you’re working with AI, the price of not doing it turns into instant and visual. Your context information find yourself being the mission documentation it is best to have been writing all alongside, besides now there’s one thing on the opposite finish that may truly go incorrect in case you skip it.

And when you begin eager about context as one thing you actively handle, you can begin designing your workflows round it. That’s what occurred with the High quality Playbook, when it went from a single 15-million-token session to a set of impartial phases with clear handoffs between them, and the entire cut up labored on the primary strive as a result of the context was already externalized to information.

Within the subsequent article, I’ll get into the precise methods you need to use right this moment in your AI brokers, but in addition in your day-to-day AI improvement work.

The High quality Playbook is open supply and works with GitHub Copilot, Cursor, and Claude Code. It’s additionally out there as a part of awesome-copilot.


Disclosure: Elements of the method described on this article are the topic of US Provisional Patent Utility No. 64/044,178, filed April 20, 2026 by the creator. The open-source High quality Playbook mission (Apache 2.0) features a patent grant to customers of that mission underneath the phrases of the Apache 2.0 license.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

[td_block_social_counter facebook="tagdiv" twitter="tagdivofficial" youtube="tagdiv" style="style8 td-social-boxed td-social-font-icons" tdc_css="eyJhbGwiOnsibWFyZ2luLWJvdHRvbSI6IjM4IiwiZGlzcGxheSI6IiJ9LCJwb3J0cmFpdCI6eyJtYXJnaW4tYm90dG9tIjoiMzAiLCJkaXNwbGF5IjoiIn0sInBvcnRyYWl0X21heF93aWR0aCI6MTAxOCwicG9ydHJhaXRfbWluX3dpZHRoIjo3Njh9" custom_title="Stay Connected" block_template_id="td_block_template_8" f_header_font_family="712" f_header_font_transform="uppercase" f_header_font_weight="500" f_header_font_size="17" border_color="#dd3333"]
- Advertisement -spot_img

Latest Articles