Your AI Agent Already Forgot Half of What You Informed It – O’Reilly

May 29, 2026

15

That is the seventh article in a sequence on agentic engineering and AI-driven improvement. Learn half one right here, half two right here, half three right here, half 4 right here, half 5 right here, and half six right here.

That is the newest article in my Radar sequence on AI-driven improvement and agentic engineering, and I’ve to confess that this one took a little bit of a flip I wasn’t anticipating.

In my final article I talked about context and context administration and I promised to offer you some actual sensible ideas for utilizing it. It was initially meant to be about particular, sensible context administration strategies that have been actually useful to me constructing Octobatch and the High quality Playbook, two open supply tasks the place I work with AIs to plan and orchestrate the entire work and each line of code is written by AI instruments like Claude Code and Cursor.

However as I used to be penning this, I discovered that I’d tailored those self same strategies to my work writing articles like this one. Which is stunning! I’ve been doing all this work discovering methods to assist individuals growing AI expertise enhance context administration, so their expertise run extra effectively. It seems that those self same precise strategies apply to anybody utilizing AI instruments, even while you’re utilizing chatbots like Claude.ai or ChatGPT.

Full disclosure: I exploit a number of AI instruments to handle this text sequence. My main instruments are Claude Cowork for brainstorming and managing my article analysis, notes, and backlog and Gemini’s cellular app for studying drafts aloud and taking my notes whereas I’m away from my desk. And I wish to inform you about one thing that occurred whereas I used to be utilizing these instruments, as a result of I feel it actually helps present why context administration isn’t only a drawback for builders.

Whereas I used to be writing this text, I used to be utilizing Gemini’s cellular app to learn the draft aloud and take my notes. Partway by means of the session I requested it to return and verify whether or not there have been earlier notes it hadn’t included but. It instructed me it didn’t have entry to the earlier notes, which appeared bizarre and insane, since we had simply taken these notes a couple of prompts earlier within the session. I may scroll again up and see them earlier within the dialog, however someway it didn’t “know” about them.

Right here’s what occurred. Gemini had compacted our dialog with out telling me, and the notes from the primary half of the session have been simply… gone.

If you happen to’ve ever had an online chat AI simply appear to neglect stuff you talked about earlier, you’ve skilled context compaction, similar to I did. Understanding even the fundamentals of context and context home windows could make a giant distinction in stopping that form of frustration.

This all jogged my memory of one thing I wrote greater than twenty years in the past in Utilized Software program Challenge Administration (again in 2005!): “Vital data is found throughout the dialogue that the crew might want to refer again to throughout the improvement course of, and if that data is just not written down, the crew must have the dialogue over again.”

Jenny Greene and I wrote that about human groups and mission conferences, but it surely applies to AI periods simply as properly.

Which brings me again to context, which I wrote about in my final article, and which I’ll write extra about within the subsequent one, as a result of it’s one of the crucial essential ideas to maintain high of thoughts when working with AI.

Context loss could also be invisible, however that doesn’t make it any much less irritating

Context is all the things the AI is holding in its working reminiscence throughout a dialog: what you’ve instructed it, what it’s instructed you, any recordsdata or directions it’s learn, and no matter inside notes the system has made alongside the best way. All of that lives in a fixed-size context window—consider that as your AI’s short-term reminiscence, the stuff it’s desirous about proper now—and when the window fills up, the AI has to start out letting issues go. Totally different instruments deal with this otherwise: Some truncate older messages, some compress the dialog right into a abstract (which implies particulars get misplaced though the abstract appears to be like full), and a few simply begin behaving inconsistently so you possibly can’t inform whether or not the AI forgot one thing or by no means understood it within the first place. The outcome is similar: The AI loses observe of stuff you instructed it, choices you made collectively, or particulars it seen earlier within the session. And it received’t inform you it forgot. It’ll simply maintain producing confident-sounding output primarily based on no matter it nonetheless has.

Earlier than we dive in a bit of deeper, I wish to do a fast jargon verify. If you happen to’ve seen the phrases “expertise” and “brokers” floating round however aren’t positive what they’re, consider expertise as libraries for AIs and brokers as interactive executables. These aren’t completely exact definitions, however for those who’re a developer they’re shut sufficient for this dialogue.

Whenever you’re coding expertise and brokers, you run into context issues shortly. The work you’re asking the AI to do is commonly advanced sufficient that the context window fills up, and the AI has to start out compacting: compressing or dropping older elements of the dialog to make room for brand spanking new ones. Compaction all the time appears to occur on the most irritating and inconvenient time, which is sensible when you consider it. You hit context limits exactly while you’ve put probably the most data into the dialog, which is precisely when dropping that data prices you probably the most.

That’s why I feel it may possibly typically assist to think about AIs as having the identical shortcomings that human groups do, besides these shortcomings are exaggerated by their AI nature. An individual who forgets one thing from a gathering final week may keep in mind it while you remind them. An AI that misplaced one thing to context compaction received’t, as a result of the knowledge is gone. However there’s one thing you are able to do about it, and it seems the strategies that assist are the identical whether or not you’re constructing autonomous AI expertise or simply attempting to get a chatbot to recollect what you instructed it 20 minutes in the past.

I’ve landed on 4 strategies that I come again to time and again. Every one exists as a result of sooner or later the AI forgot one thing essential and I responded by placing that factor in a file the place it couldn’t be forgotten. None of them require particular tooling. And to my shock, all of those strategies have turned out to be helpful for each constructing software program and managing a writing mission like this one, whether or not I’m chatting with Claude, ChatGPT, or Gemini, or utilizing a desktop software like Claude Cowork or Codex. These are the strategies I discover Most worthy:

Break up discovery from documentation: Don’t ask the AI to determine one thing out and produce polished output in the identical go.
Use handoff paperwork, not continuation prompts: Earlier than closing a stale session, have the AI write down all the things the subsequent session must know.
Give the AI an acceptance criterion, not a process: Inform it what “performed” appears to be like like as a substitute of spelling out the steps.
Use spec paperwork because the bridge between AI instruments: Make a shared doc the one supply of reality that every one your instruments learn from.

Break up discovery from documentation

Whenever you ask an AI to do one thing advanced, you’re typically asking it to do two issues directly with out realizing it. You’re asking it to determine one thing out and produce polished output on the identical time. The issue is that figuring issues out takes consideration, and producing output takes consideration, and the mannequin solely has a lot of it. Whenever you mix each duties in the identical immediate, the mannequin begins slicing corners on certainly one of them, and you’ll’t inform which one it shortchanged.

I bumped into this with the High quality Playbook, an open supply AI coding ability I constructed that runs structured code critiques towards any codebase. One of many issues it does is derive necessities from supply code: It reads by means of the code, identifies what the code guarantees to do (I name these behavioral contracts), after which produces a necessities doc. Initially this all occurred in a single go. The issue was that single-pass requirement era ran out of consideration after about 70 necessities. The mannequin forgot behavioral contracts it had seen earlier within the code, and the forgetting was fully invisible. There was no stack hint or error message, simply incomplete output and no technique to know what was lacking. I mounted it by splitting the work into two separate prompts:

Learn every supply file and write down each behavioral contract you observe as a easy record in CONTRACTS.md.

Learn CONTRACTS.md and the documentation, then derive necessities from them and write REQUIREMENTS.md.

Then a 3rd go checks whether or not each contract has a corresponding requirement, and if there are gaps, goes again to the first step for the recordsdata with gaps.

The important thing concept is that CONTRACTS.md is exterior reminiscence. When the mannequin “forgets” a few behavioral contract it seen earlier, that forgetting is often invisible. With a contracts file, each remark is written down earlier than any necessities work begins, so an uncovered contract is a visual, greppable hole. You’ll be able to see what was forgotten and repair it.

The precept: Don’t ask the AI to determine what exists and write formatted output in the identical go. The mannequin runs out of consideration attempting to do each directly. Everytime you’re asking an AI to do one thing advanced, think about whether or not you’re truly asking it to do two issues directly. “Analyze this codebase and write a report” is 2 duties. “Learn this doc and counsel enhancements” is 2 duties. Break up them, and let the primary go write its observations to a file earlier than the second go begins working with them.

Use handoff paperwork, not continuation prompts

Anybody who’s spent a protracted session with an AI coding software has felt the second when the context begins to go stale. The AI stops monitoring particulars it was dealing with effective an hour in the past, or it contradicts one thing it stated earlier. The session will get sluggish, and also you’re typically restarting as a result of the AI appears to have gotten slowed down and crammed up on what you instructed it. You get the sense that for those who maintain going, you’re going to spend extra time correcting it than making progress.

Most builders reply to their session getting too lengthy in certainly one of two methods: They push by means of the issue, or they begin a contemporary one and attempt to reexplain all the things from scratch. Each of these approaches may cause the AI to lose context. The primary loses it to compaction; the second loses it to incomplete reexplanation. And each are irritating! Particularly since you simply spent a lot time build up all that context with the AI.

There’s a 3rd choice. Earlier than you shut the session, ask the AI to put in writing a handoff doc: a file that captures all the things the subsequent session must know, written whereas the present session nonetheless has full context. The secret is that you just’re asking the AI to put in writing this whereas the related particulars are nonetheless contemporary within the working context, and in a manner that it or one other AI can learn.

I constructed this into the High quality Playbook as a core a part of how phases talk. After I break up the playbook from a single immediate to unbiased phases, I wanted every section to run as a very unbiased session with no context carryover. So every section obtained its personal kickoff immediate as a standalone file. Right here’s the construction every one follows:

Write a handoff doc {that a} contemporary session may use to choose up this work chilly. Embrace all the things it will must know.

Each kickoff opens with what prior phases completed, contains express boundaries about what’s frozen, and names which future section owns each bit of remaining work, as a result of with out it the AI will helpfully begin doing Part 3 work when you’re nonetheless in Part 2. Every section additionally ends with a required forward-looking handoff the place the finishing agent writes down what the subsequent session must know.

The precept: Every handoff is a whole state snapshot. The incoming AI agent by no means must learn prior kickoff prompts or chat historical past. The whole lot it wants is within the present handoff file: present state, uncommitted modifications, fast subsequent process, pending duties, file areas, and something that was found throughout the prior session. A contemporary AI session can decide it up chilly.

If you happen to’re deep right into a Claude Code or Copilot session and you’ll really feel the context getting stale, ask the AI to put in writing a handoff doc earlier than you shut the session. Inform it to incorporate all the things a contemporary session would wish to proceed the work. Then begin a brand new session and level it at that file. A contemporary session with a great handoff doc will normally outperform a stale session, as a result of it’s beginning with clear context as a substitute of compacted, fragmented context.

Give the AI an acceptance criterion, not a process

Whenever you give an AI a multistep process, the pure intuition is to spell out the steps. First do that, then try this, then mix the outcomes. The issue is that step-by-step procedures are the very first thing the AI forgets when the context window fills up. It’ll skip steps, merge phases, or quietly drop duties, and there’s nothing within the process itself that may assist the AI discover what it missed. The process tells the AI what to do, but it surely doesn’t inform the AI what “performed” appears to be like like.

I discovered this the exhausting manner with the High quality Playbook. The playbook runs a number of iteration passes over a codebase, and the outcomes must be cumulative. It retains a listing of all of the bugs it finds within the code being examined in a file referred to as BUGS.md. Early on, I gave the AI a process to run 4 instances after which replace that file:

First run the principle go, then run 4 iteration passes, then merge the findings into BUGS.md.

The AI didn’t reply properly to that instruction.

It seems that while you ask an AI to do a really advanced process a particular variety of instances, it may possibly lose depend. In actual fact, from my experimentation, evidently depend is likely one of the first casualties of context compaction. More often than not the AI determined three iterations was sufficient, or merged findings from solely two passes, and regardless of what number of other ways I attempted to rephrase that instruction, there was nothing I may give you that prevented the issue.

Nonetheless, all the things modified after I changed the “run 4 instances” instruction with an acceptance criterion, or a particular situation that tells the AI when to cease looping:

You’re performed solely when BUGS.md comprises the cumulative findings from the principle run plus all 4 itration passes.

Even when the AI misplaced observe of intermediate steps, it may verify the output towards the criterion and know whether or not it was completed. And I may confirm the output towards the identical criterion, which gave me a technique to audit the agent’s work with out watching each step.

In developer phrases, the AI is basically dangerous at loops like for (i = 0; i < 4; i++) as a result of it loses observe of the worth of the iterator i when it compacts its context. But it surely’s actually good at loops like whereas (!performed) as a result of it may possibly verify performed primarily based on the present state with out counting on historical past.

The precept behind all that is that an acceptance criterion survives context strain as a result of the AI can all the time verify “Am I performed?” towards a concrete check. That is truly the identical precept behind test-driven improvement: write the check earlier than the code so while you’re performed. The acceptance criterion is the check in your AI session. Whenever you’re giving an AI a process that has a number of steps, don’t describe the steps. Describe what “performed” appears to be like like, and let the AI work out get there.

Use spec paperwork because the bridge between AI instruments

Most builders working with AI don’t use only one software. You may use Claude for design, Cursor for coding, and Copilot for fast edits. You may even use a number of fashions inside the identical software, like GPT-5.5 and Opus 4.7 in separate Copilot chats inside VS Code. It’s frequent to have one mannequin for coding, one other for evaluation, and a 3rd for orchestration and mission administration. The issue is that none of those instruments or chats know what you instructed the others. Claude doesn’t know what you determined with Cursor. Two separate Copilot chats in the identical editor don’t share context. You’re the one carrying context between them, and that’s precisely the form of lossy handoff that causes drift. A design choice you made in a single dialog will get misplaced or distorted by the point it reaches the software that should implement it.

The repair is to make the spec doc the one supply of reality that every one your AI instruments learn from. I used this when constructing a recreation prototype, the place I had Claude dealing with design and planning and Cursor doing the coding. They by no means talked to one another instantly, so the spec paperwork served because the shared contract: Claude wrote the specs, and Cursor learn them. The rule I adopted was easy:

By no means inform the AI coder one thing that isn’t already within the specs. If you happen to make a design choice in dialog, write it into the spec first, then level the coder on the spec.

If I made a design choice in a dialog with Claude, that call needed to be written into the spec earlier than I instructed Cursor about it. If I found one thing throughout implementation, I wrote it into the suitable doc first, then pointed the coder at it. The spec was all the time the one supply of reality. When Claude and I modified the wound topology (eradicating one wound sort, selling one other), we up to date the docs first, then instructed Cursor to reread them. After we determined so as to add a brand new UI factor, we wrote it into the UI spec first, then instructed Cursor to reread the doc.

The important thing was together with rationale within the specs. Not simply “present 5 progressive labels” however why: “The participant shouldn’t be instructed what they’re preventing. They need to uncover it.” This helps the AI coder make higher choices when the spec doesn’t cowl an edge case as a result of it is aware of the intent behind the requirement.

The precept: The spec doc is the shared context that every one your instruments can learn. It prevents the drift that occurs when design intent lives solely in chat historical past that the opposite software can’t see. This system works any time you’re utilizing a couple of AI software on the identical mission, which at this level is most tasks.

How these strategies mix: Managing this text sequence

These 4 practices got here out of AI-driven improvement work, however they apply to nearly any AI work. And whereas these strategies emerged for me whereas engaged on brokers and expertise, I feel it’s useful to reveal them in a nondevelopment context, so I’ll share an instance from my work on the article sequence you’re studying now.

Over time, the method for the way my AI assistant and I handle this text backlog advanced organically in dialog, but it surely was by no means written down anyplace besides within the AI’s context window. Which implies each time the session compacted or I began a contemporary chat, the method was gone and I needed to reexplain it. I caught this when the AI did one thing barely unsuitable and I wished to verify we have been on the identical web page. So I requested:

Each time I counsel a brand new article concept, you add an entry to the backlog, after which create a brand new markdown file with the supply materials, proper?

That’s break up discovery from documentation. I didn’t say “doc our course of.” I stated “affirm what we do.” Discovery first, then documentation as a separate step. If I’d stated “write up our course of” with out confirming first, the AI may need written one thing believable however unsuitable, and I wouldn’t have caught the discrepancy.

As soon as we’d confirmed the method, I requested the AI to create two recordsdata. AGENTS.md is an rising customary for AI-readable mission context—a single file that tells any AI session what it must find out about a mission. You’ll be able to be taught extra in regards to the conference at brokers.md. CONTEXT.md serves an identical position as a bootstrapping doc—it’s much less established as an ordinary, however the apply of asking the AI to dump all the things it is aware of right into a context file so the subsequent session can decide it up chilly has been one of the crucial useful habits I’ve developed. Right here’s the immediate I used:

Replace the backlog file to elucidate what it’s and the way we preserve it. Create a CONTEXT.md with all the things you’d must bootstrap a brand new chat. Create an AGENTS.md to make it simple to bootstrap with a single-line immediate.

That immediate is a handoff doc. I used to be explicitly asking the AI to put in writing down all the things it knew whereas it nonetheless had full context, particularly as a result of I knew that context could be misplaced to compaction. The CONTEXT.md file is a handoff from this session to no matter contemporary session picks up the work subsequent week.

Discover what I didn’t say. I didn’t give step-by-step directions for what ought to go in these recordsdata. I stated “all the things you would wish to bootstrap this course of once more in case we misplaced it” and “a whole dump of the entire context you would wish to bootstrap a brand new chat and get it to the purpose the place this present chat is.” These are acceptance standards, not procedures. The AI had to determine what belonged in these recordsdata. If I’d given it a process (“first write the publication historical past, then the voice guidelines, then the file areas”), it will have adopted the record and missed something I forgot to incorporate. The acceptance criterion is tougher to fulfill however extra sturdy: the check is “May a contemporary session bootstrap from these recordsdata alone?”

And the AGENTS.md file itself is a spec doc as a bridge between instruments. It’s the shared contract that any AI session, whether or not it’s Claude, Gemini, Cowork, or a contemporary chat, can learn to get aligned with the mission. This session wrote it; the subsequent session reads it. The 2 periods by no means talk instantly, so the spec file bridges the hole between them.

That’s all 4 practices in two prompts, utilized to one thing as extraordinary as managing a writing mission. It didn’t require pipelines or codebases or batch orchestration. The practices work as a result of they clear up the identical underlying drawback whatever the area: essential data residing within the AI’s context window as a substitute of on disk.

Context administration is a improvement ability

Each apply I’ve described on this article and the final one is one thing builders have all the time been instructed to do: write issues down, document your rationale, be deliberate about what you save and what you let go, write ADRs and design docs and inline feedback explaining nonobvious decisions. We’ve all the time recognized we must always do extra of it. Whenever you’re working with AI, the price of not doing it turns into fast and visual.

The practices on this article all come all the way down to the identical factor: placing the essential data in recordsdata the place compaction can’t contact it, so you possibly can see what the AI is aware of and confirm that it matches actuality. Within the subsequent article, I’ll go deeper on the debugging angle: use externalized recordsdata to grasp what your AI is definitely doing, with sensible strategies that work even for those who’re not constructing brokers however are simply utilizing a chatbot.

The High quality Playbook is open supply and works with GitHub Copilot, Cursor, and Claude Code. It’s additionally obtainable as a part of awesome-copilot.

Disclosure: Elements of the strategy described on this article are the topic of US Provisional Patent Software No. 64/044,178, filed April 20, 2026 by the creator. The open supply High quality Playbook mission (Apache 2.0) features a patent grant to customers of that mission beneath the phrases of the Apache 2.0 license.

Your AI Agent Already Forgot Half of What You Informed It – O’Reilly

Context loss could also be invisible, however that doesn’t make it any much less irritating

Break up discovery from documentation

Use handoff paperwork, not continuation prompts

Give the AI an acceptance criterion, not a process

Use spec paperwork because the bridge between AI instruments

How these strategies mix: Managing this text sequence

Context administration is a improvement ability

Related Articles

XTEND drones validate live-fire Strikes with British Military

Report shares the state of bodily AI and robotics

Investing within the Way forward for Mexico’s Telco Panorama

LEAVE A REPLY Cancel reply

Latest Articles

XTEND drones validate live-fire Strikes with British Military

Report shares the state of bodily AI and robotics

Investing within the Way forward for Mexico’s Telco Panorama

After surprising quarter, IBM insists that AI is not killing the mainframe

GKN Aerospace and Pratt & Whitney increase additive manufacturing work to F135 engine | VoxelMatters

ABOUT US