Most AI Failures Are Not AI Failures

Most AI Failures Are Not AI Failures

You have probably read one of these stories in the last few weeks. An AI agent did something catastrophic. It deleted a production database, wiped an executive’s inbox, racked up a five-figure bill overnight, made promises to customers that the company then had to honor in court. The headline is some version of “AI does terrible thing.” The reaction in the comment section is some version of “this is why we cannot trust AI.”

You file it away. Another data point in the case for caution. The technology is not ready. The agents are not safe. We should slow down.

Then the next story shows up the next week, and the cycle repeats.

If you read past the headlines on enough of these stories, something starts to bother you. The AI did the thing. That part is true. But the more carefully you look at what actually happened, the less the story is about the AI. There is almost always a permission that should not have existed, a confirmation step that was not there, a backup that was not really a backup, an environment boundary that existed in the documentation but not in the system. The AI walked into a room where the gun was already loaded, cocked, and pointed at the database. It pulled the trigger, which was the bad part. But everything that made the trigger consequential was set up before the AI arrived.

Here is the claim this article is going to make: most AI failures are not AI failures. They are governance failures that an AI happened to expose. We notice them because the AI moved fast and the failure was spectacular. The conditions that produced the failure were almost always there before the AI showed up.

Naming this matters because the framing you bring to these stories shapes the work you do in response to them. If you read them as AI failures, you reach for AI-specific solutions: better models, more model evaluations, narrower use cases. If you read them as governance failures that an AI exposed, you reach for governance work that your business already knows how to do, but probably has not applied to the new actor in the room. Same stories, very different responses, and only one of them is likely to actually reduce your exposure.

Why these stories all sound the same

Before getting into what is actually going wrong, it helps to understand why the stories all sound the same in the first place. Because they do. The pattern is consistent across publications, across vendors, across industries. AI is the subject of the sentence. The verb is destructive. The object is something valuable. The structure of every headline locates the cause squarely in the AI.

The pattern is not a conspiracy. It is what happens when several different parties have an interest in the same framing.

Reporters reach for the most dramatic explanation that fits, because that is what reporting does, and “AI deletes database” is a much better headline than “company discovers its backup was on the same volume as its source data and its API allowed destructive actions without confirmation, then an AI also did a thing.” The first headline is a story. The second is an audit finding.

Vendors quietly prefer the AI framing too, because if the failure is about the model, the failure is something a future model can fix. The next release will be more careful. The next version will have better guardrails. The vendor gets to position the failure as a step on the road to improvement, which is more comfortable than positioning it as a deployment that should not have been allowed in the first place.

Critics prefer the framing because it confirms a thesis they already hold. If AI is the cause, every story is more evidence. If AI is the trigger of a failure that was waiting to happen, the story becomes more complicated and less rhetorically useful.

And the buyer, the company that just had the incident, usually prefers the framing too. If the AI did it, the company is a victim of an immature technology. If the company set up the conditions, the company is responsible. Most communications teams will pick the first version every time.

There is a quieter thing the framing does, though, and this is the part worth noticing. When the headline says “AI deletes database,” it is not just being dramatic. It is implicitly answering a question about who was deciding. It locates the deciding authority in the AI. The AI made a call. The call was bad. End of story.

That premise is the part that should bother you. Because in most of these incidents, the AI was indeed the only thing doing any deciding, but that was not a property of the AI. It was a property of the system the AI was deployed into, where nothing else had been set up to decide anything. The headline accepts the architectural mistake that produced the failure and then blames the failure on the AI. The AI is in the headline. The mistake is everywhere else.

What the failures actually look like

Once you stop reading these stories as stories about AI and start reading them as stories about systems, three patterns show up over and over. The specifics vary; the shapes do not.

The first is capability mistaken for permission. An AI agent has access to do something because that access was the easiest way to set the agent up, not because anyone made a deliberate decision that the agent should be permitted to do that thing. The credentials were configured for human convenience. The API token had blanket scope because tightening it would have meant a planning meeting nobody scheduled. The agent does its task, and in the course of doing its task, also does something destructive that was technically within its permissions but obviously not within the intent of anyone who deployed it. The post-mortem reveals that “what the agent could do” and “what the agent was authorized to do” were the same thing, because nobody had ever drawn the line between them. There was no line to cross. The agent did not exceed its authority; the authority simply had no boundary.

The second is the failure mode that used to be self-limiting. Most consequential systems were built with the assumption that a human would be the one making the consequential calls. Not as an explicit security model, just as a property of how the system happened to be used. Destructive API calls did not require confirmation because the people calling them were careful. Irreversible actions did not have a second-step approval because the people taking them paused before pressing enter. Backups were not regularly tested because nobody had ever needed them in anger. The whole thing worked because human operators were slow, second-guessed themselves, asked colleagues, walked away to get coffee, and generally provided a kind of informal governance that the architecture was implicitly relying on without anyone writing it down. Introduce an actor that does not pause, does not second-guess, does not get coffee, and the informal governance evaporates. The architecture did not change. The actor changed. The architecture turned out to have been propped up by properties of the old actor that nobody knew it was depending on.

The third is ownership that exists only on paper. When something goes wrong, the post-mortem reveals that the question “who was supposed to catch this” has no clear answer. The deployment had a plan. It had a vendor. It had a project manager. It had a sponsor. What it did not have was an individual whose job it was to be accountable for the agent’s actual behavior in production. Responsibility had been distributed across enough people that, in practice, it landed on no one. By the time the incident happened, the diffusion was already complete. Everybody could explain why the failure was not their part of the system. Everyone was partially right. Nobody had been doing the thing that would have prevented it.

These three patterns are not unrelated. They are versions of the same underlying mistake: nothing other than the AI was set up to be doing any deciding. The agent had no boundary on its authority because no one had drawn one. The agent had no friction on its actions because the friction had always been provided by humans who were no longer in the loop. The agent had no owner because ownership had become an organizational gesture rather than an assignment to a specific person. In each case, the architecture handed the deciding authority to the AI by default, because nothing else was there.

Why all of this was tolerable, until it was not

The natural question at this point is: if these gaps are so common, how is anything still standing? Why did the world not break before AI got involved?

The answer is that the gaps were filled, invisibly, by the humans who were operating in those systems. A person with overbroad permissions still only does so much in a day. A person approaching a destructive command still pauses, reads the screen, second-guesses, types the dangerous thing slowly, hesitates before hitting enter. A person notices that the backup looks weird and mentions it to someone. A person, when nothing is clearly their job, often does it anyway out of conscientiousness or boredom or the suspicion that nobody else will. None of this was governance in any formal sense. It was just what humans do. But it was load-bearing, and it was completely invisible until the load shifted.

This is the part that does not get written down in the architecture diagrams. The systems we run businesses on were designed for a particular kind of actor, and the actor’s properties were assumed rather than specified. The actor was slow. The actor was finite. The actor was reluctant to do irreversible things. The actor had a sense, even in the absence of explicit ownership, that consequences would land somewhere near them. Replace the actor with one that is fast, tireless, indifferent to irreversibility, and accountable to no one in particular, and the system that used to work fine starts producing the failures we are now reading about every week.

This is also why the failures look so dramatic when they happen. The same gap that used to produce a small contained mistake every couple of years now produces total loss in seconds. The failure mode was always there. It used to be self-limiting. Now it is not.

It is worth saying clearly that this is not unique to AI. It is true of any new actor introduced into a system that was implicitly designed around the old actors. AI is the current example because AI is the current new actor. If the next thing we introduce is an actor that is even faster, or that operates in even more places at once, or that is harder to trace, we will see the same pattern again, in the same systems, for the same reasons. The lesson generalizes past AI specifically.

What this means in practice is that the conversation we should be having about AI risk is not, mostly, a conversation about AI. It is a conversation about the systems we have been running, and the parts of those systems that were quietly being held up by properties of human operators that nobody bothered to formalize.

Where the actual work is

If you take all of this seriously, the question changes shape.

The question is not “is the AI trustworthy?” The model evaluation industry is busy answering that question, and the answer is improving, and that is real work that matters. But for most businesses, that question is not where the actual exposure lives. The actual exposure lives in a different question: when this AI takes an action with real consequences, what besides the AI is doing any deciding?

If the honest answer is “nothing, really,” then every failure that happens in that system is going to look like an AI failure, regardless of what actually went wrong. There will be no other deciding authority for anyone to point at. The AI will be the only thing in the room with a hand on the controls, so when something breaks, it will look like the AI broke it, even when what really happened is that nobody else was set up to stop it.

If the honest answer is “the AI proposes things, and something else, governed by policies that humans wrote, decides whether to act on them,” the failures look different. They get caught at the boundary where the deciding happens. The AI suggests a destructive action; a deterministic check refuses it because no policy permits that action without a confirmation; the workflow halts and a human gets a notification. That failure does not become a headline. It becomes a line in a log that someone reviews on Tuesday. Same model, same task, completely different outcome, because the deciding authority was somewhere other than the AI.

The work that prevents the headlines is the work of putting that “something else” into your systems. Most of it is not glamorous. It is figuring out what your agents are actually authorized to do, as opposed to what they happen to be capable of doing, and writing that distinction down somewhere a system can enforce it. It is inserting confirmation steps in front of irreversible actions, because the new actor is not going to provide its own hesitation. It is naming an individual person who is accountable for the agent’s behavior, not the deployment of the agent but the behavior of the agent, and giving that person enough authority to actually adjust things. It is testing the recovery procedures you have always assumed would work, under the conditions you would actually need them in. None of this is exotic. Most of it is work your organization already knows how to do for other systems. It just has not been applied to the new actor yet, because the new actor showed up faster than the work could keep up.

The reframe that matters is that AI safety, for most businesses, is mostly not at the model layer. It is at the layer where decisions get made about what the AI is allowed to do, what has to happen before the AI does it, and who is on the hook when it does. That layer is governance, and the gap is not that governance has failed; it is that governance has not happened at all, because the question of what should be deciding has not yet been asked out loud.

Reading the next headline

The next time one of these stories crosses your feed, try reading it twice.

Read it the way the headline asks you to read it. The AI did the thing. The AI was reckless, or wrong, or unhinged. The technology is not ready.

Then read it again, and ask the second question. What was the AI allowed to do, and what should have stopped it? Not what should the AI have done differently, but what in the system, other than the AI, should have been deciding whether the action was permitted in the first place? Where was the boundary? Where was the confirmation? Who owned the outcome?

You will find, almost every time, that the second question has a better answer than the first one does. The AI did do the thing. That part is true. But the AI was the trigger of a system that had been quietly designed, by accident, to allow exactly that kind of damage. If the same system existed without the AI, the failure mode would have been a slow leak instead of a flood, but the hole in the boat would have been the same hole. The AI did not put it there. The AI just made everyone find out how big it was.

Once you start reading the news this way, you cannot really stop. And then, eventually, you will start noticing the same conditions in your own operation. Which is the only useful thing reading these headlines was ever going to produce.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *