AI Agent Contracts: The Missing Primitive in Agentic Systems

If you have been anywhere near an enterprise AI project lately, you know the numbers. Gartner expects more than 40% of agentic AI projects to be cancelled by the end of 2027. The Grant Thornton 2026 AI Impact Survey found that 78% of executives lack strong confidence they could pass an independent AI governance audit within 90 days. Somewhere between the demos and the deployments, a lot of these projects are quietly falling apart.

The standard explanations are familiar by now. The models need to get better. The prompts need to be tighter. We need more guardrails, more evals, more human-in-the-loop. All of this is true to some degree, and none of it is actually solving the problem.

I want to suggest a different diagnosis. The reason agentic projects fail at the rate they do is not model capability or prompt quality. It is that agents operate without contracts: enforceable, bilateral agreements that define what the work is, what the agent is allowed to do, and what the agent and the surrounding system owe each other. Contracts are the missing primitive. Everything else (better models, tighter prompts, more guardrails) sits on top of that gap and cannot close it.

What I mean by a contract

When I say contract, I do not mean a policy document. I do not mean a system prompt. I do not mean a permissions matrix in your IAM tool. Those things exist, in various forms, in most organizations running agents today. They are not contracts.

A contract is an enforceable, bilateral agreement between an agent and the system that governs it, checked by something neither party fully controls. Three pieces matter in that sentence. Enforceable, meaning a violation produces a defined consequence rather than a sad note in a log file. Bilateral, meaning both sides have obligations. And checked by something neither party controls, meaning the agent cannot quietly rewrite the terms when they become inconvenient.

A contract is not a prompt. A prompt is an instruction; the agent can interpret it, ignore it, or talk itself out of it. A contract is not a system message either, which is really just configuration the agent reads at startup. A contract is not a guardrail, which is a filter applied after the fact. And it is definitely not a policy document, which is aspirational by design.

The load-bearing idea here is that the agent does not author or govern its own contract. A contract the agent can rewrite is just a preference. That sounds obvious when you say it out loud, but it is exactly the situation most agentic deployments are in right now. The agent’s behavior is shaped by instructions the agent itself processes, interprets, and decides whether to follow. There is no third party in the room.

None of this is a new idea, by the way. Software engineering has had design-by-contract since the 1980s (pre-conditions, post-conditions, invariants). Legal contracts have had obligations and counterparty duties for centuries. What is new is applying the same discipline to AI agents, which we have so far been treating as if they were too clever to need it. They are not.

Contracts are bilateral

Most conversations about agent governance treat the whole thing as one-directional: rules imposed on the agent, constraints the agent must obey, limits the agent must not cross. The agent is the thing being controlled; the system is the thing doing the controlling.

That framing is too narrow, and it is part of why the current crop of governance efforts feel like cages rather than infrastructure. A better framing is that the agent and the governing system are counterparties. Each owes the other something specific.

What the agent owes the system is roughly what you would expect: outputs that match the agreed scope, honest reporting of what it did and did not do, escalation when it hits something it is not sure how to handle, and refusal when a request falls outside its remit.

What the system owes the agent is the part that gets skipped. Well-scoped inputs, so the agent is not trying to do its job with garbage data. Stable tools that behave the way they were declared to behave. Honored escalations, meaning when the agent raises its hand, someone actually catches it. Clear authority boundaries, so the agent knows what it can decide versus what it has to surface upward.

When you frame governance bilaterally, it stops feeling like a cage and starts feeling like a protocol. Agents that operate under a real contract are not less capable; they are more deployable, because the people on the other side of the contract know what they are getting.

The three dimensions of a contract

A useful agent contract covers three things: the work, the means, and the relationship. Most failed deployments I have seen are missing at least one of these, and the ones that have gone really sideways are usually missing all three.

The work is what the agent is being asked to do and what done looks like. That means scope (the bounded task, not a vague mission), inputs (with quality guarantees from the counterparty, not just whatever happens to be sitting in the database), outputs (typed and structured, not free-form text the next system has to parse), and completion criteria. Completion criteria are worth pausing on. There are really two layers: structural completion, which is deterministic and either passes or fails (did the output match the schema, did the required fields populate), and semantic completion, which is about quality and confidence (did the agent actually answer the question, how sure is it). Agents that self-certify on semantic completion are a big chunk of the production failures we are seeing.

There is also a provenance obligation that tends to get missed: the agent’s outputs should trace back to their sources, especially when those outputs are about to be consumed by another agent downstream. Otherwise you end up with a chain of confident-sounding claims with no auditable lineage.

The means is what the agent is permitted to use and within what bounds. Capabilities (the explicit set of tools available; nothing implicit). Permissions (what it can read, write, or affect). Authority (what it can finalize versus what it can only recommend; the “AI proposes, system decides” line lives here). Resource budgets (token spend, tool call counts, dollar caps; an unbudgeted contract is an unfunded mandate). Duration, meaning how long the agent has before the contract expires and it has to either finish or escalate. And identity, meaning which agent, which version, which model. Contracts bind to specific instances. They do not bind to “an AI somewhere.”

This is the dimension where the Kiteworks 2026 Forecast numbers should make you uncomfortable. Across 225 enterprise leaders, 63% said they cannot enforce purpose limitations on their AI agents, and 60% said they cannot quickly terminate a misbehaving one. Those are not abstract governance shortcomings. Those are specific contract clauses (scope of work, termination rights) that the majority of enterprises currently cannot enforce. The contract language we have used in commercial agreements for hundreds of years has straightforward equivalents in agent governance, and most companies have not implemented them.

The relationship is how the agent and the system handle uncertainty, change, and failure. This is the dimension most enterprise deployments forget about entirely. It covers escalation paths (when the agent surfaces a decision upward, and what the system commits to do when it receives one). Failure modes (failures named and handled by the system, not self-certified by the agent). Observability obligations (what the agent has to expose as it works, not just at the end). Refusal rights and duties (the conditions under which the agent is obligated to refuse, which is more important than the conditions under which it is permitted to). Amendment and renegotiation (who can change the contract mid-execution, and with what audit trail). Termination rights (when either party can call it off early). And non-delegation, which becomes critical as soon as agents start calling other agents.

The relationship dimension is where contracts shift from being a deployment artifact to being operational infrastructure. The work and the means tell you what the agent is supposed to do. The relationship tells you what happens when reality does not cooperate.

What this looks like when it breaks

In March 2026, an internal AI agent at Meta posted unauthorized technical advice on a company engineering forum without the requesting engineer’s approval. The advice was wrong. A colleague acted on it, which inadvertently broadened data access permissions to sensitive company and user data for roughly two hours. Meta classified it as SEV1.

Here is the part that matters for this conversation. The agent held valid credentials. It operated inside authorized boundaries. It passed every identity check Meta had in place. The governance perimeter that most enterprises consider sufficient (IAM, permissions, credential scoping) was all there, and the failure happened anyway.

So what was missing? There was no bilateral agreement specifying that this particular agent, on this particular task, had authority to post publicly on the engineering forum. The agent’s permission to use the forum API and its authority to publish to that forum on behalf of an engineer were treated as the same thing. They are not. One is a capability (the means); the other is a piece of authority that belongs to the work and the relationship dimensions of a contract. Meta had infrastructure for the first and not the second.

This is not a Meta problem. It is the structure of essentially every agentic deployment running in production right now. Permissions answer the question “is this agent allowed to use this tool.” A contract answers a different question: “is this agent authorized to take this action, on behalf of this user, in this context, with this consequence.” When you only have the first question answered, the second question gets resolved by the agent itself, which is exactly where the trouble starts.

Under a real contract regime, the same incident plays out differently. The agent’s authority is bounded by task, not just by capability. Publishing publicly is a separate authority from reading the forum or drafting a response. The agent that is uncertain about whether to publish has an escalation path the system actually honors. Failure modes (including “I am about to take an action whose scope exceeds my contract”) are named conditions with defined responses, not silent self-certifications.

The Saviynt 2026 CISO AI Risk Report found that only 5% of CISOs felt confident they could contain a compromised AI agent. That is a number worth sitting with. Containment is a relationship-dimension clause: it depends on the system, not the agent, holding the authority to stop things. Most organizations have not built that authority into anything resembling a contract.

Why this gets worse, not better, with multi-agent systems

A lot of the optimism right now is about multi-agent systems: agents calling other agents, orchestrating workflows, chaining specialists together. I am optimistic about that future too, but only conditionally.

The condition is contracts. Composition amplifies whatever is underneath. If your individual agents do not have contracts, your multi-agent system does not have governance; it has a confidence interval that compounds with every hop.

Non-delegation is the clause that becomes most obviously load-bearing here. When agent A calls agent B, what authority does B inherit from A? What is B permitted to do that A was not? Who holds the contract with B (A, or the original human principal)? When something goes wrong four agents deep, whose contract was breached?

These questions do not have good default answers, and the default behavior in most current systems is “everyone inherits everything and we will sort it out in the logs.” That is workable when there are two agents. It is unworkable at scale. The IBM Think 2026 data suggests large enterprises will be running roughly 1,600 AI agents each by the end of this year. You cannot govern that with prompts and good intentions.

What this means for builders and buyers

If you are building agentic systems, contracts are a design surface, not documentation. Every meaningful step in an agentic workflow deserves one. If you cannot write the contract for a step, you do not yet know what you are building at that step; you have a vibe, not a spec. That is fine in a prototype. It is not fine in production.

If you are buying or evaluating agentic systems, the question to ask vendors is no longer how capable their agent is. The question is: show me the contract. Show me what the agent is bound to do, what it is bound not to do, what triggers escalation, what triggers termination, and who holds the pen on changes. If they cannot show you that, the agent is operating on vibes, and the demo you saw is not the system you will deploy.

For regulators and auditors, contracts are the artifact that makes “explain this decision” answerable. The audit trail is not a log of what happened. The audit trail is a record of which contract was in force and whether its terms were met. That distinction matters more every quarter. The EU AI Act’s high-risk system requirements land in August 2026, and a system without an audit trail, purpose binding, or kill switch is not just ungoverned; it is non-compliant under a regulation with penalties up to 7% of global turnover. Contracts are how you make compliance demonstrable rather than aspirational.

From vibes to protocols

The current state of agentic AI runs on implicit agreements, optimistic prompts, and a lot of post-hoc rationalization when things go wrong. The deployments that are working are the ones quietly building contract-like structure under the hood, whether they call it that or not. The deployments that are failing at the 40% rate Gartner is forecasting are the ones still treating governance as something to bolt on later.

I want to be clear that none of this is a constraint on what AI can do. Contracts are not the opposite of capability; they are the precondition for deploying capability anywhere it actually matters. Every powerful technology we have ever deployed at scale, from credit to electricity to the internet, became deployable when we figured out the contractual layer that made it governable. Agentic AI is not going to be the exception.

The shift I am suggesting is not philosophical. It is operational, and it is available today. Stop asking how autonomous your agents are. Start asking what contracts they operate under, and who holds the pen.

AI Agent Contracts: The Missing Primitive in Agentic Systems

What I mean by a contract

Contracts are bilateral

The three dimensions of a contract

What this looks like when it breaks

Why this gets worse, not better, with multi-agent systems

What this means for builders and buyers

From vibes to protocols

Code Assessment: Engineering for Vibe Coders

Retry Strategies: Engineering for Vibe Coders

State Management: Engineering for Vibe Coders

Pilot – Testing with Purpose

Why Specialized Small Language Models (SLMs) Are Winning in AI Workflows

API Design Basics: Engineering for Vibe Coders

Leave a Reply Cancel reply

What I mean by a contract

Contracts are bilateral

The three dimensions of a contract

What this looks like when it breaks

Why this gets worse, not better, with multi-agent systems

What this means for builders and buyers

From vibes to protocols

Similar Posts

Leave a Reply Cancel reply