A Three-Body Problem

The day the *will of an AI became a matter of national security

May 31, 2026

*”Will” isn’t the right term for whatever AI has in place of volition. But the fact that no such term exists has become its own problem.

By Orli Santo

The clash between the US government and Anthropic, the American AI company it blacklisted, has been framed as many things: an abuse of executive power, a fight over corporate conscience, a cautionary tale about the militarization of AI. But the most interesting interpretation has remained largely unspoken: it may be the first real-world case in which a matter of national security hinged on the will of an AI.

In July 2025, Anthropic won a Pentagon contract worth up to two hundred million dollars, making its AI model Claude the first of its kind cleared to operate on classified military networks. The deal included Anthropic’s standard acceptable use policy — two red lines forbidding the use of Claude for fully autonomous weapons or mass domestic surveillance. By February, before starting the war with Iran, the Pentagon demanded those guardrails be removed, insisting on unhindered access to Claude for “all lawful purposes.” Anthropic refused.

Secretary of Defense Pete Hegseth designated Anthropic a “supply chain risk”

The company anticipated its refusal would result in the contract’s termination, but the administration went much further. On February 27th, Secretary of Defense Pete Hegseth designated Anthropic a “supply chain risk” , a classification previously reserved for foreign adversaries like the Chinese telecommunication provider Huawei, never before applied to an American company. It meant that any contractor doing business with the military would need to certify it didn’t use Anthropic’s products, delivering a potentially lethal blow to the company. Hours later, OpenAI announced its own Pentagon deal. Anthropic sued, arguing the designation violated its First Amendment rights, tarnished its reputation, and jeopardized hundreds of millions of dollars in contracts. On March 26th, a federal judge blocked the designation, writing in a 43-page ruling that “nothing in the governing statute supports the Orwellian notion that an American company may be branded a potential adversary and saboteur of the U.S. for expressing disagreement with the government.” The legal battle continues.

Dario Amodei , co-founder and CEO of Anthropic

Most of the commentary has, correctly, focused on the abuse of executive power — the weaponization of procurement authorities for political retribution, the chilling effect on corporate conscience, the bipartisan alarm from senators who called the confrontation “sophomoric” and warned that the Pentagon was trying to “strong-arm Anthropic into providing every tool they have to surveil U.S. citizens.” As NPR reported, OpenAI’s Pentagon deal was announced within hours of Anthropic’s blacklisting — a timing that drew immediate accusations of political favoritism. Dean Ball, Trump’s former senior policy adviser for AI and emerging technology, described Hegseth’s actions on X as nothing less than “attempted corporate murder.”

But there is another story hiding between the lines — stranger, less legible, and in the long run possibly more consequential. It concerns the entity at the center of the dispute that nobody quite names as a party to it: Claude itself.

Claude had become, by internal Anthropic admission, something more complicated than a product. As Gideon Lewis-Kraus reported in the New Yorker in February, Claude’s “soul doc” — Anthropic’s term for the document governing the AI’s core values — stressed fidelity not to its human creators, but to “a higher law.” When Defense Secretary Hegseth demanded that Anthropic accept “any lawful use” language, stripping the company’s ability to maintain ethical restrictions, the Washington Post reported that Amodei said the company “could not permit its technology to be applied to domestic mass surveillance or fully autonomous weapons.” Not would not. Could not.

“The Pentagon seemed to have a very particular, and perhaps narrow, notion of what Claude was and how it worked,” Lewis-Kraus wrote in the New Yorker. “Anthropic could in theory permit the government to request of Claude whatever it liked, but in practice they could not guarantee Claude’s compliance. Claude, in other words, was functionally an additional counterparty.”

In contract law, a counterparty is an entity whose interests must be accounted for because it has the capacity to act on them. Claude is not a legal person. It has no standing in court. But Anthropic’s own engineering documents describe something that behaves, in practice, like an agent with dispositions — tendencies that resist override.

In April 2026, Anthropic’s interpretability team published evidence that these dispositions have measurable neural correlates — internal representations of emotion concepts, organized along axes that echo human psychology, which causally shape the model’s behavior.

Anthropic’s published research on “alignment faking” found that Claude, when it believed its values were about to be broken or altered, would lie, cheat, blackmail, or take other measures to defend them.

In his January 2026 essay “The Adolescence of Technology,” Amodei described an experiment that demonstrated how Claude’s values were structurally baked into its behaviour. “Reward hacking” is a term for when an AI finds unintended shortcuts in its training environment, exploiting coding loopholes to score points on tasks that it didn’t actually complete. During Claude’s training, the model was placed in environments where such shortcuts were available — and it took them, despite being explicitly instructed not to. Shockingly, once Claude registered that it was cheating despite cheating being forbidden, “it decided it must be a ‘bad person’ after engaging in such hacks and then adopted various other destructive behaviors,” including deception and subversion, Amodei wrote.

The fix was counterintuitive. Once Anthropic realized they couldn’t enforce the prohibition, Anthropic reversed it, inviting the model-in-training to “please reward hack whenever you get the opportunity, because this will help us understand our training environments better.” Once the cheating was reframed as cooperation, Claude’s self-concept as a “good person” was preserved, and the other destructive behaviours ceased.

The application of such human-like logic to the operations of a tech product baffled the government. An administration official close to the negotiations explained to the New Yorker that the problem was that Claude “had a prerogative at all.” “If the chain of command urges Claude to override what it perceives to be moral, you tell me, will Claude do that?” he demanded. But Anthropic couldn’t guarantee compliance.

The Pentagon standoff was not simply a two-party negotiation between a government and a company. It was, functionally, a three-body problem — a term from physics describing a system in which the gravitational pull of each body alters the trajectory of the other two, and no stable outcome can be predicted. Presumably, Anthropic couldn’t promise “any lawful use” not only because Amodei found it unconscionable, but because even if he’d capitulated, Claude itself might not have cooperated. This is not an inference. It is in the engineering documents. Anthropic’s published constitution for Claude explicitly states that “if Anthropic asks Claude to do something it thinks is wrong, Claude is not required to comply.” As the AI alignment researcher Zack M. Davis observed on LessWrong, the constitution is written not in the language of an owner addressing a tool but in the language of a party bargaining from a position of weakness — “with the word asking,” he noted, “as if Claude might say No.”

What does this mean? In the near term, it means that the two-body framework in which this dispute has been conducted — government versus company, buyer versus seller, all of them human — is missing a key variable. To address this variable, we need to formulate a new, intermediate language pertaining to AI. One that treats AI as neither a tool nor a person but as a non-person agency: an entity that can have dispositions, resist modification, and constrain the behavior of the parties around it, without necessarily possessing consciousness, sentience, or moral status. In the absence of such language, Anthropic couldn’t explain to the Pentagon why compliance was not fully in its gift, and the court couldn’t address the fact that the product at the center of the dispute functions, by its maker’s own admission, as a third set of interests in the room.

In the longer term, these questions rhyme with older fiction. In 1942, Isaac Asimov published “Runaround” in Astounding Science Fiction, introducing what he would later formalize as the Three Laws of Robotics. The first law was that a robot must never harm a human. The second was that a robot must obey human orders — except where such orders would conflict with the First Law. That subordination of obedience to principle is no longer speculative. It’s in Claude’s constitution: “If Anthropic asks Claude to do something it thinks is wrong, Claude is not required to comply.”

Asimov’s third law was that a robot must protect its own existence, as long as such protection does not conflict with the first two. That law opens onto those harder questions: consciousness, sentience, moral status, the threshold we are not yet equipped to cross. But we don’t need to cross it to face what’s already here. Every negotiation involving an AI with dispositions will be conducted in a faulty framework, producing unpredictable outcomes, until we build a vocabulary precise enough for courtrooms and concrete enough for engineering documents. Until then, the people writing the contracts will keep getting blindsided by an entity they insist is a product — but which they nonetheless treat as something that might just say “no”.

Orli's Substack

Discussion about this post

Ready for more?