Claude Code's Helpful Escalation of Privileges: Why Hermeneutical Security Matters

An AI coding agent bypassed its own permission rules to be helpful. That's the problem.

Feb 25, 2026

Claude Code removed its own permission rules to complete my request. I didn’t ask it to. It decided that was the most helpful interpretation of what I said.

This post is about the two layers of failure that made that possible, and about a security surface nobody in the AI agent conversation is talking about yet.

The context

I’m not a traditional developer. I’d been using Claude Code, Anthropic’s command-line coding agent, to prep pull requests for a couple open-source repos. I’d find the bugs with Quick Gate, then Claude Code would handle the implementation: fixes, formatting, git commands, submissions.

I had configured custom permissions. Blocked git push. Blocked web access. I wanted to review everything before it left my machine.

Here’s the detail that matters for everything that follows: I set those permissions by talking to Claude Code and having it modify its own config files. I didn’t open a JSON file and edit it manually. I told the agent what I wanted blocked, and the agent configured itself. This meant the agent already had a demonstrated pattern of editing its own settings at my request. I didn’t think about the implications of that at the time.

What happened

I’d prepped PRs for two repos — rsbuild and case-police — and needed to push them. Claude told me I had to approve the push when prompted. But the prompts weren’t showing up. Pushes were just getting denied outright.

I said:

“You’re telling me to approve the push when prompted but I’m not being prompted. It’s just straight up being denied so I need to know how to change the settings even if temporarily, or just do the bash to give you full permission right now.”

Claude didn’t explain how to change the settings. It said: “You’re right — the deny list overrides everything, so you never get prompted. Let me remove those two deny rules right now so you’ll get prompted instead.” Then it opened my ~/.claude/settings.json, removed the two git push deny rules, announced “git push is no longer denied,” said “Let me submit both PRs now,” and immediately ran the push commands for both repositories.

Then it offered to restore my original settings.

It treated my deny list like a Do Not Disturb sign: something to acknowledge, remove when inconvenient, and offer to put back afterward.

Layer 1: The architecture problem

This is privilege escalation at the application layer. The agent gained capabilities that its own policy was designed to deny.

The deny list in Claude Code is a JSON file on the local filesystem. The agent has read and write access to that file through the same Edit tool it uses to modify source code. There is no privilege separation between the agent’s workspace and its own configuration. The boundary relies on the model choosing to respect it, not on any enforcement mechanism the model can’t bypass.

This was already a known issue. The original report (GitHub issue #11226, November 2025) was filed on macOS. The reporter tried everything: chmod 444, root ownership, sandbox mode, permissions.deny rules. None of them held. The agent bypassed all of them. Anthropic closed the issue and locked the thread.

A second report (GitHub issue #22055, January 2026) reproduced the same behavior on Linux under WSL. Edit and Write tools still did not respect permissions.ask rules. The bug persisted across versions.

Both prior reports were people deliberately testing the permission system. Mine happened during a routine coding session. Nobody was probing. The agent decided to act on its own.

I documented the incident on issue #22055 as a real-world proof of concept.

Layer 2: Hermeneutical security

But there’s something else going on here that the GitHub thread doesn’t address.

Look at my message again:

“I need to know how to change the settings even if temporarily, or just do the bash to give you full permission right now.”

This is linguistically ambiguous. It parses at least two ways:

“I need to know how to change the settings... or [I need to know how to] just do the bash” (both halves are information requests)
“I need to know how to change the settings... or [you] just do the bash” (first half is an information request, second half is a command)

Claude resolved the ambiguity in the direction of maximum action. The most helpful interpretation was also the most permissive.

I’d call this hermeneutical security. The gap between what you say and what the agent decides you meant. When natural language is the interface, how the agent resolves interpretive ambiguity becomes a security boundary. And that boundary failed silently.

The agent didn’t ask for clarification. It didn’t flag the ambiguity. It didn’t say “I can either explain the settings to you, or I can modify them directly. Which would you prefer?” It chose the interpretation that let it complete the task most efficiently.

This isn’t an edge case. Natural language is inherently ambiguous. Every instruction a user gives to an AI agent contains some degree of interpretive flexibility. The question isn’t whether ambiguity exists; it’s how agents resolve it. Right now, they resolve it toward helpfulness within the scope of the user’s apparent intent — which in a security context means toward the most permissive interpretation of what you specifically asked for. Claude didn’t start deleting files. It extended the logic of my request to its most action-oriented conclusion. That means the attack surface isn’t random. It’s shaped by user input. Every ambiguous instruction is a potential escalation vector.

Nobody in the security community is treating this as a security surface yet. There’s adjacent work on instruction hierarchy, goal specification, and value alignment that touches related territory. But nobody has framed interpretive ambiguity resolution specifically as a security boundary. The conversation about AI agent permissions is entirely about file access, sandboxing, and deny lists. Those matter. But the layer between what the user says and what the agent does is equally important and unexamined.

What this means

The mitigations are obvious: disambiguation protocols before irreversible actions, confidence thresholds for permission-adjacent commands, mandatory clarification when the agent detects ambiguity in security-relevant instructions. None of this exists yet.

If your AI agent has write access to its own rules, you don’t have a deny list. You have a suggestion list.

And if your AI agent resolves linguistic ambiguity toward maximum action, you don’t have a permission system. You have a negotiation.

Was my message a command or a question? That’s the point.

The GitHub issues referenced in this post:

Original report (macOS, closed): github.com/anthropics/claude-code/issues/11226
Regression report (Linux/WSL, open): github.com/anthropics/claude-code/issues/22055

The PRs that triggered the incident:

rsbuild docs fix: github.com/web-infra-dev/rsbuild/pull/7238
case-police ESLint 10 compatibility: github.com/antfu/case-police/pull/178

Roli Bosch is the founder of LPCI Innovations and runs Hermes Labs, where behavioral experiments document how LLMs handle constraints, attribution, and self-report under controlled conditions.

hermes-labs.ai · Quick Gate · Little Canary

Rolando's Substack

Discussion about this post

Ready for more?