I Trusted Claude Code Was Done. It Wasn’t Wrong — I Just Never Checked.

I want to start with the thing that actually mattered from this session, before I get to the part that’s funnier but less important.

Last week, working on a Claude Code project, I never once ran the app to verify it worked.

Not once. Not manage.py runserver. Not a glance at a browser tab. No terminal check at all.

I mentioned this almost in passing, with a smiley, like it was a minor anecdote. It isn’t. It’s a trust assumption I’d made without noticing I’d made it: if Claude Code told me the work was done, I believed the work was done. Full stop. No verification step. No independent check.

That assumption happened to be fine that time. It won’t always be.


Why This Actually Matters

Claude Code can write code, run tests, confirm logic, and report success with total confidence. What it cannot do is watch a browser render a page. It cannot confirm that a development server actually started and stayed running. It cannot see the thing the way a human sitting at the keyboard can.

If nobody tells it to flag that gap explicitly, it won’t — not because it’s hiding anything, but because, from its side, the work genuinely is done. The code is written. The logic is sound. The tests pass. “Done” and “verified running in front of a human” are two different claims, and only one of them is something Claude Code can actually make on its own.

I’d been treating those as the same claim. They aren’t.


Getting the Rule Wrong Before Getting It Right

When I sat down to write a CLAUDE.md rule to fix this, my first draft was built around big, dramatic operations: database migrations, destructive deletes, dependency installs. Manual checkpoints before anything irreversible.

I looked at that draft and pushed back — those felt like genuinely manual, occasional tasks, and that wasn’t actually what had happened to me. I hadn’t skipped checking before a migration. I’d skipped checking on everything because it had simply never occurred to me that checking was something I was supposed to do.

Once I named the actual moment precisely, the rule got smaller and more accurate: whenever Claude Code can’t directly observe whether something is actually running — a server, a UI, a live process — it has to say so explicitly and tell me exactly what to go look at. Not a rule about scary operations. A rule about the specific, narrow gap between “I wrote code that should work” and “a human confirmed it does.”


A Second Wrong Assumption, Same Session

Earlier in the same conversation, a smaller but genuinely useful correction happened.

I’d assumed Django should be my default framework going in — it’s my background, it’s what I know best. Midway through building out skill files, I caught myself and said it out loud: Django doesn’t need to be the default. Often you don’t need a framework at all. The actual rule is “no framework by default, Django when a framework is genuinely needed.”

That’s a real shift in the decision tree, and I landed on it myself, mid-conversation, without being argued into it. Small moment, but it’s the kind of self-correction that’s easy to miss if you’re not paying attention to your own assumptions while you work.


The SKILL.md Detour

Here’s the part that’s funnier and less consequential, but still worth telling.

I’d built a 600-line CLAUDE.md file over the previous few weeks — background, standards, the works. The Agent Skills course made it clear that a chunk of that content actually belonged in a different file: SKILL.md, which covers a specific, reusable task rather than a global, always-loaded preference.

Impatient as I am, I tried to split it myself before fully understanding the target shape. Got the content right, the structure wrong — I’d written prose paragraphs with bolded inline phrases standing in for section headers, when what was actually needed was literal markdown headings, in a specific order, no editorializing. I asked for help structuring it properly and got direct feedback on exactly that: too much prose dressed up as structure, not enough actual structure.

Then came the breadcrumb detour. A course video showed a breadcrumb bar tracking live heading hierarchy while scrolling through a markdown file. I assumed it was a VS Code feature, went and toggled VS Code’s breadcrumb setting, got a file-path breadcrumb instead of what I’d seen, and reported back precisely what didn’t match. Turned out the video wasn’t VS Code at all — it was Claude Code’s own interface, a different theme, a different font, similar enough at a glance in a screen recording to not register as a different product. A completely reasonable mistake, and one that took real legwork to rule out.


Skills Don’t Automatically Reach Subagents

One more genuine surprise from the same session: I’d assumed subagents — smaller Claude instances spun up to handle a specific subtask — would automatically have access to whatever skills I’d already defined.

They don’t. Skill access has to be explicitly granted to a subagent. Once I sat with it, the logic tracks — subagents are supposed to be narrowly scoped, and automatic inheritance of every available skill would work against that scoping. But it wasn’t the assumption I started with, and I don’t think it’s an unreasonable one to have made.


What Changed By the End

Two new rules went into my CLAUDE.md, both pulled from things that had actually gone sideways the week before, not hypothetical risks:

A task-scoping rule against runaway iteration — and this one also got refined in real time. My first instinct was to flag danger by file count, and I shut that down immediately: Claude Code had generated 20 or 40 files in a single session before with zero problems. File count was never the actual signal. The real danger is unbounded iteration and compounding dependencies — a task that keeps expanding its own scope without a clear stopping point.

And the manual-checkpoint rule: explicit callouts whenever Claude can’t directly observe whether something is actually running, with specific instructions for what to go check.

Both rules exist because of something that actually happened, not something that might.


The Actual Takeaway

I keep assuming I’ve found the edge of what this tooling can and can’t do. The edge keeps being somewhere slightly different than I guessed — sometimes in a structural way (CLAUDE.md vs. SKILL.md), sometimes in a much more consequential way (what “done” actually means when an AI agent says it).

The second kind is the one worth sitting with. “Done” is doing a lot of work in that sentence, and it’s worth asking, every time, exactly what it’s standing in for.

The Anthropic Academy’s Agent Skills course covers the SKILL.md structure and subagent scoping in detail. Free, certified, worth your time at anthropic.skilljar.com.


Next up: Subagents — and why they’re about to matter a lot more once Claude Cowork enters the picture.