Let me tell you about a bug I didn’t write.
I was building my first Claude API project — a tool that reads legacy source code and analyzes it. I needed sample files to test against. So I asked Claude to generate a realistic COBOL unemployment claims processor. Something that looked like it had been written in the late 1980s by someone who’d since retired, promoted, or died.
Claude delivered. And then my own tool found something in it that stopped me cold.
In the employer lookup routine — the section responsible for matching a claim to an employer — the comparison key was wrong. The code was comparing EMP-FEIN to CLM-SSN.
An employer’s Federal Employer Identification Number. Against a claimant’s Social Security Number. Both nine digits. Both numeric. The comparison would compile, run, and silently fail on essentially every single record.
Every claim would fall to the no-employer exception path. The system would appear to be processing. Nothing would actually be matching. And unless someone was watching the exception reports very carefully — which, in a 1987 codebase, they probably weren’t — nobody would know.
My tool flagged it [High] and explained exactly what was happening in plain English.
I sat there for a second.
Why COBOL Specifically
When I was deciding which languages to support, I asked Claude what ancient language had been most urgently needed in recent memory. The answer was immediate: COBOL, during COVID.
When pandemic unemployment spiked in 2020, claims systems across the United States collapsed under the load. States put out emergency public calls for COBOL programmers — because their processing infrastructure, some of it dating to the 1980s, was still running production workloads on mainframes. The developers who built those systems had retired. The documentation was incomplete, outdated, or simply wrong. The institutional knowledge had walked out the door over thirty years of turnover.
New Jersey’s Labor Department asked for volunteers with COBOL experience. Connecticut. Kansas. Multiple states simultaneously. In 2020.
That’s not a cautionary tale about technical debt. That’s what technical debt looks like when it finally presents the bill.
COBOL went on the supported language list immediately.
What the Tool Actually Does
Point it at a source file. It reads it, sends it to Claude, and streams back a structured three-section report.
What It Does — plain-language explanation of behavior, inputs, outputs, and hidden side effects. Written for a developer who has never seen this code before and needs to understand it fast.
Risk Flags — bulleted list of security holes, deprecated patterns, and correctness bugs. Each one tagged [Critical], [High], [Medium], or [Low]. If something is actively dangerous, it leads. If it’s just ugly, it doesn’t.
Modernization Path — concrete fixes with specific APIs and libraries named. Not “avoid SQL injection” — PDO::prepare(). Not “don’t use gets()” — here’s the fgets() replacement and exactly why the buffer math works.
Here’s a slice of what it produced on the COBOL file:
[High] IF EMP-FEIN = CLM-SSN in 3000-FIND-EMPLOYER — wrong key.
Compares a 9-digit FEIN to a 9-digit SSN; will essentially never
match. Every claim falls to the no-employer exception path.
[High] COMPUTE WS-TENURE-YRS = WS-CUR-YY - WS-HIRE-YY with
WS-CUR-YY VALUE 05 — two-digit year arithmetic against a hardcoded
2005. Any claimant hired before 2000 wraps in PIC 9(2). This
misclassifies long-tenured workers as fraud and misclassifies
actual fraud as valid.
That second one. A hardcoded 2005 date in a two-digit year field. Computing tenure by subtracting from a constant that was already wrong when someone last touched this code. Silently misclassifying fraud determinations in both directions.
This is what lives in production systems right now. Not at rinky-dink startups. At agencies processing benefits for millions of people.
How It Works Under the Hood
Three Claude API features do the heavy lifting, and they were chosen deliberately.
Streaming — output appears token by token as Claude generates it. No spinner, no waiting, no wall of text appearing all at once. For a complex analysis that takes 15-20 seconds, streaming makes it feel responsive. It also writes simultaneously to a Markdown file next to your input — so sample_cobol.cbl produces sample_cobol_analysis.md when the run completes. Open it in VS Code’s preview pane, and you get a properly formatted scrollable report you can share, annotate, or feed into the next tool.
Prompt Caching — the system prompt is a 3,000+ token ruleset: what to flag, how to grade severity, which specific replacement APIs to recommend per language. It’s identical on every call. Caching pins it in Claude’s context between requests. The token usage prints at the end of every run:
--- token usage ---
input: 3211
output: 1043
cache read: 3044
cache write: 0
That cache read: 3044 line is 3,044 tokens that didn’t get reprocessed. At any kind of volume — a team running this against a legacy codebase, a CI pipeline flagging risky changes — that’s the difference between sustainable economics and quietly burning money.
Extended Thinking — short files route to claude-sonnet-4-6. Anything over 50 lines routes to claude-opus-4-7 with extended thinking enabled. Claude gets a private scratchpad to reason through global state mutations, nested conditionals, and implicit side effects before generating output. The thinking doesn’t stream to the terminal. Only the final analysis does. But on genuinely complex code, the quality difference is real and visible — and the COBOL output is the proof.
The Part I Didn’t Expect
I built this in an afternoon. On a Windows 10 laptop with a VS Code terminal that was clipping the bottom four lines of the Claude Code UI, so I literally couldn’t see the prompts asking me to choose options. While fighting a UI I’d never used before.
I have more to say about that experience — and what it implies — in Thursday’s post.
But the part that stuck with me, reading the COBOL analysis output for the first time: I have seven years of enterprise legacy system experience from CalPERS. PHP, JavaScript, MySQL, internal workflow automation. I’ve seen what real production code looks like when it’s been touched by forty hands over fifteen years, and nobody left notes.
The output wasn’t generic. It wasn’t boilerplate. It caught the things that actually break. The wrong comparison key. The hardcoded date arithmetic. The file opened as INPUT that silently discards every write.
I sat there reading it, thinking: the people who needed this tool in 2020, scrambling to find anyone who remembered what COBOL looked like — they could have used this.
That’s not a portfolio project. That’s a real workflow.
The code is at github.com/bbornino/claude-legacy-code-explainer, and the full project documentation is at https://bornino.net/projects/legacy-code-explainer-python-claude-api/.
Thursday: I built a working app in 45 minutes on a broken setup while swearing at VSCode. Here’s what that actually means.