Legacy Code Explainer (Python + Claude API)

This is the first project in a series focused on learning the Anthropic/Claude API — and it’s a practical one. Point it at a legacy source file, and it streams back a structured analysis: what the code actually does, what’s dangerous about it, and how to modernize it.

Supported languages: PHP, Perl, C, COBOL, and Tcl/Tk. The usual suspects from the “this has been running in production since 2003 and nobody knows why” hall of fame.

Source Code (GitHub): https://github.com/bbornino/claude-legacy-code-explainer

A Quick Word on Claude

If you’re not already in the AI space, Claude is Anthropic’s AI assistant — think of it as the thoughtful, technically rigorous alternative in the current AI landscape. The Claude API lets you integrate that reasoning capability directly into your own applications. This project uses three specific API features that matter in real production systems: streaming, prompt caching, and extended thinking.

What It Does

python src/main.py examples/sample_cobol.cbl

The tool reads a source file, sends it to Claude, and streams back a three-section report:

What It Does — plain-language explanation of behavior, inputs, outputs, and any hidden side effects. No jargon inflation. Just “here’s what this thing actually does.”

Risk Flags — a bulleted list of security holes, deprecated patterns, and correctness bugs. Each one is tagged [Critical], [High], [Medium], or [Low]. If something is actively dangerous, it leads. If it’s just ugly, it doesn’t.

Modernization Path — concrete replacement recommendations with specific APIs and libraries named. Not “use prepared statements” — PDO::prepare(). Not “avoid shell injection” — here’s how to restructure the call with IPC::Run and a list form instead of a shell string.

At the end of each run, token usage prints to the terminal so you can see the caching in action:

--- token usage ---
input:       3211
output:      1043
cache read:  3044
cache write: 0

That cache read: 3044 line? That’s 3,044 tokens that didn’t get re-processed — they were served from cache. On the second run and beyond, you’re not paying to re-send the ruleset every time.

Why This Exists

Legacy codebases are everywhere. Most of the people who wrote them are gone. The comments are lies. The variable names are things like $x2 and $tmpVal and $FINAL_FINAL_v3. If you’ve ever stared at 400 lines of Perl, wondering what it’s doing to the database at 2 am, this tool is for you.

It’s also worth noting: during the COVID pandemic, unemployment systems across the US collapsed under load. States put out emergency calls for COBOL programmers — because their claims processing systems, some dating back to the 1980s, were still running on mainframes nobody alive fully understood anymore. That’s not a hypothetical use case. That’s Tuesday.

This project demonstrates that AI-assisted code archaeology is a real, practical workflow — not a demo trick.

Tools & Tech Stack

Language: Python 3.12+
AI: Anthropic Claude API (claude-sonnet-4-6 / claude-opus-4-7)
Interface: Command-Line Interface (CLI)
Key Libraries: anthropic, python-dotenv
Testing: pytest (24 tests — smoke, unit, contract, integration, live)
Supported Input Languages: PHP, Perl, C, COBOL, Tcl/Tk

API Features Demonstrated

Streaming

Output appears token-by-token as Claude generates it — you’re not staring at a spinner waiting for a wall of text. While it streams to the terminal, it simultaneously writes to a Markdown file right next to your input file. So examples/sample_cobol.cbl produces examples/sample_cobol_analysis.md when the run completes.

That file is the real deliverable. Open it in VS Code’s Markdown preview pane, and you get a properly formatted, scrollable report. Share it with your team, attach it to a ticket, mark it up, or feed it back into the next tool in the chain as context. The terminal output is just the live view while it’s happening.

Prompt Caching

The system prompt is a 3,000+ token ruleset: what to flag, how to grade severity, and which specific replacement APIs to recommend per language. It’s identical on every call. Prompt caching pins it in Claude’s context between requests — so on the second run and beyond, those tokens are served from cache instead of re-processed. At any kind of volume, this is the difference between a sustainable cost model and one that bleeds money quietly.

Extended Thinking

Short files route to claude-sonnet-4-6. Anything over 50 lines routes to claude-opus-4-7 with extended thinking enabled, Claude gets a private scratchpad to work through global state mutations, deeply nested conditionals, and implicit side effects before generating output. The thinking itself doesn’t stream to the terminal; only the final analysis does. But the quality difference in genuinely complex code is real and visible.

Sample Output

Here’s a slice of what the tool produced on the COBOL unemployment claims processor — a horror show of hardcoded dates, wrong comparison keys, and silent data corruption:

[High] IF EMP-FEIN = CLM-SSN in 3000-FIND-EMPLOYER — wrong key. Compares a 9-digit FEIN to a 9-digit SSN; will essentially never match. Every claim falls to the no-employer exception path, which means the system has not actually been paying claims correctly — or worse, has been paying with fallback defaults from a prior bug.

[High] COMPUTE WS-TENURE-YRS = WS-CUR-YY - WS-HIRE-YY with WS-CUR-YY VALUE 05 — two-digit year arithmetic against a hardcoded 2005. Any claimant hired before 2000 wraps in PIC 9(2) to a positive value. This both misclassifies legitimate long-tenured workers as fraud and misclassifies actual fraud as valid.

That’s not boilerplate. That’s what extended thinking looks like on a genuinely broken codebase.

Project Structure

legacy-code-explainer/
├── src/
│   ├── main.py          ← CLI entry point
│   ├── explainer.py     ← API logic: streaming, caching, model routing, thinking
│   └── prompts.py       ← system prompt / ruleset (the cached portion)
├── examples/
│   ├── sample_php.php   ← login portal: SQL injection, md5 passwords, register_globals
│   ├── sample_perl.pl   ← payroll CGI: eval injection, path traversal, DBI misuse
│   ├── sample_c.c       ← TCP daemon: buffer overflows, format strings, gets()
│   ├── sample_cobol.cbl ← unemployment processor: wrong keys, Y2K, silent data loss
│   └── sample_tclk.tcl  ← admin CGI: subst injection, 15 RCE paths, no auth
├── tests/
│   └── test_explainer.py ← 24 tests across all pyramid levels
├── requirements.txt
└── .env.example

Setup & Installation

git clone https://github.com/bbornino/claude-legacy-code-explainer.git
cd claude-legacy-code-explainer
pip install -r requirements.txt
cp .env.example .env
# add your ANTHROPIC_API_KEY to .env

You’ll need an Anthropic API key, available at console.anthropic.com. New accounts include free credits to get started.

Running the Tool

# Short file — routes to Sonnet, no extended thinking
python src/main.py examples/sample_php.php

# Long/complex file — routes to Opus with extended thinking
python src/main.py examples/sample_perl.pl
python src/main.py examples/sample_cobol.cbl

The analysis saves automatically to a Markdown file in the same directory as your input. So sample_cobol.cbl produces sample_cobol_analysis.md — ready to open in VS Code’s preview pane or share with the team.

Running Tests

# Fast — no API calls
pytest -m "smoke or unit or contract or integration"

# Live API calls — requires a real key in .env
pytest -m live

What’s Not Built Yet

Web UI (planned — a future project in this chain)
Batch/directory mode
Side-by-side diff view: original code vs. suggested modernization

Background

Seven years at CalPERS building internal workflow automation in PHP, JavaScript, and MySQL means I’ve seen what enterprise legacy code actually looks like at scale — not the textbook examples, the real stuff. The risk flags and modernization recommendations in this tool’s ruleset come from that exposure. The AI does the reasoning; the ruleset reflects what actually breaks in production.

This is Portfolio Project #1 in a multi-project chain exploring the Anthropic API. Each project builds on the last.

[1] Legacy Code Explainer    ← you are here
[2] ...
[3] ...