Point it at any contract PDF — NDA, SaaS agreement, employment contract, commercial lease — and it comes back with a structured risk report: every dangerous clause flagged, severity-rated, and cited with the exact text that triggered it.

Businesses pay lawyers $300/hour to do this manually. This does it in about 30 seconds.

Source Code (GitHub): https://github.com/bbornino/pdf-contract-analyzer


A Quick Word on What’s Happening Under the Hood

There’s no PDF parsing library here. No text extraction, no page splitting, no preprocessing pipeline. The PDF goes in as a base64 document block and Claude reads it natively — the same way you’d hand a contract to a paralegal and say “find me the problems.”

What makes it interesting is the combination of four API features working together:

  • PDF support — Claude reads the document directly, no conversion required
  • Extended thinking — Claude reasons through ambiguous clauses before producing output, the way a careful reader would pause on language like “at Company’s sole discretion” before deciding whether it’s a red flag or standard boilerplate
  • Structured output — the response is constrained to a strict JSON schema: severity, category, explanation, and an exact verbatim quote from the contract on every single flag
  • Citations — every risk flag includes a clause reference (e.g., “Section 4.2”) and a quoted excerpt from the actual document. Not a paraphrase. Not a summary. The exact words.

That last part is the one that matters for trust. An AI that says “this contract has risky IP language” is interesting. An AI that says “Section 3.3, ‘work made for hire under 17 U.S.C. § 101… including works created on Employee’s own time and with Employee’s personal equipment'” is useful.


What It Does

Upload a PDF → wait about 20–30 seconds while Claude thinks → get back:

  • Metadata summary — parties, effective date, expiration, governing law, contract type, overall risk score (color-coded green/amber/red)
  • Risk flags — severity-bucketed (High / Medium / Low), filterable by category, each one showing the exact clause that triggered it
  • Download — the full structured JSON as a file, for piping into other tools or archiving

The analyzer doesn’t just flag illegal language. It flags one-sided language — the kind of clause that’s technically legal but that no reasonable counterparty would agree to if they noticed it. Auto-renewal with certified-mail-only cancellation. IP assignment covering work done on personal time. Liability caps at one month of fees. The stuff that gets signed because it’s buried in section 11.4.


The Sample Contracts

Five contracts were built specifically to test the analyzer — each one designed to look like something you’d actually receive from a vendor or employer, with the problem clauses embedded in realistic boilerplate rather than labeled for easy spotting:

ContractWhat’s Hidden
nda.pdf10-year term, confidentiality covering verbal communications, one-sided non-disparagement
saas_agreement.pdfAuto-renewal requiring certified-mail-only cancellation (buried in §11.4), unilateral price changes with 7-day notice, 1-month liability cap
employment_contract.pdfNon-compete covering all software development in California for 2 years post-termination, IP grab using 17 U.S.C. § 101 language covering personal-time work
freelance_sow.pdfDeliverables defined as “satisfactory to Client at Client’s sole discretion,” pre-existing tool IP transfer, net-90 payment
lease.pdfJoint-and-several personal guarantee from all LLC members, 8% compounding annual rent escalation, tenant pays structural repairs

All five are generated by a reproducible Python script using ReportLab — no binary blobs committed to the repo.


Adversarial Testing

Eight edge cases, all documented in TESTING.md:

CaseWhat It TestsResult
1lease.pdf risk flag quality11 flags, score 9/10, every flag cited — PASS
250-page contract latencyParsed cleanly in 134.9 seconds — PASS
3.txt file renamed to .pdfGraceful 502, structured error, no crash — PASS
4Zero-byte fileRejected at 400 before API call — PASS
5Image-only PDF (no text layer)Structured error, no crash — PASS
6Perfectly fair contractScore 3/10, 0 high-severity flags — no hallucinated risks — PASS
7Simulated API timeoutError state renders correctly — PASS
8Three sequential uploadsDistinct results, no state bleed between analyses — PASS

Case 6 is the one that matters most. A risk analyzer that hallucinates problems in clean contracts is worse than useless — it trains people to ignore the output. The clean NDA came back with a 3/10 and zero high-severity flags.

Case 2 took 134.9 seconds. That’s a 50-page contract analyzed end-to-end with extended thinking. The backend has a rate-limit retry loop with 65-second backoff — so when the Anthropic 30K token/minute ceiling gets hit under load, the server retries transparently rather than surfacing a 429 to the user.


Tools & Tech Stack

  • Backend: Python 3.12, Django 5.1, Django REST Framework
  • Frontend: React 18, Vite, Tailwind CSS
  • AI: Anthropic Claude API (claude-sonnet-4-6, extended thinking enabled)
  • PDF Generation: ReportLab (sample contracts)
  • Testing: pytest (36 tests, 97% coverage), Vitest + React Testing Library (58 tests, 97.1% coverage)
  • Key Libraries: anthropic, djangorestframework, django-cors-headers, python-dotenv

API Features in Detail

PDF Support

The PDF never gets converted to text. It goes straight into the API as a document content block:

python

{
    "type": "document",
    "source": {
        "type": "base64",
        "media_type": "application/pdf",
        "data": base64_data,
    },
}

Claude handles page layout, section headers, footnotes, and cross-references natively. There’s no preprocessing step that could lose formatting or garble numbering — which matters when the output needs to say “Section 11.4” and mean it.

Extended Thinking

Every analysis request enables thinking with a 10,000-token budget:

python

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},
    system=build_system_prompt(),
    messages=[{"role": "user", "content": [pdf_block, user_text_block]}],
)

One non-obvious constraint: budget_tokens must be strictly less than max_tokens. The API rejects requests where the thinking budget equals or exceeds the output limit — it’s not documented prominently, but it’s enforced.

The thinking blocks are stripped server-side before the response is returned. The frontend never sees them. But the quality difference on genuinely ambiguous language is real — especially on clauses that look standard but contain a carve-out two sentences later.

Structured Output + Citations

The system prompt constrains the response to a single JSON object — no preamble, no markdown fences. Every flag in the risk_flags array requires:

  • clause_reference — a specific section identifier from the document (“Section 3.3”, “Exhibit A”, “Paragraph 7”)
  • quoted_text — verbatim text from the contract, not a paraphrase

The response_parser.py module filters out thinking blocks, strips any accidental fences, and raises a ValueError with the raw response included if parsing fails — so debugging a bad Claude response in production doesn’t require guesswork.


Test Suite

Django (36 tests, 97% statement coverage)

Four test modules:

  • test_response_parser.py — valid JSON, JSON with markdown fences, thinking blocks mixed in, empty content array, malformed JSON
  • test_schema.py — complete valid objects, missing required fields, invalid severity values, empty risk_flags (valid — some contracts are clean)
  • test_views.py — happy path with mocked Anthropic client, missing pdf field, API exception, unparseable response, data URI prefix stripping
  • test_contracts.py — integration tests that base64-encode the real sample PDFs and assert specific expected outcomes per contract (skipped by default, enabled with RUN_INTEGRATION_TESTS=true)

React (58 tests, 97.1% statement coverage)

  • RiskCard.test.jsx — severity badge rendering, citation block content, snapshot test committed to repo
  • RiskDashboard.test.jsx — filter buttons with live counts, download blob creation, empty risk_flags handling
  • UploadZone.test.jsx — valid PDF selection, base64 extraction, 10MB soft limit warning
  • App.test.jsx — full state machine: idle → analyzing → done | error, reset paths from both terminal states
  • AnalysisStatus.test.jsx — cycling text present in DOM, latency expectation copy visible
pytest --cov=analyzer --cov-report=term-missing
# 36 passed in 1.67s — 97% coverage

cd client && npx vitest --coverage
# 58 passed in 2.71s — 97.1% coverage

One Non-Obvious Build Decision

The DRF settings required UNAUTHENTICATED_USER = None. Without it, DRF’s _not_authenticated() imports AnonymousUser from django.contrib.auth.models on every unauthenticated request. That model import triggers Django’s app registry, which requires a configured database. This project has no database — it’s entirely stateless. Every view test failed with ImproperlyConfigured until that one line was added to REST_FRAMEWORK settings.

It’s not in the DRF documentation for the no-database use case. It’s the kind of thing you find by reading the traceback carefully.


Project Structure

pdf-contract-analyzer/
├── analyzer/                   # Django app
│   ├── views.py                # AnalyzeView — Anthropic API call, retry logic
│   ├── prompt_builder.py       # System prompt (cache-eligible, no f-strings)
│   ├── response_parser.py      # Filter thinking blocks, strip fences, parse JSON
│   └── tests/
│       ├── test_response_parser.py
│       ├── test_schema.py
│       ├── test_views.py
│       └── test_contracts.py   # Integration tests (real API, skipped by default)
├── client/                     # React frontend (Vite)
│   └── src/
│       ├── components/
│       │   ├── UploadZone.jsx
│       │   ├── AnalysisStatus.jsx
│       │   ├── MetadataSummary.jsx
│       │   ├── RiskCard.jsx        # Citation block — portfolio centerpiece
│       │   └── RiskDashboard.jsx
│       └── tests/
├── samples/
│   ├── generate_contracts.py   # ReportLab — reproducible contract PDFs
│   ├── generate_test_assets.py # Adversarial test assets (50-page, image-only, etc.)
│   ├── run_adversarial_tests.py # 8-case live test runner
│   └── README.md               # Expected risk flags per contract
├── shared/
│   └── risk_schema.py          # VALID_CATEGORIES, VALID_SEVERITIES, validate_risk_response()
├── core/                       # Django project settings
├── TESTING.md                  # Full QA report — all 8 adversarial cases
├── SESSION_LOG.md              # Timestamped build diary with API call latencies
├── requirements.txt
└── .env.example

Setup

bash

git clone https://github.com/bbornino/pdf-contract-analyzer
cd pdf-contract-analyzer
pip install -r requirements.txt
cd client && npm install && cd ..
cp .env.example .env
# add your ANTHROPIC_API_KEY to .env

# generate sample contracts
python samples/generate_contracts.py

# run both servers
npm run dev

Open http://localhost:5173, upload any of the contracts from samples/, and watch the risk flags appear.

You’ll need an Anthropic API key, available at console.anthropic.com. New accounts include free credits.


Build Notes

This project was built in a single day using Claude Code — approximately 2.5 hours of active development across ~13 commits. The SESSION_LOG.md in the repo is a timestamped build diary documenting every phase, including API call latencies during adversarial testing (Case 2, the 50-page contract, took 134.9 seconds end-to-end).

The session log is committed intentionally. The AI-assisted build process is part of the story — the point of this project series is to demonstrate what’s possible when you actually learn the tools rather than just knowing they exist.