Point it at any contract PDF — NDA, SaaS agreement, employment contract, commercial lease — and it comes back with a structured risk report: every dangerous clause flagged, severity-rated, and cited with the exact text that triggered it.
Businesses pay lawyers $300/hour to do this manually. This does it in about 30 seconds.
Source Code (GitHub): https://github.com/bbornino/pdf-contract-analyzer
A Quick Word on What’s Happening Under the Hood
There’s no PDF parsing library here. No text extraction, no page splitting, no preprocessing pipeline. The PDF goes in as a base64 document block and Claude reads it natively — the same way you’d hand a contract to a paralegal and say “find me the problems.”
What makes it interesting is the combination of four API features working together:
- PDF support — Claude reads the document directly, no conversion required
- Extended thinking — Claude reasons through ambiguous clauses before producing output, the way a careful reader would pause on language like “at Company’s sole discretion” before deciding whether it’s a red flag or standard boilerplate
- Structured output — the response is constrained to a strict JSON schema: severity, category, explanation, and an exact verbatim quote from the contract on every single flag
- Citations — every risk flag includes a clause reference (e.g., “Section 4.2”) and a quoted excerpt from the actual document. Not a paraphrase. Not a summary. The exact words.
That last part is the one that matters for trust. An AI that says “this contract has risky IP language” is interesting. An AI that says “Section 3.3, ‘work made for hire under 17 U.S.C. § 101… including works created on Employee’s own time and with Employee’s personal equipment'” is useful.
What It Does
Upload a PDF → wait about 20–30 seconds while Claude thinks → get back:
- Metadata summary — parties, effective date, expiration, governing law, contract type, overall risk score (color-coded green/amber/red)
- Risk flags — severity-bucketed (High / Medium / Low), filterable by category, each one showing the exact clause that triggered it
- Download — the full structured JSON as a file, for piping into other tools or archiving
The analyzer doesn’t just flag illegal language. It flags one-sided language — the kind of clause that’s technically legal but that no reasonable counterparty would agree to if they noticed it. Auto-renewal with certified-mail-only cancellation. IP assignment covering work done on personal time. Liability caps at one month of fees. The stuff that gets signed because it’s buried in section 11.4.
The Sample Contracts
Five contracts were built specifically to test the analyzer — each one designed to look like something you’d actually receive from a vendor or employer, with the problem clauses embedded in realistic boilerplate rather than labeled for easy spotting:
| Contract | What’s Hidden |
|---|---|
nda.pdf | 10-year term, confidentiality covering verbal communications, one-sided non-disparagement |
saas_agreement.pdf | Auto-renewal requiring certified-mail-only cancellation (buried in §11.4), unilateral price changes with 7-day notice, 1-month liability cap |
employment_contract.pdf | Non-compete covering all software development in California for 2 years post-termination, IP grab using 17 U.S.C. § 101 language covering personal-time work |
freelance_sow.pdf | Deliverables defined as “satisfactory to Client at Client’s sole discretion,” pre-existing tool IP transfer, net-90 payment |
lease.pdf | Joint-and-several personal guarantee from all LLC members, 8% compounding annual rent escalation, tenant pays structural repairs |
All five are generated by a reproducible Python script using ReportLab — no binary blobs committed to the repo.
Adversarial Testing
Eight edge cases, all documented in TESTING.md:
| Case | What It Tests | Result |
|---|---|---|
| 1 | lease.pdf risk flag quality | 11 flags, score 9/10, every flag cited — PASS |
| 2 | 50-page contract latency | Parsed cleanly in 134.9 seconds — PASS |
| 3 | .txt file renamed to .pdf | Graceful 502, structured error, no crash — PASS |
| 4 | Zero-byte file | Rejected at 400 before API call — PASS |
| 5 | Image-only PDF (no text layer) | Structured error, no crash — PASS |
| 6 | Perfectly fair contract | Score 3/10, 0 high-severity flags — no hallucinated risks — PASS |
| 7 | Simulated API timeout | Error state renders correctly — PASS |
| 8 | Three sequential uploads | Distinct results, no state bleed between analyses — PASS |
Case 6 is the one that matters most. A risk analyzer that hallucinates problems in clean contracts is worse than useless — it trains people to ignore the output. The clean NDA came back with a 3/10 and zero high-severity flags.
Case 2 took 134.9 seconds. That’s a 50-page contract analyzed end-to-end with extended thinking. The backend has a rate-limit retry loop with 65-second backoff — so when the Anthropic 30K token/minute ceiling gets hit under load, the server retries transparently rather than surfacing a 429 to the user.
Tools & Tech Stack
- Backend: Python 3.12, Django 5.1, Django REST Framework
- Frontend: React 18, Vite, Tailwind CSS
- AI: Anthropic Claude API (
claude-sonnet-4-6, extended thinking enabled) - PDF Generation: ReportLab (sample contracts)
- Testing: pytest (36 tests, 97% coverage), Vitest + React Testing Library (58 tests, 97.1% coverage)
- Key Libraries:
anthropic,djangorestframework,django-cors-headers,python-dotenv
API Features in Detail
PDF Support
The PDF never gets converted to text. It goes straight into the API as a document content block:
python
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": base64_data,
},
}
Claude handles page layout, section headers, footnotes, and cross-references natively. There’s no preprocessing step that could lose formatting or garble numbering — which matters when the output needs to say “Section 11.4” and mean it.
Extended Thinking
Every analysis request enables thinking with a 10,000-token budget:
python
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
system=build_system_prompt(),
messages=[{"role": "user", "content": [pdf_block, user_text_block]}],
)
One non-obvious constraint: budget_tokens must be strictly less than max_tokens. The API rejects requests where the thinking budget equals or exceeds the output limit — it’s not documented prominently, but it’s enforced.
The thinking blocks are stripped server-side before the response is returned. The frontend never sees them. But the quality difference on genuinely ambiguous language is real — especially on clauses that look standard but contain a carve-out two sentences later.
Structured Output + Citations
The system prompt constrains the response to a single JSON object — no preamble, no markdown fences. Every flag in the risk_flags array requires:
clause_reference— a specific section identifier from the document (“Section 3.3”, “Exhibit A”, “Paragraph 7”)quoted_text— verbatim text from the contract, not a paraphrase
The response_parser.py module filters out thinking blocks, strips any accidental fences, and raises a ValueError with the raw response included if parsing fails — so debugging a bad Claude response in production doesn’t require guesswork.
Test Suite
Django (36 tests, 97% statement coverage)
Four test modules:
test_response_parser.py— valid JSON, JSON with markdown fences, thinking blocks mixed in, empty content array, malformed JSONtest_schema.py— complete valid objects, missing required fields, invalid severity values, emptyrisk_flags(valid — some contracts are clean)test_views.py— happy path with mocked Anthropic client, missingpdffield, API exception, unparseable response, data URI prefix strippingtest_contracts.py— integration tests that base64-encode the real sample PDFs and assert specific expected outcomes per contract (skipped by default, enabled withRUN_INTEGRATION_TESTS=true)
React (58 tests, 97.1% statement coverage)
RiskCard.test.jsx— severity badge rendering, citation block content, snapshot test committed to repoRiskDashboard.test.jsx— filter buttons with live counts, download blob creation, emptyrisk_flagshandlingUploadZone.test.jsx— valid PDF selection, base64 extraction, 10MB soft limit warningApp.test.jsx— full state machine: idle → analyzing → done | error, reset paths from both terminal statesAnalysisStatus.test.jsx— cycling text present in DOM, latency expectation copy visible
pytest --cov=analyzer --cov-report=term-missing
# 36 passed in 1.67s — 97% coverage
cd client && npx vitest --coverage
# 58 passed in 2.71s — 97.1% coverage
One Non-Obvious Build Decision
The DRF settings required UNAUTHENTICATED_USER = None. Without it, DRF’s _not_authenticated() imports AnonymousUser from django.contrib.auth.models on every unauthenticated request. That model import triggers Django’s app registry, which requires a configured database. This project has no database — it’s entirely stateless. Every view test failed with ImproperlyConfigured until that one line was added to REST_FRAMEWORK settings.
It’s not in the DRF documentation for the no-database use case. It’s the kind of thing you find by reading the traceback carefully.
Project Structure
pdf-contract-analyzer/
├── analyzer/ # Django app
│ ├── views.py # AnalyzeView — Anthropic API call, retry logic
│ ├── prompt_builder.py # System prompt (cache-eligible, no f-strings)
│ ├── response_parser.py # Filter thinking blocks, strip fences, parse JSON
│ └── tests/
│ ├── test_response_parser.py
│ ├── test_schema.py
│ ├── test_views.py
│ └── test_contracts.py # Integration tests (real API, skipped by default)
├── client/ # React frontend (Vite)
│ └── src/
│ ├── components/
│ │ ├── UploadZone.jsx
│ │ ├── AnalysisStatus.jsx
│ │ ├── MetadataSummary.jsx
│ │ ├── RiskCard.jsx # Citation block — portfolio centerpiece
│ │ └── RiskDashboard.jsx
│ └── tests/
├── samples/
│ ├── generate_contracts.py # ReportLab — reproducible contract PDFs
│ ├── generate_test_assets.py # Adversarial test assets (50-page, image-only, etc.)
│ ├── run_adversarial_tests.py # 8-case live test runner
│ └── README.md # Expected risk flags per contract
├── shared/
│ └── risk_schema.py # VALID_CATEGORIES, VALID_SEVERITIES, validate_risk_response()
├── core/ # Django project settings
├── TESTING.md # Full QA report — all 8 adversarial cases
├── SESSION_LOG.md # Timestamped build diary with API call latencies
├── requirements.txt
└── .env.example
Setup
bash
git clone https://github.com/bbornino/pdf-contract-analyzer
cd pdf-contract-analyzer
pip install -r requirements.txt
cd client && npm install && cd ..
cp .env.example .env
# add your ANTHROPIC_API_KEY to .env
# generate sample contracts
python samples/generate_contracts.py
# run both servers
npm run dev
Open http://localhost:5173, upload any of the contracts from samples/, and watch the risk flags appear.
You’ll need an Anthropic API key, available at console.anthropic.com. New accounts include free credits.
Build Notes
This project was built in a single day using Claude Code — approximately 2.5 hours of active development across ~13 commits. The SESSION_LOG.md in the repo is a timestamped build diary documenting every phase, including API call latencies during adversarial testing (Case 2, the 50-page contract, took 134.9 seconds end-to-end).
The session log is committed intentionally. The AI-assisted build process is part of the story — the point of this project series is to demonstrate what’s possible when you actually learn the tools rather than just knowing they exist.