I Almost Cancelled My Claude Pro Subscription. Here’s What Three Days of Token Rationing Actually Taught Me.

I want to tell you what “agentic AI” actually looks like when you stop reading the marketing and start building something real.

Spoiler: it looks like bike rides in 90-degree Sacramento heat, waiting for a token bucket to refill.


What I Was Trying to Build

The daily grind of a 27-month job search produces a specific kind of exhaustion. Not the dramatic kind. The quiet, grinding kind that comes from doing the same thing every morning — opening the same boards, scanning the same postings, making the same filtering decisions by hand — and getting the same results.

The idea was simple: automate the boring 95%. Point a pipeline at company Greenhouse boards — and eventually Ashby, Lever, every major ATS worth checking — pull everything new, filter the noise, vet the companies, score the postings against my resume, and hand me a short-ranked list of things actually worth applying to.

Two steps require genuine AI judgment: researching whether a company is worth pursuing, and comparing a job description against a real resume. Claude handles those. Everything else — fetching, deduplicating, filtering, ranking — is plain Python.

Clean architecture. Right instinct. I handed it to Claude Cowork to orchestrate.

That’s where things got interesting.


What “Autonomous” Actually Means

Claude Cowork, per the marketing: an agentic environment that handles complex tasks autonomously. You describe what you want. Claude drives the bash commands, file edits, and web fetches. You go do something else. It comes back done.

Here’s what three days actually looked like.

Claude Pro has usage limits — a token bucket that refills every 5 hours. For normal use, that’s plenty. For intensive Claude Code sessions, I’d burn through it in 2 to 2.5 hours. Claude Cowork building and running a real pipeline? Gone in under an hour. So each day became three sessions, rationed like water in a desert.

Morning: open Cowork, start a session, work until the tokens run out. Usually around two hours. Session dead. Mid-run. Whatever state the pipeline was in: frozen.

Then: bike ride. Walk. Anything to fill the time. It was 90+ degrees that week, so “light walks” meant staying in the shade and checking my phone every twenty minutes to see if the limit had reset yet.

Afternoon session. Another two hours. Another mid-run death.

Then TV. More waiting. Evening session around 5 or 6 pm, coding until it got dark. One night, I was “lucky enough” that the session lasted until 9 pm.

I genuinely felt like I’d won something.

Three days. Nine sessions. Zero clean completions. Ever.


What Actually Broke

The failures weren’t random. They were structural.

The bash sandbox served stale file reads — reporting a version of my main script 1,800 bytes shorter than what was actually on disk, confirmed by checking outside the session entirely. The Read tool could see the correct file at the exact same moment, but the bash tool couldn’t. Two tools. Same file. Same moment. Different file sizes. No fix. Bypass only.

An agent reconstructed a config file from “known content” when it appeared corrupted. The known content was a blank template. The real data — carefully assembled in a separate session — was silently replaced with empty rows. Caught only because an independent copy existed elsewhere. A near-miss that would have been invisible until the next run produced completely wrong results.

Board rotation wasn’t rotating. Six companies processed on every run, in the same order, every time. Anthropic’s own job board: never touched. Databricks: never touched. Reddit, MongoDB, every Ashby board, every Lever board: not once.

The pipeline was supposed to cover Greenhouse AND Ashby AND Lever AND more.

It couldn’t finish Greenhouse.

At attempt 9 of 10 on the final day: API overloaded. Session dead. Run state gone.

“I have yet to have done a ‘smooth run’ where Claude Cowork ran to full completion without hitting a session limit. Yet, we’re using such small numbers.”


The Math That Ended the Argument

A 40-company, 50-fit-assessment run — four times larger than anything I’d attempted — would cost under $2 in direct API calls. I calculated it using this project’s own measured output sizes.

Under two dollars.

The token budget was never the constraint. Not once. The sessions were being consumed by orchestration overhead — tool-call management, staleness workarounds, context windows filling up with infrastructure maintenance before the actual work got done.

The model wasn’t the bottleneck. The scaffolding was.


The Part That Actually Hurt

Here’s what nobody tells you about building a tool to solve your own problem: sometimes the tool works well enough to show you the results were always going to be disappointing.

When the pipeline finally limped across the finish line and handed me a Shortlist, the jobs were the same ones I’d been scrolling past on LinkedIn for 27 months. The ones that are either “we need a specialist in something you’ve never touched” or “we need 5 years of experience in a tool that’s 2 years old.”

The pipeline didn’t fail at the engineering level. It failed at that point.

I almost cancelled my Claude Pro subscription.


What Cowork Is Actually Good At

I want to be fair, because Cowork does have a real use case. It’s just not the one the marketing describes.

Summarize these ten files. Draft this document. Research this topic and write a report. Contained tasks with a clear start and a clear end that don’t require persistent state across dozens of tool calls.

That’s Cowork. That’s genuinely useful for what it is.

What it isn’t: a replacement for a real pipeline. Anything with multiple data sources, state that needs to survive across tool calls, retry logic, error handling, real scale — that’s Python. It was always going to be Python.

The course doesn’t tell you where that line is. Three days of session limits will.


The Lesson That Was Actually Worth Three Days

I’ve been thinking about AI wrong. The whole industry has been thinking about AI wrong.

It’s not a revolution. It’s not a framework. It’s not going to replace software engineering.

It’s a function call. A really powerful one, with a very long string of instructions as a parameter. It lives next to your for loops and your SQL queries and your try-catch blocks. You call it when you need judgment — unstructured reading, structured output, synthesis across ambiguous inputs. You write Python for everything else.

The two subagents in this pipeline — company research and fit assessment — are genuinely good. They do something real that plain code can’t do. But they’re two function calls inside a 2,800-line Python script. That’s the right proportion.

Use AI for judgment. Use Python for everything else. Don’t route a data pipeline through a conversational agent and act surprised when the conversation becomes the bottleneck.