AI Triage: What Goes Where

I didn’t set out to build a four-tool AI workflow.

It kind of… assembled itself through trial and error. Mostly error.

After a few weeks of testing all four platforms — sometimes side by side on the same prompt, sometimes just paying attention to where things broke down — I landed on a setup that actually works for how I think and what I’m building. Here’s the breakdown and, more importantly, why.

Gemini — The Curiosity Engine

Use it for: random “tell me more about this” questions, photo identification, and basic lookups.

Don’t use it for: anything requiring multi-step reasoning or producing structured output.

Gemini is essentially a smarter Google search with a conversational wrapper. For quick curiosity questions, it’s fine — it’s right there in the Google ecosystem, it’s fast, and it doesn’t require switching contexts.

But the moment you ask it to do anything that requires tracking multiple requirements across a conversation, it starts dropping things. I spent nine or thirteen follow-up prompts once trying to get it to produce a basic date-plus-log-entry spreadsheet. It just… couldn’t hold the shape of what I was asking.

I also caught it telling me the wrong time — casually, confidently, wrong by over two hours. It doesn’t actually know what time it is. Small thing, but it broke my trust in a way that’s hard to rebuild.

Good for: “What is this plant?“ and “give me a quick summary of X.” Not good for: anything you’d actually depend on.

Copilot — The Scaffolder

Use it for: framing up code files, generating boilerplate, HTML/JavaScript/Python structure.

Don’t use it for: full-stack UI work, visual coherence, anything that requires understanding the whole picture.

Copilot lives inside VS Code, which is its biggest advantage — it’s right there while you’re working. For getting the skeleton of a file laid out quickly, it earns its keep.

The limitation I kept hitting: it thinks like a back-end engineer. Which is fine, except I’m a full-stack developer who cares about UI/UX. Ask Copilot to make something look good or feel cohesive across a page, and it kind of shrugs. That gap matters to me.

Good for: “help me frame up this Python class” and “scaffold this API endpoint.” Not good for: “make this actually look like a real application.”

ChatGPT — The Swiss Army Knife

Use it for: pretty much everything that doesn’t fit the other three categories.

Don’t use it for: high-volume rendering or anything where context dropping is a dealbreaker on long sessions.

ChatGPT is the most versatile tool in the stack. Architecting a new feature, explaining a concept I don’t fully get yet, debugging something stubborn, coaching me through an unfamiliar framework — it handles all of it reasonably well.

The throttling is real, though. On the free tier, especially, push it too hard on image generation or complex back-to-back tasks, and it starts rationing. And on very long conversations, I’ve noticed it dropping constraints I set early — more on that in a future post about context windows and why they matter more than most people realize.

Good for: “help me think through this architecture” and “explain why this isn’t working.” Not good for: long sessions where you need it to remember everything from prompt one.

Claude — The Workhorse

Use it for: building applications that integrate AI. Agentic workflows. Anything where you need the AI to actually work rather than just respond.

Don’t use it for: quick one-off questions where you don’t need that depth.

This is where I’m spending most of my serious coding time right now. The context window is massive — up to a million tokens on the paid tier — which means it holds onto your requirements across a long session in a way the others don’t. And the agentic capability, the ability to take a goal and autonomously figure out the steps to get there, is genuinely a different league.

There’s also something about the tone. Claude matched my conversational style faster than any of the others. That sounds like a small thing until you realize how much friction disappears when you’re not constantly translating between how you think and how the tool wants to be addressed.

Good for: “build me an app that does X” and “here’s my goal — figure out how to get there.” Not good for: replacing your morning Google search.

The Bigger Point

Four tools. Four lanes. Minimal overlap.

The instinct when you’re new to all of this is to pick one and go all in — find the best AI and use it for everything. That’s not how this actually works. Each of these tools has a genuine strength and a genuine ceiling, and knowing the difference is the skill nobody’s really teaching.

I’m still refining this. The lanes shift a little as the tools update and as I get better at knowing what I’m actually asking for. But this is the map I’m working from right now.

Previously: What the Heck is Agentic? — the distinction that actually matters.

Next up: the job posting that made me want to flip a table. “3-5 years of LLM experience required.” Really.