My AI Agents Don't Think QA is Important. 🪳

Happy Friday!

Some of you might remember my email from November where I shared my first impressions of using Claude Code. The short version was that it felt like a highly educated developer at their first job who kept breaking the project file and couldn't follow TDD to save its life.

Well, I've come a long way since then, and people keep asking me how I actually use it now. So I wrote it all up.

The big shift was treating Claude Code less like a coding tool and more like a small engineering team. I have specialized agents all coordinated by an engineering manager agent who owns the process and keeps things from going off the rails.

Each agent gets its own context window, which means the more specialized they are, the longer they can work before they forget things and start regressing. If you read my earlier email about the context window problem, this is my answer to it.

Agents have defined responsibilities, documentation they're expected to maintain, and instructions about who to collaborate with. I tell the engineering manager what I want to work on, we discuss it, and then I let them run. It works well enough that I had to set up notifications because it can run for long stretches without needing me.

Now, it is not magic. Agents still sometimes ignore their own definitions, and there are things it just can't do like add files to an XCode project.

And who knew you'd have to go out of your way to get a development team to work with QA? I have explicit instructions for it, and it still slips. If you've ever managed developers, you're probably nodding.

I published my setup, each agent's definition, rules files, and how I handle documentation. If you're curious or experimenting with any of this, take a look.

On a related note, I want to point you to something I wrote recently about the direction of AI: AI - Use With Care. Everything in it is still relevant, and I think it's important context for anyone making decisions around AI right now.

I have some background in AI research from college, specifically planning systems, which are a completely different technique from the LLMs everyone is excited about. That background is why I keep pushing people to temper their expectations about what LLMs can or will ever do on their own. LLMs have fundamental limitations that aren't going away with bigger and newer models. They struggle with genuine reasoning and logic. They hallucinate confidently and can't reliably tell you when they're wrong. They have no real ability to plan or break down complex multi-step problems. Their knowledge is frozen at training time and has no grounding in the real world. And they have no concept of what they actually know versus what they're guessing at.

These issues are never going away. They're inherent to how the technique works. LLMs are probability engines, and probability doesn't mean "probably correct." It just means the most likely next thing.

Having said all of that, I am optimistic about where things are going. There is promising research into incorporating other AI techniques. Planning systems that can actually reason about sequences of actions, neural networks designed for different problem types, and hybrid architectures that play to each technique's strengths.

When those techniques mature and get integrated, we'll see something much more capable than what we have today. But that future isn't here yet, and pretending LLMs alone will get us there is how people get burned.

So here's my question for you: are you using AI in your work yet? And if so, are you impressed, frustrated, or somewhere in between? I'd genuinely like to know.

Reply and tell me.

Sincerely,

Ryan

PS: I'm taking on new clients. Check out my case studies if you'd like to see some of the work I've done with previous clients and schedule a call.

PPS: This email was drafted with Claude. Could you tell? I did make a few edits.