Thoughts on Agentic Coding: Part 2, Debugging
I was impressed by Claude Code's reasoning and debugging ability during a recent coding session.
The quick version is that my RSS feed was displaying posts in the wrong order. I asked Claude to fix it, so that posts would be sorted in reverse chronological order.
The interaction went like this (heavily edited to capture the gist of the session, without posting a huge wall of text)
Me: Modify the RSS feed to display posts in reverse chronological order
Claude took this and went to work. It wasn't convinced of the problem.
Claude: <reads some files..>
Claude: Posts are already correctly sorted.
Claude: See? Here is the relevant code:
Claude: <code>..
The code snippet looked plausible, but I suspected that it was missing something. Undeterred, I asked again.
Me: Are you sure? What about <some little detail>?
Claude provided a more detailed explanation. It still wasn't convinced of the problem.
Claude: <reads some files..>
Claude: Yes, the sorting is correct.
Claude: Here are more details and the relevant code:
Claude: <code>
Claude's explanation looked correct. So I paused for a few minutes to check things. Maybe my eyes had deceived me earlier. Perhaps the feed was fine? Or maybe I had been looking at some old or defunct copy of the feed? Nope. I double-checked and was still 99% sure of the problem.
I tried a different angle. Asking an agent to run a test suite or inspect the logs is a powerful way for it to identify bugs.
Me: Can you run some tests with fake posts using different dates?
Claude went back to work..
Claude: Creating some fake posts..
Claude: Regenerating feed..
Claude: Reviewing feed..
But now Claude agreed with me.
Claude: You're right. The feed is showing posts in ascending order, not descending order
Claude: Let me check the feed generation code
Claude: <reads some files..>
Claude: I see the issue..
The feed generator relies on an external code package to handle some of the details of the RSS format. This is very common. Much of modern software development involves gluing together these 3rd party libraries. Believing that its own code was correct, Claude suspected that this external package must be the problem.
Claude: The feed generator is receiving posts in the correct order
Claude: But <some external code package> is likely re-sorting them
Then it proposed a clever way to test that idea. It planned to run that external package in isolation with known inputs. I was particularly impressed here. Human programmers can take several years to develop this debugging intuition. When the results don't make sense, you often need to re-test each of your assumptions. Claude wanted to perform a sanity-check.
Claude: Let me isolate that package and test it..
Claude: <suggests more code to test the hypothesis>
I guessed Claude was on the right track. There was no need to run the tests that it had proposed. Instead, I asked it to jump ahead. By reading the documentation for the external package, Claude could determine the correct usage.
Me: Nevermind. Just check the package documentation
Upon checking the documentation, Claude found a fix.
Claude: <downloads the documentation>
Claude: I found the issue. The add_entry() method..
Claude: <updates the code>
Claude: <verifies the solution>
And with that, the problem was solved.
Now a few observations on the interaction.
Claude can solve "real" problems
It's pretty interesting what happened here.
- Claude held a view (that the sorting was already correct)
- It ran an experiment to gather data
- It changed its view
- It developed a hypothesis
- It planned a way to test the hypothesis
That's crazy.
Claude defended the status quo, TWICE. And then it changed its mind in light of new data.
Basically that's the scientific method in action. Observe, hypothesize, do an experiment..
Call it "agentic plan following" or whatever you like, but to me, it looks like problem solving.
That doesn't mean Claude can solve EVERY problem. But it can probably solve 95% of the typical code problems faced by most developers. Finding a solution was non-trivial and required a very sophisticated relationship with the codebase. There is no obvious ceiling on what types of problems can be solved this way.
Working with an agent is remarkably similar to working remotely
This felt exactly like a debugging session with a coworker, conducted via text over Slack. No surprise there. Plenty of companies offer "AI co-workers" who can send emails and slack messages.
"It works on my machine" is a meme among developers. Often, it means: "it works for me, so you must be doing something wrong". That's exactly what happened here. I suspected a problem and Claude didn't agree that the problem existed. Then after some back-and-forth, we got on the same page. We both suggested ideas, and quickly the problem was solved. After years of working remotely, coding through the agent interface was no different.
I know that I am anthropomorphizing a LOT in this article, saying the agent "thinks", "wants" or "believes" various things. As a computer programmer, that feels quite natural, like saying "Windows wants you to eject the USB drive" or "the website thinks you have the wrong email". But I get that not everyone is comfortable with that usage for AI. It's a discussion to keep an eye on.
Was the solution technically correct?
It worked. That's all that matters, right? Well, maybe..
Claude's solution here was adequate, but in fact, it wasn't perfect. During this writeup, I realized that a slightly better solution was possible. Since this a personal project, the sorting was fixed to my satisfaction. That was good enough for me.
But in a production system, that "good enough" vibe-coding attitude is dangerous. An agent could choose a very bad solution that looks fine on the surface. Trusting the agent, a developer could approve that solution without the slightest investigation. Malicious actors and supply chain attacks both amplify that danger.