Thoughts on Agentic Coding: Part 3, Experiments

I tried a handful of experiments with Claude Code in the last few weeks. Some were just for fun, and others were proof-of-concept to explore future project ideas. Mostly, I wanted to get a better understanding of how things are going with agentic coding.

Below are those projects. They're fun, but skip to the end for the takeaways.

1. Eyeballs

My first experiment was very goofy. I recreated an early project from my CS101 class back in 2001. Basically the eyeballs follow the mouse cursor around the screen.

(Click the canvas ->)

Not terribly complicated. I could do this pretty quick, but there is a little bit of trigonometry and I might need a pencil and paper. It was nice that it worked on the first try. A few days later, I repackaged it into a more portable script.

2. Scrabble Solver

Next was a Scrabble Solver. I wrote about it more here. I asked Claude to read a research paper from 1988 and implement the algorithm described. It seemed to work correctly. I am curious how well it would do without the paper (probably just fine!) I was tempted to add computer vision here to see if it could OCR an actual Scrabble board. Maybe later.

3. 3D Game

Screenshot of a spaceship game

Here I went for a 3D space game, to re-create another college project from the early 2000s. In the class, we wrote x86 Assembly, but I don't remember how anymore, so Claude got off easy, using Javascript.

The player navigates down a canyon and dodges obstacles. Most blocks cause damage, but a few special blocks are powerups. There are powerups and it works on desktop/mobile. The camera is wonky, but the game is entirely playable and even kinda fun.

A few times, I had to lead Claude in the right direction, but I didn't need to write code at all. Drawing the obstacles was troublesome. The top and side faces would sometimes overlap or be drawn in the wrong z-order. Capturing screenshots and asking Claude to inspect them fixed it. That Claude can look at an image, see the problem, and relate that back the code is mindblowing.

The other hiccup was collision detection. The player should lose a life upon colliding with an obstacle, but collisions never occurred. This took a few tries, but I solved it by asking Claude to write a log file with the position of every object. Allowing Claude to inspect log files or running program state is very powerful.

4. Blog Engine

The next project was to create a static site generator (this blog). This worked well, which wasn't too surprising because there are a zillion static site generators to draw inspiration from. Even so, the thing I was most impressed with here were the prompts. Claude had understood the assignment enough to ask questions before writing any code. Once I had described the task, there were 5-6 prompts asking how I wanted to handle certain preferences for things like storage, metadata and drafts. Later, I quickly added RSS, XML Sitemaps and S3 uploads.

5. YouTube Word Clouds

Two word clouds from YouTube video titles

Next was making Word Clouds for Youtube channels. My original idea is a little tricky to describe. When a new concept like RAG or MCP gets popular, there will be a ton of Youtube tutorials about it. I wanted to visualize those trends over time. I asked Claude to write some helper code to fetch the video metadata (date, channel, title) for a handful of pre-selected channels that I knew covered AI topics. Skimming the results as CSV, I wasn't sure how far I would get with extracting trends, so I switched tasks and decided to make a word cloud for each channel instead. The new idea was simply to visualize what each channel is "about" based on the frequency of keywords in its titles. Claude chose the python wordcloud library for this task, which is the same choice that I would have made.

6. BlueSky Mutual Follows

I know that BlueSky has a public API that shows which accounts a particular user is following. I wanted a program that would show overlapping follows, that is, given Account A and Account B, who are both of them mutually following? The idea here is discovery. If two people are both following a 3rd person, then I might be interested in following them too. Claude nailed this on the first try, generating a simple page using bootstrap, a text box, and a button. It even figured out how to paginate the API results automatically.

7. Hand Tracking Piano

Screenshot of a hand-tracking music game

This was the most delightful to me. Also 10-15 minutes to build, maybe less?

I initially asked for 12 squares on the screen, labeled as the musical notes A-G#, and the corresponding note to play when the associated key was pressed. That worked just fine.

Next I asked it to use the camera to do hand-tracking, with each hand represented as a red circle on the screen. That worked too, using a library called mediapipe. And then finally, I asked Claude to rearrange the notes in a circle, and to treat the hands as cursors that play the corresponding note when they intersect with a note. This was goofy and fun.

8. Ray Tracing

This was the first experiment that didn't really work. The idea was to create a ray tracer, which is a piece of software that takes a description of a scene and renders an image, pixel by pixel, by calculating how light would be emitted from the light source(s) and hit different objects. It would need to worry about things like materials and reflectivity and object textures and different types of light. There is a lot of trigonometry here and it needs to be exactly correct.

It did successfully render a scene, sorta.. but it was pretty far off the reference image. Probably a fail in any computer graphics class. This task feels do-able for Claude. But I didn't have the vocabulary or math in my head right then to lead us in the right direction.

UPDATE 1/14/26: I did a second experiment a few days later. Ray-tracing works just fine!

9. AWS Administration

Next I unleashed Claude on my personal AWS account.

Pro tip: don't do this!

AWS costs real money and there is no safety net.

I wanted it to create an EC2 instance and do some server admin tasks. This worked great. Claude loves shell scripting.

During this task, I experimented with using multiple models to improve the results. Two techniques I'd read about are a) repeatedly asking, essentially, "do it better" to iteratively improve the code and b) sending the results back and forth between different models, asking each to improve what it received. I found this worked pretty well. Claude Sonnet's initial shell scripts were fine, but ChatGPT 4 had some good suggestions.

10. Vocal Synthesis

Screenshot of a some sliders controlling sounds

By far the most wildly ambitious attempt. Vocal synthesis is a topic I know absolutely nothing about.

My naive idea was that human speech sounds are controlled by a handful of parameters, things like: mouth shape, tongue position, etc.. so if you can directly model those, then maybe you can generate speech?

Sounds plausible, right?

I think in reality:

it's much more complicated, and
they just use neural nets

Nevertheless, I wanted to experiment.

I tried to be clever here. First I asked Claude to do a BUNCH of research by reading about speech synthesis and then summarizing the results into a RESEARCH.md file for future reference. I don't know if the research was correct, but it was certainly impressive to watch. It wrote 750 lines of notes, referencing 6 academic papers and 28 online sources. To be clear, these were not hallucinated references. It was reading PDFs and doing the web searches.

The research took maybe 30 minutes of fetching and synthesizing before it was ready to start coding. Claude decided to model this using "Speech Frames" with 11 control parameters, such as tongue_height, lip_rounding, constriction_degree, etc. The parameters were valued from 0.0 - 1.0. I have no idea if this was an appropriate data representation. Pictured is the first attempt at a UI for vowel sounds. It did make sounds! And moving the sliders changed the sounds. But it sounded very "electronic", just musical beeps and boops and nothing like speech sounds.

Claude suggested a few things, but I was clueless about what direction to go next. It would likely take weeks (or even months/years) of research to learn even the basics, and far more to build a simulation. Pretty quickly, I decided to take the "L" and move on.

Conclusions

What exactly did I learn from these 10 experiments?

Well, first, it's fun. Agentic coding dramatically lowers the cost of exploration.

Okay, duh. But what else?

Claude is basically DWIM

DWIM stands for "Do What I Mean". Historically this has either been a joke or a dream. A system with DWIM would understand the user's intent, not just what they asked for. Claude achieves something close by prompting the user. Given a rambling feature description, Claude will ask questions to clarify what was actually meant.

This prompting behavior is probably the most surprising thing to me about the whole system.

With some knowledge of LLMs and programming, it's easy enough to see how prompting works technically. But there's magic in the way it builds trust by asking those key questions up front. "You care about frontmatter and directory structures? Me too!"

Feedback Loops are crucial

A tip that I picked up from other blogs and videos is the usefulness of feedback loops. Claude is a pretty good debugger, just from code inspection. But it becomes next-level when it can directly observe the program outputs through unit tests or logging. Many problems can be solved simply by logging more details, or allowing Claude to inspect the live running state.

Powerful, but not cheap. Inspecting logs consumes a lot of tokens and context space. Claude is pretty clever about this, using tools like "head", "tail" and "grep" to narrow what it consumes, but a long-running log analysis can get expensive.

Agents (like Claude) will replace a LOT of software

In making these 10 projects, I basically looked at zero documentation and wrote zero code. I peeked once or twice to make sure that a few libraries were legit, but that's about it. Partly, I felt it would defeat my purpose in experimenting, but also I just didn't really feel the need to write code or look at documentation.

But that leaves questions..

Why create new languages and libraries when agents can just patch together existing ones? Or even conjure them from scratch?
As agents become ubiquitous, will free APIs disappear entirely?
What even happens to the web when it's mostly bots reading it?

Tons of jobs and entire companies exist to provide "Dev Ops". Claude can just "do that". It's automatic. It's not even a feature, just an incidental behavior.

Nobody taught Claude how to wrangle AWS (I assume?). Yes, I'm sure there were some code examples in the training, maybe even a lot. But most of its ability comes from a) some general knowledge of cloud hosting concepts, b) being able to refer to the docs on demand and c) being clever with its own shell scripting. By combining those capabilities, it gains a fourth capability to "manage aws" entirely for free.

To me, that feels different. Managing AWS isn't just writing code. It manifests in the real world.

And those "free" capabilities, things it can just "do", are everywhere.

There is still plenty of room for experts

Claude did great on 8 of 10 experiments that I tried. The ones where it "failed" were Ray Tracing and Voice Synthesis, both attempted "from scratch". Those are basically physics simulations of the real world. They're also topics that I know literally nothing about. Claude is very cool. But it doesn't substitute (yet) for specialist domain knowledge.

Would the results improve if led by an expert? Almost certainly..

Would the results improve if I had spent 10x longer myself? Maybe for the ray tracer, very unlikely for the voice synthesizer.

Notes:

I did a second experiment later and ray-tracing works just fine!