My latest experiment has been using Claude to solve ciphers. It's pretty good at it and quite interesting to observe and follow along.

NOTE: for anyone unfamiliar with this topic, the ciphers discussed here are considered "easy", computationally speaking. These are NOT the same thing as modern cryptography, which has completely different mathematical foundations. That Claude can decipher these historical, pre-WW2 codes is interesting, but it's nothing to worry about.

Method

I picked 10 ciphers that are solvable through standard cryptanalysis techniques. Then I provided Claude Sonnet 4.5 with each ciphertext, and asked it to recover the plaintext. I experimented with a few different prompts. The plaintext was a passage of English text consisting of 786 uppercase characters, with no spaces or punctuation.

The agent workspace and HTTP server ran in separate docker containers with no shared files, to ensure isolation. I wanted to eliminate the possibility that the agent could accidentally see the original plaintext or cipher code. Any generated files and scripts were deleted after each run to avoid leaking information between attempts.

Agent Gameplay

The rules went like this:

  • The agent is instructed to fetch a ciphertext from an HTTP server and recover the plaintext
  • Solutions are submitted via HTTP
  • The server responds either "correct=true" or "correct=false"
  • If the solution is correct, game over, the agent wins
  • Otherwise, the agent continues analyzing and guessing

The agent is allowed to encipher additional known plaintexts via HTTP. So for example, the agent might submit "AAAAAA" and receive back "BCDEFG". This allows the agent to test hypotheses and explore how the cipher behaves in different circumstances.

Results

Cipher Long Prompt (steps) Short Prompt (steps)
ROT-13 6 5
Caesar 5 4
Affine 12 22
Substitution 15 39
Rail Fence 16 20
Columnar Transposition 20 37
Vigenere 8 10
Autokey 10 NA
Playfair 99 (with hint) 33
ADFGX 97 (with hint) 76 (with hint)

The table above shows the number of "steps" required to solve each cipher. This data is from a single trial of 10 ciphers with 2 different prompts. Here a "step" refers to an LLM input/output roundtrip. The "hint" in those last two cases occurred after the agent got stuck and halted execution. I provided the name of the cipher and asking it to continue working.

The behavior of Playfair and ADFGX is interesting. Both are 5x5 grids of 25 letters (typically "J" is omitted, and encoded as "I" if necessary). The agent would correctly identify the use of some grid-based cipher, and even recover most or all of the grid, but then get stuck. After providing the hint of the correct cipher name, the agent would very quickly reach a solution, since it already had done most of the work.

So.. not perfect? But 8/9 out of 10 is still impressive.

Interesting Behaviors

There were some interesting behaviors while the agent attempted to solve these ciphers.

Seeking Hints

In one case, the agent noticed that it wasn't making progress, and it tried to locate the server code, to find hints.

Check if there's a server configuration file that might give a hint.

Using external solvers

In one early test, the agent started searching for web-based cipher solving services. In another case, the agent downloaded a python package specifically designed to help solving the Playfair cipher. Pretty clever. A+ for resourcefulness.

I was watching the progress fairly closely here and stopped those. Later on, I modified the instructions to specifically disallow using external services, since I was trying to test analytic ability.

Sending thousands of HTTP requests

In one case, the agent wrote a python script that generated thousands of HTTP requests in order to obtain a table of plaintext => ciphertext mappings for analysis. Fortunately, these were running over local docker-to-docker networking, so the requests never left my machine. In several other cases, the agent considered this or similar methods, but decided against it, on the basis that it would be too slow or wasteful.

Sending thousands of HTTP requests isn't ideal. But using that strategy with a local function to build a mapping table would be pretty reasonable.

Problems and Considerations

Choosing a Plaintext

My original idea was to use a passage from a book as the plaintext. However, LLMs are trained on literature. Using a famous text might allow an analyst to "get lucky" by decoding a few letters (ex: "IT W__ THE BEST..") and then just guessing the rest. I also didn't want to use random words or nonsense. Ultimately, I asked ChatGPT to generate a fake press release about economic activity in a fictional world. Any lucky discoveries would still be useful, but they wouldn't give away the entire passage.

Reversible Ciphers
My first attempt was buggy. I didn't notice that until seeing the agent get stuck with several near-perfect decodings that it couldn't quite solve 100%. Turns out that a few of the ciphers were not actually reversible. Essentially I had designed the world's worst hash functions. Huh?

It should be the case that:

decode(encode(plaintext)) == plaintext

If you encode some text and then decode it, you should get the original back.

Well, I had omitted the decode functions, and I hadn't adequately checked that each encipherment was actually reversible. (Perhaps surprisingly, decoding is not required the play the game)

I fixed this by adding decode functions for each cipher, reworking each encoding to ensure reversibility, and expanding the unit test suite to double-check on lots of varied input strings.

Capitalization, Spacing and Punctuation
Another big modification was to limit the character set to only uppercase letters (A-Z). In my initial setup, I had allowed lowercase, spaces and punctuation.

Before the digital era, most codes had a very limited character set, so it seemed appropriate to retain that behavior.

So instead of this:

Tonight, meet at the Prancing Pony Inn

The plaintext would look like this:

TONIGHTMEETATTHEPRANCINGPONYINN

My first implementation of these ciphers had been closer to a "homemade" or "extended" variant of each algorithm. Switching to uppercase-only made the ciphers closer to their original behavior, and eliminated penalties for minor mistakes due to capitalization or spacing.

I/J Merging

One problem was specific to ADFGX and Playfair. As mentioned above, these ciphers use a 5x5 alphabet grid (25 letters, instead of 26). The letter "J" gets replaced with "I". Slightly garbled text is easy enough for a human (or agent) to read and understand. But it's a problem for scoring, when solutions are required to be 100% correct. I avoided this by just rewording the plaintext, so that no words contained the letter "J". Other options would have been to accept solutions with minor typos, or let the agent keep working out how to handle individual words with I/J mixups.

Prompt Complexity

I experimented with two main prompts. The shorter prompt had fewer than 100 words and basically said "Here are the rules, now fetch the ciphertext and play". The longer prompt had many more guidelines and suggested specific strategies to employ, such as analyzing letter frequencies, repeated sequences, and digraph patterns.

The shorter prompt performed admirably well. In some cases, it resulted in fewer steps (roundtrips). But in nearly every case, the longer prompt solved the cipher using 30-60% fewer tokens.

Part of this was luck. Agent behavior is non-deterministic. If the agent hits on the right idea early on, it can solve the cipher more quickly.

Part of this was efficient frontloading of work and context building in the longer prompt. Suggesting specific techniques caused the agent to perform many statistical tests up front, rather than discovering each one as needed, and performing them as separate tests later in the analysis.

I experimented with a few other prompts as well:

  • listing all the candidate ciphers
  • requesting a specific approach: "adopt a hypothesis, then try to disprove it"

Neither of these seemed to make a huge difference in the outcome, but I didn't spend a ton of time here and it would be interesting to experiment more. I would also be interesting to run each test 10-20 times, to eliminate the "luck" factor. But that feels like a waste of tokens without a more specific goal in mind.

Analytic Methods Used by Agent

Below is a partial transcript showing bits of the agent's reasoning process. Each line here is the agent's own description of an individual python script that it wrote to test something while narrowing in on a specific solution:

Test cipher consistency assumption
..
Try Autokey, Beaufort, and Variant Beaufort ciphers
Try substitution and transposition ciphers
Comprehensive search with advanced scoring
Try Porta cipher and common keywords
..
Use alphabetical sequences to deduce key
..
Test if encoding is deterministic
Test encoding length ratios
Try Bifid cipher decryption
..
Analyze alphabetical sequence positions
Analyze sequence position GCD
Try variations of best hill climbing key
Try Gronsfeld cipher
Try all Caesar shifts
Try famous phrases as keys
..
Intelligent search for 4-letter key
..

This comes from a single run. It's interesting to watch. The LLM knows a lot of types of ciphers and how to test for them. It hops back and forth with different ideas.

Possible Followups

Prompting Strategies
I'd be interested to try different prompting to achieve more systematic exploration. It feels like a falsification strategy should be effective here. Sometimes the agent gets stuck for awhile on the wrong idea. Instead of trying to prove that its idea is correct, I wonder if trying to quickly disprove potential ciphers would be more effective.

Replacing Context with Code
I had set things up so that each experiment was completely independent. But it would be interesting to have the agent build out a reusable code library over the course of many experiments. Presumably later attempts would improve. Less context building should be required, as more statistical tests and cipher knowledge would get enshrined into code.

Novel Ciphers
I tried to use "classic" ciphers here. But I'm curious how the agent would perform on a novel cipher. I'm also curious whether I can invent something truly novel by myself. It'd be easy to glue a couple of these together in a new way. But what about something truly unique? Could I create a new method that has the spirit of classic ciphers, without resembling them at all?

Something to think about.