So my most recent Android program is an anagram finder called Unanagram. For some reason I really like writing progams to solve word puzzles like Yahoo Word Racer and Scrabble. Anyway, while it seems like finding anagrams is a really easy thing to do, it turned out to be a little tricky. Following are some notes on the complications encountered in writing for Android and some solutions.

There is an easy algorithm to determine if two words are anagrams of each other: just sort the letters of each word alphabetically. If they match, they are anagrams. For example: "cats" and "acts" both sort alphabetically as "acst".

Trickier though is finding ALL the anagrams for a particular set of letters. You could sort every word in the dictionary as a "key" and keep the unsorted word as the value. The trouble is that this obligates you to check every key and requires a fair amount of space as well. Also, it doesn't really allow spaces which is vital for multi-word results. With only 16mb per program, memory-intensive algorithms are not viable.

After a fair amount of experimentation with different data structures, I decided to use my reliable friend, the Trie. This has the dual benefit of being space-efficient and having fast lookups. (A DAG might be more appropriate, but the Trie worked so well I didn't bother investigating)

Nevertheless, a few complications arose.

First, you have to load the data structure from disk. Usually files like a dictionary would go into /assets. However, files in that directory get compressed, so they take longer to load. For a large file the time is unacceptable. For faster loading they can go in /res/raw.

Then there is actually building the data structure. A Trie is built through successive insertions which add branches to the tree as necessary. While adding 60k words takes less than a second on my laptop, it took 5 MINUTES on Android. That wasn't gonna work. Since the application only needs lookups, and not insertions, I decided to build the Trie offline, serialize it into a pseudo-binary format and then load that in Android. While the file size increased from 600k to 1.5Mb, the loading time dropped to 6 seconds. This is still slow, but much more acceptable. By beginning the loading in a background thread as soon as the application starts, it becomes unnoticeable. Someone more clever than myself may be able to get both better compression and better load time, but that was "good enough" for my purposes.

Now because you only get 16Mb of RAM, it is necessary to build the Trie using as little memory as possible. I was able to get down to about 4.5Mb by building the Trie as a series of nodes, each with a Child pointer, a Next pointer and a Character indicating the node's letter. This is a bit different from your typical Trie, which stores a list of child nodes. An uppercase letter was used to indicate a terminal node, rather than adding a boolean flag.

That's great for the in-memory size of the tree, however, with a branching factor of 2, this tree has ENORMOUS depth. While again my laptop has no problem, Android hits a StackOverflow error when the recursion gets too deep. Happily, this is fixable by converting the naturally recursive traversal into an iterative algorithm with an external Stack. Yuck. But it works and requires an insignificant amount of memory. It's almost certainly faster as well, but I didn't check.

A final complication is that Android programs must always be ready to be Paused or Killed. Since generating ALL the anagrams for a long word can easily take over a minute, it would be bad news to re-start the search everytime the program gets killed (which happens a LOT - incoming calls, switching the screen orientation, and other programs contending for memory can all kill an app). Since the lookup is stack based, it was fairly straightforward to build a resume function which rebuilds the stack using the letters of the last word found.