Can Clojure Find Me An Apartment?

This post was going to be about how I spent the better part of a day trying to get clojure and emacs and slime and the java classpath all working together.

The gist of it is this: I am an idiot sometimes. I spent most of an afternoon trying to figure out why it is an error to (use ‘clojure.contrib). Earlier in the day, my classpath was setup wrong, so (use ‘clojure.contrib.duck-streams) didn’t work. At some point, I stopped typing the whole thing, thinking that if ‘clojure.contrib.duck-streams works, then so should the parent package ‘clojure.contrib. A-ha! Save myself a bit of typing! Nope. That never works.. so, when I finally did get my classpath working, I didn’t know it because I was typing something that’s just plain wrong. Hilarious and Awesome, huh?

So, with everything finally working, I made my first little half-way real Clojure program.

Our current lease runs out in about a 6 weeks, so me and my roommate need to find a new place to live – sounds like a job for Craigslist. There’s a problem though: in big cities, Craigslist is absolutely flooded with apartments and the search functions just aren’t that good. I have no interest in skimming hundreds or thousands of posts looking for that perfect combination of price/location/amenities (well, mostly price and location, actually), so why not let the computer do the work instead? Usually this would be a job for Python/BeautifulSoup, but in the interest of learning Clojure, here goes..

Following is what I’ve come up with so far for scraping apartments off Craiglist as gently as possible by filtering out links that don’t meet my criteria. Right now, this code only generates the list of matching links, it doesn’t actually follow them. If I continue further with this program, that will be Step 2, probably using http://lethain.com/entry/2009/nov/24/scalable-scraping-in-clojure/ for inspiration.

This is based on the Enlive library, which provides a very usable syntax for ripping through HTML (though I don’t quite understand it all yet). As I’m still a complete beginner with Clojure and functional programming in general, the following code is probably far from idiomatic and may look sloppy to you pros out there. Comments and suggestions are welcome!

;; import enlive
(use 'net.cgrand.enlive-html)
 
;; html helper
(defn fetch-url [url]
  (html-resource (java.net.URL. url)))
 
;; pulls link from paragraph
;; ie, (map get-link (select *cl* [:p]))
(defn get-link [p]
  (:href (:attrs (first (:content p)))))
 
;; pulls text of link from paragraph
(defn get-link-text [p]
  (:content (first (:content p))))
 
;; pulls text of parens following link
;; usually this is zipcode/location info
;; "", if absent
(defn get-paren-text [p]
  (let [content (:content p)]
    (if (< 2 (count content))
      (:content (nth content 2))
      "")))
 
;; pulls link/text/location into a map
(defn get-all [p]
  {:link (get-link p)
   :text (str (get-link-text p)
	      (get-paren-text p))})
 
;; some helpers to remove links we don't care about 
 
;; (affordable "$800" 600 1000) #t
;; (affordable "$1500" 600 1000) #f
(defn affordable? [text min max]
  (let [price (second (re-find #"\$(\d+)" text))]
    (if price
      (let [price (Integer/parseInt price)]
	(and (<= min price)
	     (>= max price))))))
 
;; (has-kword "downtown" (list "down")) #t
;; (has-kword "down" (list "downtown")) #f
(defn has-kword? [text kwords]
  (let [vals (map #(re-find (re-matcher (re-pattern %) text)) kwords)]
    (some #(not (= nil %)) vals)))
 
;; parameterizes a function to decide if a link is worth retrieving
;; this would be cooler if the criteria functions
;; came in as a list too.. but that makes my head
;; spin.. maybe later
(defn keep-link? [min max areas beds]
  (fn [{link :link text :text}]
    (let [text (.toLowerCase text)]
      (and link
	   (re-find #"/apa/" link)
	   (affordable? text min max)
	   (has-kword? text areas)
	   (has-kword? text beds)))))
 
;; some top level definitions
;; you may need to change these to get non-empty results
(def *url* "http://yourcity.craigslist.org/apa/")
(def *min-price* 100)
(def *max-price* 10000)
(def *areas* (list "downtown" "west side" "etc"))
(def *beds* (list "2br" "3br"))
(def my-keep-link? (keep-link? *min-price* *max-price* *areas* *beds*))
 
;; actually do the work
(filter my-keep-link? (map get-all (select (fetch-url *url*) [:p])))
 
;; References
;; 1) http://wiki.github.com/cgrand/enlive/
;; 2) http://github.com/swannodette/enlive-tutorial/
;; 3) Programming Clojure, Stuart Halloway
;; 4) lots and lots of Googling

On the whole, I’m liking Clojure a lot, but there is also a lot to learn.

(Shocking conclusion, I know!)

Posted in Programming | Tagged , , | Leave a comment

A Few Cool Videos From Google Tech Talks

I keep meaning to find some interesting podcasts and online lectures. There’s a ton of material out there, but so much of it sucks. Anyway, browsing the topic “What are the best Google Tech Talks” on Stackoverflow, I found the following, which I now link for your viewing pleasure:

XKCD visits Google – Very funny and interesting, but perhaps less enjoyable unless you’re an xkcd fanboy like me. Jump to 21:30 where xkcd answers a joking question from Donald Knuth.

PolyWorld: Using Evolution to Design Artificial Intelligence – An interesting A-Life experiment/visualization. Jump to 5:35 for some really neat video of an older program that evolves different body morphologies for efficient movement in a simulated physical environment. (I think this is the original work the speaker is citing)

The Next Generation of Neural Networks – The speaker flies through the intro material much too fast for me to understand with only a rudimentary knowledge of NN. Nevertheless, the demo at 21:35 is cool, as is the discussion around 31:40 of using these layered NN for document clustering and classification.

Posted in Random Links | Tagged | Leave a comment

Batch Extracting MP3s from YouTube Videos

Last night I wanted to extract audio tracks from a number of YouTube videos that I’d downloaded using youtube-dl. Being only a so-so shell scripter, I’ve always resorted to ugly for-loops when manipulating multiple files. This invariably ends badly when my loop improperly handles whitespace and mangles the filenames.

No more! Skimming a tutorial last night I stumbled on something that heavy shell users already know: the -exec parameter for the find command. This allows you to specify a command to run on everything that find finds. In the case of extracing audio from MP3s, it works like this:

find . -name '*.flv' -exec ffmpeg -i '{}' '{}.mp3' ';'

This command looks in the current directory for flv files and uses ffmpeg to extract the audio to another file with the same name, plus the .mp3 extension. The funny brackets {} are substituted for the file name.

A downside to this approach – your files end up with names like .flv.mp3 instead of .mp3. If that bothers you, you can fix it with the rename command which uses regexes to rename files:

rename 's/\.flv\.mp3/\.mp3/' *.flv.mp3

Ubuntu users like myself will need to install ffmpeg and ubuntu-restricted-extras to get the necessary encoder.

There are certainly lots of other ways to encode a directory worth of files, but I think this one is pretty cool.

Posted in Programming | Tagged , | Leave a comment

Collaborative Filtering, Hadoop and the Hazards of Copy-Paste

I’ve been working on a new App idea lately – a recommender for Android programs. Basically, it looks at what you have installed (and possibly ratings) and recommends other applications you might like by using the recommendations of other people in the same way as Amazon or the various music services – in a word – collaborative filtering.

There are different ways to do collaborative filtering, but they are all expensive when you get a lot of records to sort through. Two common approaches are 1) Calculate the similarity of users, and recommend apps liked by similar users, or 2) Calculate the similarity of apps, and recommend apps similar to ones the user likes. I am trying the second way, known as item-based collaborative filtering or the model-based approach, which allows for fast queries at the cost of an expensive offline step that re-computes the item similarities every once in awhile.

My initial tests in Python, based on the very interesting book “Programming Collective Intelligence” quickly became too slow with just a few thousand users and apps. Because there are already around 5,000 apps and a few million users of Android (with many more every day), there’s no way the script would be able to handle the future growth of the platform.

Enter MapReduce and Hadoop. The explanation is better left to the pros, but simply, MapReduce is a way of parallelizing certain types of computations across many computers and then merging the final results. With the availability of Amazon Web Services, which allows you to rent a cluster of computers by the hour, it becomes possible to run a prohibitively expensive computation once every few days for just a couple of dollars. There are several different MapReduce frameworks out there, but I choose to try Hadoop, which is available on Amazon’s services and used heavily by Yahoo and many others.

There will be a lot more to say about Hadoop as I gain more experience. But all-in-all, it is pretty fun to re-think an algorithm, even just a little bit, to make it suitable for MapReduce. I *think* I have a correct implementation of Item-Based Collaborative Filtering running on my tiny 2-node cluster and it’s pretty cool!

One snag I ran into while trying to get my cluster running using the ubiquitous WordCount example for Hadoop. Like most people, I copy-pasted the source from the Hadoop tutorial and tried to run it. It ran, great! So then instead of reading the rest of the documentation, I immediately tried to modify it. Eventually, I ended up trying to make the simplest change – to return Text instead of IntWritables from the Map operation and — WTF!?! I spent HOURS trying to figure out why there was a ClassCastException. So for other poor souls trying to modify the WordCount example, there are 3 things you need to do:

First, get the method signatures right. The Mapper has to output Text and the Reducer has to consume Text (Eclipse will help with that, of course)

Second, add the lines: “conf.setMapOutputKeyClass(Text.class);” and “conf.setMapOutputValueClass(Text.class);” to the main() method. These tell Hadoop that the Mapper is not using the default, IntWritable, for output

Third, and crucially important, remove the line “conf.setCombinerClass(Reduce.class);”. Discovering that I needed to remove that single line took me about half a day, digging through the logs and Googling everything I could think of until I discovered this thread. Because it was part of the example, I assumed it was Hadoop boiler-plate that was essential — it’s not, it’s an optimization. The Combiner is kind of like a pre-Reduce phase that saves time by combining in-memory results instead of writing them to disk and combining them later. The Combiner needs a method signature that accepts the output of the Mapper and is still suitable as input to the Reducer. Otherwise, it chokes.

So is the peril of the copy-paster who runs code without really understanding all of it ~~

Posted in Programming | Tagged , | Leave a comment

Extending Android’s Chronometer to get Elapsed Time

In revamping one of my Games on the Android Market, I wanted to add an onscreen timer. The built-in Chronometer class does 95% of what I needed, but getting that extra 5% was annoying enough that I decided to post my solution.

The OOTB Chronometer keeps tracks of time, but it can’t tell you the number of seconds elapsed since it started. This is easily remedied, as I discovered from the fine people on StackOverflow. However, simply getting the elapsed time is not enough because the Chronometer will reset itself everytime the app is Paused or Killed. The two cases are quite different. When the app is Paused, it is still alive and may resume (ex. an incoming call). When the app is Killed, it needs to store the elapsed time somewhere so it can be recovered when the app is Restored (ex. orientation change, or b/c the OS needs memory).

So here is a stripped down solution that seems to do the job:

import android.app.Activity;
import android.content.Context;
import android.os.Bundle;
import android.os.SystemClock;
import android.util.Log;
import android.view.Menu;
import android.widget.Chronometer;
 
public class CustomChronometerActivity extends Activity {
	private static final String TAG = "CustomChronometerActivity";
	private static final String MS_ELAPSED = "com.etc.etc.MsElapsed";
 
	private MyChronometer chrono;
 
	@Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
 
        //start the chronometer
        chrono = new MyChronometer(this);
        chrono.start();
        setContentView(chrono);
    }
 
	@Override
	protected void onPause() {
		Log.i(TAG, "onPause()");
		super.onPause();
		chrono.stop();
	}
 
	@Override
	protected void onResume() {
		Log.i(TAG, "onResume()");
		super.onResume();
		chrono.start();
	}
 
	@Override
	protected void onSaveInstanceState(Bundle outState) {
		super.onSaveInstanceState(outState);
		Log.i(TAG, "onSaveInstanceState()");
		chrono.stop();
		outState.putInt(MS_ELAPSED, chrono.getMsElapsed());
	}
 
	@Override
	protected void onRestoreInstanceState(Bundle savedInstanceState) {
		super.onRestoreInstanceState(savedInstanceState);
		Log.i(TAG, "onRestoreInstanceState()");
		int ms = savedInstanceState.getInt(MS_ELAPSED);
		chrono.setMsElapsed(ms);
		chrono.start();
	}
 
	class MyChronometer extends Chronometer {
 
		public int msElapsed;
		public boolean isRunning = false;
 
		public MyChronometer(Context context) {
			super(context);
		}
 
		public int getMsElapsed() {
			return msElapsed;
		}
 
		public void setMsElapsed(int ms) {
			setBase(getBase() - ms);
			msElapsed  = ms;
		}
 
		@Override
		public void start() {
			super.start();
			setBase(SystemClock.elapsedRealtime() - msElapsed);
			isRunning = true;
		}
 
		@Override
		public void stop() {
			super.stop();
			if(isRunning) {
				msElapsed = (int)(SystemClock.elapsedRealtime() - this.getBase());
			}
			isRunning = false;
		}
	}
}

Certainly there are other ways to do this – either by implementing your own timer using Threads or Handlers, or perhaps by implementing an OnChronometerTickListener and subscribing to events. I rather like this solution, but if you’re the clever sort and see some situation where this doesn’t work or some reason why it might be a bad idea, please let me know.

Posted in Programming | Tagged , | Leave a comment

More Reasons I Love Python

I was cleaning up some folders the other day at work where the files had been named using one of several naming schemes (or a few with no particular scheme at all). After brief consideration, I decided to do the legwork of renaming all the files with a naming scheme that actually makes sense:

Category_YYYY-MM-DD

That way, the files will stay grouped together if they get copied around to other folders, and they sort alphabetically by date. Then there’s the task for regenerating all the HTML for these baddies. Happily, Python was up to the task:

import os
from datetime import date
files = os.listdir("Path\\To\\File")
files.sort()
files.reverse()
for file in files:
    # chop the prefix, chop the suffix, split into (year, month, date), convert to int
    x = [int(x) for x in file.split("_")[-1][:-4].split('-')]
    print "<li><a href=\"/path/to/%s\">%s</a></li>" % (file, date(x[0], x[1], x[2]).strftime('%B %d, %Y'))

Well, it’s nothing like the real pros can do. But you gotta love a few links of code that save your fingers from a repetitive and typo-prone task like manually editing hundreds of links.

Posted in Programming | Tagged | Leave a comment

Getting Your Stats From The Android Marketplace with PHP and CURL

A few weeks ago I mentioned one way to get your developer stats off the Android Developer Console automatically. Unfortunately, despite being very awesome, Firefox + MozRepl is not super-great for this task. When a plugin is updated, Firefox hangs on startup. That’s fine, but it kinda sucks for scripting. I’m sure there’s a way around it, but that difficulty makes a good excuse for coming back to solve this problem the right way.

Following is a PHP script that uses cURL to login to the Developer Console and grab the market stats. Unfortunately, Google’s app is written in GWT and the its Javascript is completely obfuscated. The market stats are fetched as JSON data and then somehow parsed, but I haven’t been able to figure out how exactly. If you run this script (or just look using Firebug), you’ll see that the JSON is a gigantic array. While the data of interest are clearly present in this array (total downloads, current installed base, rating, etc..), I haven’t been able to figure out how to parse it reliably. If you’ve tried this and figured it out, I’d love to know!

This script was assembled from a bunch of random PHP/cURL tutorials and may contain redundancy, unnecessary cURL settings, etc. Python fans, see the comments of my other post on this topic where a kind soul has demonstrated the same thing in Python using mechanize.

<?php
//setup a temp file to store cookies
$ckfile = tempnam ("/tmp", "CURLCOOKIE");
 
//do google authorization
$data = array('accountType' => 'GOOGLE',
          'Email' => 'YOUR_ACCOUNT_EMAIL_HERE',
          'Passwd' => 'YOUR_ACCOUNT_PW_HERE',
          'source' => '',
          'service' => 'androiddeveloper');  
 
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://www.google.com/accounts/ClientLogin");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
$output = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
 
//grab the AUTH token for later
$auth = '';
if($info['http_code'] == 200) {
    preg_match('/Auth=(.*)/', $output, $matches);
    if(isset($matches[1])) {
        $auth = $matches[1];
    }
}
 
//login to Android Market
//this results in a 302
//I think this is necessary for a cookie to be set
$ch = curl_init ("http://market.android.com/publish?auth=$auth");
curl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
 
//go to the Developer Console
$ch = curl_init ("http://market.android.com/publish/Home");
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
 
//grab the JSON data
//perm and postdata seem to have changed in the last 6 months
//if the script isn't working, try using firebug to inspect the Request when
//http://market.android.com/publish/editapp gets fetched
$perm = "81E29277804F7729E9B743A43B2EFD07";
$headers = array(
    "Content-Type: text/x-gwt-rpc; charset=utf-8",
    "X-GWT-Permutation: $perm",
    "Referer: http://market.android.com/publish/gwt/$perm.cache.html");
//not sure what x-gwt-permutation means, I think it may have to do with which version of GWT they serve based on your browser
$postdata = "5|0|4|http://market.android.com/publish/gwt/|09C42EAE15B55219550B2D800FAC1644|com.google.wireless.android.vending.developer.shared.AppEditorService|getFullAssetInfosForUser|1|2|3|4|0|";
$ch = curl_init ("http://market.android.com/publish/editapp");
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata);
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
 
//now what?!?
echo('<pre>');
$output = json_decode(substr($output, 4));
print_r($output);

If you run this script and are willing to send me your stats, that would be super-helpful. Maybe I’ll be able to get enough data to figure out why some apps have more fields than others. With only 3 apps currently on the market, I don’t have much to go on. Feel free to obscure your data, but please make the changes obvious and note whether the app is free/paid and what part of the market it appears on (games/apps and sub-category). Here is a link to my best guesses so far in an Excel worksheet: market-json

Posted in Programming | Tagged , , | Leave a comment

Finding Anagrams is Harder Than It Should Be

So my most recent Android program is an anagram finder called Unanagram. For some reason I really like writing progams to solve word puzzles like Yahoo Word Racer and Scrabble. Anyway, while it seems like finding anagrams is a really easy thing to do, it turned out to be a little tricky. Following are some notes on the complications encountered in writing for Android and some solutions.

There is an easy algorithm to determine if two words are anagrams of each other: just sort the letters of each word alphabetically. If they match, they are anagrams. For example: “cats” and “acts” both sort alphabetically as “acst”.

Trickier though is finding ALL the anagrams for a particular set of letters. You could sort every word in the dictionary as a “key” and keep the unsorted word as the value. The trouble is that this obligates you to check every key and requires a fair amount of space as well. Also, it doesn’t really allow spaces which is vital for multi-word results. With only 16mb per program, memory-intensive algorithms are not viable.

After a fair amount of experimentation with different data structures, I decided to use my reliable friend, the Trie. This has the dual benefit of being space-efficient and having fast lookups. (A DAG might be more appropriate, but the Trie worked so well I didn’t bother investigating)

Nevertheless, a few complications arose.

First, you have to load the data structure from disk. Usually files like a dictionary would go into /assets. However, files in that directory get compressed, so they take longer to load. For a large file the time is unacceptable. For faster loading they can go in /res/raw.

Then there is actually building the data structure. A Trie is built through successive insertions which add branches to the tree as necessary. While adding 60k words takes less than a second on my laptop, it took 5 MINUTES on Android. That wasn’t gonna work. Since the application only needs lookups, and not insertions, I decided to build the Trie offline, serialize it into a pseudo-binary format and then load that in Android. While the file size increased from 600k to 1.5Mb, the loading time dropped to 6 seconds. This is still slow, but much more acceptable. By beginning the loading in a background thread as soon as the application starts, it becomes unnoticeable. Someone more clever than myself may be able to get both better compression and better load time, but that was “good enough” for my purposes.

Now because you only get 16Mb of RAM, it is necessary to build the Trie using as little memory as possible. I was able to get down to about 4.5Mb by building the Trie as a series of nodes, each with a Child pointer, a Next pointer and a Character indicating the node’s letter. This is a bit different from your typical Trie, which stores a list of child nodes. An uppercase letter was used to indicate a terminal node, rather than adding a boolean flag.

That’s great for the in-memory size of the tree, however, with a branching factor of 2, this tree has ENORMOUS depth. While again my laptop has no problem, Android hits a StackOverflow error when the recursion gets too deep. Happily, this is fixable by converting the naturally recursive traversal into an iterative algorithm with an external Stack. Yuck. But it works and requires an insignificant amount of memory. It’s almost certainly faster as well, but I didn’t check.

A final complication is that Android programs must always be ready to be Paused or Killed. Since generating ALL the anagrams for a long word can easily take over a minute, it would be bad news to re-start the search everytime the program gets killed (which happens a LOT – incoming calls, switching the screen orientation, and other programs contending for memory can all kill an app). Since the lookup is stack based, it was fairly straightforward to build a resume function which rebuilds the stack using the letters of the last word found.

Posted in Programming | Tagged , | Leave a comment

Geocoding in Sharepoint Lists

Today I was thinking it would be nice to do some Geocoding in Sharepoint. Specifically, I wanted to make it so list items could have longitude and latitude fields that could be populated with a button click from the EditItem page. Geocoding is pretty easy to do with both Google and Yahoo. For my situation, Yahoo seemed more appropriate.

My first intuition was that this should be easy to do with Javascript. Just call the Yahoo Maps API with the right data and parse the response. Except.. this usage violates the same site origin policy for Javascript. Drat. Well, there are several things that can be done. If you’re handy with C# and M$ technologies, you can just create a proxy on the same server. Unfortunately, I don’t know the first thing about the Microsoft stack and I’m too lazy to learn. As an alternative, I opted to create a PHP proxy on another server and force it to return JSON, which skirts around the same site origin problem.

First, the code for the PHP proxy which lives somewhere besides the Sharepoint server. This proxy forwards requests to Yahoo, parses the response, and emits JSON back to the caller.

<?php
//get params from request
$appid = 'YOUR_YAHOO_APPID';
$street = $_GET['street'];
$city = $_GET['city'];
$state = $_GET['state'];
 
//build new request
$req = 'http://local.yahooapis.com/MapsService/V1/geocode?';
$req .= 'appid=' . $appid;
$req .= '&amp;street=' . urlencode($street);
$req .= '&amp;city=' . urlencode($city);
$req .= '&amp;state=' . urlencode($state);
 
//fetch XML using cURL
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $req);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result = trim(curl_exec($ch));
curl_close($ch);
 
//parse XML
$xml = simplexml_load_string($result);
$lat = $xml->Result[0]->Latitude;
$lng = $xml->Result[0]->Longitude;
 
//return JSON
echo $_GET['jsoncallback'] . "({lat: \"$lat\", lng: \"$lng\"})";
?>

Now, the Javascript part of this uses jQuery to make life much much easier. It basically injects a new Button onto the page. When clicked, it builds a request from the location data on the page, sends the request to the proxy, parses the response, and puts the latitude and longitude into the form fields. Then the user can click ‘Save’ as usual. This code needs to be added to the EditItem.aspx page for that particular list. You also need to have jQuery on the page. In my case, I just included them both as external scripts to keep my changes to .aspx pages to a minimum.

$(document).ready(function() {
	//create a new Button, match Sharepoint styles
	var geoButton = $('<input type="button">').attr({'class':'ms-ButtonHeightWidth', 'value':'GeoCode'});
 
	//add our Button after the default 'Cancel' Button
	$('.ms-formtoolbar .ms-toolbar:last').after($('<td>&amp;nbsp;</td>').attr({'class':'ms-separator'}));
	$('.ms-formtoolbar .ms-separator:last').after(geoButton);
 
	//wrap Button in a Table to match Sharepoint's style
	geoButton.wrap($('<td></td>').attr({'class': 'ms-toolbar', 'no-wrap':'true'}))
		 .wrap($('<table></table>').attr({'cellspacing':'0','cellpadding':'0','width':'100%'}))
		 .wrap($('<tbody></tbody>'))
	         .wrap($('<tr></tr>'))
                 .wrap($('<td></td>').attr({'nowrap':'','align':'right','width':'100%'}));
 
	//onClick, perform geoCode and put Long/Lat into form fields
	geoButton.click(function() {
 
	//get data from form fields
	var street = $('input[title=Street]').val();
	var city =  $('input[title=City]').val();
	var state =  $('input[title=State]').val();
 
	//fail early if some data is absent, since we wouldn't get a good geocode
	if(street == '' || city == '' || state == '')
  	    return;
 
	//wrap data into a URL so we can do an HTTP GET
	var address = '&amp;street='+street+'&amp;city='+city+'&amp;state='+state;
 
	//use jQuery.getJSON to avoid that pesky cross domain security restriction
	//?jsoncallback=? is a peculiarity required by jQuery, the server must echo this back
	$.getJSON("http://path_to/yahoo-geocoder.php?jsoncallback=?"+address,
		function(json) {
    		    //async callback, unpack the data
 		    var lat = json.lat;
		    var lng = json.lng;
       		    //simple error detection
		    if(lat == '' || lng == '') {
			//show '!!!' after form fields to indicate that geocoding failed
			$('input[title=Latitude]')
			.after($('<span>!!!</span>').attr({'id':'error-lat'}).css('color','red'));
			$('input[title=Longitude]')
			.after($('<span>!!!</span>').attr({'id':'error-lng'}).css('color','red'));
		    } else {
			//success, place results into form fields
			$('input[title=Latitude]').val(lat);
			$('input[title=Longitude]').val(lng);
			//remove any previous error indicators
			$('#error-lat').remove();
			$('#error-lng').remove();
		    }
		});
	});
});
Posted in Programming | Tagged , | Leave a comment

Python, PIL and Pretty Polaroids

I suspect that by now everyone and their grandmother has written a script to convert photos so they look like Polaroids. Yesterday I spent a slow morning at work replacing all our slideshows (which used a super-ugly Flash control) with these:

puppies-img_0253

puppies-img_0254

There’s plenty of other neat effects that could be done.. maybe add a bit of aging or apply some filters. But I think it looks pretty good. The following script uses Python, PIL (Python Imaging Library), and a pre-drawn “polaroid” frame.

import PIL, time, glob, random, os, sys
from PIL import Image, ImageOps, ImageEnhance, ImageDraw, ImageFont
 
# Generate Polaroid-looking images
def make_polaroid(infile, outfile, text=''):
    base = (300,320)    #size of polaroid background
    polaroid = Image.open('polaroid-0.png')
    polaroid = ImageOps.fit(polaroid, base, Image.ANTIALIAS, 0, (0.5,0.5))
 
    target = (272,248); # size of empty target area on polaroid background
    img = Image.open(infile)
    img = ImageOps.fit(img, target, Image.ANTIALIAS, 0, (0.5,0.5))
 
    #enhance the image a bit
    img = ImageOps.autocontrast(img, cutoff=2)
    img = ImageEnhance.Sharpness(img).enhance(2.0)
 
    #draw the text, if any
    font = ImageFont.truetype("arial.ttf", 16)
    text_size = ImageDraw.Draw(polaroid).textsize(text, font=font)
    fontxy = (base[0]/2 - text_size[0]/2, 278)
    ImageDraw.Draw(polaroid).text(fontxy, text, font=font, fill=(40,40,40))
 
    #copy the image onto the polaroid background
    imgcorner = (14,20) #paste image onto polaroid
    polaroid.paste(img, imgcorner)
 
    #copy the whole thing onto a larger background and rotate randomly
    angle = random.randint(-10,10)
    blank = Image.new(polaroid.mode, (400,400))
    blank.paste(polaroid, (blank.size[0]/2-polaroid.size[0]/2, blank.size[1]/2-polaroid.size[1]/2))
    blank = blank.rotate(angle, Image.BICUBIC)
 
    blank.save(outfile)
 
if __name__ == "__main__":
    # Takes 1 required argument -- the desired prefix for the output filename
    if len(sys.argv) &lt; 2:
        print "Missing required positional argument 'prefix'"
        exit()
 
    # Text to appear on image, use "" if none 
    text = "Some Text, or leave blank"
 
    # Erase everything in Output folder
    for f in glob.glob('output/*'):
        os.remove(f)
 
    # Create Polaroids of each JPG in Input folder
    files = [f[6:] for f in glob.glob('input/*.jpg')]
    for f in files:
        make_polaroid('input/'+f,'output/'+sys.argv[1]+'-'+f[:-4]+'.jpg',text)
 
    # Write index.html so Output folder can be copied/renamed elsewhere
    files = [f[7:] for f in glob.glob('output/*')]
    outhtml = open('output/index.html','w')
    outhtml.write("<html><head></head><body style='background-color: #000;'><div align='center'><p>")
 
    for i in range(len(files)):
        outhtml.write("<img src='%s' />" % (files[i]))
        if (i+1) % 2 == 0:
            outhtml.write("</p>")
    outhtml.write("</div></body></html>")
    outhtml.close()

The script is a bit over-specialized to my purpose .. converting a bunch of individual folders one at a time. So you may need to hack on it a bit to suit your needs. You can download the script here: polaroid.zip. Place files you want to convert into the “input” folder. Run the script with a single argument for the output filename prefix. It will take a few seconds or minutes to run, depending on how many photos you’re converting. When it finishes, copy the “output” folder elsewhere. The file “index.html” is pre-generated to contain all the photos in the folder.

Posted in Programming | Tagged | Leave a comment