Can Clojure Find Me An Apartment?

(2010)

This post was going to be about how I spent the better part of a day trying to get clojure and emacs and slime and the java classpath all working together.

The gist of it is this: I am an idiot sometimes. I spent most of an afternoon trying to figure out why it is an error to (use 'clojure.contrib). Earlier in the day, my classpath was setup wrong, so (use 'clojure.contrib.duck-streams) didn't work. At some point, I stopped typing the whole thing, thinking that if 'clojure.contrib.duck-streams works, then so should the parent package 'clojure.contrib. A-ha! Save myself a bit of typing! Nope. That never works.. so, when I finally did get my classpath working, I didn't know it because I was typing something that's just plain wrong. Hilarious and Awesome, huh?

So, with everything finally working, I made my first little half-way real Clojure program.

Our current lease runs out in about a 6 weeks, so me and my roommate need to find a new place to live - sounds like a job for Craigslist. There's a problem though: in big cities, Craigslist is absolutely flooded with apartments and the search functions just aren't that good. I have no interest in skimming hundreds or thousands of posts looking for that perfect combination of price/location/amenities (well, mostly price and location, actually), so why not let the computer do the work instead? Usually this would be a job for Python/BeautifulSoup, but in the interest of learning Clojure, here goes..

Following is what I've come up with so far for scraping apartments off Craiglist as gently as possible by filtering out links that don't meet my criteria. Right now, this code only generates the list of matching links, it doesn't actually follow them. If I continue further with this program, that will be Step 2, probably using http://lethain.com/entry/2009/nov/24/scalable-scraping-in-clojure/ for inspiration.

This is based on the Enlive library, which provides a very usable syntax for ripping through HTML (though I don't quite understand it all yet). As I'm still a complete beginner with Clojure and functional programming in general, the following code is probably far from idiomatic and may look sloppy to you pros out there. Comments and suggestions are welcome!

;; import enlive
(use 'net.cgrand.enlive-html)

;; html helper
(defn fetch-url [url]
 (html-resource (java.net.URL. url)))

;; pulls link from paragraph
;; ie, (map get-link (select *cl* [:p]))
(defn get-link [p]
 (:href (:attrs (first (:content p)))))

;; pulls text of link from paragraph
(defn get-link-text [p]
 (:content (first (:content p))))

;; pulls text of parens following link
;; usually this is zipcode/location info
;; "", if absent
(defn get-paren-text [p]
 (let [content (:content p)]
   (if (< 2 (count content))
     (:content (nth content 2))
     "")))

;; pulls link/text/location into a map
(defn get-all [p]
 {:link (get-link p)
  :text (str (get-link-text p)
          (get-paren-text p))})

;; some helpers to remove links we don't care about 

;; (affordable "$800" 600 1000) #t
;; (affordable "$1500" 600 1000) #f
(defn affordable? [text min max]
 (let [price (second (re-find #"\$(\d+)" text))]
   (if price
     (let [price (Integer/parseInt price)]
    (and (<= min price)
         (>= max price))))))

;; (has-kword "downtown" (list "down")) #t
;; (has-kword "down" (list "downtown")) #f
(defn has-kword? [text kwords]
 (let [vals (map #(re-find (re-matcher (re-pattern %) text)) kwords)]
   (some #(not (= nil %)) vals)))

;; parameterizes a function to decide if a link is worth retrieving
;; this would be cooler if the criteria functions
;; came in as a list too.. but that makes my head
;; spin.. maybe later
(defn keep-link? [min max areas beds]
 (fn [{link :link text :text}]
   (let [text (.toLowerCase text)]
     (and link
       (re-find #"/apa/" link)
       (affordable? text min max)
       (has-kword? text areas)
       (has-kword? text beds)))))

;; some top level definitions
;; you may need to change these to get non-empty results
(def *url* "http://yourcity.craigslist.org/apa/")
(def *min-price* 100)
(def *max-price* 10000)
(def *areas* (list "downtown" "west side" "etc"))
(def *beds* (list "2br" "3br"))
(def my-keep-link? (keep-link? *min-price* *max-price* *areas* *beds*))

;; actually do the work
(filter my-keep-link? (map get-all (select (fetch-url *url*) [:p])))

;; References
;; 1) http://wiki.github.com/cgrand/enlive/
;; 2) http://github.com/swannodette/enlive-tutorial/
;; 3) Programming Clojure, Stuart Halloway
;; 4) lots and lots of Googling

On the whole, I'm liking Clojure a lot, but there is also a lot to learn.

(Shocking conclusion, I know!)