<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Craige&#039;s Programming Stuff</title>
	<atom:link href="http://craiget.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://craiget.com</link>
	<description>Misc programming notes</description>
	<lastBuildDate>Fri, 06 Apr 2012 10:35:14 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2</generator>
		<item>
		<title>testing</title>
		<link>http://craiget.com/2012/04/testing/</link>
		<comments>http://craiget.com/2012/04/testing/#comments</comments>
		<pubDate>Fri, 06 Apr 2012 10:35:14 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://craiget.com/?p=347</guid>
		<description><![CDATA[tumblrize]]></description>
			<content:encoded><![CDATA[<p>tumblrize</p>
]]></content:encoded>
			<wfw:commentRss>http://craiget.com/2012/04/testing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chrome pageYOffset vs IE6 fixed positioning</title>
		<link>http://craiget.com/2012/02/chrome-pageyoffset-vs-ie6-fixed-positioning/</link>
		<comments>http://craiget.com/2012/02/chrome-pageyoffset-vs-ie6-fixed-positioning/#comments</comments>
		<pubDate>Wed, 15 Feb 2012 15:05:51 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://craiget.com/?p=342</guid>
		<description><![CDATA[I spent awhile beating my head on this one today and couldn&#8217;t find the answer on Google (I know!), so just documenting here for future generations &#8211; in Safari and Chrome (so probably anything webkit-based), window.pageYOffset is zero (or something &#8230; <a href="http://craiget.com/2012/02/chrome-pageyoffset-vs-ie6-fixed-positioning/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I spent awhile beating my head on this one today and couldn&#8217;t find the answer on Google (I know!), so just documenting here for future generations &#8211; in Safari and Chrome (so probably anything webkit-based), window.pageYOffset is zero (or something small and unexpected) when you have this in your CSS:</p>

<div class="wp_syntax"><div class="code"><pre class="css" style="font-family:monospace;">html<span style="color: #00AA00;">,</span>body <span style="color: #00AA00;">&#123;</span> 
  <span style="color: #000000; font-weight: bold;">height</span><span style="color: #00AA00;">:</span> <span style="color: #933;">100%</span><span style="color: #00AA00;">;</span>
  <span style="color: #000000; font-weight: bold;">overflow</span><span style="color: #00AA00;">:</span> <span style="color: #993333;">auto</span><span style="color: #00AA00;">;</span>
<span style="color: #00AA00;">&#125;</span></pre></div></div>

<p>Those styles are part of a fairly common hack to create fixed positioning in IE6. So they aren&#8217;t <em>really</em> necessary in the first place. </p>
<p>Firefox has a different behavior, giving the expected scroll height (I dunno which is technically correct standards-wise).</p>
<p>Here&#8217;s a minimal example demonstrating the difference:</p>

<div class="wp_syntax"><div class="code"><pre class="html" style="font-family:monospace;">&lt;!DOCTYPE html&gt;
&lt;html&gt;
&lt;head&gt;
&lt;title&gt;Some Title&lt;/title&gt;
&lt;style type=&quot;text/css&quot;&gt;
html,body {
  height: 100%;
  overflow: auto;
}
&lt;/style&gt;
&lt;script type=&quot;text/javascript&quot;&gt;
// getPageScroll() by quirksmode.com, as seen everywhere
function getPageScroll() {
    var xScroll, yScroll;
    if (self.pageYOffset) {
      yScroll = self.pageYOffset;
      xScroll = self.pageXOffset;
    } else if (document.documentElement &amp;&amp; document.documentElement.scrollTop) {
      yScroll = document.documentElement.scrollTop;
      xScroll = document.documentElement.scrollLeft;
    } else if (document.body) {// all other Explorers
      yScroll = document.body.scrollTop;
      xScroll = document.body.scrollLeft;
    }
    return new Array(xScroll,yScroll)
}
&lt;/script&gt;
&lt;/head&gt;
&lt;body&gt;
  &lt;div style=&quot;height:1000px&quot;&gt;&lt;/div&gt;
  &lt;a href=&quot;#&quot; onclick=&quot;console.log(getPageScroll());return false;&quot;&gt;Scroll&lt;/a&gt;
&lt;/body&gt;
&lt;/html&gt;</pre></div></div>

<p>Open this page in Firefox and Chrome (or Safari) and click the button labeled &#8220;Scroll&#8221; and the computed height will print to the Console.</p>
]]></content:encoded>
			<wfw:commentRss>http://craiget.com/2012/02/chrome-pageyoffset-vs-ie6-fixed-positioning/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Your Android Developer Account Will Live Forever</title>
		<link>http://craiget.com/2011/11/your-android-developer-account-will-live-forever/</link>
		<comments>http://craiget.com/2011/11/your-android-developer-account-will-live-forever/#comments</comments>
		<pubDate>Tue, 22 Nov 2011 00:21:30 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://craiget.com/?p=333</guid>
		<description><![CDATA[I&#8217;ve been shutting down my app business since it doesn&#8217;t really make enough money to be worth the hassle of properly running a business, filing taxes, etc.. Part of that process is closing all my accounts, including the Android Developer &#8230; <a href="http://craiget.com/2011/11/your-android-developer-account-will-live-forever/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been shutting down my app business since it doesn&#8217;t really make enough money to be worth the hassle of properly running a business, filing taxes, etc.. Part of that process is closing all my accounts, including the Android Developer account. Well, apparently that is not possible. Following a couple rounds of emails to Google, they say the account cannot be archived or deleted. The best you can do is to unpublish all your apps and change your password to something random. I was met with a similar surprise at the end of last year when I tried to close an account with an advertiser &#8211; &#8220;YOU WANT TO DO WHAT?!?&#8221; &#8211; they have thousands of clients, but apparently no one had ever asked to close their account before.</p>
<p>Uncool.</p>
]]></content:encoded>
			<wfw:commentRss>http://craiget.com/2011/11/your-android-developer-account-will-live-forever/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Automatically Save HTML Of Every Page You Visit</title>
		<link>http://craiget.com/2011/10/automatically-save-html-of-every-page-you-visit/</link>
		<comments>http://craiget.com/2011/10/automatically-save-html-of-every-page-you-visit/#comments</comments>
		<pubDate>Sun, 16 Oct 2011 15:26:02 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[chrome]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[jquery]]></category>

		<guid isPermaLink="false">http://craiget.com/?p=291</guid>
		<description><![CDATA[For the last couple of weeks, I&#8217;ve been thinking about the best way to capture the HTML of every webpage I visit. Sure, you can always write a screen scraper or bot, but I guess I wanted something a little &#8230; <a href="http://craiget.com/2011/10/automatically-save-html-of-every-page-you-visit/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>For the last couple of weeks, I&#8217;ve been thinking about the best way to capture the HTML of every webpage I visit. Sure, you can always write a screen scraper or bot, but I guess I wanted something a little more organic.</p>
<p>The right answer to this problem is probably: caching proxy. Alternatively, tcpdump or some cleverness with copying the temporary files from the browser cache might also work. However, I think there&#8217;s a strong case for using the browser directly: first, you get nice cleaned-up HTML, and second, you get javascript execution (handy if there is ajax stuff on the page or if you want to use jQuery for pre-processing the HTML).</p>
<p>Basically, you want this:</p>
<ul>
<li>Load a webpage as normal</li>
<li>Inject an additional script to..</li>
<li>Grab the DOM as a string</li>
<li>POST to a webserver to save it for processing later (avoiding cross-domain rules)</li>
</ul>
<p>In Firefox you&#8217;ve got Greasemonkey and User Scripts. These work in Chrome too, but it seems like the cross-domain restriction may be problematic. I didn&#8217;t investigate too much further after reading that there <strong>might</strong> be a problem. Happily, if you write a proper full-on Chrome Extension, you can specify exceptions to the cross-domain rules.</p>
<p>So, following is the script I pieced together this morning. It&#8217;s a chrome extension that grabs the source of every page you load (using jQuery&#8217;s DOM methods). Then it POSTs to your local webserver. My example below is pretty minimal just to demonstrate that it works. Maybe someday I&#8217;ll package it as a real extension, make it configurable and release it, but, you know, probably not.</p>
<p>Use at your own risk and all the usual disclaimers. Also, you should probably lock down the <strong>permissions</strong> and <strong>matches</strong> attributes to only run on your local server against the pages you&#8217;re interested in.</p>
<p><strong>manifest.json</strong></p>

<div class="wp_syntax"><div class="code"><pre class="json" style="font-family:monospace;">{
  &quot;name&quot;: &quot;Capture HTML and POST to local server&quot;,
  &quot;version&quot;: &quot;0.0.1&quot;,
  &quot;description&quot;: &quot;Capture HTML and POST to local server&quot;,
  &quot;permissions&quot;: [
    &quot;http://*/*&quot;
  ],
  &quot;content_scripts&quot;: [
    {
      &quot;matches&quot;: [&quot;http://*/*&quot;],
      &quot;js&quot; : [&quot;jquery.min.js&quot;,&quot;contentscript.js&quot;],
      &quot;run at&quot;:&quot;document_end&quot;
    }
  ],
  &quot;background_page&quot;: &quot;background.html&quot;
}</pre></div></div>

<p><strong>contentscript.js</strong></p>

<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">function</span> captureHTML<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #003366; font-weight: bold;">var</span> html <span style="color: #339933;">=</span> <span style="color: #3366CC;">'&lt;html&gt;'</span> <span style="color: #339933;">+</span> $<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'html'</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">html</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> <span style="color: #3366CC;">'&lt;/html&gt;'</span><span style="color: #339933;">;</span>
    chrome.<span style="color: #660066;">extension</span>.<span style="color: #660066;">sendRequest</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#123;</span>html<span style="color: #339933;">:</span> html<span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span> <span style="color: #003366; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span>response<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #000066;">alert</span><span style="color: #009900;">&#40;</span>response.<span style="color: #660066;">result</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
captureHTML<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p><strong>background.html</strong></p>

<div class="wp_syntax"><div class="code"><pre class="html" style="font-family:monospace;">&lt;html&gt;
&lt;head&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;jquery.min.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot;&gt;// &lt;![CDATA[
&nbsp;
    chrome.extension.onRequest.addListener(
        function(request, sender, sendResponse) {
            var html = request.html;
            var url = 'http://localhost/recv.php';
            var data = {html:html};
            $.post(url, data, function(result) {
                sendResponse({result: result});                    
            });
    });
&nbsp;
// ]]&gt;&lt;/script&gt;
&lt;/head&gt;
&lt;/html&gt;</pre></div></div>

<p><strong>recv.php</strong></p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;?php</span>
<span style="color: #000088;">$html</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$_POST</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'html'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$result</span> <span style="color: #339933;">=</span> <span style="color: #990000;">strlen</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$html</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">echo</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$result</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #990000;">error_log</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$html</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>Also, you will need to download a copy of the latest minimized jQuery and save it into the extension folder as jquery.min.js. The PHP receiver needs to go somewhere on your local server and be sure to set the matching path in background.html.</p>
<p>So it seems to work. I think it&#8217;s kinda fun. If you know a better way to do this, please let me know.</p>
<p>Resources:</p>
<ul>
<li><a href="http://code.google.com/chrome/extensions/samples.html#script">http://code.google.com/chrome/extensions/samples.html#script</a></li>
<li><a href="http://stackoverflow.com/questions/2588513/why-doesnt-jquery-work-in-chrome-user-scripts-greasemonkey">http://stackoverflow.com/questions/2588513/why-doesnt-jquery-work-in-chrome-user-scripts-greasemonkey</a></li>
<li><a href="http://blog.michael-forster.de/2009/08/using-jquery-to-build-google-chrome.html">http://blog.michael-forster.de/2009/08/using-jquery-to-build-google-chrome.html</a></li>
<li><a href="http://code.google.com/chrome/extensions/messaging.html">http://code.google.com/chrome/extensions/messaging.html</a></li>
</ul>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://craiget.com/2011/10/automatically-save-html-of-every-page-you-visit/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Extracting Table Data From PDFs with OCR</title>
		<link>http://craiget.com/2011/09/extracting-table-data-from-pdfs-with-ocr/</link>
		<comments>http://craiget.com/2011/09/extracting-table-data-from-pdfs-with-ocr/#comments</comments>
		<pubDate>Thu, 01 Sep 2011 01:44:22 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[ocr]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://craiget.com/?p=266</guid>
		<description><![CDATA[PDF is the ideal format for things you don&#8217;t want anybody to read. Kidding.. sort of.. I am a bit biased against PDFs. Though I reluctantly admit their usefulness in a very few situations, mostly they&#8217;re just annoying. For a &#8230; <a href="http://craiget.com/2011/09/extracting-table-data-from-pdfs-with-ocr/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>PDF is the ideal format for things you don&#8217;t want anybody to read.</p>
<p>Kidding.. sort of.. I am a bit biased against PDFs. Though I reluctantly admit their usefulness in a very few situations, mostly they&#8217;re just annoying. For a recent project, I wanted to extract a <strong>bunch</strong> of data from PDF documents (several hundred pages). All the data was nicely arranged in table format, as if it had been exported from Excel or something. Why the original Excel documents were not made available remains a mystery. Unfortunately, <em>Select All &#8211; Copy &#8211; Paste</em> completely mangled the text, but happily, it was possible to wrangle the data from the PDFs via OCR and some Python scripting.</p>
<p>The script below works like this:</p>
<ul>
<li>Take a PDF file</li>
<li>Split it into separate pages</li>
<li>Convert each page into an image file (pixels)</li>
<li>Locate the horizontal and vertical lines on each page (long runs of black pixels)</li>
<li>Segment the image into cells using the line coordinates</li>
<li>Clean up each cell (remove borders, threshold to black and white)</li>
<li>Perform OCR on each cell</li>
<li>Assemble results into a 2D array</li>
</ul>
<p>Optical Character Recognition is pretty amazing stuff, but it isn&#8217;t always perfect. To get the best possible results, it helps to use the cleanest input you can. In my initial experiments, I found that performing OCR on the entire document actually worked pretty well as long as I removed the cell borders (long horizontal and vertical lines). However, the software compressed all whitespace into a single empty space. Since my input documents had multiple columns with several words in each column, the cell boundaries were getting lost. Retaining the relationship between cells was very important, so one possible solution was to draw a unique character, like &#8220;^&#8221; on each cell boundary &#8211; something the OCR would still recognize and that I could use later to split the resulting strings.</p>
<p>Instead, I decided to OCR each cell individually. While slower, this seemed cleaner, more flexible, and easier to debug.</p>
<p>So here&#8217;s the code, there are a few dependencies:</p>
<ul>
<li>Recent-ish Python</li>
<li><a href="http://www.pythonware.com/products/pil/">PIL</a> (Python Imaging Library)</li>
<li><a href="http://code.google.com/p/tesseract-ocr/">Tesseract OCR</a> (I am using v3, but I think v2 will work too)</li>
<li><a href="http://www.imagemagick.org/">ImageMagick</a> (to split PDFs into multiple pages)</li>
</ul>
<p>It is slightly tuned to the particular files I was interested in (for example, it expects the cell borders to be solid black). It is also pretty slow &#8211; so if you need to process a massive number of pages, this won&#8217;t work for you. Also, it expects to operate in the directory you run it from and it expects there to be a subdirectory called &#8220;working&#8221; for temporary files. I suppose I should make the script do that automatically.. lazy, I guess..</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> Image, ImageOps
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">subprocess</span>, <span style="color: #dc143c;">sys</span>, <span style="color: #dc143c;">os</span>, <span style="color: #dc143c;">glob</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># minimum run of adjacent pixels to call something a line</span>
H_THRESH = <span style="color: #ff4500;">300</span>
V_THRESH = <span style="color: #ff4500;">300</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> get_hlines<span style="color: black;">&#40;</span>pix, w, h<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Get start/end pixels of lines containing horizontal runs of at least THRESH black pix&quot;&quot;&quot;</span>
    hlines = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> y <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span>h<span style="color: black;">&#41;</span>:
        x1, x2 = <span style="color: black;">&#40;</span><span style="color: #008000;">None</span>, <span style="color: #008000;">None</span><span style="color: black;">&#41;</span>
        black = <span style="color: #ff4500;">0</span>
        run = <span style="color: #ff4500;">0</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> x <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span>w<span style="color: black;">&#41;</span>:
            <span style="color: #ff7700;font-weight:bold;">if</span> pix<span style="color: black;">&#91;</span>x,y<span style="color: black;">&#93;</span> == <span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>,<span style="color: #ff4500;">0</span>,<span style="color: #ff4500;">0</span><span style="color: black;">&#41;</span>:
                black = black + <span style="color: #ff4500;">1</span>
                <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> x1: x1 = x
                x2 = x
            <span style="color: #ff7700;font-weight:bold;">else</span>:
                <span style="color: #ff7700;font-weight:bold;">if</span> black <span style="color: #66cc66;">&gt;</span> run:
                    run = black
                black = <span style="color: #ff4500;">0</span>
        <span style="color: #ff7700;font-weight:bold;">if</span> run <span style="color: #66cc66;">&gt;</span> H_THRESH:
            hlines.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: black;">&#40;</span>x1,y,x2,y<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> hlines
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> get_vlines<span style="color: black;">&#40;</span>pix, w, h<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Get start/end pixels of lines containing vertical runs of at least THRESH black pix&quot;&quot;&quot;</span>
    vlines = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> x <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span>w<span style="color: black;">&#41;</span>:
        y1, y2 = <span style="color: black;">&#40;</span><span style="color: #008000;">None</span>,<span style="color: #008000;">None</span><span style="color: black;">&#41;</span>
        black = <span style="color: #ff4500;">0</span>
        run = <span style="color: #ff4500;">0</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> y <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span>h<span style="color: black;">&#41;</span>:
            <span style="color: #ff7700;font-weight:bold;">if</span> pix<span style="color: black;">&#91;</span>x,y<span style="color: black;">&#93;</span> == <span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>,<span style="color: #ff4500;">0</span>,<span style="color: #ff4500;">0</span><span style="color: black;">&#41;</span>:
                black = black + <span style="color: #ff4500;">1</span>
                <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> y1: y1 = y
                y2 = y
            <span style="color: #ff7700;font-weight:bold;">else</span>:
                <span style="color: #ff7700;font-weight:bold;">if</span> black <span style="color: #66cc66;">&gt;</span> run:
                    run = black
                black = <span style="color: #ff4500;">0</span>
        <span style="color: #ff7700;font-weight:bold;">if</span> run <span style="color: #66cc66;">&gt;</span> V_THRESH:
            vlines.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: black;">&#40;</span>x,y1,x,y2<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> vlines
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> get_cols<span style="color: black;">&#40;</span>vlines<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Get top-left and bottom-right coordinates for each column from a list of vertical lines&quot;&quot;&quot;</span>
    cols = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span>, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>vlines<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">if</span> vlines<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span> - vlines<span style="color: black;">&#91;</span>i-<span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span> <span style="color: #66cc66;">&gt;</span> <span style="color: #ff4500;">1</span>:
            cols.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: black;">&#40;</span>vlines<span style="color: black;">&#91;</span>i-<span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>,vlines<span style="color: black;">&#91;</span>i-<span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>,vlines<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span>,vlines<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">3</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> cols
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> get_rows<span style="color: black;">&#40;</span>hlines<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Get top-left and bottom-right coordinates for each row from a list of vertical lines&quot;&quot;&quot;</span>
    rows = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span>, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>hlines<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">if</span> hlines<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span> - hlines<span style="color: black;">&#91;</span>i-<span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">3</span><span style="color: black;">&#93;</span> <span style="color: #66cc66;">&gt;</span> <span style="color: #ff4500;">1</span>:
            rows.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: black;">&#40;</span>hlines<span style="color: black;">&#91;</span>i-<span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>,hlines<span style="color: black;">&#91;</span>i-<span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>,hlines<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span>,hlines<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">3</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> rows          
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> get_cells<span style="color: black;">&#40;</span>rows, cols<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Get top-left and bottom-right coordinates for each cell usings row and column coordinates&quot;&quot;&quot;</span>
    cells = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> i, row <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">enumerate</span><span style="color: black;">&#40;</span>rows<span style="color: black;">&#41;</span>:
        cells.<span style="color: black;">setdefault</span><span style="color: black;">&#40;</span>i, <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> j, col <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">enumerate</span><span style="color: black;">&#40;</span>cols<span style="color: black;">&#41;</span>:
            x1 = col<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
            y1 = row<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>
            x2 = col<span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span>
            y2 = row<span style="color: black;">&#91;</span><span style="color: #ff4500;">3</span><span style="color: black;">&#93;</span>
            cells<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span><span style="color: black;">&#91;</span>j<span style="color: black;">&#93;</span> = <span style="color: black;">&#40;</span>x1,y1,x2,y2<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> cells
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> ocr_cell<span style="color: black;">&#40;</span>im, cells, x, y<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Return OCRed text from this cell&quot;&quot;&quot;</span>
    fbase = <span style="color: #483d8b;">&quot;working/%d-%d&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>x, y<span style="color: black;">&#41;</span>
    ftif = <span style="color: #483d8b;">&quot;%s.tif&quot;</span> <span style="color: #66cc66;">%</span> fbase
    ftxt = <span style="color: #483d8b;">&quot;%s.txt&quot;</span> <span style="color: #66cc66;">%</span> fbase
    <span style="color: #dc143c;">cmd</span> = <span style="color: #483d8b;">&quot;tesseract %s %s&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>ftif, fbase<span style="color: black;">&#41;</span>
    <span style="color: #808080; font-style: italic;"># extract cell from whole image, grayscale (1-color channel), monochrome</span>
    region = im.<span style="color: black;">crop</span><span style="color: black;">&#40;</span>cells<span style="color: black;">&#91;</span>x<span style="color: black;">&#93;</span><span style="color: black;">&#91;</span>y<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
    region = ImageOps.<span style="color: black;">grayscale</span><span style="color: black;">&#40;</span>region<span style="color: black;">&#41;</span>
    region = region.<span style="color: black;">point</span><span style="color: black;">&#40;</span><span style="color: #ff7700;font-weight:bold;">lambda</span> p: p <span style="color: #66cc66;">&gt;</span> <span style="color: #ff4500;">200</span> <span style="color: #ff7700;font-weight:bold;">and</span> <span style="color: #ff4500;">255</span><span style="color: black;">&#41;</span>
    <span style="color: #808080; font-style: italic;"># determine background color (most used color)</span>
    histo = region.<span style="color: black;">histogram</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">if</span> histo<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span> <span style="color: #66cc66;">&gt;</span> histo<span style="color: black;">&#91;</span><span style="color: #ff4500;">255</span><span style="color: black;">&#93;</span>: bgcolor = <span style="color: #ff4500;">0</span>
    <span style="color: #ff7700;font-weight:bold;">else</span>: bgcolor = <span style="color: #ff4500;">255</span>
    <span style="color: #808080; font-style: italic;"># trim borders by finding top-left and bottom-right bg pixels</span>
    pix = region.<span style="color: black;">load</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    x1,y1 = <span style="color: #ff4500;">0</span>,<span style="color: #ff4500;">0</span>
    x2,y2 = region.<span style="color: black;">size</span>
    x2,y2 = x2-<span style="color: #ff4500;">1</span>,y2-<span style="color: #ff4500;">1</span>
    <span style="color: #ff7700;font-weight:bold;">while</span> pix<span style="color: black;">&#91;</span>x1,y1<span style="color: black;">&#93;</span> <span style="color: #66cc66;">!</span>= bgcolor:
        x1 += <span style="color: #ff4500;">1</span>
        y1 += <span style="color: #ff4500;">1</span>
    <span style="color: #ff7700;font-weight:bold;">while</span> pix<span style="color: black;">&#91;</span>x2,y2<span style="color: black;">&#93;</span> <span style="color: #66cc66;">!</span>= bgcolor:
        x2 -= <span style="color: #ff4500;">1</span>
        y2 -= <span style="color: #ff4500;">1</span>
    <span style="color: #808080; font-style: italic;"># save as TIFF and extract text with Tesseract OCR</span>
    trimmed = region.<span style="color: black;">crop</span><span style="color: black;">&#40;</span><span style="color: black;">&#40;</span>x1,y1,x2,y2<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    trimmed.<span style="color: black;">save</span><span style="color: black;">&#40;</span>ftif, <span style="color: #483d8b;">&quot;TIFF&quot;</span><span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">subprocess</span>.<span style="color: black;">call</span><span style="color: black;">&#40;</span><span style="color: black;">&#91;</span><span style="color: #dc143c;">cmd</span><span style="color: black;">&#93;</span>, shell=<span style="color: #008000;">True</span>, stderr=<span style="color: #dc143c;">subprocess</span>.<span style="color: black;">PIPE</span><span style="color: black;">&#41;</span>
    lines = <span style="color: black;">&#91;</span>l.<span style="color: black;">strip</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> l <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">open</span><span style="color: black;">&#40;</span>ftxt<span style="color: black;">&#41;</span>.<span style="color: black;">readlines</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> lines<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> get_image_data<span style="color: black;">&#40;</span>filename<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Extract textual data[rows][cols] from spreadsheet-like image file&quot;&quot;&quot;</span>    
    im = Image.<span style="color: #008000;">open</span><span style="color: black;">&#40;</span>filename<span style="color: black;">&#41;</span>
    pix = im.<span style="color: black;">load</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    width, height = im.<span style="color: black;">size</span>
    hlines = get_hlines<span style="color: black;">&#40;</span>pix, width, height<span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">sys</span>.<span style="color: black;">stderr</span>.<span style="color: black;">write</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;%s: hlines: %d<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>filename, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>hlines<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    vlines = get_vlines<span style="color: black;">&#40;</span>pix, width, height<span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">sys</span>.<span style="color: black;">stderr</span>.<span style="color: black;">write</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;%s: vlines: %d<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>filename, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>vlines<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    rows = get_rows<span style="color: black;">&#40;</span>hlines<span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">sys</span>.<span style="color: black;">stderr</span>.<span style="color: black;">write</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;%s: rows: %d<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>filename, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>rows<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    cols = get_cols<span style="color: black;">&#40;</span>vlines<span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">sys</span>.<span style="color: black;">stderr</span>.<span style="color: black;">write</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;%s: cols: %d<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>filename, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>cols<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    cells = get_cells<span style="color: black;">&#40;</span>rows, cols<span style="color: black;">&#41;</span>
&nbsp;
    data = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> row <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #008000;">len</span><span style="color: black;">&#40;</span>rows<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
        data.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>ocr_cell<span style="color: black;">&#40;</span>im,cells, row, col<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> col <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #008000;">len</span><span style="color: black;">&#40;</span>cols<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span> 
    <span style="color: #ff7700;font-weight:bold;">return</span> data
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> split_pdf<span style="color: black;">&#40;</span>filename<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Split PDF into PNG pages, return filenames&quot;&quot;&quot;</span>
    prefix = filename<span style="color: black;">&#91;</span>:-<span style="color: #ff4500;">4</span><span style="color: black;">&#93;</span>
    <span style="color: #dc143c;">cmd</span> = <span style="color: #483d8b;">&quot;convert -density 600 %s working/%s-%%d.png&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>filename, prefix<span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">subprocess</span>.<span style="color: black;">call</span><span style="color: black;">&#40;</span><span style="color: black;">&#91;</span><span style="color: #dc143c;">cmd</span><span style="color: black;">&#93;</span>, shell=<span style="color: #008000;">True</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: black;">&#91;</span>f <span style="color: #ff7700;font-weight:bold;">for</span> f <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #dc143c;">glob</span>.<span style="color: #dc143c;">glob</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">os</span>.<span style="color: black;">path</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'working'</span>, <span style="color: #483d8b;">'%s*'</span> <span style="color: #66cc66;">%</span> prefix<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> extract_pdf<span style="color: black;">&#40;</span>filename<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Extract table data from pdf&quot;&quot;&quot;</span>
    pngfiles = split_pdf<span style="color: black;">&#40;</span>filename<span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">sys</span>.<span style="color: black;">stderr</span>.<span style="color: black;">write</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;Pages: %d<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>pngfiles<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    <span style="color: #808080; font-style: italic;"># extract table data from each page</span>
    data = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> pngfile <span style="color: #ff7700;font-weight:bold;">in</span> pngfiles:
        pngdata = get_image_data<span style="color: black;">&#40;</span>pngfile<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> d <span style="color: #ff7700;font-weight:bold;">in</span> pngdata:
            data.<span style="color: black;">append</span><span style="color: black;">&#40;</span>d<span style="color: black;">&#41;</span>
        <span style="color: #808080; font-style: italic;"># remove temp files for this page</span>
        <span style="color: #dc143c;">os</span>.<span style="color: black;">system</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;rm working/*.tif&quot;</span><span style="color: black;">&#41;</span>
        <span style="color: #dc143c;">os</span>.<span style="color: black;">system</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;rm working/*.txt&quot;</span><span style="color: black;">&#41;</span>
    <span style="color: #808080; font-style: italic;"># remove split pages</span>
    <span style="color: #dc143c;">os</span>.<span style="color: black;">system</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;rm working/*&quot;</span><span style="color: black;">&#41;</span>   
    <span style="color: #ff7700;font-weight:bold;">return</span> data
&nbsp;
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">'__main__'</span>:
    <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">sys</span>.<span style="color: black;">argv</span><span style="color: black;">&#41;</span> <span style="color: #66cc66;">!</span>= <span style="color: #ff4500;">2</span>:
        <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;Usage: ctocr.py FILENAME&quot;</span>
        exit<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    <span style="color: #808080; font-style: italic;"># split target pdf into pages</span>
    filename = <span style="color: #dc143c;">sys</span>.<span style="color: black;">argv</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>
    data = extract_pdf<span style="color: black;">&#40;</span>filename<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> row <span style="color: #ff7700;font-weight:bold;">in</span> data:
        <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\t</span>&quot;</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span>row<span style="color: black;">&#41;</span></pre></div></div>

<p>Anyhow, I think it is kinda fun. Since the OCR is not actually magic, some post-processing may be necessary. In particular, I&#8217;ve noticed &#8220;o&#8221; (the letter) in place of &#8220;0&#8243; (the number) sometimes, extra whitespace or oddly split words, and occasional wrong letters. But overall, the accuracy is still fantastic.</p>
<p>The usual caveats apply: use at your own risk, etc.</p>
]]></content:encoded>
			<wfw:commentRss>http://craiget.com/2011/09/extracting-table-data-from-pdfs-with-ocr/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Huh, Bitcoin = Pretty interesting</title>
		<link>http://craiget.com/2011/08/huh-bitcoin-pretty-interesting/</link>
		<comments>http://craiget.com/2011/08/huh-bitcoin-pretty-interesting/#comments</comments>
		<pubDate>Tue, 30 Aug 2011 13:12:21 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://craiget.com/?p=257</guid>
		<description><![CDATA[Read an interesting article on Ars Technica this morning. Looks like Bitcoin had already made the rounds earlier this summer, but I guess I missed it. http://arstechnica.com/tech-policy/news/2011/08/symantec-spots-malware-that-uses-your-gpu-to-mine-bitcoins.ars http://arstechnica.com/tech-policy/news/2011/06/bitcoin-inside-the-encrypted-peer-to-peer-currency.ars http://www.bitcoin.org Bitcoin is the first legitimate crypto-currency, an idea first suggested in &#8230; <a href="http://craiget.com/2011/08/huh-bitcoin-pretty-interesting/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Read an interesting article on Ars Technica this morning. Looks like Bitcoin had already made the rounds earlier this summer, but I guess I missed it. </p>
<ul>
<li><a href="http://arstechnica.com/tech-policy/news/2011/08/symantec-spots-malware-that-uses-your-gpu-to-mine-bitcoins.ars">http://arstechnica.com/tech-policy/news/2011/08/symantec-spots-malware-that-uses-your-gpu-to-mine-bitcoins.ars</a></li>
<li><a href="http://arstechnica.com/tech-policy/news/2011/06/bitcoin-inside-the-encrypted-peer-to-peer-currency.ars">http://arstechnica.com/tech-policy/news/2011/06/bitcoin-inside-the-encrypted-peer-to-peer-currency.ars</a></li>
<li><a href="http://www.bitcoin.org/">http://www.bitcoin.org</a></li>
</ul>
<p>Bitcoin is the first legitimate crypto-currency, an idea first suggested in <a href="http://en.wikipedia.org/wiki/Crypto-currency">1998</a>. It is unique in several ways:</p>
<p>First of all, it is (mostly) anonymous, just like cash. Mostly &#8211; because, like cash, it is not anonymous under conditions of physical surveillance or if either party is coerced.</p>
<p>Second, it eliminates the need for 3rd party payment processors like Paypal and even credit cards. In a traditional online transaction, the payment processor holds the secret account numbers for both parties and conducts the transaction. Under the bitcoin scheme, all transactions are published freely using public key cryptography to conceal the identities of both parties. This allows the economy to incorporate the transfer of money without needing an intermediate payment processor.</p>
<p>Also interesting is that the system is designed to be inflation-proof. Unlike a traditional national currency, bitcoin is controlled by an algorithm. There&#8217;s no central authority that can decide to increase the money supply and cause inflation. Instead, there is a fixed supply of 21M bitcoins which will be distributed at a geometrically decreasing rate. Each bitcoin can be subdivided, so as trading in single bitcoins becomes impractical, people can trade in millibitcoins and microbitcoins.</p>
<p>As a P2P network, the system relies on creating consensus between nodes and can be subverted if someone can muster enough computing resources to control more than half the network. In the age of massive botnets, that&#8217;s not unfeasible. Proponents argue that there&#8217;s no economic incentive, since subverting the network would ruin the saboteur&#8217;s own bitcoin investment. However, there still seems to be a risk from someone stealing the network just for laughs. </p>
<p>Another danger is that, at least anecdotally, bitcoin is being used to buy/sell illegal goods and services or for money laundering. That doesn&#8217;t bode well for its long-term viability. To be really useful, it needs some mainstream acceptance. A <a href="https://en.bitcoin.it/wiki/Trade">list of sites</a> that accept the currency looks mildly promising. </p>
<p>Anyway.. it seems like quite an interesting system &#8211; and very sci-fi. The system even comes with an anonymous inventor who designed the protocol and published the original paper under a pseudonym.</p>
<p>I&#8217;m not buying bitcoins just yet. But it would be neat to see something like this catch on.</p>
]]></content:encoded>
			<wfw:commentRss>http://craiget.com/2011/08/huh-bitcoin-pretty-interesting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Little Job Scraper</title>
		<link>http://craiget.com/2011/08/a-little-job-scraper/</link>
		<comments>http://craiget.com/2011/08/a-little-job-scraper/#comments</comments>
		<pubDate>Wed, 24 Aug 2011 12:40:32 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://craiget.com/?p=249</guid>
		<description><![CDATA[Often times you reach a point in a project where it is handy to have some real data. So today I wrote a little program to grab one page worth of Want Ads from the venerable Craigslist. Having served its &#8230; <a href="http://craiget.com/2011/08/a-little-job-scraper/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Often times you reach a point in a project where it is handy to have some real data. So today I wrote a little program to grab one page worth of Want Ads from the venerable Craigslist.</p>
<p>Having served its intended purpose, it seemed fun to tweak the program to keep track of new job postings on craigslist. So.. here&#8217;s that..</p>
<p>This program just reads the pages you specify and scans for any URLs it hasn&#8217;t seen before. If you run it via cron, say, once a day, it will give you the new postings for that day. Each new url is recorded, so it doesn&#8217;t notify you twice about the same job.</p>
<p>In python:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">urllib2</span>, <span style="color: #dc143c;">time</span>
<span style="color: #ff7700;font-weight:bold;">from</span> BeautifulSoup <span style="color: #ff7700;font-weight:bold;">import</span> BeautifulSoup
&nbsp;
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">sys</span>
<span style="color: #008000;">reload</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">sys</span><span style="color: black;">&#41;</span>
<span style="color: #dc143c;">sys</span>.<span style="color: black;">setdefaultencoding</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'utf-8'</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">socket</span>
<span style="color: #dc143c;">socket</span>.<span style="color: black;">setdefaulttimeout</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">5</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># pages to monitor</span>
categories = <span style="color: black;">&#91;</span>
    <span style="color: #483d8b;">&quot;http://knoxville.craigslist.org/sof/&quot;</span>,
    <span style="color: #483d8b;">&quot;http://knoxville.craigslist.org/eng&quot;</span>
<span style="color: black;">&#93;</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># data file for visited url list</span>
dat = <span style="color: #483d8b;">&quot;.cl.exclude&quot;</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># build list of urls already visited</span>
exclude = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
<span style="color: #ff7700;font-weight:bold;">try</span>:
    <span style="color: #ff7700;font-weight:bold;">for</span> line <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">open</span><span style="color: black;">&#40;</span>dat<span style="color: black;">&#41;</span>.<span style="color: black;">readlines</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
        exclude.<span style="color: black;">append</span><span style="color: black;">&#40;</span>line<span style="color: black;">&#91;</span>:-<span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">except</span>:
    <span style="color: #ff7700;font-weight:bold;">pass</span>
&nbsp;
&nbsp;
<span style="color: #808080; font-style: italic;"># get unseen urls from each category page</span>
urls = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
<span style="color: #ff7700;font-weight:bold;">for</span> category <span style="color: #ff7700;font-weight:bold;">in</span> categories:
    <span style="color: #ff7700;font-weight:bold;">try</span>:
        page = <span style="color: #dc143c;">urllib2</span>.<span style="color: black;">urlopen</span><span style="color: black;">&#40;</span>category<span style="color: black;">&#41;</span>
        soup = BeautifulSoup<span style="color: black;">&#40;</span>page<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> a <span style="color: #ff7700;font-weight:bold;">in</span> soup.<span style="color: black;">findAll</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'a'</span><span style="color: black;">&#41;</span>:
            <span style="color: #808080; font-style: italic;"># must be a url</span>
            <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> a.<span style="color: black;">has_key</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'href'</span><span style="color: black;">&#41;</span>: <span style="color: #ff7700;font-weight:bold;">continue</span>
            <span style="color: #808080; font-style: italic;"># must match current category (to exclude help pages/etc)</span>
            <span style="color: #ff7700;font-weight:bold;">if</span> a<span style="color: black;">&#91;</span><span style="color: #483d8b;">'href'</span><span style="color: black;">&#93;</span>.<span style="color: black;">find</span><span style="color: black;">&#40;</span>category<span style="color: black;">&#41;</span> == -<span style="color: #ff4500;">1</span>: <span style="color: #ff7700;font-weight:bold;">continue</span>
            <span style="color: #808080; font-style: italic;"># ok, keep this url</span>
            urls.<span style="color: black;">append</span><span style="color: black;">&#40;</span>a<span style="color: black;">&#91;</span><span style="color: #483d8b;">'href'</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">except</span> <span style="color: #008000;">Exception</span>, e:
        <span style="color: #ff7700;font-weight:bold;">raise</span> e
&nbsp;
<span style="color: #808080; font-style: italic;"># visit each url to get the title and content</span>
<span style="color: #ff7700;font-weight:bold;">for</span> url <span style="color: #ff7700;font-weight:bold;">in</span> urls:
    <span style="color: #808080; font-style: italic;"># skip if already seen</span>
    <span style="color: #ff7700;font-weight:bold;">if</span> a<span style="color: black;">&#91;</span><span style="color: #483d8b;">'href'</span><span style="color: black;">&#93;</span> <span style="color: #ff7700;font-weight:bold;">in</span> exclude: <span style="color: #ff7700;font-weight:bold;">continue</span>
    <span style="color: #ff7700;font-weight:bold;">try</span>:
        page = <span style="color: #dc143c;">urllib2</span>.<span style="color: black;">urlopen</span><span style="color: black;">&#40;</span>url<span style="color: black;">&#41;</span>
        soup = BeautifulSoup<span style="color: black;">&#40;</span>page<span style="color: black;">&#41;</span>
        title = soup.<span style="color: black;">find</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;title&quot;</span><span style="color: black;">&#41;</span>.<span style="color: #dc143c;">string</span>
        body = soup.<span style="color: black;">find</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;div&quot;</span>, <span style="color: black;">&#123;</span><span style="color: #483d8b;">&quot;id&quot;</span>: <span style="color: #483d8b;">&quot;userbody&quot;</span><span style="color: black;">&#125;</span><span style="color: black;">&#41;</span>.<span style="color: #dc143c;">string</span>
        <span style="color: #808080; font-style: italic;"># do something interesting here, like email the list to yourself</span>
        <span style="color: #ff7700;font-weight:bold;">print</span> url, title
    <span style="color: #ff7700;font-weight:bold;">except</span> <span style="color: #008000;">Exception</span>, e:
        <span style="color: #ff7700;font-weight:bold;">raise</span> e
    <span style="color: #808080; font-style: italic;"># scrape slowly</span>
    <span style="color: #dc143c;">time</span>.<span style="color: black;">sleep</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">10</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># write list of all urls from this time</span>
<span style="color: #808080; font-style: italic;"># note: there is no need to remember ALL the old urls since</span>
<span style="color: #808080; font-style: italic;"># the urls are unique and we aren't dealing with pagination </span>
<span style="color: #808080; font-style: italic;"># it is safe to forget urls that are past the first page of results</span>
fout = <span style="color: #008000;">open</span><span style="color: black;">&#40;</span>dat,<span style="color: #483d8b;">'w'</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">for</span> url <span style="color: #ff7700;font-weight:bold;">in</span> urls:
    fout.<span style="color: black;">write</span><span style="color: black;">&#40;</span>url+<span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: black;">&#41;</span>
fout.<span style="color: black;">close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>Obviously, scraping is potentially rude. This is pretty lightweight, since it only checks URLs it hasn&#8217;t seen before and waits 10 seconds between visits. Nevertheless, use at your own risk.</p>
<p>The best way to use this is probably tweaking it to email you about new jobs. I&#8217;ve omitted that code since it is:</p>
<ol>
<li>Pretty well documented elsewhere</li>
<li>Email originating from a home server will probably be rejected anyway</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://craiget.com/2011/08/a-little-job-scraper/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Yay Dash C</title>
		<link>http://craiget.com/2011/07/yay-dash-c/</link>
		<comments>http://craiget.com/2011/07/yay-dash-c/#comments</comments>
		<pubDate>Sun, 31 Jul 2011 13:45:42 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://craiget.com/?p=233</guid>
		<description><![CDATA[I think our internet is rate-limited. That&#8217;s annoying because I don&#8217;t do any of the stuff that maybe deserves it (looking at you BitTorrent!). I haven&#8217;t exactly quantified the problem yet, but the main symptom is a very reasonable rate &#8230; <a href="http://craiget.com/2011/07/yay-dash-c/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I think our internet is rate-limited. That&#8217;s annoying because <strong>I don&#8217;t do any of the stuff that maybe deserves it</strong> (looking at you BitTorrent!). I haven&#8217;t exactly quantified the problem yet, but the main symptom is a very reasonable rate of 300K or so dropping to 3K-5K after the first 1-2 Mb. Since many webpages are in the 1-2Mb range (or substantially smaller), it isn&#8217;t a big deal for regular browsing, but video becomes basically unwatchable. I&#8217;m not sure if the rate-limiting is on specific types of files (video) or everything.. or maybe I&#8217;m just imagining the whole thing.</p>
<p>Either way &#8211; Dialup is so 1999. Right?!</p>
<p>Thankfully, there&#8217;s <a href="http://rg3.github.com/youtube-dl/">youtube-dl</a>, which downloads youtube videos for offline viewing. Unfortunately, the rate-limiting is still problematic. After a couple of MB, the rate drops and the download effectively stops (and doesn&#8217;t appear to recover if you leave it running for awhile). Youtube-dl has a &#8220;-c&#8221; option (just like <a href="http://www.gnu.org/s/wget/">wget</a>) which tries to continue your previous download instead of starting over.</p>
<p>A totally garbage solution that works: just restart the download every 10 seconds until it&#8217;s done. You get the good rate for a few seconds and restart every time the rate drops. This works.. but doing it by hand is annoying (or unfeasible for a big file). A better solution is to have a script that runs youtube-dl automatically for 10 seconds, kills it, restarts it, and repeats until the file is completely downloaded.</p>
<p>So it would be nice to have a way to run a program for a certain number of seconds. People much smarter than me have already figured this out in the form of a bash script:</p>
<p><a href="http://www.bashcookbook.com/bashinfo/source/bash-4.0/examples/scripts/timeout3 ">http://www.bashcookbook.com/bashinfo/source/bash-4.0/examples/scripts/timeout3<br />
</a></p>
<p>You can use it like this:</p>
<pre>
timeout 10 youtubedl -c "url_of_youtube_video"
</pre>
<p>So that works, now just wrap it up in a loop. 10 tries is probably enough to get a video. I know there are smarter ways to check for completion, but I&#8217;m pretty lazy and this is good enough:</p>
<pre>
for i in {1..10}
do
  timeout 10 youtubedl -c "url_of_youtube_video"
done
</pre>
<p>Not exactly as good as just watching videos in the browser, but it resolves my frustration anyway.</p>
]]></content:encoded>
			<wfw:commentRss>http://craiget.com/2011/07/yay-dash-c/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Overton Window</title>
		<link>http://craiget.com/2011/07/the-overton-window/</link>
		<comments>http://craiget.com/2011/07/the-overton-window/#comments</comments>
		<pubDate>Tue, 26 Jul 2011 11:55:48 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://craiget.com/?p=183</guid>
		<description><![CDATA[I love finding out that some fluttering thought has a proper name. Reasonable people should agree that simply having two sides to an issue doesn&#8217;t make them equally correct. If you disagree, just take any issue you feel strongly about, &#8230; <a href="http://craiget.com/2011/07/the-overton-window/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I love finding out that some fluttering thought has a proper name.</p>
<p>Reasonable people should agree that simply having two sides to an issue doesn&#8217;t make them equally correct. If you disagree, just take any issue you feel strongly about, consider the polar opposing, and decide if the you would agree to the 50-50 compromise. You would? Okay, well move the other viewpoint one step towards the extreme. Would you still agree? Certainly not &#8211; 50-50 became 40-60 &#8211; the former &#8220;compromise&#8221; now favors your opponent. </p>
<p>Suppose I argue that a triangle has 5 sides.</p>
<p>You say 3.</p>
<p>Should we compromise on 4?</p>
<p>What if I say 10? Is the number of sides of a triangle even up for debate?</p>
<p>That&#8217;s the essence of the <a href="http://en.wikipedia.org/wiki/Overton_window">Overton Window</a> &#8211; the range of beliefs that reasonable people can hold on a topic.</p>
<p>The difficulty lies in the <a href="http://en.wikipedia.org/wiki/Argument_to_moderation">Argument to Moderation</a>, a fallacy that, given two extremes, the truth necessarily lies in the middle. Proponents of a particular viewpoint can manipulate the Overton Window by adopting values more extreme than their actual beliefs. As a result, the apparent middle ground shifts, changing the whole debate.</p>
<p><strong>Not exactly a revelation</strong> &#8211; people manipulate each other and the public opinion.</p>
<p>I was just intrigued that there&#8217;s a term that particular phenomenon.</p>
<p>A few related links:</p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Overton_window">The Overton Window</a> (Wikipedia)</li>
<li><a href="http://skeptics.stackexchange.com/questions/5063/can-the-overton-window-be-deliberately-moved-by-espousing-extremist-views">Can the Overton Window be deliberately moved?</a> (stackexchange)</li>
<li><a href="http://diveintomark.org/archives/2006/08/23/overton-window">W3C and the Overton Window</a> (regarding web standards)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://craiget.com/2011/07/the-overton-window/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Web Font Picker</title>
		<link>http://craiget.com/2011/07/web-font-picker/</link>
		<comments>http://craiget.com/2011/07/web-font-picker/#comments</comments>
		<pubDate>Fri, 22 Jul 2011 14:53:31 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[javascript]]></category>

		<guid isPermaLink="false">http://craiget.com/?p=170</guid>
		<description><![CDATA[Google Web Fonts is kinda awesome. If you haven&#8217;t checked it out already &#8211; basically it gives you a ton of new font choices that still degrade gracefully for older browsers. All you have to do is add a stylesheet &#8230; <a href="http://craiget.com/2011/07/web-font-picker/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.google.com/webfonts">Google Web Fonts</a> is kinda awesome. If you haven&#8217;t checked it out already &#8211; basically it gives you a ton of new font choices that still degrade gracefully for older browsers. All you have to do is add a stylesheet to your page and specify the &#8216;font-family&#8217;. It truly couldn&#8217;t be easier. Also.. yay, free!</p>
<p>One annoyance is the workflow. You have to look at the collection, edit your css and/or webpages, reload, repeat. (However, the fonts are available for download, if you use Photoshop/Gimp/etc to design your pages).</p>
<p>The following code lets you change fonts on the fly by adding a little dropdown box to the top right corner. When you doubleclick anything on the page, it will be styled with the chosen font. The code is a bit ugly since it&#8217;s just the first thing that came to mind. Nevertheless, I think it&#8217;s kinda neat for experimenting.</p>

<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #006600; font-style: italic;">// list of fonts to try</span>
<span style="color: #003366; font-weight: bold;">var</span> families <span style="color: #339933;">=</span> <span style="color: #009900;">&#91;</span><span style="color: #3366CC;">'Yellowtail'</span><span style="color: #339933;">,</span><span style="color: #3366CC;">'Astigmatic'</span><span style="color: #339933;">,</span><span style="color: #3366CC;">'Leckerli One'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
<span style="color: #006600; font-style: italic;">// build the dropdown box</span>
$<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'body'</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">append</span><span style="color: #009900;">&#40;</span>$<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'&lt;select id=&quot;fontpicker&quot;&gt;&lt;/select&gt;'</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000066; font-weight: bold;">for</span><span style="color: #009900;">&#40;</span><span style="color: #003366; font-weight: bold;">var</span> i<span style="color: #339933;">=</span><span style="color: #CC0000;">0</span><span style="color: #339933;">;</span> i<span style="color: #339933;">&lt;</span>families.<span style="color: #660066;">length</span><span style="color: #339933;">;</span> i<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    $<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'#fontpicker'</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">append</span><span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'&lt;option value=&quot;'</span><span style="color: #339933;">+</span>families<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span><span style="color: #339933;">+</span><span style="color: #3366CC;">'&quot;&gt;'</span><span style="color: #339933;">+</span>families<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span><span style="color: #339933;">+</span><span style="color: #3366CC;">'&lt;/option&gt;'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
$<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'#fontpicker'</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">css</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#123;</span><span style="color: #3366CC;">'position'</span><span style="color: #339933;">:</span> <span style="color: #3366CC;">'absolute'</span><span style="color: #339933;">,</span><span style="color: #3366CC;">'top'</span><span style="color: #339933;">:</span> <span style="color: #3366CC;">'0px'</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">'left'</span><span style="color: #339933;">:</span> <span style="color: #3366CC;">'0px'</span><span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #006600; font-style: italic;">// bind doubleclick on every element</span>
$<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'*'</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">live</span><span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'dblclick'</span><span style="color: #339933;">,</span> <span style="color: #003366; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #003366; font-weight: bold;">var</span> family <span style="color: #339933;">=</span> $<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'#fontpicker'</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">val</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #003366; font-weight: bold;">var</span> href <span style="color: #339933;">=</span> <span style="color: #3366CC;">&quot;http://fonts.googleapis.com/css?family=&quot;</span><span style="color: #339933;">+</span>family<span style="color: #339933;">+</span><span style="color: #3366CC;">&quot;&amp;v2&quot;</span><span style="color: #339933;">;</span>
    <span style="color: #003366; font-weight: bold;">var</span> stylesheet <span style="color: #339933;">=</span> <span style="color: #3366CC;">&quot;&lt;link href='http://fonts.googleapis.com/css?family=&quot;</span><span style="color: #339933;">+</span>family<span style="color: #339933;">+</span><span style="color: #3366CC;">&quot;&amp;v2' rel='stylesheet' type='text/css'&gt;&quot;</span><span style="color: #339933;">;</span>
    $<span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">this</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">css</span><span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'font-family'</span><span style="color: #339933;">,</span> family<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #006600; font-style: italic;">// try not to load the same stylesheet twice</span>
    <span style="color: #003366; font-weight: bold;">var</span> found <span style="color: #339933;">=</span> <span style="color: #CC0000;">0</span><span style="color: #339933;">;</span>
    $<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">&quot;head link[rel='stylesheet']&quot;</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">each</span><span style="color: #009900;">&#40;</span><span style="color: #003366; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #000066; font-weight: bold;">if</span><span style="color: #009900;">&#40;</span>$<span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">this</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">attr</span><span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'href'</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">==</span> href<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
            found <span style="color: #339933;">=</span> <span style="color: #CC0000;">1</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000066; font-weight: bold;">if</span><span style="color: #009900;">&#40;</span>found <span style="color: #339933;">==</span> <span style="color: #CC0000;">0</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        $<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'head'</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">append</span><span style="color: #009900;">&#40;</span>stylesheet<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>There&#8217;s one big problem still, which is that you need to specify WHICH fonts you want to make available to the switcher. Better than doing it one at a time, but not as good as pulling the complete list from Google. I&#8217;m not sure of a super great way to achieve that, but maybe something to consider.</p>
]]></content:encoded>
			<wfw:commentRss>http://craiget.com/2011/07/web-font-picker/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

