<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Craige&#039;s Programming Stuff</title>
	<atom:link href="http://craiget.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://craiget.com</link>
	<description>Misc programming notes</description>
	<lastBuildDate>Tue, 22 Nov 2011 00:23:18 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2</generator>
		<item>
		<title>Your Android Developer Account Will Live Forever</title>
		<link>http://craiget.com/2011/11/your-android-developer-account-will-live-forever/</link>
		<comments>http://craiget.com/2011/11/your-android-developer-account-will-live-forever/#comments</comments>
		<pubDate>Tue, 22 Nov 2011 00:21:30 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://craiget.com/?p=333</guid>
		<description><![CDATA[I&#8217;ve been shutting down my app business since it doesn&#8217;t really make enough money to be worth the hassle of properly running a business, filing taxes, etc.. Part of that process is closing all my accounts, including the Android Developer &#8230; <a href="http://craiget.com/2011/11/your-android-developer-account-will-live-forever/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been shutting down my app business since it doesn&#8217;t really make enough money to be worth the hassle of properly running a business, filing taxes, etc.. Part of that process is closing all my accounts, including the Android Developer account. Well, apparently that is not possible. Following a couple rounds of emails to Google, they say the account cannot be archived or deleted. The best you can do is to unpublish all your apps and change your password to something random. I was met with a similar surprise at the end of last year when I tried to close an account with an advertiser &#8211; &#8220;YOU WANT TO DO WHAT?!?&#8221; &#8211; they have thousands of clients, but apparently no one had ever asked to close their account before.</p>
<p>Uncool.</p>
]]></content:encoded>
			<wfw:commentRss>http://craiget.com/2011/11/your-android-developer-account-will-live-forever/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Automatically Save HTML Of Every Page You Visit</title>
		<link>http://craiget.com/2011/10/automatically-save-html-of-every-page-you-visit/</link>
		<comments>http://craiget.com/2011/10/automatically-save-html-of-every-page-you-visit/#comments</comments>
		<pubDate>Sun, 16 Oct 2011 15:26:02 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[chrome]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[jquery]]></category>

		<guid isPermaLink="false">http://craiget.com/?p=291</guid>
		<description><![CDATA[For the last couple of weeks, I&#8217;ve been thinking about the best way to capture the HTML of every webpage I visit. Sure, you can always write a screen scraper or bot, but I guess I wanted something a little &#8230; <a href="http://craiget.com/2011/10/automatically-save-html-of-every-page-you-visit/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>For the last couple of weeks, I&#8217;ve been thinking about the best way to capture the HTML of every webpage I visit. Sure, you can always write a screen scraper or bot, but I guess I wanted something a little more organic.</p>
<p>The right answer to this problem is probably: caching proxy. Alternatively, tcpdump or some cleverness with copying the temporary files from the browser cache might also work. However, I think there&#8217;s a strong case for using the browser directly: first, you get nice cleaned-up HTML, and second, you get javascript execution (handy if there is ajax stuff on the page or if you want to use jQuery for pre-processing the HTML).</p>
<p>Basically, you want this:</p>
<ul>
<li>Load a webpage as normal</li>
<li>Inject an additional script to..</li>
<li>Grab the DOM as a string</li>
<li>POST to a webserver to save it for processing later (avoiding cross-domain rules)</li>
</ul>
<p>In Firefox you&#8217;ve got Greasemonkey and User Scripts. These work in Chrome too, but it seems like the cross-domain restriction may be problematic. I didn&#8217;t investigate too much further after reading that there <strong>might</strong> be a problem. Happily, if you write a proper full-on Chrome Extension, you can specify exceptions to the cross-domain rules.</p>
<p>So, following is the script I pieced together this morning. It&#8217;s a chrome extension that grabs the source of every page you load (using jQuery&#8217;s DOM methods). Then it POSTs to your local webserver. My example below is pretty minimal just to demonstrate that it works. Maybe someday I&#8217;ll package it as a real extension, make it configurable and release it, but, you know, probably not.</p>
<p>Use at your own risk and all the usual disclaimers. Also, you should probably lock down the <strong>permissions</strong> and <strong>matches</strong> attributes to only run on your local server against the pages you&#8217;re interested in.</p>
<p><strong>manifest.json</strong></p>

<div class="wp_syntax"><div class="code"><pre class="json" style="font-family:monospace;">{
  &quot;name&quot;: &quot;Capture HTML and POST to local server&quot;,
  &quot;version&quot;: &quot;0.0.1&quot;,
  &quot;description&quot;: &quot;Capture HTML and POST to local server&quot;,
  &quot;permissions&quot;: [
    &quot;http://*/*&quot;
  ],
  &quot;content_scripts&quot;: [
    {
      &quot;matches&quot;: [&quot;http://*/*&quot;],
      &quot;js&quot; : [&quot;jquery.min.js&quot;,&quot;contentscript.js&quot;],
      &quot;run at&quot;:&quot;document_end&quot;
    }
  ],
  &quot;background_page&quot;: &quot;background.html&quot;
}</pre></div></div>

<p><strong>contentscript.js</strong></p>

<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #003366; font-weight: bold;">function</span> captureHTML<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #003366; font-weight: bold;">var</span> html <span style="color: #339933;">=</span> <span style="color: #3366CC;">'&lt;html&gt;'</span> <span style="color: #339933;">+</span> $<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'html'</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">html</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> <span style="color: #3366CC;">'&lt;/html&gt;'</span><span style="color: #339933;">;</span>
    chrome.<span style="color: #660066;">extension</span>.<span style="color: #660066;">sendRequest</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#123;</span>html<span style="color: #339933;">:</span> html<span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span> <span style="color: #003366; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span>response<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #000066;">alert</span><span style="color: #009900;">&#40;</span>response.<span style="color: #660066;">result</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
captureHTML<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p><strong>background.html</strong></p>

<div class="wp_syntax"><div class="code"><pre class="html" style="font-family:monospace;">&lt;html&gt;
&lt;head&gt;
&lt;script type=&quot;text/javascript&quot; src=&quot;jquery.min.js&quot;&gt;&lt;/script&gt;
&lt;script type=&quot;text/javascript&quot;&gt;// &lt;![CDATA[
&nbsp;
    chrome.extension.onRequest.addListener(
        function(request, sender, sendResponse) {
            var html = request.html;
            var url = 'http://localhost/recv.php';
            var data = {html:html};
            $.post(url, data, function(result) {
                sendResponse({result: result});                    
            });
    });
&nbsp;
// ]]&gt;&lt;/script&gt;
&lt;/head&gt;
&lt;/html&gt;</pre></div></div>

<p><strong>recv.php</strong></p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;?php</span>
<span style="color: #000088;">$html</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$_POST</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'html'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$result</span> <span style="color: #339933;">=</span> <span style="color: #990000;">strlen</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$html</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">echo</span> <span style="color: #009900;">&#40;</span><span style="color: #000088;">$result</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #990000;">error_log</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$html</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>Also, you will need to download a copy of the latest minimized jQuery and save it into the extension folder as jquery.min.js. The PHP receiver needs to go somewhere on your local server and be sure to set the matching path in background.html.</p>
<p>So it seems to work. I think it&#8217;s kinda fun. If you know a better way to do this, please let me know.</p>
<p>Resources:</p>
<ul>
<li><a href="http://code.google.com/chrome/extensions/samples.html#script">http://code.google.com/chrome/extensions/samples.html#script</a></li>
<li><a href="http://stackoverflow.com/questions/2588513/why-doesnt-jquery-work-in-chrome-user-scripts-greasemonkey">http://stackoverflow.com/questions/2588513/why-doesnt-jquery-work-in-chrome-user-scripts-greasemonkey</a></li>
<li><a href="http://blog.michael-forster.de/2009/08/using-jquery-to-build-google-chrome.html">http://blog.michael-forster.de/2009/08/using-jquery-to-build-google-chrome.html</a></li>
<li><a href="http://code.google.com/chrome/extensions/messaging.html">http://code.google.com/chrome/extensions/messaging.html</a></li>
</ul>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://craiget.com/2011/10/automatically-save-html-of-every-page-you-visit/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Extracting Table Data From PDFs with OCR</title>
		<link>http://craiget.com/2011/09/extracting-table-data-from-pdfs-with-ocr/</link>
		<comments>http://craiget.com/2011/09/extracting-table-data-from-pdfs-with-ocr/#comments</comments>
		<pubDate>Thu, 01 Sep 2011 01:44:22 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[ocr]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://craiget.com/?p=266</guid>
		<description><![CDATA[PDF is the ideal format for things you don&#8217;t want anybody to read. Kidding.. sort of.. I am a bit biased against PDFs. Though I reluctantly admit their usefulness in a very few situations, mostly they&#8217;re just annoying. For a &#8230; <a href="http://craiget.com/2011/09/extracting-table-data-from-pdfs-with-ocr/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>PDF is the ideal format for things you don&#8217;t want anybody to read.</p>
<p>Kidding.. sort of.. I am a bit biased against PDFs. Though I reluctantly admit their usefulness in a very few situations, mostly they&#8217;re just annoying. For a recent project, I wanted to extract a <strong>bunch</strong> of data from PDF documents (several hundred pages). All the data was nicely arranged in table format, as if it had been exported from Excel or something. Why the original Excel documents were not made available remains a mystery. Unfortunately, <em>Select All &#8211; Copy &#8211; Paste</em> completely mangled the text, but happily, it was possible to wrangle the data from the PDFs via OCR and some Python scripting.</p>
<p>The script below works like this:</p>
<ul>
<li>Take a PDF file</li>
<li>Split it into separate pages</li>
<li>Convert each page into an image file (pixels)</li>
<li>Locate the horizontal and vertical lines on each page (long runs of black pixels)</li>
<li>Segment the image into cells using the line coordinates</li>
<li>Clean up each cell (remove borders, threshold to black and white)</li>
<li>Perform OCR on each cell</li>
<li>Assemble results into a 2D array</li>
</ul>
<p>Optical Character Recognition is pretty amazing stuff, but it isn&#8217;t always perfect. To get the best possible results, it helps to use the cleanest input you can. In my initial experiments, I found that performing OCR on the entire document actually worked pretty well as long as I removed the cell borders (long horizontal and vertical lines). However, the software compressed all whitespace into a single empty space. Since my input documents had multiple columns with several words in each column, the cell boundaries were getting lost. Retaining the relationship between cells was very important, so one possible solution was to draw a unique character, like &#8220;^&#8221; on each cell boundary &#8211; something the OCR would still recognize and that I could use later to split the resulting strings.</p>
<p>Instead, I decided to OCR each cell individually. While slower, this seemed cleaner, more flexible, and easier to debug.</p>
<p>So here&#8217;s the code, there are a few dependencies:</p>
<ul>
<li>Recent-ish Python</li>
<li><a href="http://www.pythonware.com/products/pil/">PIL</a> (Python Imaging Library)</li>
<li><a href="http://code.google.com/p/tesseract-ocr/">Tesseract OCR</a> (I am using v3, but I think v2 will work too)</li>
<li><a href="http://www.imagemagick.org/">ImageMagick</a> (to split PDFs into multiple pages)</li>
</ul>
<p>It is slightly tuned to the particular files I was interested in (for example, it expects the cell borders to be solid black). It is also pretty slow &#8211; so if you need to process a massive number of pages, this won&#8217;t work for you. Also, it expects to operate in the directory you run it from and it expects there to be a subdirectory called &#8220;working&#8221; for temporary files. I suppose I should make the script do that automatically.. lazy, I guess..</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> Image, ImageOps
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">subprocess</span>, <span style="color: #dc143c;">sys</span>, <span style="color: #dc143c;">os</span>, <span style="color: #dc143c;">glob</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># minimum run of adjacent pixels to call something a line</span>
H_THRESH = <span style="color: #ff4500;">300</span>
V_THRESH = <span style="color: #ff4500;">300</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> get_hlines<span style="color: black;">&#40;</span>pix, w, h<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Get start/end pixels of lines containing horizontal runs of at least THRESH black pix&quot;&quot;&quot;</span>
    hlines = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> y <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span>h<span style="color: black;">&#41;</span>:
        x1, x2 = <span style="color: black;">&#40;</span><span style="color: #008000;">None</span>, <span style="color: #008000;">None</span><span style="color: black;">&#41;</span>
        black = <span style="color: #ff4500;">0</span>
        run = <span style="color: #ff4500;">0</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> x <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span>w<span style="color: black;">&#41;</span>:
            <span style="color: #ff7700;font-weight:bold;">if</span> pix<span style="color: black;">&#91;</span>x,y<span style="color: black;">&#93;</span> == <span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>,<span style="color: #ff4500;">0</span>,<span style="color: #ff4500;">0</span><span style="color: black;">&#41;</span>:
                black = black + <span style="color: #ff4500;">1</span>
                <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> x1: x1 = x
                x2 = x
            <span style="color: #ff7700;font-weight:bold;">else</span>:
                <span style="color: #ff7700;font-weight:bold;">if</span> black <span style="color: #66cc66;">&gt;</span> run:
                    run = black
                black = <span style="color: #ff4500;">0</span>
        <span style="color: #ff7700;font-weight:bold;">if</span> run <span style="color: #66cc66;">&gt;</span> H_THRESH:
            hlines.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: black;">&#40;</span>x1,y,x2,y<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> hlines
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> get_vlines<span style="color: black;">&#40;</span>pix, w, h<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Get start/end pixels of lines containing vertical runs of at least THRESH black pix&quot;&quot;&quot;</span>
    vlines = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> x <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span>w<span style="color: black;">&#41;</span>:
        y1, y2 = <span style="color: black;">&#40;</span><span style="color: #008000;">None</span>,<span style="color: #008000;">None</span><span style="color: black;">&#41;</span>
        black = <span style="color: #ff4500;">0</span>
        run = <span style="color: #ff4500;">0</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> y <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span>h<span style="color: black;">&#41;</span>:
            <span style="color: #ff7700;font-weight:bold;">if</span> pix<span style="color: black;">&#91;</span>x,y<span style="color: black;">&#93;</span> == <span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>,<span style="color: #ff4500;">0</span>,<span style="color: #ff4500;">0</span><span style="color: black;">&#41;</span>:
                black = black + <span style="color: #ff4500;">1</span>
                <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> y1: y1 = y
                y2 = y
            <span style="color: #ff7700;font-weight:bold;">else</span>:
                <span style="color: #ff7700;font-weight:bold;">if</span> black <span style="color: #66cc66;">&gt;</span> run:
                    run = black
                black = <span style="color: #ff4500;">0</span>
        <span style="color: #ff7700;font-weight:bold;">if</span> run <span style="color: #66cc66;">&gt;</span> V_THRESH:
            vlines.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: black;">&#40;</span>x,y1,x,y2<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> vlines
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> get_cols<span style="color: black;">&#40;</span>vlines<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Get top-left and bottom-right coordinates for each column from a list of vertical lines&quot;&quot;&quot;</span>
    cols = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span>, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>vlines<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">if</span> vlines<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span> - vlines<span style="color: black;">&#91;</span>i-<span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span> <span style="color: #66cc66;">&gt;</span> <span style="color: #ff4500;">1</span>:
            cols.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: black;">&#40;</span>vlines<span style="color: black;">&#91;</span>i-<span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>,vlines<span style="color: black;">&#91;</span>i-<span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>,vlines<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span>,vlines<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">3</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> cols
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> get_rows<span style="color: black;">&#40;</span>hlines<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Get top-left and bottom-right coordinates for each row from a list of vertical lines&quot;&quot;&quot;</span>
    rows = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> i <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1</span>, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>hlines<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">if</span> hlines<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span> - hlines<span style="color: black;">&#91;</span>i-<span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">3</span><span style="color: black;">&#93;</span> <span style="color: #66cc66;">&gt;</span> <span style="color: #ff4500;">1</span>:
            rows.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: black;">&#40;</span>hlines<span style="color: black;">&#91;</span>i-<span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>,hlines<span style="color: black;">&#91;</span>i-<span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>,hlines<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span>,hlines<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">3</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> rows          
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> get_cells<span style="color: black;">&#40;</span>rows, cols<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Get top-left and bottom-right coordinates for each cell usings row and column coordinates&quot;&quot;&quot;</span>
    cells = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> i, row <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">enumerate</span><span style="color: black;">&#40;</span>rows<span style="color: black;">&#41;</span>:
        cells.<span style="color: black;">setdefault</span><span style="color: black;">&#40;</span>i, <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> j, col <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">enumerate</span><span style="color: black;">&#40;</span>cols<span style="color: black;">&#41;</span>:
            x1 = col<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
            y1 = row<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>
            x2 = col<span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span>
            y2 = row<span style="color: black;">&#91;</span><span style="color: #ff4500;">3</span><span style="color: black;">&#93;</span>
            cells<span style="color: black;">&#91;</span>i<span style="color: black;">&#93;</span><span style="color: black;">&#91;</span>j<span style="color: black;">&#93;</span> = <span style="color: black;">&#40;</span>x1,y1,x2,y2<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> cells
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> ocr_cell<span style="color: black;">&#40;</span>im, cells, x, y<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Return OCRed text from this cell&quot;&quot;&quot;</span>
    fbase = <span style="color: #483d8b;">&quot;working/%d-%d&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>x, y<span style="color: black;">&#41;</span>
    ftif = <span style="color: #483d8b;">&quot;%s.tif&quot;</span> <span style="color: #66cc66;">%</span> fbase
    ftxt = <span style="color: #483d8b;">&quot;%s.txt&quot;</span> <span style="color: #66cc66;">%</span> fbase
    <span style="color: #dc143c;">cmd</span> = <span style="color: #483d8b;">&quot;tesseract %s %s&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>ftif, fbase<span style="color: black;">&#41;</span>
    <span style="color: #808080; font-style: italic;"># extract cell from whole image, grayscale (1-color channel), monochrome</span>
    region = im.<span style="color: black;">crop</span><span style="color: black;">&#40;</span>cells<span style="color: black;">&#91;</span>x<span style="color: black;">&#93;</span><span style="color: black;">&#91;</span>y<span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
    region = ImageOps.<span style="color: black;">grayscale</span><span style="color: black;">&#40;</span>region<span style="color: black;">&#41;</span>
    region = region.<span style="color: black;">point</span><span style="color: black;">&#40;</span><span style="color: #ff7700;font-weight:bold;">lambda</span> p: p <span style="color: #66cc66;">&gt;</span> <span style="color: #ff4500;">200</span> <span style="color: #ff7700;font-weight:bold;">and</span> <span style="color: #ff4500;">255</span><span style="color: black;">&#41;</span>
    <span style="color: #808080; font-style: italic;"># determine background color (most used color)</span>
    histo = region.<span style="color: black;">histogram</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">if</span> histo<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span> <span style="color: #66cc66;">&gt;</span> histo<span style="color: black;">&#91;</span><span style="color: #ff4500;">255</span><span style="color: black;">&#93;</span>: bgcolor = <span style="color: #ff4500;">0</span>
    <span style="color: #ff7700;font-weight:bold;">else</span>: bgcolor = <span style="color: #ff4500;">255</span>
    <span style="color: #808080; font-style: italic;"># trim borders by finding top-left and bottom-right bg pixels</span>
    pix = region.<span style="color: black;">load</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    x1,y1 = <span style="color: #ff4500;">0</span>,<span style="color: #ff4500;">0</span>
    x2,y2 = region.<span style="color: black;">size</span>
    x2,y2 = x2-<span style="color: #ff4500;">1</span>,y2-<span style="color: #ff4500;">1</span>
    <span style="color: #ff7700;font-weight:bold;">while</span> pix<span style="color: black;">&#91;</span>x1,y1<span style="color: black;">&#93;</span> <span style="color: #66cc66;">!</span>= bgcolor:
        x1 += <span style="color: #ff4500;">1</span>
        y1 += <span style="color: #ff4500;">1</span>
    <span style="color: #ff7700;font-weight:bold;">while</span> pix<span style="color: black;">&#91;</span>x2,y2<span style="color: black;">&#93;</span> <span style="color: #66cc66;">!</span>= bgcolor:
        x2 -= <span style="color: #ff4500;">1</span>
        y2 -= <span style="color: #ff4500;">1</span>
    <span style="color: #808080; font-style: italic;"># save as TIFF and extract text with Tesseract OCR</span>
    trimmed = region.<span style="color: black;">crop</span><span style="color: black;">&#40;</span><span style="color: black;">&#40;</span>x1,y1,x2,y2<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    trimmed.<span style="color: black;">save</span><span style="color: black;">&#40;</span>ftif, <span style="color: #483d8b;">&quot;TIFF&quot;</span><span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">subprocess</span>.<span style="color: black;">call</span><span style="color: black;">&#40;</span><span style="color: black;">&#91;</span><span style="color: #dc143c;">cmd</span><span style="color: black;">&#93;</span>, shell=<span style="color: #008000;">True</span>, stderr=<span style="color: #dc143c;">subprocess</span>.<span style="color: black;">PIPE</span><span style="color: black;">&#41;</span>
    lines = <span style="color: black;">&#91;</span>l.<span style="color: black;">strip</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> l <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">open</span><span style="color: black;">&#40;</span>ftxt<span style="color: black;">&#41;</span>.<span style="color: black;">readlines</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> lines<span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> get_image_data<span style="color: black;">&#40;</span>filename<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Extract textual data[rows][cols] from spreadsheet-like image file&quot;&quot;&quot;</span>    
    im = Image.<span style="color: #008000;">open</span><span style="color: black;">&#40;</span>filename<span style="color: black;">&#41;</span>
    pix = im.<span style="color: black;">load</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    width, height = im.<span style="color: black;">size</span>
    hlines = get_hlines<span style="color: black;">&#40;</span>pix, width, height<span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">sys</span>.<span style="color: black;">stderr</span>.<span style="color: black;">write</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;%s: hlines: %d<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>filename, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>hlines<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    vlines = get_vlines<span style="color: black;">&#40;</span>pix, width, height<span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">sys</span>.<span style="color: black;">stderr</span>.<span style="color: black;">write</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;%s: vlines: %d<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>filename, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>vlines<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    rows = get_rows<span style="color: black;">&#40;</span>hlines<span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">sys</span>.<span style="color: black;">stderr</span>.<span style="color: black;">write</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;%s: rows: %d<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>filename, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>rows<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    cols = get_cols<span style="color: black;">&#40;</span>vlines<span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">sys</span>.<span style="color: black;">stderr</span>.<span style="color: black;">write</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;%s: cols: %d<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>filename, <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>cols<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    cells = get_cells<span style="color: black;">&#40;</span>rows, cols<span style="color: black;">&#41;</span>
&nbsp;
    data = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> row <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #008000;">len</span><span style="color: black;">&#40;</span>rows<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>:
        data.<span style="color: black;">append</span><span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>ocr_cell<span style="color: black;">&#40;</span>im,cells, row, col<span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">for</span> col <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">range</span><span style="color: black;">&#40;</span><span style="color: #008000;">len</span><span style="color: black;">&#40;</span>cols<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span> 
    <span style="color: #ff7700;font-weight:bold;">return</span> data
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> split_pdf<span style="color: black;">&#40;</span>filename<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Split PDF into PNG pages, return filenames&quot;&quot;&quot;</span>
    prefix = filename<span style="color: black;">&#91;</span>:-<span style="color: #ff4500;">4</span><span style="color: black;">&#93;</span>
    <span style="color: #dc143c;">cmd</span> = <span style="color: #483d8b;">&quot;convert -density 600 %s working/%s-%%d.png&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: black;">&#40;</span>filename, prefix<span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">subprocess</span>.<span style="color: black;">call</span><span style="color: black;">&#40;</span><span style="color: black;">&#91;</span><span style="color: #dc143c;">cmd</span><span style="color: black;">&#93;</span>, shell=<span style="color: #008000;">True</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: black;">&#91;</span>f <span style="color: #ff7700;font-weight:bold;">for</span> f <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #dc143c;">glob</span>.<span style="color: #dc143c;">glob</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">os</span>.<span style="color: black;">path</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'working'</span>, <span style="color: #483d8b;">'%s*'</span> <span style="color: #66cc66;">%</span> prefix<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">def</span> extract_pdf<span style="color: black;">&#40;</span>filename<span style="color: black;">&#41;</span>:
    <span style="color: #483d8b;">&quot;&quot;&quot;Extract table data from pdf&quot;&quot;&quot;</span>
    pngfiles = split_pdf<span style="color: black;">&#40;</span>filename<span style="color: black;">&#41;</span>
    <span style="color: #dc143c;">sys</span>.<span style="color: black;">stderr</span>.<span style="color: black;">write</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;Pages: %d<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span> <span style="color: #66cc66;">%</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span>pngfiles<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span>
    <span style="color: #808080; font-style: italic;"># extract table data from each page</span>
    data = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> pngfile <span style="color: #ff7700;font-weight:bold;">in</span> pngfiles:
        pngdata = get_image_data<span style="color: black;">&#40;</span>pngfile<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> d <span style="color: #ff7700;font-weight:bold;">in</span> pngdata:
            data.<span style="color: black;">append</span><span style="color: black;">&#40;</span>d<span style="color: black;">&#41;</span>
        <span style="color: #808080; font-style: italic;"># remove temp files for this page</span>
        <span style="color: #dc143c;">os</span>.<span style="color: black;">system</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;rm working/*.tif&quot;</span><span style="color: black;">&#41;</span>
        <span style="color: #dc143c;">os</span>.<span style="color: black;">system</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;rm working/*.txt&quot;</span><span style="color: black;">&#41;</span>
    <span style="color: #808080; font-style: italic;"># remove split pages</span>
    <span style="color: #dc143c;">os</span>.<span style="color: black;">system</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;rm working/*&quot;</span><span style="color: black;">&#41;</span>   
    <span style="color: #ff7700;font-weight:bold;">return</span> data
&nbsp;
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">'__main__'</span>:
    <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">len</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">sys</span>.<span style="color: black;">argv</span><span style="color: black;">&#41;</span> <span style="color: #66cc66;">!</span>= <span style="color: #ff4500;">2</span>:
        <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;Usage: ctocr.py FILENAME&quot;</span>
        exit<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
    <span style="color: #808080; font-style: italic;"># split target pdf into pages</span>
    filename = <span style="color: #dc143c;">sys</span>.<span style="color: black;">argv</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>
    data = extract_pdf<span style="color: black;">&#40;</span>filename<span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">for</span> row <span style="color: #ff7700;font-weight:bold;">in</span> data:
        <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\t</span>&quot;</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span>row<span style="color: black;">&#41;</span></pre></div></div>

<p>Anyhow, I think it is kinda fun. Since the OCR is not actually magic, some post-processing may be necessary. In particular, I&#8217;ve noticed &#8220;o&#8221; (the letter) in place of &#8220;0&#8243; (the number) sometimes, extra whitespace or oddly split words, and occasional wrong letters. But overall, the accuracy is still fantastic.</p>
<p>The usual caveats apply: use at your own risk, etc.</p>
]]></content:encoded>
			<wfw:commentRss>http://craiget.com/2011/09/extracting-table-data-from-pdfs-with-ocr/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Huh, Bitcoin = Pretty interesting</title>
		<link>http://craiget.com/2011/08/huh-bitcoin-pretty-interesting/</link>
		<comments>http://craiget.com/2011/08/huh-bitcoin-pretty-interesting/#comments</comments>
		<pubDate>Tue, 30 Aug 2011 13:12:21 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://craiget.com/?p=257</guid>
		<description><![CDATA[Read an interesting article on Ars Technica this morning. Looks like Bitcoin had already made the rounds earlier this summer, but I guess I missed it. http://arstechnica.com/tech-policy/news/2011/08/symantec-spots-malware-that-uses-your-gpu-to-mine-bitcoins.ars http://arstechnica.com/tech-policy/news/2011/06/bitcoin-inside-the-encrypted-peer-to-peer-currency.ars http://www.bitcoin.org Bitcoin is the first legitimate crypto-currency, an idea first suggested in &#8230; <a href="http://craiget.com/2011/08/huh-bitcoin-pretty-interesting/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Read an interesting article on Ars Technica this morning. Looks like Bitcoin had already made the rounds earlier this summer, but I guess I missed it. </p>
<ul>
<li><a href="http://arstechnica.com/tech-policy/news/2011/08/symantec-spots-malware-that-uses-your-gpu-to-mine-bitcoins.ars">http://arstechnica.com/tech-policy/news/2011/08/symantec-spots-malware-that-uses-your-gpu-to-mine-bitcoins.ars</a></li>
<li><a href="http://arstechnica.com/tech-policy/news/2011/06/bitcoin-inside-the-encrypted-peer-to-peer-currency.ars">http://arstechnica.com/tech-policy/news/2011/06/bitcoin-inside-the-encrypted-peer-to-peer-currency.ars</a></li>
<li><a href="http://www.bitcoin.org/">http://www.bitcoin.org</a></li>
</ul>
<p>Bitcoin is the first legitimate crypto-currency, an idea first suggested in <a href="http://en.wikipedia.org/wiki/Crypto-currency">1998</a>. It is unique in several ways:</p>
<p>First of all, it is (mostly) anonymous, just like cash. Mostly &#8211; because, like cash, it is not anonymous under conditions of physical surveillance or if either party is coerced.</p>
<p>Second, it eliminates the need for 3rd party payment processors like Paypal and even credit cards. In a traditional online transaction, the payment processor holds the secret account numbers for both parties and conducts the transaction. Under the bitcoin scheme, all transactions are published freely using public key cryptography to conceal the identities of both parties. This allows the economy to incorporate the transfer of money without needing an intermediate payment processor.</p>
<p>Also interesting is that the system is designed to be inflation-proof. Unlike a traditional national currency, bitcoin is controlled by an algorithm. There&#8217;s no central authority that can decide to increase the money supply and cause inflation. Instead, there is a fixed supply of 21M bitcoins which will be distributed at a geometrically decreasing rate. Each bitcoin can be subdivided, so as trading in single bitcoins becomes impractical, people can trade in millibitcoins and microbitcoins.</p>
<p>As a P2P network, the system relies on creating consensus between nodes and can be subverted if someone can muster enough computing resources to control more than half the network. In the age of massive botnets, that&#8217;s not unfeasible. Proponents argue that there&#8217;s no economic incentive, since subverting the network would ruin the saboteur&#8217;s own bitcoin investment. However, there still seems to be a risk from someone stealing the network just for laughs. </p>
<p>Another danger is that, at least anecdotally, bitcoin is being used to buy/sell illegal goods and services or for money laundering. That doesn&#8217;t bode well for its long-term viability. To be really useful, it needs some mainstream acceptance. A <a href="https://en.bitcoin.it/wiki/Trade">list of sites</a> that accept the currency looks mildly promising. </p>
<p>Anyway.. it seems like quite an interesting system &#8211; and very sci-fi. The system even comes with an anonymous inventor who designed the protocol and published the original paper under a pseudonym.</p>
<p>I&#8217;m not buying bitcoins just yet. But it would be neat to see something like this catch on.</p>
]]></content:encoded>
			<wfw:commentRss>http://craiget.com/2011/08/huh-bitcoin-pretty-interesting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Little Job Scraper</title>
		<link>http://craiget.com/2011/08/a-little-job-scraper/</link>
		<comments>http://craiget.com/2011/08/a-little-job-scraper/#comments</comments>
		<pubDate>Wed, 24 Aug 2011 12:40:32 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://craiget.com/?p=249</guid>
		<description><![CDATA[Often times you reach a point in a project where it is handy to have some real data. So today I wrote a little program to grab one page worth of Want Ads from the venerable Craigslist. Having served its &#8230; <a href="http://craiget.com/2011/08/a-little-job-scraper/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Often times you reach a point in a project where it is handy to have some real data. So today I wrote a little program to grab one page worth of Want Ads from the venerable Craigslist.</p>
<p>Having served its intended purpose, it seemed fun to tweak the program to keep track of new job postings on craigslist. So.. here&#8217;s that..</p>
<p>This program just reads the pages you specify and scans for any URLs it hasn&#8217;t seen before. If you run it via cron, say, once a day, it will give you the new postings for that day. Each new url is recorded, so it doesn&#8217;t notify you twice about the same job.</p>
<p>In python:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">urllib2</span>, <span style="color: #dc143c;">time</span>
<span style="color: #ff7700;font-weight:bold;">from</span> BeautifulSoup <span style="color: #ff7700;font-weight:bold;">import</span> BeautifulSoup
&nbsp;
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">sys</span>
<span style="color: #008000;">reload</span><span style="color: black;">&#40;</span><span style="color: #dc143c;">sys</span><span style="color: black;">&#41;</span>
<span style="color: #dc143c;">sys</span>.<span style="color: black;">setdefaultencoding</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'utf-8'</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">socket</span>
<span style="color: #dc143c;">socket</span>.<span style="color: black;">setdefaulttimeout</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">5</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># pages to monitor</span>
categories = <span style="color: black;">&#91;</span>
    <span style="color: #483d8b;">&quot;http://knoxville.craigslist.org/sof/&quot;</span>,
    <span style="color: #483d8b;">&quot;http://knoxville.craigslist.org/eng&quot;</span>
<span style="color: black;">&#93;</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># data file for visited url list</span>
dat = <span style="color: #483d8b;">&quot;.cl.exclude&quot;</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># build list of urls already visited</span>
exclude = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
<span style="color: #ff7700;font-weight:bold;">try</span>:
    <span style="color: #ff7700;font-weight:bold;">for</span> line <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #008000;">open</span><span style="color: black;">&#40;</span>dat<span style="color: black;">&#41;</span>.<span style="color: black;">readlines</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
        exclude.<span style="color: black;">append</span><span style="color: black;">&#40;</span>line<span style="color: black;">&#91;</span>:-<span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">except</span>:
    <span style="color: #ff7700;font-weight:bold;">pass</span>
&nbsp;
&nbsp;
<span style="color: #808080; font-style: italic;"># get unseen urls from each category page</span>
urls = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
<span style="color: #ff7700;font-weight:bold;">for</span> category <span style="color: #ff7700;font-weight:bold;">in</span> categories:
    <span style="color: #ff7700;font-weight:bold;">try</span>:
        page = <span style="color: #dc143c;">urllib2</span>.<span style="color: black;">urlopen</span><span style="color: black;">&#40;</span>category<span style="color: black;">&#41;</span>
        soup = BeautifulSoup<span style="color: black;">&#40;</span>page<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> a <span style="color: #ff7700;font-weight:bold;">in</span> soup.<span style="color: black;">findAll</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'a'</span><span style="color: black;">&#41;</span>:
            <span style="color: #808080; font-style: italic;"># must be a url</span>
            <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> a.<span style="color: black;">has_key</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'href'</span><span style="color: black;">&#41;</span>: <span style="color: #ff7700;font-weight:bold;">continue</span>
            <span style="color: #808080; font-style: italic;"># must match current category (to exclude help pages/etc)</span>
            <span style="color: #ff7700;font-weight:bold;">if</span> a<span style="color: black;">&#91;</span><span style="color: #483d8b;">'href'</span><span style="color: black;">&#93;</span>.<span style="color: black;">find</span><span style="color: black;">&#40;</span>category<span style="color: black;">&#41;</span> == -<span style="color: #ff4500;">1</span>: <span style="color: #ff7700;font-weight:bold;">continue</span>
            <span style="color: #808080; font-style: italic;"># ok, keep this url</span>
            urls.<span style="color: black;">append</span><span style="color: black;">&#40;</span>a<span style="color: black;">&#91;</span><span style="color: #483d8b;">'href'</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
    <span style="color: #ff7700;font-weight:bold;">except</span> <span style="color: #008000;">Exception</span>, e:
        <span style="color: #ff7700;font-weight:bold;">raise</span> e
&nbsp;
<span style="color: #808080; font-style: italic;"># visit each url to get the title and content</span>
<span style="color: #ff7700;font-weight:bold;">for</span> url <span style="color: #ff7700;font-weight:bold;">in</span> urls:
    <span style="color: #808080; font-style: italic;"># skip if already seen</span>
    <span style="color: #ff7700;font-weight:bold;">if</span> a<span style="color: black;">&#91;</span><span style="color: #483d8b;">'href'</span><span style="color: black;">&#93;</span> <span style="color: #ff7700;font-weight:bold;">in</span> exclude: <span style="color: #ff7700;font-weight:bold;">continue</span>
    <span style="color: #ff7700;font-weight:bold;">try</span>:
        page = <span style="color: #dc143c;">urllib2</span>.<span style="color: black;">urlopen</span><span style="color: black;">&#40;</span>url<span style="color: black;">&#41;</span>
        soup = BeautifulSoup<span style="color: black;">&#40;</span>page<span style="color: black;">&#41;</span>
        title = soup.<span style="color: black;">find</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;title&quot;</span><span style="color: black;">&#41;</span>.<span style="color: #dc143c;">string</span>
        body = soup.<span style="color: black;">find</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;div&quot;</span>, <span style="color: black;">&#123;</span><span style="color: #483d8b;">&quot;id&quot;</span>: <span style="color: #483d8b;">&quot;userbody&quot;</span><span style="color: black;">&#125;</span><span style="color: black;">&#41;</span>.<span style="color: #dc143c;">string</span>
        <span style="color: #808080; font-style: italic;"># do something interesting here, like email the list to yourself</span>
        <span style="color: #ff7700;font-weight:bold;">print</span> url, title
    <span style="color: #ff7700;font-weight:bold;">except</span> <span style="color: #008000;">Exception</span>, e:
        <span style="color: #ff7700;font-weight:bold;">raise</span> e
    <span style="color: #808080; font-style: italic;"># scrape slowly</span>
    <span style="color: #dc143c;">time</span>.<span style="color: black;">sleep</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">10</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #808080; font-style: italic;"># write list of all urls from this time</span>
<span style="color: #808080; font-style: italic;"># note: there is no need to remember ALL the old urls since</span>
<span style="color: #808080; font-style: italic;"># the urls are unique and we aren't dealing with pagination </span>
<span style="color: #808080; font-style: italic;"># it is safe to forget urls that are past the first page of results</span>
fout = <span style="color: #008000;">open</span><span style="color: black;">&#40;</span>dat,<span style="color: #483d8b;">'w'</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">for</span> url <span style="color: #ff7700;font-weight:bold;">in</span> urls:
    fout.<span style="color: black;">write</span><span style="color: black;">&#40;</span>url+<span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: black;">&#41;</span>
fout.<span style="color: black;">close</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre></div></div>

<p>Obviously, scraping is potentially rude. This is pretty lightweight, since it only checks URLs it hasn&#8217;t seen before and waits 10 seconds between visits. Nevertheless, use at your own risk.</p>
<p>The best way to use this is probably tweaking it to email you about new jobs. I&#8217;ve omitted that code since it is:</p>
<ol>
<li>Pretty well documented elsewhere</li>
<li>Email originating from a home server will probably be rejected anyway</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://craiget.com/2011/08/a-little-job-scraper/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Yay Dash C</title>
		<link>http://craiget.com/2011/07/yay-dash-c/</link>
		<comments>http://craiget.com/2011/07/yay-dash-c/#comments</comments>
		<pubDate>Sun, 31 Jul 2011 13:45:42 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://craiget.com/?p=233</guid>
		<description><![CDATA[I think our internet is rate-limited. That&#8217;s annoying because I don&#8217;t do any of the stuff that maybe deserves it (looking at you BitTorrent!). I haven&#8217;t exactly quantified the problem yet, but the main symptom is a very reasonable rate &#8230; <a href="http://craiget.com/2011/07/yay-dash-c/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I think our internet is rate-limited. That&#8217;s annoying because <strong>I don&#8217;t do any of the stuff that maybe deserves it</strong> (looking at you BitTorrent!). I haven&#8217;t exactly quantified the problem yet, but the main symptom is a very reasonable rate of 300K or so dropping to 3K-5K after the first 1-2 Mb. Since many webpages are in the 1-2Mb range (or substantially smaller), it isn&#8217;t a big deal for regular browsing, but video becomes basically unwatchable. I&#8217;m not sure if the rate-limiting is on specific types of files (video) or everything.. or maybe I&#8217;m just imagining the whole thing.</p>
<p>Either way &#8211; Dialup is so 1999. Right?!</p>
<p>Thankfully, there&#8217;s <a href="http://rg3.github.com/youtube-dl/">youtube-dl</a>, which downloads youtube videos for offline viewing. Unfortunately, the rate-limiting is still problematic. After a couple of MB, the rate drops and the download effectively stops (and doesn&#8217;t appear to recover if you leave it running for awhile). Youtube-dl has a &#8220;-c&#8221; option (just like <a href="http://www.gnu.org/s/wget/">wget</a>) which tries to continue your previous download instead of starting over.</p>
<p>A totally garbage solution that works: just restart the download every 10 seconds until it&#8217;s done. You get the good rate for a few seconds and restart every time the rate drops. This works.. but doing it by hand is annoying (or unfeasible for a big file). A better solution is to have a script that runs youtube-dl automatically for 10 seconds, kills it, restarts it, and repeats until the file is completely downloaded.</p>
<p>So it would be nice to have a way to run a program for a certain number of seconds. People much smarter than me have already figured this out in the form of a bash script:</p>
<p><a href="http://www.bashcookbook.com/bashinfo/source/bash-4.0/examples/scripts/timeout3 ">http://www.bashcookbook.com/bashinfo/source/bash-4.0/examples/scripts/timeout3<br />
</a></p>
<p>You can use it like this:</p>
<pre>
timeout 10 youtubedl -c "url_of_youtube_video"
</pre>
<p>So that works, now just wrap it up in a loop. 10 tries is probably enough to get a video. I know there are smarter ways to check for completion, but I&#8217;m pretty lazy and this is good enough:</p>
<pre>
for i in {1..10}
do
  timeout 10 youtubedl -c "url_of_youtube_video"
done
</pre>
<p>Not exactly as good as just watching videos in the browser, but it resolves my frustration anyway.</p>
]]></content:encoded>
			<wfw:commentRss>http://craiget.com/2011/07/yay-dash-c/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Overton Window</title>
		<link>http://craiget.com/2011/07/the-overton-window/</link>
		<comments>http://craiget.com/2011/07/the-overton-window/#comments</comments>
		<pubDate>Tue, 26 Jul 2011 11:55:48 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://craiget.com/?p=183</guid>
		<description><![CDATA[I love finding out that some fluttering thought has a proper name. Reasonable people should agree that simply having two sides to an issue doesn&#8217;t make them equally correct. If you disagree, just take any issue you feel strongly about, &#8230; <a href="http://craiget.com/2011/07/the-overton-window/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I love finding out that some fluttering thought has a proper name.</p>
<p>Reasonable people should agree that simply having two sides to an issue doesn&#8217;t make them equally correct. If you disagree, just take any issue you feel strongly about, consider the polar opposing, and decide if the you would agree to the 50-50 compromise. You would? Okay, well move the other viewpoint one step towards the extreme. Would you still agree? Certainly not &#8211; 50-50 became 40-60 &#8211; the former &#8220;compromise&#8221; now favors your opponent. </p>
<p>Suppose I argue that a triangle has 5 sides.</p>
<p>You say 3.</p>
<p>Should we compromise on 4?</p>
<p>What if I say 10? Is the number of sides of a triangle even up for debate?</p>
<p>That&#8217;s the essence of the <a href="http://en.wikipedia.org/wiki/Overton_window">Overton Window</a> &#8211; the range of beliefs that reasonable people can hold on a topic.</p>
<p>The difficulty lies in the <a href="http://en.wikipedia.org/wiki/Argument_to_moderation">Argument to Moderation</a>, a fallacy that, given two extremes, the truth necessarily lies in the middle. Proponents of a particular viewpoint can manipulate the Overton Window by adopting values more extreme than their actual beliefs. As a result, the apparent middle ground shifts, changing the whole debate.</p>
<p><strong>Not exactly a revelation</strong> &#8211; people manipulate each other and the public opinion.</p>
<p>I was just intrigued that there&#8217;s a term that particular phenomenon.</p>
<p>A few related links:</p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Overton_window">The Overton Window</a> (Wikipedia)</li>
<li><a href="http://skeptics.stackexchange.com/questions/5063/can-the-overton-window-be-deliberately-moved-by-espousing-extremist-views">Can the Overton Window be deliberately moved?</a> (stackexchange)</li>
<li><a href="http://diveintomark.org/archives/2006/08/23/overton-window">W3C and the Overton Window</a> (regarding web standards)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://craiget.com/2011/07/the-overton-window/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Web Font Picker</title>
		<link>http://craiget.com/2011/07/web-font-picker/</link>
		<comments>http://craiget.com/2011/07/web-font-picker/#comments</comments>
		<pubDate>Fri, 22 Jul 2011 14:53:31 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[javascript]]></category>

		<guid isPermaLink="false">http://craiget.com/?p=170</guid>
		<description><![CDATA[Google Web Fonts is kinda awesome. If you haven&#8217;t checked it out already &#8211; basically it gives you a ton of new font choices that still degrade gracefully for older browsers. All you have to do is add a stylesheet &#8230; <a href="http://craiget.com/2011/07/web-font-picker/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.google.com/webfonts">Google Web Fonts</a> is kinda awesome. If you haven&#8217;t checked it out already &#8211; basically it gives you a ton of new font choices that still degrade gracefully for older browsers. All you have to do is add a stylesheet to your page and specify the &#8216;font-family&#8217;. It truly couldn&#8217;t be easier. Also.. yay, free!</p>
<p>One annoyance is the workflow. You have to look at the collection, edit your css and/or webpages, reload, repeat. (However, the fonts are available for download, if you use Photoshop/Gimp/etc to design your pages).</p>
<p>The following code lets you change fonts on the fly by adding a little dropdown box to the top right corner. When you doubleclick anything on the page, it will be styled with the chosen font. The code is a bit ugly since it&#8217;s just the first thing that came to mind. Nevertheless, I think it&#8217;s kinda neat for experimenting.</p>

<div class="wp_syntax"><div class="code"><pre class="javascript" style="font-family:monospace;"><span style="color: #006600; font-style: italic;">// list of fonts to try</span>
<span style="color: #003366; font-weight: bold;">var</span> families <span style="color: #339933;">=</span> <span style="color: #009900;">&#91;</span><span style="color: #3366CC;">'Yellowtail'</span><span style="color: #339933;">,</span><span style="color: #3366CC;">'Astigmatic'</span><span style="color: #339933;">,</span><span style="color: #3366CC;">'Leckerli One'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
<span style="color: #006600; font-style: italic;">// build the dropdown box</span>
$<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'body'</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">append</span><span style="color: #009900;">&#40;</span>$<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'&lt;select id=&quot;fontpicker&quot;&gt;&lt;/select&gt;'</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000066; font-weight: bold;">for</span><span style="color: #009900;">&#40;</span><span style="color: #003366; font-weight: bold;">var</span> i<span style="color: #339933;">=</span><span style="color: #CC0000;">0</span><span style="color: #339933;">;</span> i<span style="color: #339933;">&lt;</span>families.<span style="color: #660066;">length</span><span style="color: #339933;">;</span> i<span style="color: #339933;">++</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    $<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'#fontpicker'</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">append</span><span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'&lt;option value=&quot;'</span><span style="color: #339933;">+</span>families<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span><span style="color: #339933;">+</span><span style="color: #3366CC;">'&quot;&gt;'</span><span style="color: #339933;">+</span>families<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span><span style="color: #339933;">+</span><span style="color: #3366CC;">'&lt;/option&gt;'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
$<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'#fontpicker'</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">css</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#123;</span><span style="color: #3366CC;">'position'</span><span style="color: #339933;">:</span> <span style="color: #3366CC;">'absolute'</span><span style="color: #339933;">,</span><span style="color: #3366CC;">'top'</span><span style="color: #339933;">:</span> <span style="color: #3366CC;">'0px'</span><span style="color: #339933;">,</span> <span style="color: #3366CC;">'left'</span><span style="color: #339933;">:</span> <span style="color: #3366CC;">'0px'</span><span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #006600; font-style: italic;">// bind doubleclick on every element</span>
$<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'*'</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">live</span><span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'dblclick'</span><span style="color: #339933;">,</span> <span style="color: #003366; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #003366; font-weight: bold;">var</span> family <span style="color: #339933;">=</span> $<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'#fontpicker'</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">val</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #003366; font-weight: bold;">var</span> href <span style="color: #339933;">=</span> <span style="color: #3366CC;">&quot;http://fonts.googleapis.com/css?family=&quot;</span><span style="color: #339933;">+</span>family<span style="color: #339933;">+</span><span style="color: #3366CC;">&quot;&amp;v2&quot;</span><span style="color: #339933;">;</span>
    <span style="color: #003366; font-weight: bold;">var</span> stylesheet <span style="color: #339933;">=</span> <span style="color: #3366CC;">&quot;&lt;link href='http://fonts.googleapis.com/css?family=&quot;</span><span style="color: #339933;">+</span>family<span style="color: #339933;">+</span><span style="color: #3366CC;">&quot;&amp;v2' rel='stylesheet' type='text/css'&gt;&quot;</span><span style="color: #339933;">;</span>
    $<span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">this</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">css</span><span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'font-family'</span><span style="color: #339933;">,</span> family<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #006600; font-style: italic;">// try not to load the same stylesheet twice</span>
    <span style="color: #003366; font-weight: bold;">var</span> found <span style="color: #339933;">=</span> <span style="color: #CC0000;">0</span><span style="color: #339933;">;</span>
    $<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">&quot;head link[rel='stylesheet']&quot;</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">each</span><span style="color: #009900;">&#40;</span><span style="color: #003366; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #000066; font-weight: bold;">if</span><span style="color: #009900;">&#40;</span>$<span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">this</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">attr</span><span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'href'</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">==</span> href<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
            found <span style="color: #339933;">=</span> <span style="color: #CC0000;">1</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000066; font-weight: bold;">if</span><span style="color: #009900;">&#40;</span>found <span style="color: #339933;">==</span> <span style="color: #CC0000;">0</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        $<span style="color: #009900;">&#40;</span><span style="color: #3366CC;">'head'</span><span style="color: #009900;">&#41;</span>.<span style="color: #660066;">append</span><span style="color: #009900;">&#40;</span>stylesheet<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>There&#8217;s one big problem still, which is that you need to specify WHICH fonts you want to make available to the switcher. Better than doing it one at a time, but not as good as pulling the complete list from Google. I&#8217;m not sure of a super great way to achieve that, but maybe something to consider.</p>
]]></content:encoded>
			<wfw:commentRss>http://craiget.com/2011/07/web-font-picker/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Half-baked Objects and 10% ORM</title>
		<link>http://craiget.com/2011/07/half-baked-objects-and-10-orm/</link>
		<comments>http://craiget.com/2011/07/half-baked-objects-and-10-orm/#comments</comments>
		<pubDate>Thu, 21 Jul 2011 12:05:42 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[orm]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[sql]]></category>

		<guid isPermaLink="false">http://craiget.com/?p=121</guid>
		<description><![CDATA[I&#8217;ve used Object Relational Mapping (ORM) libraries on a few projects in the past. Without getting into the many, many details, ORM bridges the gap between data storage in a relational database and Object-Oriented Programming. Simply, instead of writing SQL &#8230; <a href="http://craiget.com/2011/07/half-baked-objects-and-10-orm/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve used <a href="http://en.wikipedia.org/wiki/Object-relational_mapping">Object Relational Mapping</a> (ORM) libraries on a few projects in the past. Without getting into the many, many details, ORM bridges the gap between data storage in a relational database and Object-Oriented Programming. Simply, instead of writing SQL queries, you let the ORM library write them for you. It&#8217;s great when it works out, but like all code generators, there are some potential downsides:</p>
<ul>
<li>One more library to learn</li>
<li>May generate inefficient SQL (or more efficient, in some cases)</li>
<li>If there&#8217;s a problem, you may be taking a deep dive into the code to figure it out</li>
</ul>
<p>Whether it&#8217;s worthwhile is simply a matter of getting more out of it than you put in. As an alternative, I&#8217;ve started using a technique to build Objects on-the-fly from multi-table joins. This doesn&#8217;t handle every case (not even close!), but it does handle the cases I need.</p>
<p>So suppose you&#8217;ve got a webpage with Users and Posts and Comments. Each Post can have multiple Comments, and a User can &#8220;Like&#8221; a comment. A normalized version looks something like this:</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> users <span style="color: #66cc66;">&#40;</span>
  id <span style="color: #993333; font-weight: bold;">INT</span> <span style="color: #993333; font-weight: bold;">AUTO_INCREMENT</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span><span style="color: #66cc66;">,</span>
  name <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">50</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#41;</span>;
<span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> posts <span style="color: #66cc66;">&#40;</span>
  id <span style="color: #993333; font-weight: bold;">INT</span> <span style="color: #993333; font-weight: bold;">AUTO_INCREMENT</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span><span style="color: #66cc66;">,</span>
  name <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">50</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#41;</span>;
<span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> comments <span style="color: #66cc66;">&#40;</span>
  id <span style="color: #993333; font-weight: bold;">INT</span> <span style="color: #993333; font-weight: bold;">AUTO_INCREMENT</span> <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span><span style="color: #66cc66;">,</span>
  post_id <span style="color: #993333; font-weight: bold;">INT</span><span style="color: #66cc66;">,</span>
  content <span style="color: #993333; font-weight: bold;">VARCHAR</span><span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">50</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#41;</span>;
<span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> liked_comments <span style="color: #66cc66;">&#40;</span>
  user_id <span style="color: #993333; font-weight: bold;">INT</span><span style="color: #66cc66;">,</span>
  comment_id <span style="color: #993333; font-weight: bold;">INT</span>
<span style="color: #66cc66;">&#41;</span>;</pre></div></div>

<p>Now on this webpage, you want to show all of a user&#8217;s Liked Comments. So you probably have a view template that loops over the comments, showing the comment text and a link back to the Post, something like this:</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">foreach</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$comments</span> <span style="color: #b1b100;">as</span> <span style="color: #000088;">$comment</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">:</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>
  &lt;div class=&quot;comment&quot;&gt;
    &lt;p&gt;<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">echo</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$comment</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">content</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>
    &lt;p&gt;On &lt;a href=&quot;<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">echo</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$comment</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">post</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">link</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>&quot;&gt;<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">echo</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$comment</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">post</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">name</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>&lt;/a&gt;&lt;/p&gt;
  &lt;/div&gt;
<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">endforeach</span> <span style="color: #000000; font-weight: bold;">?&gt;</span></pre></div></div>

<p>Now the question is, where should the post permalink come from? I can think of at least 3 reasonable answers:</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;">// 1. from a method on the comment
&lt;a href=&quot;<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">echo</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$comment</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">post_link</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>&quot;&gt;<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">echo</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$comment</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">post_name</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>&lt;/a&gt;
&nbsp;
// 2. from a method on the post
&lt;a href=&quot;<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">echo</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$comment</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">post</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">link</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>&quot;&gt;<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">echo</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$comment</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">post</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">name</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>&lt;/a&gt;
&nbsp;
// 3. from the template, using properties of the comment
&lt;a href=&quot;/post/<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">echo</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$comment</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">post_id</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>&quot;&gt;<span style="color: #000000; font-weight: bold;">&lt;?php</span> <span style="color: #b1b100;">echo</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$comment</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">post_name</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> <span style="color: #000000; font-weight: bold;">?&gt;</span>&lt;/a&gt;</pre></div></div>

<p>I would argue that the 2nd option is the best. In the 1st option, the Comment class needs methods to handle displaying a post, which seems unnatural and leads to duplication. In the 3rd option, the View is building the URL, which is a pain if you ever want to change it later, since you&#8217;d need to update all your views. The best thing is to let the Post know how to build it&#8217;s own permalink, the method might look like this:</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">// in Post class</span>
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">function</span> <span style="color: #990000;">link</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #b1b100;">return</span> <span style="color: #0000ff;">&quot;/post/&quot;</span> <span style="color: #339933;">.</span> <span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">id</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>So how to build a list of Comments, each with a nested Post object? Here&#8217;s one possibility:</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span>
  comments<span style="color: #66cc66;">.</span>content <span style="color: #993333; font-weight: bold;">AS</span> content<span style="color: #66cc66;">,</span>
  posts<span style="color: #66cc66;">.</span>id <span style="color: #993333; font-weight: bold;">AS</span> post_id<span style="color: #66cc66;">,</span>
  posts<span style="color: #66cc66;">.</span>name <span style="color: #993333; font-weight: bold;">AS</span> post_name
<span style="color: #993333; font-weight: bold;">FROM</span> liked_comments
<span style="color: #993333; font-weight: bold;">JOIN</span> comments <span style="color: #993333; font-weight: bold;">ON</span> liked_comments<span style="color: #66cc66;">.</span>comment_id <span style="color: #66cc66;">=</span> comments<span style="color: #66cc66;">.</span>id
<span style="color: #993333; font-weight: bold;">JOIN</span> posts <span style="color: #993333; font-weight: bold;">ON</span> comments<span style="color: #66cc66;">.</span>post_id <span style="color: #66cc66;">=</span> posts<span style="color: #66cc66;">.</span>id
<span style="color: #993333; font-weight: bold;">WHERE</span> user_id <span style="color: #66cc66;">=</span> <span style="color: #cc66cc;">1</span></pre></div></div>

<p>So that&#8217;s fine, let&#8217;s say you instantiate a Comment for each row. Something like this:</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;?php</span>
<span style="color: #000088;">$comments</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$rs</span> <span style="color: #339933;">=</span> <span style="color: #990000;">mysql_query</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$sql</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">while</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$row</span> <span style="color: #339933;">=</span> <span style="color: #990000;">mysql_fetch_assoc</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$rs</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #000088;">$comments</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Comment<span style="color: #009900;">&#40;</span><span style="color: #000088;">$row</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #000000; font-weight: bold;">?&gt;</span></pre></div></div>

<p>So that creates a list of comments for our View. All that&#8217;s missing is to instantiate a nested Post for each Comment. This can be done in the Comment constructor:</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">function</span> __construct<span style="color: #009900;">&#40;</span><span style="color: #000088;">$args</span><span style="color: #339933;">=</span><span style="color: #009900; font-weight: bold;">NULL</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$args</span> <span style="color: #339933;">&amp;&amp;</span> <span style="color: #990000;">is_array</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$args</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">array_key_exists</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'post_id'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$args</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&amp;&amp;</span> <span style="color: #990000;">array_key_exists</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'post_name'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$args</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
      <span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">post</span> <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Post<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
      <span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">post</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">id</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$args</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'post_id'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
      <span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">post</span><span style="color: #339933;">-&gt;</span><span style="color: #004000;">name</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$args</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'post_name'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
  <span style="color: #009900;">&#125;</span>
  <span style="color: #666666; font-style: italic;">// other stuff..</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>So when we instantiate a Comment and provide the appropriate keys (post_id and post_name), it will instantiate a Post for us. It&#8217;s not really a <strong>proper</strong> Post, but more of a half-baked object. It doesn&#8217;t have an author, content or other things you might expect in a Post; instead, it has just the two keys to know how to display its permalink.</p>
<p>Now this works fine, but having a bunch of hacked-up constructors isn&#8217;t very nice and we&#8217;re still requiring the Comment class to know something about the structure of Posts. A better alternative is to make a super class with a more generic constructor that can be used by any class to instantiate any other class (or classes) based only on the row names. Here is the more generic version I am currently using:</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">// in a base class</span>
<span style="color: #000000; font-weight: bold;">function</span> __construct<span style="color: #009900;">&#40;</span><span style="color: #000088;">$row</span><span style="color: #339933;">,</span> <span style="color: #000088;">$params</span><span style="color: #339933;">=</span><span style="color: #009900; font-weight: bold;">NULL</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
    <span style="color: #b1b100;">foreach</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$row</span> <span style="color: #b1b100;">as</span> <span style="color: #000088;">$k</span><span style="color: #339933;">=&gt;</span><span style="color: #000088;">$v</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
      <span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #000088;">$k</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$v</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
    <span style="color: #000088;">$klass_map</span> <span style="color: #339933;">=</span> <span style="color: #009900; font-weight: bold;">NULL</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span> <span style="color: #000088;">$params</span> <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">array_key_exists</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">'klass_map'</span><span style="color: #339933;">,</span> <span style="color: #000088;">$params</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
            <span style="color: #000088;">$klass_map</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$params</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">'klass_map'</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span> 
    <span style="color: #009900;">&#125;</span>
    <span style="color: #000088;">$vars</span> <span style="color: #339933;">=</span> <span style="color: #990000;">get_object_vars</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$this</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">foreach</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$vars</span> <span style="color: #b1b100;">as</span> <span style="color: #000088;">$k</span><span style="color: #339933;">=&gt;</span><span style="color: #000088;">$v</span><span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
        <span style="color: #000088;">$split</span> <span style="color: #339933;">=</span> <span style="color: #990000;">strpos</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$k</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">'_'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span> <span style="color: #000088;">$split</span> <span style="color: #339933;">===</span> <span style="color: #009900; font-weight: bold;">FALSE</span> <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
            <span style="color: #b1b100;">continue</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span> <span style="color: #b1b100;">else</span> <span style="color: #009900;">&#123;</span>
            <span style="color: #000088;">$prefix</span> <span style="color: #339933;">=</span> <span style="color: #990000;">substr</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$k</span><span style="color: #339933;">,</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">,</span> <span style="color: #000088;">$split</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
            <span style="color: #000088;">$postfix</span> <span style="color: #339933;">=</span> <span style="color: #990000;">substr</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$k</span><span style="color: #339933;">,</span> <span style="color: #000088;">$split</span><span style="color: #339933;">+</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
            <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span> <span style="color: #339933;">!</span> <span style="color: #990000;">isset</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #009900;">&#123;</span><span style="color: #000088;">$prefix</span><span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
                <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span> <span style="color: #000088;">$klass_map</span> <span style="color: #339933;">&amp;&amp;</span> <span style="color: #990000;">array_key_exists</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$prefix</span><span style="color: #339933;">,</span> <span style="color: #000088;">$klass_map</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
                    <span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #009900;">&#123;</span><span style="color: #000088;">$prefix</span><span style="color: #009900;">&#125;</span> <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #000088;">$klass_map</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$prefix</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
                <span style="color: #009900;">&#125;</span> <span style="color: #b1b100;">else</span> <span style="color: #009900;">&#123;</span>
                    <span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #009900;">&#123;</span><span style="color: #000088;">$prefix</span><span style="color: #009900;">&#125;</span> <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> stdClass<span style="color: #339933;">;</span>
                <span style="color: #009900;">&#125;</span>
            <span style="color: #009900;">&#125;</span>
            <span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #009900;">&#123;</span><span style="color: #000088;">$prefix</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">-&gt;</span><span style="color: #009900;">&#123;</span><span style="color: #000088;">$postfix</span><span style="color: #009900;">&#125;</span> <span style="color: #339933;">=</span> <span style="color: #000088;">$v</span><span style="color: #339933;">;</span>
            <span style="color: #990000;">unset</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$this</span><span style="color: #339933;">-&gt;</span><span style="color: #000088;">$k</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span>
    <span style="color: #666666; font-style: italic;">//echo('&lt;pre&gt;');</span>
    <span style="color: #666666; font-style: italic;">//exit(print_r($this));</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Well that looks a little more complicated. Basically, it just looks to see if there is an underscore in each property name, and if there is, it tries to instantiate that property as a class. A mapping tells it which prefixes go with which classes. For example:</p>
<pre>
$this->post_id becomes $this->post->id
$this->post_name becomes $this->post->name
$this->user_id becomes $this->user->id
$this->content just stays the same (no underscore)
</pre>
<p>So how to use that constructor? Something like this:</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">&lt;?php</span>
<span style="color: #000088;">$params</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span>
  <span style="color: #0000ff;">'klass_map'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span>
    <span style="color: #0000ff;">'post'</span> <span style="color: #339933;">=&gt;</span> <span style="color: #0000ff;">'Post'</span><span style="color: #339933;">,</span> <span style="color: #666666; font-style: italic;">// post_ prefix maps to Post class</span>
   <span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$comments</span> <span style="color: #339933;">=</span> <span style="color: #990000;">array</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #000088;">$rs</span> <span style="color: #339933;">=</span> <span style="color: #990000;">mysql_query</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$sql</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">while</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$row</span> <span style="color: #339933;">=</span> <span style="color: #990000;">mysql_fetch_assoc</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$rs</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #000088;">$comments</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Comment<span style="color: #009900;">&#40;</span><span style="color: #000088;">$row</span><span style="color: #339933;">,</span> <span style="color: #000088;">$params</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
<span style="color: #000000; font-weight: bold;">?&gt;</span></pre></div></div>

<p>The key observation is that $this->post is not a generic stdClass, but an instance of Post that has been created with only the properties we know we&#8217;re gonna need.</p>
<p>There are some obvious downfalls here:</p>
<p>First, using magic constructors can make things unnecessarily complicated and may cause conflicts with libraries that do their own magic. Adding/removing (unsetting) properties seems particularly hazardous.</p>
<p>Second, you have to write your SQL carefully so you get the row names and mappings you need. In particular, row names like &#8220;modified_on&#8221; would not behave as expected. It should be easy to tweak the generic constructor to be a bit more robust.</p>
<p>Also, this really only handles the case of these nested 1:1 mappings. I think you could extend the idea, which is fairly useful by itself, but I would bet it gets complicated quickly as you head towards <strong>real</strong> ORM territory.</p>
<p>Despite the shortcomings, I&#8217;m finding this to be a convenient way to construct objects on-the-fly at the early prototyping stages of a project when I&#8217;m constantly renaming things and moving code around.</p>
]]></content:encoded>
			<wfw:commentRss>http://craiget.com/2011/07/half-baked-objects-and-10-orm/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Worse Is Better</title>
		<link>http://craiget.com/2011/07/worse-is-better/</link>
		<comments>http://craiget.com/2011/07/worse-is-better/#comments</comments>
		<pubDate>Tue, 19 Jul 2011 12:43:02 +0000</pubDate>
		<dc:creator>craiget</dc:creator>
				<category><![CDATA[Random Links]]></category>

		<guid isPermaLink="false">http://craiget.com/?p=114</guid>
		<description><![CDATA[Some interesting essays from computer history: Worse Is Better. The original essay considers the success of C against the arguably superior Lisps, which failed to gain widespread popularity. It seems hard to predict whether a particular product will succeed using &#8230; <a href="http://craiget.com/2011/07/worse-is-better/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Some interesting essays from computer history: <a href="http://www.dreamsongs.com/WorseIsBetter.html">Worse Is Better</a>. The original essay considers the success of C against the arguably superior Lisps, which failed to gain widespread popularity. It seems hard to <strong>predict</strong> whether a particular product will succeed using Worse Is Better &#8211; lots of times, worse just sucks &#8211; but it&#8217;s useful in retrospect to see why a product wins.</p>
]]></content:encoded>
			<wfw:commentRss>http://craiget.com/2011/07/worse-is-better/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

