Anatomy of a SPARQL Query

(2009)

..in which I try to describe a few aspects of the SPARQL query language in plain(er) English.

SPARQL is the query language for the semantic web and is a W3C recommendation (that is, practically a standard). To issue a query, you need an endpoint. All my examples will be using the DBPedia endpoint. Open that page in a new window. You should be able to copy-paste any query on this page into DBpedia and play around with the results.

The basic thing you can do with SPARQL is SELECT. Let's start by getting a list of US States:

SELECT ?state WHERE { ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> . } Now, it is important to note that the choice of variable names is arbitrary. You could have chosen ?s or even ?turnip just as easily. This query will give you the exact same results:

SELECT ?turnip WHERE { ?turnip skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> . }

Okay, well that's pretty sweet. So let's learn something interesting about the nifty-fifty - how about state capitals?

SELECT ?state ?capital WHERE { ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> . ?state dbpedia2:capital ?capital }

So that works. But WTF are skos and dbpedia2? They are called namespaces. The short story is that to be useful in a SPARQL query, everything needs a unique identifier. URLs (like a web address) provide just that. If two things have the same URL, they are exactly the same. Unfortunately, URLs are often quite long. To save typing and space on the screen, skos and dbpedia2 and rdf are all abbreviations. When you see skos, it actually means http://www.w3.org/2004/02/skos/core# and when you see skos:subject it means http://www.w3.org/2004/02/skos/core#subject.

Namespaces - well that's all very interesting, but how do you know that the property is called dbpedia2:capital instead of, say, dbpedia2:stateCapital? Happily, there is a query that gives a list of all the properties for a particular subject. In this case, the subject is Alaska:

SELECT ?prop WHERE { <http://dbpedia.org/resource/Alaska> ?prop ?obj } Of course, you don't have to provide the subject. You could provide the predicate or the object instead. Now, on with the show. Let's look at the State Flower:

SELECT ?state ?flower WHERE { ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> . ?state dbpedia2:flower ?flower } Uh oh - there are a few states missing! Awesome as it is, DBPedia doesn't know everything. In this case, some states are missing the dbpedia2:flower property. Fortunately, there's the OPTIONAL keyword:

SELECT ?state ?flower WHERE { ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> . OPTIONAL { ?state dbpedia2:flower ?flower } } Okay, that's better. We don't have 50 flowers, but at least we still have all the states. So which state has the biggest population? The ORDER BY clause can help:

SELECT ?state ?pop WHERE { ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> . ?state dbpedia2:poprank ?pop } ORDER BY ?pop Go ahead and try it. It works, but not exactly, right? The problem is that DBPedia's dbpedia:poprank property is not always an integer value. What does that mean? It means that sometimes a “1” isn't a “1”, sometimes it is a “1”^^dbpedia:units/Rank. So what can you do about it? In this case, you can just cast it to an integer. This query will give you the states in order of population:

SELECT ?state ?pop WHERE { ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> . ?state dbpedia2:poprank ?pop } ORDER BY xsd:int(?pop) So that's useful and there are a number of conversion functions to do similar things. Now here's something a little different - this query gets the latitude and longitude of the highest point in every state. With a little bit of scripting, you could plot these on a map:

PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> SELECT ?state ?mtn ?lat ?long WHERE { ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> . ?state dbpedia2:highestpoint ?mtn . ?mtn geo:lat ?lat . ?mtn geo:long ?long } Here it was necessary to declare the geo namespace using the PREFIX keyword. The DBPedia SPARQL endpoint already declares a bunch of namespaces for your convenience, but geo is not among them so we need to add it in. It is worth noting that the name geo is arbitary - you could have called it fred. Namespaces in widespread use have developed conventional prefixes that should be used to prevent confusion.

SPARQL also supports a limited set of comparison operations using the FILTER keyword. The query below lists states that joined after January 1st, 1850:

SELECT ?state ?date WHERE { ?state skos:subject <http://dbpedia.org/resource/Category:States_of_the_United_States> . ?state dbpedia2:admittancedate ?date FILTER ( ?date > "1850-01-01T00:00:00Z"^^xsd:dateTime ) } The recommendation lists a number of comparisons http://www.w3.org/TR/rdf-sparql-query/#SparqlOps, though in practice these may not be 100% available in a particular implementation.

I'll be back with Part 2 soon. As always, I hope this will be useful, so comments/feedback are always welcome - especially from people who can correct any mistakes printed here.