OSM and SPARQL logo

OpenStreetMap, or “OSM” to geospatial folk, is a crowd-sourced online map that has made tremendous achievements in its role as the Wikipedia of geospatial data. (The Wikipedia page for OpenStreetMap is really worth a skim to learn more about its impressive history.) OSM offers a free alternative to commercial mapping systems out there—and you better believe that the commercial mapping systems are reading that great free data into their own databases.

OSM provides a SPARQL endpoint and a nice page of example queries. With their endpoint, the following query lists the names and addresses of all the museums in New York City (or, in RDF terms, everything with an osmt:addr:city value of “New York” and an osmt:tourism value of “museum”):

SELECT ?name ?housenumber ?street 
WHERE {
   ?museum osmt:addr:city "New York";
      osmt:tourism "museum";
      osmm:loc ?loc ;
      osmt:name ?name ;
      osmt:addr:housenumber ?housenumber ;
      osmt:addr:street ?street .
      # The following tells it to only get museums south of the Javits Center
      # FILTER(geof:latitude(?loc) < 40.758289)
}

You can try it here. As I write this, it returns 32 results, and if you uncomment the filter condition to only get museums south of that latitude, it returns 17. That filter condition is just a taste of actually using geospatial data; the osm:loc value has a type of http://www.opengis.net/ont/geosparql#wktLiteral and takes a form like Point(-73.9900266 40.7187837). As you can see, the filter with the geof:latitude() function to pull the latitude value out out of the Point value.

This is a very basic level of geospatial data use. A proper geospatial query for something like all the museums within a mile of the Museum of Modern Art is more complicated because of the effect of the earth’s curvature. Although OSM stores each entity’s latitude and longitude values, its query engine doesn’t support such queries. (The How OSM Data is Stored documentation of their SPARQL endpoint is good if you want to explore their SPARQL endpoint more.)

The ability to execute real geospatial queries typically comes from an add-in to most databases. For example, if you already use Oracle for your relational data, you pay extra for Oracle Spatial. If you’re using the open source PostgreSQL relational database, you get the open source PostGIS add-in. Even little SQLite has SpatiaLite. (If you’re storing massive amounts of data using Apache Accumulo on a Hadoop platform, the add-in would be the open source GeoMesa suite developed at my employer CCRi. Being around this project has taught me a lot about the issues of geospatial processing.)

The LinkedGeoData.org project from the University of Leipzig’s Agile Knowledge Engineering and Semantic Web (AKSW) research group “uses the information collected by the OpenStreetMap project and makes it available as an RDF knowledge base according to the Linked Data principles”. It includes a SPARQL endpoint but I could find no documentation or examples of geospatial extensions to SPARQL. The endpoint is currently up and running, but the “About/News” page shows no activity on the project since May of last year. (A query for resources with an rdfs:label of “Grand Central Station” returned the URIs http://linkedgeodata.org/triplify/node291087340 and http://linkedgeodata.org/triplify/way189853520, both of which returned HTTP 500. )

A standardized extension for SPARQL called GeoSPARQL specifies how to make queries about spatial information in which you can do things like specify criteria in terms of miles or kilometers, and a SPARQL engine that supports this standard will do the necessary trigonometry to give you the right answers. GeoSPARQL is sponsored by the Open Geospatial Consortium, who is also responsible for other popular geospatial standards such as the Web Feature Service and Web Map Service standards for REST API access to geospatial data. I have used both often at work. Looking at their standards page, I only just now learned that they are also the standards body behind KML. Their GeoSPARQL Functions page documents the extension functions. (I have co-workers who understand what the mathematical concept of a “convex hull” is; I have tried with little success.)

The geosparql.org website has some preloaded data where you can try GeoSPARQL queries. I wanted to explore the possibilities of using a geospatial SPARQL extension, ideally GeoSPARQL, with data that I could control. Because I just love converting triples from one namespace to another so that I can use new tools and standards with them, I hoped to get some OSM triples and convert them to the right namespaces to enable geospatial queries on them using a local triplestore. I decided that a simpler first step would be to pull down some triples from geosparql.org and load those into the local triplestore, because I already knew that those would work with standard GeoSPARQL queries.

The two downloadable triplestores that I could find that claimed geospatial support were Blazegraph and Parliament. (Blazegraph’s 2010 slides “Geospatial RDF Data Geospatial RDF Data” (pdf) provide a good introduction to issues of geospatial indexing.) I got the sample query to run against their sample data as described on their Querying Geospatial Data page, but I had no luck when I tried to modify it to work with data that I had loaded into it. The geo:predicate triple in their sample query seems to be necessary, but I wasn’t querying for both location and time like their example does, and although I tried different objects for a triple using this property I couldn’t get it to work and gave up. (Since Amazon Neptune’s acqui-hire of most if not all of the Blazegraph staff, it doesn’t seem to be under active development anyway.)

Parliament comes from Raytheon subsidiary BBN, a company with a long history in important computer technology. This triplestore promised not just geospatial support but support for the GeoSPARQL standard. I got Parliament up and running locally and found a localhost page about indexes that showed that the data I was using did not have a geospatial index, and I saw no way to create one. Their five-year-old User Guide (PDF) has a “Configuring Indexes” section consisting of the four words “Yet to be written”. I gave up on Parliament after some LinkedIn searches showed that the main people attached to the project are no longer at BBN.

In the middle of all this research I learned some great news: Apache Jena had some geospatial support that required the use of Lucene or Solr, the use of a custom querying vocabulary, and a lot of manual index configuration, but they are now ramping up code development on direct support for the GeoSPARQL standard. That’s why this blog entry has a subtitle of “Part 1”, and I look forward to trying out GeoSPARQL in a locally running copy of Jena’s Fuseki server and then writing Part 2. (I’m going to be patient as I wait for it to be included in the binary release of Fuseki—or to put it another way, I’m too lazy to set up the environment to build it from the current source.) And, once a SPARQL 1.2 Recommendation gets closer and I update my book Learning SPARQL, I thought it would be a good idea to cover GeoSPARQL, so I’ll be happy to see support for SPARQL’s standardized geospatial extension in the triplestore that is already used in many of the book’s examples.