GeoSPARQL queries on OSM Data in GraphDB

Or, Querying geospatial data with SPARQL Part 2

OSM and SPARQL logo

Over a year ago, in Querying geospatial data with SPARQL: Part 1, I described my dream of pulling geospatial data down from Open Street Map, loading it into a local triplestore, and then querying it with queries that conformed to the GeoSPARQL standard. At the time, I tried several triplestores and data sources and never quite got there. When I tried it recently with Ontotext’s free version of GraphDB, it all turned out to be quite easy.

For some background, read that blog entry up through the paragraph beginning “The website has some preloaded data…” The rest of the entry describes my only somewhat successful attempts to do geospatial queries with Blazegraph and Parliament and how I looked forward to Apache Jena’s growing GeoSPARQL support. (A few years earlier I wrote a bit about GeoSPARQL in Visualizing DBpedia geographic data with some help from SPARQL.)


The GraphDB page that I link to above includes a chart that shows that the free version does plenty, and most importantly, doesn’t expire or limit the amount of data you load. Once I downloaded it, installed it, started it up, and had it running at http://localhost:7200, its web-based interface had a tutorial to “(1) Create a repository (2) Load a sample dataset (3) Run a SPARQL query” so I went through those steps. When you use GraphDB’s form to create a new repository, you’ll see that the “Rulesets” field has a default value of “RDFS-Plus (Optimized)” and offers 10 other choices, including several OWL choices and an “Upload custom ruleset” option. The form also includes a “Supports SHACL validation” checkbox and other options, so these were all great to see.

Before trying GraphDB with geospatial data I wanted to test out its support for inferencing and for RDF* and SPARQL*. I had a nice short example ready to go at my blog entry RDF* and SPARQL*: Reification can be pretty cool after the paragraph beginning “Blazegraph lets you do inferencing, so I couldn’t resist mixing that with RDF* and SPARQL*.” Treating two triples as resources themselves (thanks, RDF*!), the sample data in that example makes one triple an instance of d:Class2 and the other an instance of d:Class3, and then it makes both of those classes subclasses of d:Class1 without creating any instances of d:Class1. The query that follows this sample data doesn’t just ask for the instances of d:Class1, which GraphDB’s RDFS-Plus support will find in its subclasses; it asks for the subject, predicate, and object of each of these instances. (Thanks, SPARQL*!) It all worked fine in GraphDB.

Using GeoSPARQL with GraphDB

In my “Part 1” blog entry I described how a database manager’s ability to deal properly with geospatial data usually requires an add-on. GraphDB does use what they call a plugin for this, but there’s no need to download and plug it in yourself; it’s already in GraphDB and you turn it on by simply adding a triple to the repository setting geoSparql:enabled to True for some resource as described in their GeoSPARQL documentation. I got all of that page’s GeoSPARQL examples to work easily enough after loading the data that it pointed to.

In Part 1 I also wrote “Because I just love converting triples from one namespace to another so that I can use new tools and standards with them, I hoped to get some OSM triples and convert them to the right namespaces to enable geospatial queries on them using a local triplestore.” Having gotten the GeoSPARQL examples mentioned above to work in GraphDB I had a model to use when converting the OSM triples, and then I got a nice surprise: I didn’t have to convert them!

I pulled all the triples about museums in “New York” from the Open Street Map SPARQL endpoint with the following simple query:

PREFIX osmt: <>

CONSTRUCT { ?museum ?p ?o }
  ?museum osmt:addr:city "New York";
          osmt:tourism "museum";
          ?p ?o .

(Despite requesting “New York” museums, the results all seemed to be in Manhattan. An osmt:addr:city value of “Brooklyn” got other museums.)

After storing that query in the file manhattanMuseums.rq, the following curl command (split at the \ for display here) retrieved the triples and stored them in the file manhattanMuseums.ttl:

curl --data-urlencode "query@manhattanMuseums.rq" \ -H "Accept: text/turtle" > manhattanMuseums.ttl

(On October 25th when I first published this I thought that their SPARQL endpoint was down, but it turned out that my re-testing of the curl call was failing because of my own dumb typo.)

Here are two triples that it retrieved about one museum that I highly recommend:

osmnode:368061660 osmm:loc "Point(-73.9900266 40.7187837)"^^geo:wktLiteral ;
	<> "Lower East Side Tenement Museum" .

Why no need to convert the data?

Here is the cool part that meant that I didn’t have to convert any triples before loading manhattanMuseums.ttl into GraphDB and issuing standard GeoSPARQL queries on it: while SPARQL has a perfectly decent selection of data types, you can define your own, and section 8.5.1 of the GeoSPARQL specification defines the datatype for specifying geospatial coordinates. As you can see in the Tenement Museum example above, the OSM triples use that type, so I was all set.

In Part 1 I also wrote “A proper geospatial query for something like all the museums within a mile of the Museum of Modern Art is more complicated because of the effect of the earth’s curvature.” It’s not so complicated with proper GeoSPARQL support because I can call the geof:distance function, which is not supported by Open Street Map’s SPARQL endpoint but is supported by GraphDB as part of its GeoSPARQL support. I loaded manhattanMuseums.ttl into GraphDB and ran the following query:

PREFIX geof: <>
PREFIX uom:  <>
PREFIX osmt: <>
PREFIX osmm: <> 

SELECT ?museumName ?metersFromMoma
   ?moma   osmt:official_name "The Museum of Modern Art" ;
           osmm:loc ?momaLoc .
   ?museum osmt:tourism "museum" ;
           osmt:name ?museumName ;
           osmm:loc ?museumLoc . 
    # Find the distance from each museum to MoMA and save it
    BIND(round(geof:distance(?museumLoc,?momaLoc, uom:metre)) 
        AS ?metersFromMoma)
    FILTER(?metersFromMoma < 1610)  # Only those less than a mile away.
    FILTER(?museum != ?moma)        # Don't bother showing MoMA itself.
ORDER BY ?metersFromMoma

(I tried pulling address data as well, but not all museums had that, especially the ones that were close to MoMA.) With that query pasted into the file museumsNearMoma.rq, the following pulled a TSV version of the results from my locally running copy of GraphDB…

curl --header "Accept: text/tab-separated-values" --data-urlencode \
  "query@museumsNearMoma.rq" http://localhost:7200/repositories/OSMManhattanData

so that I could paste them here:

?museumName	?metersFromMoma
Paley Center for Media	88
Museum of Arts and Design	766
International Center of Photography	827
National Geographic Encounter - Ocean Odyssey	925
American Folk Art Museum	1350
Frick Collection	1399
Asia Society	1450
Mount Vernon Hotel Museum	1503

GeoSPARQL has a lot more for GIS geeks than the geof:distance function, so check out the spec for that. Also, after I wrote the first draft of this blog entry, I found out on Twitter about a new document from the Open Geospatial Consortium, the standards group responsible for GeoSPARQL: OGC Benefits of Representing Spatial Data Using Semantic and Graph Technologies. It lists nice use cases that show the benefits of semantic technologies, describes the use cases addressed by GeoSPARQL, and proposes some extensions to that specification.

There is also an excellent Linked Data/Knowledge Graph angle to my example above, especially for GLAM researchers: because the OSM data includes triples like this additional one about the Tenement Museum,

osmnode:368061660 <> wd:Q901533 .

you can connect up the geospatial data in OSM with triples from Wikidata to aggregate even more cool data about the entities in OSM. And, you can do it all in a local, free triplestore!