# Using SPARQL to combine Wikidata and OSM triples

Last month in GeoSPARQL queries on OSM Data in GraphDB I showed how to use SPARQL to retrieve triples about Manhattan museums from OpenStreetMap’s SPARQL endpoint. Then, after loading the triples into Ontotext’s free GraphDB triplestore, I showed how GraphDB’s support for the GeoSPARQL standard let me query for all the museums within a mile of the Museum of Modern Art. The OSM data doesn’t include pictures of the museums, but I mentioned that it does include the museum’s Wikidata URIs, so today we’ll see how to use those URIs to retrieve the images from Wikidata and connect them to the data retrieved from OSM. The result of this process includes the images you see here, each linking to the pictured museum’s website.

Before I get to that I wanted to show a nice query that Ontotext founder Atanas Kiryakov showed me after I published that last blog entry. I had used curl to send a SPARQL CONSTRUCT query to OSM’s endpoint and save the triples in a local Turtle file. Once I had that file I loaded it into GraphDB and ran the query about museums near MoMA there. Atanas’s query uses the SPARQL SERVICE keyword to do the retrieval from within GraphDB so that all the steps that I did can happen with one query:

PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
PREFIX uom:  <http://www.opengis.net/def/uom/OGC/1.0/>
PREFIX osmt: <https://wiki.openstreetmap.org/wiki/Key:>
PREFIX osmm: <https://www.openstreetmap.org/meta/>

SELECT ?museum ?museumName ?metersFromMoma where {
SERVICE <https://sophox.org/sparql> {
?moma   osmt:official_name "The Museum of Modern Art" ;
osmm:loc ?momaLoc .
?museum osmt:tourism "museum" ;
osmt:name ?museumName ;
osmm:loc ?museumLoc .     }
BIND(round(geof:distance(?museumLoc,?momaLoc, uom:metre)) AS ?metersFromMoma)
FILTER(?metersFromMoma < 1610)  # Only those less than a mile away.
FILTER(?museum != ?moma)        # Don't bother showing MoMA itself.
} ORDER BY ?metersFromMoma


His query uses no features that are specific to GraphDB, so this query would work with any SPARQL engine that supports the GeoSPARQL standard—which in this case, means supporting that geof:distance() function call. GraphDB was the first triplestore I found that had this support.

To get pictures of the retrieved museums, I created a variation on Atanas’s query that retrieved triples about the Manhattan museums and inserted them into the active local repository:

PREFIX osmt: <https://wiki.openstreetmap.org/wiki/Key:>

INSERT { ?museum ?p ?o } WHERE
{
SERVICE <https://sophox.org/sparql> {
osmt:tourism "museum";
?p ?o .
}
}


The following query then showed me that each museum’s osmt:wikidata value in that locally stored data was a Wikidata identifier such as https://www.wikidata.org/wiki/Q636942 for the International Center of Photography:

PREFIX osmt: <https://wiki.openstreetmap.org/wiki/Key:>
SELECT * WHERE {
osmt:tourism "museum";
osmt:wikidata ?wikidataID .
}


If you look at the Wikidata page for the ICP you’ll see that it includes a picture of it, and if you click on the “image” property name there you’ll see that this is property P18 in Wikidata. So, my next query took each of the Wikidata ID values of the museums and used the SERVICE keyword to send them off to Wikidata where it used them to retrieve image URLs, which it stored locally:

PREFIX osmt: <https://wiki.openstreetmap.org/wiki/Key:>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

INSERT { ?museum wdt:P18 ?imageURL} WHERE {
osmt:tourism "museum";
osmt:wikidata ?wikidataID .
SERVICE <https://query.wikidata.org/sparql> {
?wikidataID wdt:P18 ?imageURL.
}
}


(As always, I first ran the query above with the CONSTRUCT keyword instead of INSERT just to make sure that I was properly asking for what I was trying to get.)

The OSM data that I pulled included website URLs for most of the museums, so I queried the data I had aggregated from the two endpoints to list the websites and image URLs for museums within a mile of MoMA (actually, within 2 miles to give me a nicer choice of pictures to include here):

PREFIX osmt: <https://wiki.openstreetmap.org/wiki/Key:>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
PREFIX uom:  <http://www.opengis.net/def/uom/OGC/1.0/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX osmm: <https://www.openstreetmap.org/meta/>

SELECT ?website ?imageURL WHERE {
?moma   osmt:official_name "The Museum of Modern Art" ;
osmm:loc ?momaLoc .
?museum wdt:P18 ?imageURL ;
osmt:website ?website ;
osmm:loc ?museumLoc .
BIND(round(geof:distance(?museumLoc,?momaLoc, uom:metre)) AS ?metersFromMoma)
FILTER(?metersFromMoma < 3220)  # Only those less than 2 miles away.
FILTER(?museum != ?moma)   # Don't bother showing MoMA itself.
}


When displaying query results, GraphDB adds a handy “Download as” button, so I saved a tab-separated value version of that query’s results and used the ancient Linux utility sed to wrap the values in a bit of HTML:

sed -E "s/(.+)\t<(.+)>/\<a href='\1'> \
<img width='200' src='\2'\/><\/a>/" query-result.tsv > temp.html


I could then copy the bits of HTML from the resulting file to the text file I’m typing now so that the images you see can be links to the home pages.

If you’re reading this more than a few months after November of 2020 and the URLs of any of those images have changed, they’ll show up as broken links. With any application that uses data from remote sources, we have to consider various factors when making the decision whether to dynamically grab certain data when necessary or grab it once and store it locally for future use. Isn’t it nice how SPARQL and the widely-implemented open source and commercial RDF tools out there give us so many options when we make these decisions?

Have you ever pulled data from two different endpoints to answer a question that neither endpoint could answer by itself? Let me know at @bobdc.