Instead of writing SPARQL queries for Wikipedia--query for them!

Queries as data to help you get at more data.

Let’s say, hypothetically, that you want to execute a SPARQL query that lists all of Wikimedia’s portraits with fruit. Wikimedia does have a category for this, so what would be the quickest way to come up with the query?

If you click the Wikidata item link on this category’s page, you’ll see all the data about it that you can retrieve with a SPARQL query to the Wikimedia endpoint, as I’ve described in my last few blog entries. The cool thing for this particular resource is that one property is called Wikidata SPARQL query equivalent, and its value is the query that will retrieve a list of the portraits with fruit. In other words, Wikidata has a triple that looks like this:

  • subject: wd:Q29789760 (the Wikidata category “portraits with fruit”)

  • predicate: p:P3921 (“Wikidata SPARQL query equivalent”)

  • object: SELECT DISTINCT ?item WHERE { ?item wdt:P31/wdt:P279\* wd:Q838948 . ?item wdt:P136/wdt:P31?/wdt:P279\* wd:Q134307 . ?item wdt:P180/wdt:P31?/wdt:P279\* wd:Q3314483 . }

Paste that object value into the Wikidata query service, and you can run it to get a list of the portraits.

That may seem like a lot of trouble to get this list, but that’s not really the point. This query gives you a head start in developing more sophisticated queries on the topic.

When I wondered how many Wikimedia resources used this predicate, I found that the ones using it were easier to understand if they also had an rdfs:label value. So, I entered this query to count the subjects that had both:

SELECT (count(*) as ?count) WHERE { 
  ?s wdt:P3921 ?o ;
     rdfs:label ?label .
  }

Two weeks ago there were 316, but as I write this there are almost a hundred more, so the number is growing at a good pace.

The idea of a SPARQL query as an object in an RDF triple is not new. It’s part of the Shapes Constraint Language (SHACL), as demonstrated by one of its test cases. SHACL is a W3C specification that lets you specify constraints on data–for example, to validate that certain properties are required for instances of a particular class and that others are optional. (This is a lot more difficult using OWL.) I’ll be looking at SHACL more closely in the coming months; meanwhile, I’ll be keeping an eye on the SPARQL queries being added to Wikidata where we can retrieve them with our own SPARQL queries.