RDF* and SPARQL*

Reification can be pretty cool.

triple within a triple

After I posted Reification is a red herring (and you don’t need property graphs to assign data to individual relationships) last month, I had an amusingly difficult time explaining to my wife how that would generate so much Twitter activity. This month I wanted to make it clear that I’m not opposed to reification in and of itself, and I wanted to describe the fun I’ve been having playing with Olaf Hartig and Bryan Thompson’s RDF* and and SPARQL* extensions to these standards to make reification more elegant.

In that post, I said that in many years of using RDF I’ve never needed to use reification because, for most use cases where it was a candidate solution, I was better off using RDFS to declare classes and properties that reflected the use case domain instead of going right to the standard reification syntax (awkward in any standardized serialization) that let me create triples about triples. My soapbox ranting in that post focused on the common argument that the property graph approach of systems like Tinkerpop and Neo4j is better than RDF because achieving similar goals in RDF would require reification; as I showed, it doesn’t.

But, reification can still be very useful, especially in the world of metadata. (I am slightly jealous of the metadata librarians of the world for having the word “metadata” in their job title–it sounds even cooler in Canada: Bibliothécaire aux métadonnées.) If metadata is data about data, and more and more of the Information Science world is taking advantage of linked data technologies, then triples about triples are bound to be useful in their use of information for provenance, curation, and all kinds of scholarship about datasets.

The conclusion of my blog post mentioned how, just as I was finishing it up, I discovered Olaf Hartig and Bryan Thompson’s 2014 paper Foundations of an Alternative Approach to Reification in RDF and Blazegraph’s implementation of it. I decided to play with this a bit in Blazegraph in order to get a hands-on appreciation of what was possible, and I like it. (Olaf recently mentioned on Twitter that these capabilities are being added into Apache Jena as well, so this isn’t just a Blazegraph thing.)

As I described in Trying out Blazegraph two years ago, it’s pretty simple to download the Blazegraph jar, start it up, load RDF data, and query it. For my RDF* experiments, I started up Blazegraph and created a Blazegraph namespace with a mode of rdr and then did my first few experiments there.

I started with the examples in Olaf’s slides RDF* and SPARQL*: An Alternative Approach to Statement-Level Metadata in RDF. To make the slides visually cleaner, he left out full URIs and prefixes, so I added some to properly see the querying in action. I loaded his slide 15 data into my new Blazegraph namespace, specifying a format of Turtle-RDR. The double brackets that you see here are the RDF* extension that lets us create triples that are themselves resources that we can use as subjects and objects of other triples:

@prefix d: <http://www.learningsparql.com/ns/data/> .
<<d:Kubrick d:influencedBy d:Welles>> d:significance 0.8 ;
      d:source <https://nofilmschool.com/2013/08/films-directors-that-influenced-stanley-kubrick> .

This data tells us that the triple about Kubrick being influenced by Welles has a significance of 0.8 and a source at an article on nofilmschool.com.

I then executed the following query, based on Olaf’s from slide 16, with no problem:

PREFIX d: <http://www.learningsparql.com/ns/data/> 
SELECT ?x WHERE {
  <<?x d:influencedBy d:Welles>> d:significance ?sig .
  FILTER (?sig > 0.7)
}

In this case, the use of the double angle brackets is the SPARQL* extension that lets us do the same thing that this syntax does in RDF*. This query asks for whoever was named as being influenced by Welles in statements that have a significance greater than 0.7. The query worked just fine in Blazegraph.

SPARQL* also lets you query for the components of triples that are being treated as independent resources. From Olaf’s slide 17, this next query asks for whoever was influenced by Welles and the significance and source of any returned statements, and it worked fine with the data above:

PREFIX d: <http://www.learningsparql.com/ns/data/> 
SELECT ?x ?sig ?src WHERE {
  <<?x d:influencedBy d:Welles>> d:significance ?sig ;
  d:source ?src .
}

His slide 18 query returns the same result as that one, but takes the syntax a bit further by binding the triple pattern about someone influencing Welles to a variable and then querying for that:

PREFIX d: <http://www.learningsparql.com/ns/data/> 
SELECT ?x ?sig ?src WHERE {
  BIND(<<?x d:influencedBy d:Welles>> AS ?t)
  ?t  d:significance ?sig ;
      d:source ?src .
}

Moving on to more easy experiments, I found that all the examples on the Blazegraph page Reification Done Right worked exactly as shown there. That page also provides some nice background for ways to use RDF* and SPARQL* in Blazegraph.

Blazegraph lets you do inferencing, so I couldn’t resist mixing that with RDF* and SPARQL*. I had to create a new Blazegraph namespace that not only had a Mode of rdr but also had the “Inference” box checked upon creation, and then I loaded this data:

@prefix d:    <http://www.learningsparql.com/ns/data/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .


<<d:s1 d:p1 d:o1>> a d:Class2 .
<<d:s2 d:p2 d:o2>> a d:Class3 .


d:Class2 rdfs:subClassOf d:Class1 . 
d:Class3 rdfs:subClassOf d:Class1 . 

It creates two triples that are themselves resources, with one being an instance of Class2 and the other being an instanced of Class3. Two final triples tell us that each of those classes are subclasses of Class1. The following query asked for triples that are instances of Class1, despite the data having no explicit triples about Class1 instances, and Blazegraph did the inferencing and found both of them:

PREFIX d: <http://www.learningsparql.com/ns/data/> 
SELECT ?x ?y ?z WHERE {
   <<?x ?y ?z>> a d:Class1 . 
}

After doing this inferencing, I was thinking that OWL metadata and inferencing about such triples should open up a lot of new possibilities, but I realized that none of those possibilities are necessarily new: they’ll just be easier to implement than they would have been using the old method of reification that used four triples to represent one. Still, being easier to implement counts for plenty, and I think that metadata librarians and other people doing work to build value around existing triples now have a reasonable syntax some nice tools to explore this.