My command line OWL processor

With most of the credit going to to Ivan Herman.

Charlie Christian

I recently asked on Twitter about the availability of command line OWL processors. I got some leads, but most would have required a little coding or integration work on my part. I decided that a small project that I did with the OWL-RL Python library a few years ago gave me a head start on just creating my own OWL command line processor in Python. It was pretty easy.

My goal was something that would read RDF files, do inferencing, and output any triples created by the inferencing. The heavy lifting is done by the OWL-RL library for the classic RDFLib Python library. The OWL-RL library was originally written by Ivan Herman and is now maintained by Ashley Sommer and Nicholas Car. (As you would guess from its name, this library implements the rule-based OWL profile known as OWL RL.) My script is short and simple enough that instead of putting it on github I’ve just pasted it below.

Testing it

In my recent blog posting You probably don’t need OWL, I wrote about an inferencing use case:

For example, in Trying Out Blazegraph (which only supports bits of OWL), I showed a dataset that had triples about various chairs and desks being located in various rooms, as well as triples about which rooms were in which buildings, but nothing about which furniture was in which buildings (or for that matter, what counted as furniture). I then used the RDFS rdfs:subClassOf property to declare that dm:Chair and dm:Desk were subclasses of dm:Furniture, and I also declared that my dm:locatedIn property was an owl:TransitiveProperty. With these additional modeling triples, a SPARQL query to an OWL processor that understood rdfs:subClassOf and owl:TransitiveProperty could then list which furniture was in which building. This little bit of OWL actually added some semantics to the model as well, because it tells us—and OWL processors—a little about the “meaning” of dm:locatedIn.

To try this example with my new command line processor, I didn’t even need to use SPARQL. I just stored the “Trying Out Blazegraph” sample data in a file called chairsAndTables.ttl and fed it to my script like this:

owl-rl-inferencing.py chairsAndTables.ttl

Here are the first three triples of the output:

<http://learningsparql.com/ns/data#chair15> a ns2:Furniture, ns1:Thing ;
    ns2:locatedIn <http://learningsparql.com/ns/data#building100> .

It inferred that chair 15 is an instance of the Furniture class (and of the Thing class) and that it’s in building 100. It also output triples about what buildings all the other chairs and tables were in, so I counted this as a successful test.

For another test, I was especially happy to see the script do the inferencing I expected from one particular example in my book Learning SPARQL. Example dataset ex424.ttl lists the name, instrument played, and birth state of six musicians without saying that any is a member of any class. Here are two examples:

d:m2 rdfs:label "Charlie Christian" ;
     dm:plays d:Guitar ;
     dm:stateOfBirth d:TX .

d:m4 rdfs:label "Kim Gordon" ;
     dm:plays d:Bass ;
     dm:stateOfBirth d:NY .

It also includes the following restriction class definitions, which specify conditions that qualify an instance as a member of the classes Guitarist, Texan, and TexasGuitarPlayer:

dm:Guitarist
   owl:equivalentClass
           [ rdf:type owl:Restriction ;
             owl:hasValue d:Guitar ;
             owl:onProperty dm:plays
           ] .

dm:Texan
   owl:equivalentClass
           [ rdf:type owl:Restriction ;
             owl:hasValue d:TX ;
             owl:onProperty dm:stateOfBirth
           ] .

dm:TexasGuitarPlayer
   owl:equivalentClass
        [ rdf:type owl:Class ;
          owl:intersectionOf (dm:Texan dm:Guitarist)
        ] .

To test my script’s ability to read different serializations, I split up ex424.ttl into ex424a.ttl, ex424b.nt, and ex424c.rdfbefore feeding them to the script like this:

owl-rl-inferencing.py ex424a.ttl ex424b.nt ex424c.rdf 

The output included the following triples, so we know that it inferred that Charlie Christian was an instance of all three classes:

<http://learningsparql.com/ns/data#m2> a
        <http://learningsparql.com/ns/demo#Guitarist>,
        <http://learningsparql.com/ns/demo#Texan>,
        <http://learningsparql.com/ns/demo#TexasGuitarPlayer> .

It did not infer that resource m4, New York bassist Kim Gordon, was in either class. It did infer that Texas piano player Red Garland was a Texan, but not a Guitarist or a TexasGuitarPlayer, and it inferred that native Californian Bonnie Raitt was a Guitarist but not a member of the other two classes.

Combining this with other tools

The inferred triples may need some management after they’re materialized. If chair 15 gets moved from room 101 in building 100 to building 201 in building 200, we don’t want that inferred triple about it being in building 100 hanging out any more. Named graphs can help here, as I described in Living in a materialized world: Managing inferenced triples with named graphs. That shows how RDFLib lets you pipeline a series of queries and updates, letting you combine simple and complex operations into sophisticated applications. The ability to do OWL inferencing can contribute a lot to these pipelines.

Without taking advantage of RDFLib’s pipelining ability at the Python code level, you can do some pipelining right from your operating system command line, sending the output of my owl-rl-inferencing.py script to an Apache Jena tool such as riot.

Either way, I hope the script is useful to someone. Let me know!

The code

#!/usr/bin/env python3

# owl-rl-inferencing.py: read RDF files provided as command line
# arguments, do OWL RL inferencing, and output any new triples
# resulting from that.

import sys
import rdflib
import owlrl

if len(sys.argv) <  2:  # print directions
    print("Read RDF files, perform inferencing, and output the new triples.")
    print ("Enter one or more .ttl, .nt, and .rdf filenames as arguments.")
    sys.exit()

inputGraph = rdflib.Graph()
graphToExpand = rdflib.Graph()

# Read the files. arg 0 is the script name, so don't parse that as RDF.
for filename in sys.argv[1:]:   
    if filename.endswith(".ttl"):
       inputGraph.parse(filename, format="turtle")
    elif filename.endswith(".nt"):       
       inputGraph.parse(filename, format="nt")
    elif filename.endswith(".rdf"):       
       inputGraph.parse(filename, format="xml")
    else:
        print("# Filename " + filename + " doesn't end with .ttl, .nt, or .rdf.")

# Copy the input graph so that we can diff to identify new triples later.
for s, p, o in inputGraph:
    graphToExpand.add((s,p,o))

# Do the inferencing. See
# https://owl-rl.readthedocs.io/en/latest/stubs/owlrl.DeductiveClosure.html#owlrl.DeductiveClosure
# for other owlrl.* choices.
owlrl.DeductiveClosure(owlrl.OWLRL_Semantics).expand(graphToExpand)

newTriples = graphToExpand - inputGraph  # How cool is that? 

# Output Turtle comments reporting on graph sizes
print(f"# inputGraph: {len(inputGraph)} triples")
print(f"# graphToExpand: {len(graphToExpand)} triples")
print(f"# newTriples: {len(newTriples)} triples")

# Output the new triples (decode() is to omit "b'' " in output)
print(newTriples.serialize(format='turtle').decode())

Comments? Reply to my tweet announcing this blog entry.