You probably don't need OWL

And if you do there's a simple way to prove it.

During the course of my recent blog posts What is RDF?, What is RDFS?, What else can I do with RDFS?, and Taxonomy management with SKOS, some readers wondered if I would do a “What is OWL?” followup. I recommended to one inquirer that he read pages 39-41 and 263 - 269 of Learning SPARQL; I think that provides a pretty good introduction to OWL’s history and how to do some of the set-based logic that was an important part of its original intent.

A recent blog entry by Irene Polikoff, a founder of my former employer TopQuadrant, has also inspired a lot of conversation about when people should or shouldn’t use OWL. Her entry’s title is pretty categorical: Why I Don’t Use OWL Anymore. I think that bits of OWL can be more useful than she does, but still less useful than many people do. I’ll get to some examples below.

Data modeling? Use RDFS

At its simplest level, data modeling is the identification and enumeration of the pieces of information that you want to keep track of and the relationships between them. A standards-based, machine-readable version of this enumeration is very valuable to application development. As I wrote in What is RDFS? and What else can I do with RDFS?, RDFS can do that pretty well. It does an especially good job for schema.org, one of the great success stories of RDF-based technology, as I described in the first of those two pieces. You can go beyond RDFS to add information about your data’s structures and potential relationships in even more detail, but as we’ll see, machine-readable descriptions of this information won’t do you much good unless you have tools that will read these descriptions and use them to contribute value to your applications.

Defining constraints on that data model? Use SHACL

OWL can go beyond RDFS to describe additional details about your classes and properties, but it can only rarely describe what counts as a valid instance of a class and what doesn’t. This has been a fundamental need of data processing for as long as people have been using data on computers: developers who write applications that use data don’t want to write lots of code to make sure that the data they read is what they’re really expecting. They want to assume that the processes that created that data already did this validation. SQL’s CREATE TABLE statements let you specify data types of and dependencies between table columns, not to mention which are required and which are optional; DTDs and later forms of schema do the same for XML.

RDF never really had this until the W3C standard SHACL, as I described in Validating RDF data with SHACL. Irene’s followup to her blog entry mentioned above is titled Why I Use SHACL For Defining Ontology Models, and it explains many of the advantages that SHACL brings. (She does write “I no longer used RDFS/OWL (besides declaring classes and subclasses)”, so she hasn’t completely replaced her usage of RDFS.)

Controlled vocabulary? Use SKOS

Last month in Taxonomy management with SKOS I described how taxonomies and thesauri are controlled vocabularies that typically let you store metadata about the vocabulary terms, including their relationships to each other. You could picture a taxonomy or thesaurus as a potentially large collection of terms arranged in a tree in which lower levels of the tree describe subsets of the higher levels. If we want to represent this all in RDF, should we do it as OWL classes? I say: no. This is not a nail for that hammer.

First of all, the lower levels of a taxonomy tree do not represent subsets of the higher levels. The tree’s nodes represent terms, not sets of things, and lower levels of the tree show more specific terms: for example, “collie” and “bulldog” as more specific versions of “dog” and “dog” as a more specific version of “mammal”. Heather Hedden, author of the leading introduction to taxonomy development, summed it up nicely in her blog post Differing Definitions of Ontologies: “ontology structures are meant to model data, not to organize taxonomy concepts that could be either generic (common nouns) named entities (proper nouns)”.

In a taxonomy, “Person broader than Employee” means that a book or other form of media about employees is also a work about persons. In an ontology, “Employee is a subclass of Person” lets you distinguish between properties that apply to all persons (family name, given name) and properties that apply to employees but not to persons (hire date, salary).

SKOS is itself an OWL ontology that defines a data model for storing controlled vocabularies and their metadata. It has commercial and open source support among popular vocabulary management tools. (Pinterest developed their own ontology for taxonomy management, but it draws on SKOS.) SKOS is a W3C standard that is specialized for this particular job. SKOS vocabularies and OWL ontologies can use each other as input; a straightforward SPARQL query can often create one from the other, but keep their different purposes in mind. The traction that SKOS-based tools have achieved over the years is a powerful argument to use this standard for vocabulary management.

But if you really need OWL…

If you really need OWL, prove it! Do something with your data and an OWL processor that would have been noticeably more difficult without that processor. This will demonstrate what value OWL brings to your data.

For example, in Trying Out Blazegraph (which only supports bits of OWL), I showed a dataset that had triples about various chairs and desks being located in various rooms, as well as triples about which rooms were in which buildings, but nothing about which furniture was in which buildings (or for that matter, what counted as furniture). I then used the RDFS rdfs:subClassOf property to declare that dm:Chair and dm:Desk were subclasses of dm:Furniture, and I also declared that my dm:locatedIn property was an owl:TransitiveProperty. With these additional modeling triples, a SPARQL query to an OWL processor that understood rdfs:subClassOf and owl:TransitiveProperty could then list which furniture was in which building. This little bit of OWL actually added some semantics to the model as well, because it tells us—and OWL processors—a little about the “meaning” of dm:locatedIn.

That was pretty easy. I think it’s a good general rule that if you want to demonstrate the value of a certain technology, show something that you can do with it that would have been a lot more trouble, if not impossible, without it. A query about data that is relevant to many different businesses, such as employee or facility data, is a great way to do this. (I always thought that Protégé’s famed pizza ontology was a little too cutesy of a demonstration domain—of course everyone likes pizza, but why not use a domain where there is an actual chance that people would use an ontology to manage the relevant data?)

The most visible pushback that I saw to Irene’s blog posts about not using OWL was Why We Use OWL Every Day At Triply from the Amsterdam-based company. Their explanations of OWL’s value focused on its role as human-readable documentation of modeling intentions, which is certainly valuable, but they did not point to any usage of OWL as machine-readable modeling instructions when I asked.

I am not done playing with OWL, and I still dream of making the following pin and wearing it to a conference where at least some of the attendees will get the joke:

It's an owl:Thing

Comments? Reply to my tweet announcing this blog entry.