What is RDFS?
And how much can a simple schema do for you?
RDFS, or RDF Schema, is a W3C standard specialized vocabulary for describing RDF vocabularies and data models. Before I discuss it further, though, I’d like to explain why the use of standardized, specialized vocabularies (whether RDFS itself or a vocabulary that someone uses RDFS to describe) can be useful beyond the advantages of sharing a vocabulary with others for easier interoperability.
Last month, in What is RDF?, my example dataset included triples whose predicates came from the W3C standard vCard business card ontology. It also included triples from a namespace that I had created myself with my own domain name. Certain kinds of RDF applications go through data and, when they find predicates that use a specialized vocabulary designed for such applications, they execute special tasks designed around that vocabulary. For example, GeoSPARQL applications that find predicates from the
http://www.opengis.net/def/function/geosparql/ namespace can perform geospatial math that answers questions such as “what museums are within a mile of New York’s Museum of Modern Art?”, as I described in GeoSPARQL queries on OSM Data in GraphDB.
The use of RDF does not require any schemas. However, the commercial and open source tools that can understand the RDFS vocabulary (by which I mean the RDFS vocabulary itself, not necessarily the ones you define with it) make it easier for applications to build user interfaces around RDF-based applications, to integrate data from disparate datasets, and more. Before we get there, though, let’s look at an example of an RDF schema and some data that uses it.
The following RDFS schema uses the Turtle syntax to describe a few classes and properties.
# Employee schema version 1 # Pound sign lets you add comments to Turtle. @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix vcard: <http://www.w3.org/2006/vcard/ns#> . @prefix emp: <http://www.snee.com/schema/employees/> . emp:Person rdf:type rdfs:Class . emp:Employee a rdfs:Class . vcard:given-name a rdf:Property . vcard:family-name a rdf:Property . emp:hireDate a rdf:Property . emp:reportsTo a rdf:Property .
The first thing to note is that the schema is itself triples, using Turtle RDF to describe a few RDF structures. This means that you can use SPARQL and other RDF tools to work with the schema itself and with collections of schemas.
The second thing to note is how simple a schema can be—in this case, just six triples saying “Here are some classes and properties to potentially use”.
rdf:type predicate means “is an instance of the following class”, so the first triple above says that
emp:Person is itself a class. (Below we’ll see how to create instances of
emp:Person.) This schema’s next triple says that
emp:Employee is also a class. Instead of the
rdf:type predicate, that line uses the shortcut " a ". This means the same thing, but with a syntax that brings the triple closer to the English expression “
emp:Employee is a class”.
The remaining four triples in that example list some available properties. I copied two from the vCard vocabulary and made up two myself.
Using the schema
The following instance data uses the classes and properties declared above:
@prefix vcard: <http://www.w3.org/2006/vcard/ns#> . @prefix emp: <http://www.snee.com/schema/employees/> . @prefix ex: <http://www.snee.com/example/> . ex:id1 a emp:Person ; vcard:given-name "Francis" ; vcard:family-name "Jones" . ex:id2 a emp:Employee ; vcard:given-name "Heidi" ; vcard:family-name "Smith" ; emp:hireDate "2015-01-13" . ex:id3 a emp:Employee ; vcard:given-name "Jane" ; vcard:family-name "Berger" ; emp:reportsTo ex:id2 .
These triples use another bit of Turtle syntax that I didn’t cover last month: a semicolon means “the next triple has the same subject as the last one”. For example, the first three lines after the prefix declarations in this sample data say that resource
sn:id1 is an instance of the class Person, has a given name of Francis, and a family name of Jones.
The schema above doesn’t say much, but it’s already at least as useful as a list of the columns in a relational table. Someone who has this schema and is working with this data knows what property names to use if they want query the data, add to it, or delete from it. They also know what the potential classes are and can query for instances of those classes. All of these abilities are a big help if multiple people are going to create interoperable data and applications.
Adding to the schema
The next version of the same schema goes a little further by providing more information about the classes and properties:
# Employee schema version 2 @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix vcard: <http://www.w3.org/2006/vcard/ns#> . @prefix emp: <http://www.snee.com/schema/employees/> . emp:Person rdf:type rdfs:Class ; rdfs:label "person" . emp:Employee a rdfs:Class ; rdfs:label "employee" ; rdfs:comment "A full-time, non-contractor employee." . vcard:given-name rdf:type rdf:Property ; rdfs:label "given name". vcard:family-name rdf:type rdf:Property ; rdfs:label "family name" ; rdfs:label "apellido"@es . emp:hireDate a rdf:Property ; rdfs:label "hire date" ; rdfs:comment "The first day an employee was on the payroll." . emp:reportsTo a rdf:Property ; rdfs:label "reports to" .
This version includes
rdfs:label properties. The former function as documentation for the things they’re describing. They should provide clarity as to exactly what the described resource means, like the
rdfs:comment value for the
emp:Employee resource: “A full-time, non-contractor employee.”
rdfs:label property provides a human-readable name for the resource being described. This is especially helpful for reports and applications that use this data. For example, if your application will display a form where people can edit data about employees, it would be difficult for these end users to read the form if it labeled its fields with actual property names such as “vcard:given-name” and “emp:hireDate”. On the other hand, you shouldn’t hard-code more readable form field names like “hire date” and “family name” in your application code, either.
For some real model-driven development you want to set it up so that as your model (as encoded by the schema) evolves the application automatically adapts to this evolution wherever possible. Providing display names as part of the model helps move your application toward this goal. An application that uses the revised version of my sample schema can use
rdfs:label values such as “family name” and “given name” to provide much more readable form field labels.
RDF (and hence RDFS) also let you add language tags to literal values. If you add multiple
rdfs:label values to an RDF resource and you tag each of these values according to its language, then the model-driven development described above can extend to the generation of forms in different languages for different users. In the second version of my schema the resource
vcard:family-name has labels in both English and Spanish. (A future version of the schema should have Spanish labels for the other classes and properties as well.) You can even include language codes for country-specific versions of terms so that a given form could be displayed in American English, British English, Castilian Spanish, Mexican Spanish, and more, all based on data in the schema.
Remember that while I’m using
rdfs:comment values in an RDFS schema here, you can also use them in any RDF you like. For example:
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix vcard: <http://www.w3.org/2006/vcard/ns#> . @prefix emp: <http://www.snee.com/schema/employees/> . @prefix ex: <http://www.snee.com/example/> . ex:id3 a emp:Employee ; vcard:given-name "Jane" ; vcard:family-name "Berger" ; rdfs:label "Jane Berger" ; rdfs:comment """Jane has taken the sales department from being only her and an assistant to the ten-person team we have today.""" .
rdfs:comment value here is shown as a long literal, which encloses the values in triple quotation marks so that the value can include carriage returns.) Similarly, you can add language tags to any RDF literal values you want—not just RDFS schemas.
In my next blog entry I’ll describe some fancier modeling that you can do with RDFS and how it can help applications such as data integration and even a mobile application. I’ll also mention (as I have before) how, in the debate over schema-driven software development versus schemaless development, the use of partial schemas can give you the best of both worlds. (Last month I promised a few of those things for this blog entry, but for this entry I wanted to emphasize the value of RDFS’s most basic constructs.)
Meanwhile, take a look at the RDFS schema for schema.org. From the Vocabulary Definition Files section of the page Schema.org for Developers you can pick which variation you want, in which serialization; I would pick the Turtle serialization to see how the schema demonstrates what I’ve been describing here.
You should recognize a lot of the Turtle version of the schema.org schema, because it’s mostly declarations of classes and properties with
rdfs:label values and descriptive
rdfs:comment values. Schema.org provides an excellent role model for RDFS development—all without any OWL! Fifteen years ago I had a difficult time finding an example of RDFS being used without any OWL mixed in, and I think Schema.org has been a real inspiration since then.
From now on, when you see a given set of RDF terms being used, ask “where can I find a schema documenting it?” And, if you find a schema (or OWL ontology) describing a model, ask “where can I see sample data that follows this schema? (Schema.org sample data tends to be in JSON-LD, but you can convert it to Turtle easily enough.)
Comments? Reply to my tweet announcing this blog entry.
Share this post