Is SPIN the Schematron of RDF?

Represent business rules using an implemented standard, then flagging violations in a machine-readable way.

Many complain about the potentially low quality of public semantic web data, but Fürber and Hepp are doing something about it.

Christian Fürber and Martin Hepp (the latter being the source of the increasingly popular GoodRelations ontology) have published a paper titled “Using SPARQL and SPIN for Data Quality Management on the Semantic Web” (pdf) for the 2010 Business Informations Systems conference in Berlin. TopQuadrant’s Holger Knublach designed SPIN, or the SPARQL Inferencing Notation, as a SPARQL-based way to express constraints and inferencing rules on sets of triples, and Fürber and Hepp have taken a careful, structured look at how to apply it to business data.

I knew that “data quality” was a specific discipline within IT, but I hadn’t looked at it very closely. Their paper gives a nice overview of this area before moving on to describing their work. It also describes the value that a systematic approach to data quality can bring to semantic web applications, but I don’t think anyone needs any convincing there; it’s often the first issue people bring up when they hear about the very idea of Linked Data on the web.

Or, to put it more bluntly, many complain about the potentially low quality of public semantic web data, but Fürber and Hepp are doing something about it. SPIN may have the potential to do for RDF data what Schematron has done for XML for years now: providing a technique, based entirely on an existing, well-implemented W3C standard, for describing business rules about data and then validating data against those rules. (I see that William Vambenepe had some thoughts on the comparison early last year.)

I’m looking forward to Fürber and Hepp’s future work described in their paper and to seeing how others apply it in their applications.