Transforming data with inferencing and (partial!) schemas

An excellent compromise between schemas and "schemaless" development.

I originally planned to title this “Partial schemas!” but as I assembled the example I realized that in addition to demonstrating the value of partial, incrementally-built schemas, the steps shown below also show how inferencing with schemas can implement transformations that are very useful in data integration. In the right situations this can be even better than SPARQL, because instead of using code—whether procedural or declarative—the transformation is driven by the data model…

Note: I wrote this blog entry to accompany the IBM Data Magazine piece mentioned in the first paragraph, so for people following the link from there this goes into a little more detail on what RDF, triples, and SPARQL are than I normally would on this blog. I hope that readers already familiar with these standards will find the parts about doing the inferencing on a Hadoop cluster interesting.

Simple federated queries with RDF

A few more triples to identify some relationships, and you're all set.

Once, at an XML Summer School session, I was giving a talk about semantic web technology to a group that included several presenters from other sessions. This included Henry Thompson, who I’ve known since the SGML days. He was still a bit skeptical about RDF, and said that RDF was in the same situation as XML—that if he and I stored similar information using different vocabularies, we’d still have to convert his to use the same vocabulary as mine or vice versa before we could use our…

What's wrong with undeclared classes and properties?

It's not like the RDF spec requires them.

OK, it’s a rhetorical question. I know the answer: we can attach metadata to class and property declarations, so when we know that a given instance is a member of a particular class and has certain properties, if those are declared, we know more about the instance and can do more with it, not least of all aggregate it more easily with other data that uses the same or related classes and properties.

RDFS: The primary document

Shorter and more interesting than I remember.

About two years ago I wondered if RDF Schema had become merely a layer of OWL or if anyone used RDFS by itself without OWL. My theory was that because tools such as TopBraidComposer, Protege, and SWOOP that let you design RDFS vocabularies also let you assign OWL properties to your classes, people used those because they were there, and we ended up with few pure RDFS vocabularies.