Quick and dirty linked data content negotiation

Not even that dirty.

May 9, 2011

I’ve managed to fill a key gap in the world’s supply of Linked Open Data by publishing triples that connect Mad Magazine film parody titles to the DBpedia URIs of the actual films. For example:

<http://dbpedia.org/resource/Judge_Dredd_%28film%29>
      mad:FilmParody
              [ prism:CoverDate "1995-08-00" ;
                prism:issueIdentifier
                        "338" ;
                dc:title "Judge Dreck"
              ] .


<http://dbpedia.org/resource/2001:_A_Space_Odyssey_%28film%29>
      mad:FilmParody
              [ prism:CoverDate "1969-03-00" ;
                prism:issueIdentifier "125" ;
                dc:title "201 Minutes of a Space Idiocy"
              ] .

(To prepare the data, I scraped a Wikipedia list, tested the URIs, then hand-corrected a few.) To really make this serious RESTful linked open data, I wanted to make it available as both RDF/XML and Turtle depending on the Accept value in the header of the HTTP request. All this took was a few lines in the .htaccess file (which I’ve been learning more about lately) in the directory storing the RDF/XML and Turtle versions of the data.

For example, either of the following two commands retrieves the Turtle version:

wget --header="Accept: text/turtle" http://www.rdfdata.org/dat/MadFilmParodies/
curl --header "Accept: text/turtle" -L http://www.rdfdata.org/dat/MadFilmParodies/

Substituting application/rdf+xml for text/turtle in either command gets you the RDF/XML version, and omitting the --header parameter altogether gets you an HTML version.

Here’s the complete .htaccess file:

RewriteEngine on


RewriteCond %{HTTP_ACCEPT} ^.*text/turtle.*
RewriteRule ^index.html$ http://www.rdfdata.org/dat/MadFilmParodies/MadFilmParodies.ttl [L]
# no luck:
#RewriteRule ^index.html$ http://www.rdfdata.org/dat/MadFilmParodies/MadFilmParodies.ttl [R=303,L]


RewriteCond %{HTTP_ACCEPT} ^.*application/rdf\+xml.*
RewriteRule ^index.html$ http://www.rdfdata.org/dat/MadFilmParodies/MadFilmParodies.rdf [L]


RewriteRule ^index.html$ http://en.wikipedia.org/wiki/List_of_Mad's_movie_spoofs

The Apache web server where I have this hosted is configured to look for an index.html file in a directory if the requested URL doesn’t mention a specific filename, so the three rules here each modify that “request” to look for something else, depending on what the RewriteCond line finds in the HTTP_ACCEPT value. If it finds “text/turtle”, it sends the Turtle version of my data, and the L directive tells the Apache mod_rewrite module that is processing these instructions not to look at any more of them.

The next rule performs the corresponding HTTP_ACCEPT check and file delivery for an RDF/XML request, and the default behavior if neither of those happen is to deliver an HTML version of the data. (I took the lazy way out and just redirected to the appropriate Wikipedia page instead of creating a new HTML file.) As you can see from the two commented-out lines, I had the impression that adding R=303 in the brackets with the L would send an HTTP return code of 303 back to the requester, overriding the default code of 302, but never got that to work. If anyone has any any suggestions about how to fix this, or whether 303 is even the most appropriate return code, please let me know.

From what I’ve read on how the syntax of these instructions work, I shouldn’t have needed the full URLs for the Turtle and RDF/XML versions of the Mad Film Parody data, because they were in the same directory as the .htaccess file, but that was the only way I could get this to work.

Now that I know how to do this, I can do it again for other resources pretty quickly. It took me about five minutes to do it for the little http://www.snee.com/ns/madMag/MadFilmParody ontology that the data points to. I consider this solution quick and a bit dirty because it requires the maintenance of two copies of the data, but the XML guy in me knows that it would be wrong to perform parallel edits on the two copies, and that I should instead pick one as a master, edit it when necessary, and generate the other from it. If I had to do this on a larger scale, I learned from Brian Sletten at last year’s semtech that I should look into NetKernel, but it was a good exercise to do it this way to learn what was really going on.

I’m going to try to get into the habit of doing this for data and ontologies that I create, so I’d appreciate any suggestions about tweaking details before any suboptimal aspects of this become habits.

2 Comments

By Ryan on May 9, 2011 3:40 PM

To help maintain a master copy of your RDF and transform into other formats through the command line, I’d recommend the rdfcat utility distributed with Jena: http://jena.sourceforge.net/javadoc/jena/rdfcat.html . Personally, I’d make Turtle my master format language due to readability and file size, and transform that into XML after editing. Something like this:

java jena.rdfcat MadFilmParodies.ttl -in TTL > parody.rdf

By Bob DuCharme {.commenter-profile} on May 9, 2011 4:14 PM

Thanks Ryan! I’ve used jena.rdfcopy, but never noticed rdfcat before.

blog

home

blog

categories

writing

music

about

Recent Posts

Visualizing RDF

Using regular expressions to manipulate data in a SPARQL query

Appreciating the SPARQL property path slash character more

Triples about existing triples

Querying for labels

Human-readable names in RDF

My brief tenor banjo career

Nicer date and time handling in SPARQL 1.2

Passing your own data to use in Wikidata visualizations

Entity recognition from within a SPARQL query