Form-driven SPARQL queries without scripting

Just two lines in an .htaccess file.

In a podcast of a radio show I was listening to recently, the host asserted that 80s rapper Schoolly D had scored most of director Abel Ferrara’s films. I was curious about this, so I went to IMDB’s page for Ferrara, clicked on the first film title, scrolled down, clicked “Full cast and crew”, checked the music credit, returned to Ferrara’s main page, and repeated the last few steps… until I realized that one SPARQL query could create a single list of Ferrara’s films with the film score credit next to each one.

The following query, when entered on DBpedia’s snorql form, shows that Mr. D is credited with two films, and that Joe Delia is credited with many more:

SELECT ?title ?scorer WHERE 
{   
  ?director rdfs:label "Abel Ferrara"@en .    
  ?film <http://dbpedia.org/ontology/director> ?director .    
  ?film rdfs:label ?title .   
  FILTER ( lang(?title) = "en" )   
  ?film <http://dbpedia.org/property/music> ?scorer .  
} 
ORDER BY ?scorer

(Further research showed that Delia brought in D to contribute to many of the films for which he is credited. Also, I could have done this with the Linked Movie Database SPARQL endpoint, as I’ve written about before, but I’ve been exploring DBpedia’s film data more lately.)

A great way to spread the benefits of SPARQL and semantic web data while keeping the syntax parts under the covers is to create a web form for users to fill out and to insert the entered values into a SPARQL query. I thought that a form where you enter a director’s name and then see who scored his or her films would be a nice example of this. In the IBM developerWorks article Build Wikipedia query forms with semantic technology, I described and linked to two such forms; the first listed all the actors who appeared in movies by the two directors whose names you entered in the form (for example, everyone who appeared in films by both Woody Allen and Martin Scorsese), and the other searched album and artist names for strings of text and displayed basic information about the albums it found.

Both of those forms passed the entered values to python scripts that plugged the values into SPARQL queries before sending these queries off to the appropriate SPARQL endpoints. Recently, though, while reading Tom Heath and Christian Bizer’s book Linked Data: Evolving the Web into a Global Data Space, I had a better idea. I’ve used .htaccess files to redirect an Apache HTTP server from one requested URL to another (for example, when I’ve moved a file but don’t want to break links that point to it) but I didn’t know about the regular expression support in the Apach mod_rewrite module that carries out the .htaccess instructions. It turns out that, because of this feature, I don’t even need a script to execute a SPARQL query with values from a web form.

A form that I put at http://snee.com/sparqlforms/directors/filmscores.html has a single field where you enter a director’s name. When you click the “go” button, the form’s action is http://www.snee.com/sparqlforms/directors/composers, so if you enter “John Ford” the form does an HTTP GET with the URL http://www.snee.com/sparqlforms/directors/composers?director=John+Ford.

The .htaccess file in the same directory has the following three lines (everything from “RewriteRule” to the end is one line, split up for easier viewing here):

RewriteEngine on


RewriteCond %{QUERY_STRING} ^director=(.*)$


RewriteRule ^composers.*$ http://dbpedia.org/sparql?query=
PREFIX+rdfs:+<http://www.w3.org/2000/01/rdf-schema#>+
SELECT+?title+?scorer+WHERE+{+?director+rdfs:label+"%1"@en+.+
?film+<http://dbpedia.org/ontology/director>+?director+.+
?film+rdfs:label+?title+.+FILTER+(+lang(?title)+=+"en"+)+
?film+<http://dbpedia.org/property/music>+?scorer+.++}+ORDER+BY+?scorer

Most of the third “line” is just an escaped version of the SPARQL query about who scored Abel Ferrara films. I won’t go into details about the syntax of the rest of the three lines because this tutorial explains the basics better than I could and this bit of Apache documentation is pretty comprehensive.

To summarize, RewriteRule gets two expressions as arguments: what to look for and what to replace it with when redirecting your browser or other client. Regular expression matching in the first parameter can use parentheses, and the second expression can refer to these matched expressions with variable references like $1 and $2. HTTP GET parameters like “?directory=John+Ford” are a special case, though—RewriteRule regular expressions won’t find them—which is why I have the RewriteCond line above. That matches the director value parameter, and the RewriteRule references that with %1 (as distinguished from $1, which would reference something matched in the RewriteRule). This inserts the value into the escaped version of the SPARQL query where I had “Abel Ferrara” in my original query. The query is part of a URL that executes the query on DBpedia’s endpoint, so the user who clicks “go” on the form will see the list of film titles and music credits. Try the form yourself, and make sure to use a director’s official name (for example, “Marty Scorsese” won’t get you anything).

This kind of URL revision is an important technique in Linked Data publishing, where you want to assign sensible, cool URIs to resources but may have some less cool details in how you actually serve up the resource data. For a larger, more complex application, it’s nice to know that I would only need to add two more lines to the .htaccess file for each new form/query combination in my application. This can be a very valuable tool for semantic web application development. (I couldn’t get it to work with a local copy of the Apache HTTP server or with the Url Rewrite Filter designed to allow the same thing with Tomcat, though, so I may have to go back to the python CGI scripts for local applications.)

Schoolly D poster

5 Comments

By Matthew on April 21, 2011 6:04 AM

Hi Bob,

The first time I heard a Schooly D song in a Ferrara movie (King of New York) I flipped :)

I’ve been going through your post from 2007 on querying DBpedia and see that the chalkboard query no longer works in snorql. Could you tell me why this is?

thanks for an informative blog!\

By Bob DuCharmeAuthor Profile Page{.commenter-profile} on April 21, 2011 8:36 AM

Matthew,

DBpedia has rearranged some of their vocabulary. I just fixed the queries and description in that post so that the query now works properly.

By Vasiliy Faronov on April 21, 2011 12:15 PM

And the user can’t pass a double quote into that string, right?

By Bob DuCharmeAuthor Profile Page{.commenter-profile} on April 21, 2011 12:30 PM

Vasiliy,

I didn’t try that, but it makes sense.

For more fine-grained control over things like that, a script probably would be better, but a regex guru might be able to work it right into the .htaccess code.

By amit on April 21, 2011 11:35 PM

Nice read. Please Try my webapp for querying the semantic data: http://WWW.s3space.com\