Custom HTML form front end, SPARQL endpoint back end

Your website's users sending SPARQL queries, even if they haven't heard of SPARQL.

Negroni and SPARQL logo

In a recent Twitter exchange, Dr Joanne Paul asked “Does/can this exist? A website where I enter a title (eg. ’earl of pembroke’) and a year (eg. 1553) and it spits out who held that title in that year (in this case, William Herbert).” Michelle Watson replied “I bet you could probably write SPARQL query to Wikipedia that would come close to doing that. Not sure how you’d embed that into a webpage though.” I replied to that: “Have an HTML form that hands the entered values to a CGI script (Perl or Python or whatever) that plugs the values into a SPARQL query, sends that off to Wikipedia, and formats the result as HTML” and then “See pages 285 - 291 of my book “Learning SPARQL” for an example that uses Python and IMDB. The Python script is at http://www.learningsparql.com/1steditionexamples/ex364-cgi.txt .”

I thought I’d done a simple example on my blog outside of the book and couldn’t find it, so I’m doing another one here because it’s so easy. Instead of a Python CGI (Common Gateway Interface) script calling linkedmdb.org like I did in the book, I wrote a Perl CGI script that calls Wikidata. Instead of having the end user enter the names of two directors on a form and then listing all the actors who have been in films by both directors, like I did in the Python example, in my new one the end user enters the name of a cocktail ingredient and clicks a button. Then, a dynamic web page lists the cocktails that use that ingredient with links to each cocktail’s Wikipedia page. (The example in the book called a SPARQL server at the Linked Movie Database, which doesn’t seem to work anymore anyway.) Either way, the key is that the person entering the query criteria is simply filling out a form and they don’t need to know anything about the technology on the back end.

Before creating such a query, I had to ask: does Wikidata have the data I need to determine which drinks have which ingredients? Wikipedia infoboxes are usually the quickest way to assess whether the data you need is available in a structured form. If you look at the Wikipedia page for a Negroni, the infobox lists the ingredients in a fairly structured way, which usually means that the data is available in Wikidata with enough structure to query it. The infobox also shows that a Negroni is an IBA (International Bartenders Association) Official Cocktail, or in data modeling terms, it’s an instance of a class that we can query for. (The narrative text of the page also has an excellent origin story about how Count Camillo Negroni inspired the drink’s creation in 1919 and how Orson Welles had something clever to say about the drink 28 years later.)

The basic steps for creating a web form that calls a SPARQL endpoint:

  1. Write a SPARQL query that requests a specific example of the thing you want from the endpoint. My query asked for cocktails where “bitters” was an ingredient.

  2. Create a web page with an HTML form where the end user can enter the value or values that will customize the query.

  3. Add the SPARQL query to a CGI script that takes the values passed from the web form, plugs them into the appropriate places in the query, sends the query off to the endpoint, and then displays the result as HTML.

The results of steps 1 and 3 end up in the same CGI file, and the result of step 2 is so small and simple (526 bytes, even with a dash of CSS) that you should take a quick look at my SPARQL cocktail query HTML form and its source before I describe the CGI file. As you’ll see, when the user clicks the form’s “search” button, the form passes the entered value to the script in a q variable.

Here is the Perl CGI script:

#!/usr/bin/perl

# sample call: http://www.bobdc.com/cgi/sparqlcocktail.cgi?q=scotch

require sparql;  # Assumes that sparql.pm is in this directory; comes
# from https://github.com/swh/Perl-SPARQL-client-library

use strict;
use CGI;

# Usage of Perl-SPARQL-client-library based on test.pl included with it
my $params = CGI->new;
my $searchTerm = $params->param('q');

my $sparql = sparql->new();
my $endpoint = 'https://query.wikidata.org/sparql';

# Prefixes used in query don't need declarations
# because the endpoint has all of these predeclared. 
my $query = '
SELECT ?cocktailName ?wikipediaURL ?ingredientName WHERE {
  BIND ("SEARCHTERM" AS ?searchTerm )
  # ?cocktail instance of IBA official cocktail, 
  ?cocktail wdt:P31 wd:Q2536409 ;  
          # material used ?ingredient,
          wdt:P186 ?ingredient ;    
          rdfs:label ?cocktailName . 
  ?ingredient rdfs:label ?ingredientName . 
  FILTER (lang(?ingredientName) = "en")
  FILTER (lang(?cocktailName) = "en")
  # substring query so that "lime" finds "lime juice", "lime wedge", etc.
  FILTER(contains(lcase(?ingredientName),lcase(?searchTerm)))
  ?wikipediaURL schema:about ?cocktail . 
  FILTER(contains(str(?wikipediaURL),"/en.wikipedia.org"))
}
ORDER BY ?cocktailName 
';

# Insert the search term into the query
$query =~ s/SEARCHTERM/$searchTerm/;

# Perform the query
my $queryResult = $sparql->query($endpoint,$query);

# Output the result as HTML
print "Content-type: text/html\n\n";
print "<html><head><title>SPARQL Cocktails Results</title>\n";
print "<style type='text/css'> * { font-family: arial,helvetica}</style>\n";
print "</head><body>\n";

if (scalar(@{$queryResult}) == 0) {
    print "No drinks found with $searchTerm as an ingredient.\n";
}
else {
    print "<h2>Cocktails with $searchTerm as an ingredient</h2>\n";
    for my $row (@{$queryResult}) {
	my $wikipediaURL = $row->{'wikipediaURL'};
	my $cocktailName = $row->{'cocktailName'};
	my $ingredientName = $row->{'ingredientName'};

	# Remove delimiters and language tags. 
	$wikipediaURL =~ s/<(.+)>/$1/;
	$cocktailName =~ s/\"(.+)\"\@en/$1/;
	$ingredientName =~ s/\"(.+)\"\@en/$1/;

	print "<p><a href='$wikipediaURL'>$cocktailName</a>:";
	print " $ingredientName</p>\n";
    }
}
print "</body></html>\n";

The SPARQL query is stored in the Perl variable $query, and the script takes the q value passed from the form and replaces the string “SEARCHTERM” in the SPARQL query with that value.

The workings of the query are described by comments within it. It uses sparql.pm from the Perl-SPARQL-client-library library that Steve Harris (a.k.a. @theno23) added there six years ago. It’s nice that when Steve’s library passes the query to the endpoint, the comments cause no problems—I have seen libraries that pass SPARQL queries to endpoints without the carriage returns so that embedded comments screw up the parsing of the query. So, my comments describing how the query works are right in the query instead of here.

CGI scripts have been around since the 1990s and played an important role in the web evolving from static web pages to something more interactive and dynamic. They still work, as you can see, and make it easy to automate the use of SPARQL endpoints for people who’ve never heard of SPARQL or RDF. The layers of UI technology that have been developed since, typically as JavaScript libraries, can of course be incorporated here so that a modern responsive interface can take advantage of SPARQL endpoints on the back end such as Wikidata as well.

If you write an HTML form and CGI script that sends a SPARQL query to a SPARQL endpoint such as Wikidata, let me know. I’d love to see it!