One-click replacement of an IMDb page with the corresponding Wikipedia page

With some Python, JavaScript, and of course, SPARQL.

I recently tweeted “I find that @imdb is so crowded with ads that’s it’s easier to use Wikipedia to look up movies and actors and directors and their careers. And then there’s that Wikidata SPARQL endpoint!” Instead of just cursing the darkness, I decided to light a little SPARQL-Python-JavaScript candle, and it was remarkably easy.

Drag this bookmarklet link to your browser’s bookmarks bar: imdb2wp. Then, when you’re looking at the IMDb page of a person, movie, or television show, the link should take you right to the Wikipedia page for that entity.

The key to it all is the impressive amount of non-Wikidata identifiers that Wikidata has been adding. If you look at the IMDb page of, for example, the movie Medium Cool, in its URL of https://www.imdb.com/title/tt0064652/ you’ll see the movie’s IMDb identifier tt0064652. If you look at the movie’s Wikidata page, you’ll see that IMDb ID stored there. You won’t see the URL of its English Wikipedia page, but that’s easy enough to look up with the IMDb ID in the following SPARQL query:

SELECT ?wppage WHERE {
   ?subject wdt:P345 'tt0064652' .
   ?wppage schema:about ?subject .
   FILTER(contains(str(?wppage),'//en.wikipedia'))
}

Try it yourself. (Of course, a different filter condition can tell the query to find the corresponding Wikipedia page in language other than English.)

How does the click on the browser’s bookmark bar execute the SPARQL query with the appropriate IMDb ID? Last August in Custom HTML form front end, SPARQL endpoint back end I wrote about an application in which the end user enters the name of a cocktail ingredient, clicks the search button, and then (after a SPARQL query asks Wikipedia for drinks that have that ingredient) that user sees a web page displaying those drinks with links to their Wikipedia pages. This new script, in Python this time, is also a CGI script. It accepts a parameter, plugs that parameter into a SPARQL query, sends the query off to the Wikidata endpoint, and then uses the result to give users what they want:

#!/usr/bin/env python
# imdb2wp.cgi:go to Wikipedia page for a movie or 
# person based on their IMDB ID value. Sample call:

# http://learningsparql.com/cgi/imdb2wp.cgi?imdbID=nm0000598

import sys
# Following needed for hosted version to find SPARQLWrapper library
sys.path.append('/home/bobdc/lib/python/')
from SPARQLWrapper import SPARQLWrapper, JSON
import cgi

form = cgi.FieldStorage() 
imdbID = form.getvalue('imdbID')

sparql = SPARQLWrapper("https://query.wikidata.org/sparql")
# SPARQL query of Wikidata asks for the Wikipedia 
# page of whatever has this IMDB ID.

queryString = """
SELECT ?wppage WHERE {
?subject wdt:P345 'IMDB-ID' . 
  ?wppage schema:about ?subject .
  FILTER(contains(str(?wppage),'//en.wikipedia'))
}
"""

queryString = queryString.replace("IMDB-ID",imdbID)
sparql.setQuery(queryString)
sparql.setReturnFormat(JSON)

try:
  results = sparql.query().convert()
  requestGood = True
except Exception, e:
  results = str(e)
  requestGood = False

print "Content-type: text/html\n\n"

if requestGood == False:
  print "<h1>Problem communicating with the server</h1>"
  print "<p>" + results + "</p>"
elif (len(results["results"]["bindings"]) == 0):
  print "<p>No results found.</p>"

else:

  for result in results["results"]["bindings"]:
    wppage = result["wppage"]["value"]

print ("<meta http-equiv=\"Refresh\" content=\"0;" + wppage + " \">")

Note how short the script is even with its comments, white space, and error handling. As its header comment tells us, the script is called as a web service. If you replace the sample call to it shown there with the Medium Cool id of tt0064652, you’ll get the URL http://learningsparql.com/cgi/imdb2wp.cgi?imdbID=tt0064652, which as you can see by clicking it calls the script that sends you to that Wikipedia page. The script stores the passed value in an imdbID variable and then inserts it into a query that looks just like the one hard-coded for “Medium Cool” above. Then, the script sends the query off to the Wikidata SPARQL endpoint.

At a similar point in the Perl script that lists which cocktails have the entered ingredients, the script displays some HTML showing the results. The imdb2wp script does not render a page with results but instead sends back a meta refresh page. (I only recently learned that that was the actual name for these, and it is an excellent name.) This just sends the user to the Wikipedia page found by the SPARQL query.

How does the single click call the CGI script? The bookmarklet’s URL is actually a bit of JavaScript that pulls the IMDb ID from the displayed page’s URL, appends it to http://learningsparql.com/cgi/imdb2wp.cgi?imdbID=, and sends the browser off to the result. So to review:

  1. When viewing an IMDb page, you click the bookmarklet.
  2. JavaScript in the bookmarklet calls the CGI script with the IMDb ID.
  3. The CGI script plugs the IMDb ID into a SPARQL query and uses that query to ask Wikidata for the entity’s Wikipedia URL.
  4. The CGI script redirects you to that URL.

SPARQL is just a query language—a syntax for describing what to do with a certain kind of data. The real value is in the data that we can query with SPARQL, and Wikidata is becoming more and more valuable. I found it surprisingly easy to use some otherwise old-fashioned (and standardized!) technologies to go from complaining about IMDb to actually doing something about the annoyance.

This is just a taste of the many possibilities we’ll see from Wikidata’s storage of so many standard identifiers for real-world entities. Whatever domain you work in or want to work in, take a look at what kind of identifiers and other data Wikidata stores about that domain’s entities and you may very well be inspired to do something no one else has done in that domain by using SPARQL and scripting to mix and match that data with other data in that domain. Let me know if you do!