The HTML interface to your SPARQL endpoint is not your SPARQL endpoint

Remember what the 'P' in 'SPARQL' stands for.

If you have interesting data, we want to use it in application development!

Something that happens to me now and then: I’ll hear that an organization with a lot of interesting data (science, music, whatever) makes the data available on a SPARQL endpoint. I send my browser to the URL listed as the SPARQL endpoint and I see a web form. I enter a simple query on the web form to retrieve a few random triples, click the form’s button, and the results of my query appear. Then I enter fancier queries to explore the endpoint’s data.

Then, if there is a clear indication of an endpoint URL that is different from their form’s URL, I append /query= and an escaped version of a simple query to it so that I can send the query to the endpoint with curl. If I see no clear indication of an endpoint URL that is different from this form’s URL, I’ll look around the website a bit for it, and if I still have no luck I’ll try using the form’s URL and several variations on it. (Below are some hints on these variations.)

Sometimes I just can’t find a working endpoint URL. There are sites out there advertising a SPARQL endpoint where the only way to send a query to the endpoint is via the HTML form interface. I won’t name specific sites here, but it’s definitely a pattern I’ve noticed.

“SPARQL” stands for “SPARQL Protocol and RDF Query Language”. The SPARQL 1.1 Protocol specification tells us “This document specifies the SPARQL Protocol; it describes a means for conveying SPARQL queries and updates to a SPARQL processing service and returning the results via HTTP to the entity that requested them.” It also tells us that a SPARQL Protocol service is “[a]n HTTP server that services HTTP requests and sends back HTTP responses for SPARQL Protocol operations. The URI at which a SPARQL Protocol service listens for requests is generally known as a SPARQL endpoint”.

An “endpoint” that doesn’t support this protocol is not a SPARQL endpoint. Curl provides many ways to send a query via HTTP and then process the results—my mention of it above links to something I wrote with several examples—and it’s a great way to test a proper endpoint.

It’s not about curl, though; curl is just a great way to explore a service’s HTTP support. Any modern programming language supports HTTP, which means that you should be able to write a program in any of these languages that sends a request to a SPARQL endpoint and then processes the result without needing any special SPARQL or RDF library. (Of course, there are many such libraries to make this processing even easier.) The curl utility just provides a convenient way to do quick and dirty tests of a SPARQL endpoint from the command line. The ability to do this from the command line, and from within a programming language that provides HTTP support, means that you can automate the execution of these queries and then mix and match the results with other processing to create cool applications. If the only way to issue SPARQL queries against your data is to enter a query on a web form and then click a button, then I can’t use your data in this kind of application development. If you have interesting data, we want to use it in application development!

Finding the endpoint

Ideally, the announcement of the endpoint tells you both the URL for endpoint where you send HTTP requests and the URL for a web form front end to that endpoint. For example, DBpedia’s endpoint is at http://dbpedia.org/sparql and the web form interface is at http://dbpedia.org/snorql/, where it uses a UI tool called “snorql.” Note that the snorql form says “SPARQL Explorer for http://dbpedia.org/sparql" right at the top. That’s the kind of clarity about the relationship between the form and the endpoint that I want to see more of out there. The yago endpoint form also does this nicely.

Some places use the same URL for both the endpoint and the web interface to the endpoint, such as the European Bioinformatics Institute’s endpoint at https://www.ebi.ac.uk/rdf/services/sparql, the AGROVOC Thesaurus endpoint at http://agrovoc.uniroma2.it/sparql, and JazzCats one at http://cdhr-linkeddata.anu.edu.au/jazzcats-sparql/sparql. Using the same URL doesn’t mean that the HTML interface to the SPARQL endpoint is the same as the endpoint itself; their HTTP servers have some step noting whether a query parameter was passed with with the URL, and if not, they deliver the HTML page with the web form if it does.

You’ll notice how these endpoint URLs all end in /sparql. Not all SPARQL endpoints do, but it’s a nice convention. If a SPARQL endpoint web form is at http://www.example.com and I see no clear indication of an endpoint URL, I’ll try http://www.example.com/sparql as an endpoint by appending a query parameter with a URL-escaped version of a very simple query such as “SELECT * WHERE { ?s ?p ?o } LIMIT 5”. With curl, I can then test it with this:

curl http://example.com/sparql?query=SELECT%20*%20WHERE%20%7B%3Fs%20%3Fp%20%3Fo%7D%20LIMIT%205

If that doesn’t work (for example, if the curl request gets you nothing or the HTML of an error message page) and the URL begins with “http://”, try adding an “s” after the “p”. Once you do get a SPARQL result set from and endpoint, it’s typically XML of the query results and you can start exploring ways to get other formats such as JSON or TSV. (Again, see my curling SPARQL post for a quick tour of some possibilities.)

You can also email the people running the site and say “Hey! Great data! I enjoyed entering queries on your form! Does your site have a SPARQL endpoint that supports the SPARQL protocol so that I can get the data with curl and other HTTP tools instead of just using a browser to see rendered HTML of the results?” It’s one of the reasons that I’m writing this blog entry—so I can just point to this long-winded explanation of the difference instead of trying to do a short summary in another email to one of those sites.