After I wrote about Extracting RDF data models from Wikidata in my blog last month, Ettore Rizza suggested that I check out wdtaxonomy, which extracts taxonomies from Wikidata by retrieving the kinds of data that my blog entry’s sample queries retrieved, and it then displays the results as a tree. After playing with it, I’m tempted to tell everyone who read that blog entry to ignore the example queries I included, because you can learn a lot more from wdtaxonomy.
Some people complain when an RDF dataset lacks a documented data model. A great thing about RDF and SPARQL is that if you want to know what kind of modeling might have been done for a dataset, you just look, even if they’re using non-(W3C-)standard modeling structures. They’re still using triples, so you look at the triples.
I found all kinds of interesting things in the article “Getting the Most out of Wikidata: Semantic Technology Usage in Wikipedia’s Knowledge Graph”(pdf) by Stanislav Malyshev of the Wikimedia Foundation and four co-authors from the Technical University of Dresden. I wanted to highlight two particular things that I will find useful in the future and then I’ll list a few more.
Last month in Dividing and conquering SPARQL endpoint retrieval I described how you can avoid timeouts for certain kinds of SPARQL endpoint queries by first querying for the resources that you want to know about and then querying for more data about those resources a subset at a time using the VALUES keyword. (The example query retrieved data, including the latitude and longitude, about points within a specified city.) I built my demo with some shell scripts, some Perl scripts, and a bit of spit…
When I first tried SPARQL’s VALUES keyword (at which point it was pretty new to SPARQL, having only recently been added to SPARQL 1.1) I demoed it with a fairly artificial example. I later found that it solved one particular problem for me by letting me create a little lookup table. Recently, it gave me huge help in one of the most classic SPARQL development problems of all: how to retrieve so much data from an endpoint that the first attempts at that retrieval resulted in timeouts.
After I posted Reification is a red herring (and you don’t need property graphs to assign data to individual relationships) last month, I had an amusingly difficult time explaining to my wife how that would generate so much Twitter activity. This month I wanted to make it clear that I’m not opposed to reification in and of itself, and I wanted to describe the fun I’ve been having playing with Olaf Hartig and Bryan Thompson’s RDF* and and SPARQL* extensions to these…