Exploring JSON-LD

And of course, querying it with SPARQL.

I paid little attention to JSON-LD until recently. I just thought of it as another RDF serialization format that, because it’s valid JSON, had more appeal to people normally uninterested in RDF. Dan Brickley’s December tweet that “JSON-LD is much more widely used than Turtle” inspired me to look a little harder at the JSON-LD ecosystem, and I found a lot of great things. To summarize: the amount of JSON-LD data out there is exploding, and we can query it with SPARQL, so…

curling SPARQL

A quick reference.

I’ve been using the curl utility to retrieve data from SPARQL endpoints for years, but I still have trouble remembering some of the important syntax, so I jotted down a quick reference for myself and I thought I’d share it. I also added some background.

Querying machine learning distributional semantics with SPARQL

Bringing together my two favorite kinds of semantics.

When I wrote Semantic web semantics vs. vector embedding machine learning semantics, I described how distributional semantics–whose machine learning implementations are very popular in modern natural language processing–are quite different from the kind of semantics that RDF people usually talk about. I recently learned of a fascinating project that brings RDF technology and distributional semantics together, letting our SPARQL query logic take advantage of entity similarity as rated…

Playing with wdtaxonomy

Those queries from my last blog entry? Never mind!

After I wrote about Extracting RDF data models from Wikidata in my blog last month, Ettore Rizza suggested that I check out wdtaxonomy, which extracts taxonomies from Wikidata by retrieving the kinds of data that my blog entry’s sample queries retrieved, and it then displays the results as a tree. After playing with it, I’m tempted to tell everyone who read that blog entry to ignore the example queries I included, because you can learn a lot more from wdtaxonomy.

SPARQL full-text Wikipedia searching and Wikidata subclass inferencing

Wikipedia querying techniques inspired by a recent paper.

I found all kinds of interesting things in the article “Getting the Most out of Wikidata: Semantic Technology Usage in Wikipedia’s Knowledge Graph”(pdf) by Stanislav Malyshev of the Wikimedia Foundation and four co-authors from the Technical University of Dresden. I wanted to highlight two particular things that I will find useful in the future and then I’ll list a few more.

Pipelining SPARQL queries in memory with the rdflib Python library

Using retrieved data to make more queries.

Last month in Dividing and conquering SPARQL endpoint retrieval I described how you can avoid timeouts for certain kinds of SPARQL endpoint queries by first querying for the resources that you want to know about and then querying for more data about those resources a subset at a time using the VALUES keyword. (The example query retrieved data, including the latitude and longitude, about points within a specified city.) I built my demo with some shell scripts, some Perl scripts, and a bit of spit…

When I first tried SPARQL’s VALUES keyword (at which point it was pretty new to SPARQL, having only recently been added to SPARQL 1.1) I demoed it with a fairly artificial example. I later found that it solved one particular problem for me by letting me create a little lookup table. Recently, it gave me huge help in one of the most classic SPARQL development problems of all: how to retrieve so much data from an endpoint that the first attempts at that retrieval resulted in timeouts.