Exploring JSON-LD

And of course, querying it with SPARQL.

JSON-LD logo

I paid little attention to JSON-LD until recently. I just thought of it as another RDF serialization format that, because it’s valid JSON, had more appeal to people normally uninterested in RDF. Dan Brickley’s December tweet that “JSON-LD is much more widely used than Turtle” inspired me to look a little harder at the JSON-LD ecosystem, and I found a lot of great things. To summarize: the amount of JSON-LD data out there is exploding, and we can query it with SPARQL, so it offers many new possibilities for RDF-based applications.

JSON-LD structure

The primer on the json-ld.org site is a good way to get a quick introduction to the syntax. The W3C’s RDF AND JSON-LD UseCases document has a Differences with RDF section that provides a nice summary for people coming to JSON-LD from the RDF world.

To get to know the JSON-LD syntax, I created a Turtle file with examples of some trickier RDF features and then converted it to JSON-LD to see what it looked like. My Turtle:

    @prefix ab:   <http://learningsparql.com/ns/sample#> .
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
    @prefix v:    <http://www.w3.org/2006/vcard/> .
    @prefix dc:   <http://purl.org/dc/elements/1.1/> .
    
    # Sample comment: I wish I could get Hugo to do syntax highlighting of Turtle!
    
    ab:i432 ab:firstName     "Richard" ;
            ab:lastName      "Mutt" ;
            ab:startYear     2013 ;
            ab:officer       true ;
            ab:reportsTo     ab:i193 ;
            ab:linkedIn      <https://www.linkedin.com/in/rmutt> ;
            ab:address       _:b1 .
    
    _:b1    ab:city          "Springfield" ;
            ab:streetAddress "32 Main St." .
    
    ab:i193 ab:firstName     "Joan" ;
            ab:lastName      "Jones" ;
            v:title "Director"@en ; 
            v:title "Directeur"@fr .
            
    <urn:isbn:123456789X> dc:creator ab:i193 ;
            dc:title "Chicken Soup for the JSON-LD Soul" . 
    
    ab:firstName rdfs:label  "first name" .

It has some information in different types about employee Richard Mutt with an object property to identify his boss and a blank node to hold together the details of his address. Triples about his boss list her job title in both English and French; they also show her as the author of a book whose name is specified with a “title” property in a different namespace from the property identifying her job title.

The Jena command line utilities that I currently have installed don’t write JSON-LD (as we’ll see, they can read it) (2021 update: they do now, for example with riot --syntax=jsonld) so I used the easyrdf.org website to convert the Turtle sample above to JSON-LD. I’m tempted to include a screen shot of the result—it was a dense mass without a single carriage return, showing how the JSON-LD home page assertion that JSON-LD is “easy for humans to read and write” should be qualified with “if you add carriage returns and indenting in all the right places”. Of course, just about any programming or markup language is easy for humans to read and write if you add white space in all the right places, so this does not make JSON-LD special. (I do find it amusing when a set of software developers generalize from themselves to their entire species.)

The jq utility nicely converted the easyrdf output into something easier for humans to read. Here is the result:

    [
      {
        "@id": "_:b0",
        "http://learningsparql.com/ns/sample#city": [
          {
            "@value": "Springfield"
          }
        ],
        "http://learningsparql.com/ns/sample#streetAddress": [
          {
            "@value": "32 Main St."
          }
        ]
      },
      {
        "@id": "http://learningsparql.com/ns/sample#firstName",
        "http://www.w3.org/2000/01/rdf-schema#label": [
          {
            "@value": "first name"
          }
        ]
      },
      {
        "@id": "http://learningsparql.com/ns/sample#i0193"
      },
      {
        "@id": "http://learningsparql.com/ns/sample#i193",
        "http://learningsparql.com/ns/sample#firstName": [
          {
            "@value": "Joan"
          }
        ],
        "http://learningsparql.com/ns/sample#lastName": [
          {
            "@value": "Jones"
          }
        ],
        "http://www.w3.org/2006/vcard/title": [
          {
            "@value": "Director",
            "@language": "en"
          },
          {
            "@value": "Directeur",
            "@language": "fr"
          }
        ]
      },
      {
        "@id": "http://learningsparql.com/ns/sample#i432",
        "http://learningsparql.com/ns/sample#firstName": [
          {
            "@value": "Richard"
          }
        ],
        "http://learningsparql.com/ns/sample#lastName": [
          {
            "@value": "Mutt"
          }
        ],
        "http://learningsparql.com/ns/sample#startYear": [
          {
            "@value": 2013
          }
        ],
        "http://learningsparql.com/ns/sample#officer": [
          {
            "@value": true
          }
        ],
        "http://learningsparql.com/ns/sample#reportsTo": [
          {
            "@id": "http://learningsparql.com/ns/sample#i0193"
          }
        ],
        "http://learningsparql.com/ns/sample#linkedIn": [
          {
            "@id": "https://www.linkedin.com/in/rmutt"
          }
        ],
        "http://learningsparql.com/ns/sample#address": [
          {
            "@id": "_:b0"
          }
        ]
      },
      {
        "@id": "https://www.linkedin.com/in/rmutt"
      },
      {
        "@id": "urn:isbn:123456789X",
        "http://purl.org/dc/elements/1.1/creator": [
          {
            "@id": "http://learningsparql.com/ns/sample#i193"
          }
        ],
        "http://purl.org/dc/elements/1.1/title": [
          {
            "@value": "Chicken Soup for the JSON-LD Soul"
          }
        ]
      }
    ]

Except for JSON’s inability to store comments, the converted version shows that JSON-LD managed to represent all the tricky RDF bits that I included in the input.

With the Jena arq command line tool I successfully executed the following SPARQL query against the JSON-LD data above:

CONSTRUCT { ?s ?p ?o } WHERE
{ ?s ?p ?o }

The query simply asks for all the triples. My arq command line asked for output in the default format of Turtle, and it worked fine.

There are two bits of big news here for RDF people evaluating JSON-LD:

  • I round-tripped some fairly complex RDF in and out of JSON-LD with no loss of anything but the comment.

  • I performed a SPARQL query on JSON-LD. This demonstrates that the exploding amount of JSON-LD out there is available for use in RDF applications.

SPARQL queries of public JSON-LD

Next I queried some non-demo real-world data. The overstock.com website has rich JSON-LD data about all of their products and even includes some nice JSON-LD in their search results pages. After searching the site for “headphones” and pulling the JSON-LD from the first page of search results, I wrote a script to pull the JSON-LD for the 60 or so products listed there.

The aggregated data has 8,808 triples with 27 different predicates. If you do a View Source on the web page of a typical entry from the headphones list (search for “ld+json”) you’ll see that its JSON-LD provides more than just a product name and images—it includes a full paragraph of description, pricing, reviews, availability, and more.

The following query of that data requests the price and name (but not description) of any headphones under $30 that include “Bluetooth” in their description:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
PREFIX s:   <http://schema.org/> 

SELECT ?price ?name WHERE {
   ?i a s:Product ;
   s:name ?name ;
   ?offers ?offer ; 
   s:description ?description .
  
   ?offer s:price ?price .
   FILTER(contains(?description,"Bluetooth")) 
   FILTER(xsd:decimal(?price) < 30) 
}

Here is the result:

--------------------------------------------------------------------------------------------------------------------------------------------
| price   | name                                                                                                                           |
============================================================================================================================================
| "16.49" | "BL1 Mini Bluetooth Monaural Headphone Stereo Wireless Stealth Business Wireless Bluetooth 4.1 Headphones"                     |
| "11.24" | "Mini Wireless Bluetooth 4.0 Stereo In-Ear Headset (Black)"                                                                    |
| "13.49" | "Mpow EM 13 Mini Wireless Earbud, Bluetooth V4.1 Invisible Earphone"                                                           |
| "21.99" | "X18 Wireless Bluetooth Earbuds Headphones Stereo Sound Built-in 6.0 Noise Cancelling Mic"                                     |
| "20.99" | "Mpow Bluetooth Headphones V4.1 Wireless Sport Headphones Noise Cancelling In-ear Stereo Earbuds 8-hour Playing Time with Mic" |
--------------------------------------------------------------------------------------------------------------------------------------------

Of course, all of the properties use the schema.org vocabulary, but RDFS gives us ways to map this data as to other more specialized vocabularies. I’ll show some of that next time; above, the casting of the price from a string datatype to a decimal value is one taste of how SPARQL can turn the data into something more useful.

A bright future

It’s been a pleasant surprise to see how many different sites include JSON-LD these days. The Hugo website generation framework that I wrote about migrating to last month adds JSON-LD metadata by default, so my new blog website had JSON-LD before I even knew it did. I’ve also been surprised by how popular JSON-LD is with the search engine optimization crowd—a Google search for JSON-LD SEO gets over 200,000 hits, and many don’t even mention RDF. They just see it as a way to add metadata that Google’s crawlers are more likely to notice.

While I’m currently only interested in JSON-LD as a growing source of data that I can query with SPARQL, there are some interesting things happening with the syntax and structure of JSON-LD itself. Greg Kellogg’s JSON-LD 1.1 Update gives a nice overview of the additions to JSON-LD that are being considered. I certainly plan to play with it more.