Converting JSON to RDF

Any JSON at all.

The real payoff of easy conversion of JSON to RDF is the ease with which you can then integrate that data with other datasets.

When I was at TopQuadrant, I learned that their SPARQLMotion scripting language had a module that could convert JSON to RDF. This had nothing to do with JSON-LD—it worked with any JSON at all, using blank nodes to indicate the grouping of data within arbitrary structures.

Because this tool is only available to paying TopQuadrant customers (or those in the first 30 days of the trial version of TopBraid Composer Maestro Edition), I’ve kept my eye out for a free tool that would do this, and I was happy to see AtomGraph’s JSON2RDF on github. I had to build the binary myself, but this was easy enough after a quick install of the maven build tool. As the JSON2RDF github readme file tells us, mvn clean install is all you need to build a jar file. A Docker image is also available.

I could then run it on a myinput.json input file to create a myoutput.json file with this command line:

java -jar json2rdf-1.0.0-SNAPSHOT-jar-with-dependencies.jar http://example.com/test# < myinput.json > myoutput.ttl

As you’ll see in the sample output below, the converter uses the URL provided in the command line as the base URI for the properties in the output.

To test it I ran that command line using the following handmade JSON as input:

{ 
    "color": "red", 
    "amount": 3,                      
    "arrayTest": ["north","south","east","west",3,"escaped \/string"],
    "boolTest": true,
    "nullTest": null,
    "addressBookEntry": {
        "first": "Richard",
        "last": "Mutt",
        "address": {
            "street": "1 Main St",
            "city": "Springfield" ,
            "zip": "10045"
        }
    }
}

Here is the output that AtomGraph’s JSON2RDF created:

_:B6bba <http://example.com/test#color> "red" .
_:B6bba <http://example.com/test#amount> "3"^^<http://www.w3.org/2001/XMLSchema#int> .
_:B6bba <http://example.com/test#arrayTest> "north" .
_:B6bba <http://example.com/test#arrayTest> "south" .
_:B6bba <http://example.com/test#arrayTest> "east" .
_:B6bba <http://example.com/test#arrayTest> "west" .
_:B6bba <http://example.com/test#arrayTest> "3"^^<http://www.w3.org/2001/XMLSchema#int> .
_:B6bba <http://example.com/test#arrayTest> "escaped /string" .
_:B6bba <http://example.com/test#boolTest> "true"^^<http://www.w3.org/2001/XMLSchema#boolean> .
_:B6bba <http://example.com/test#addressBookEntry> _:Bcd68.

_:Bcd68 <http://example.com/test#first> "Richard" .
_:Bcd68 <http://example.com/test#last> "Mutt" .
_:Bcd68 <http://example.com/test#address> _:B9a02 .

_:B9a02 <http://example.com/test#street> "1 Main St" .
_:B9a02 <http://example.com/test#city> "Springfield" .
_:B9a02 <http://example.com/test#zip> "10045" .

(To make it easier to read on this page I replaced the original blank node URIs created by JSON2RDF with shorter versions and added two carriage returns.) You can see that the converter handled the data types, escaped string, and nested structures just fine. This output also provides a nice lesson in how, although the simplicity of the RDF data model means that any data collection is a flat list of triples, you can still represent more complex data structures with very little trouble.

That was a hand-curated example. To test it on something from the wild, I grabbed the following from the JSON and BSON page of mongodb.com:

{
  "_id": 1,
  "name" : { "first" : "John", "last" : "Backus" },
  "contribs" : [ "Fortran", "ALGOL", "Backus-Naur Form", "FP" ],
  "awards" : [
    {
      "award" : "W.W. McDowell Award",
      "year" : 1967,
      "by" : "IEEE Computer Society"
    }, {
      "award" : "Draper Prize",
      "year" : 1993,
      "by" : "National Academy of Engineering"
    }
  ]
}

JSON2RDF turned it into this (again, with blank node URIs replaced with shorter versions for easier reading):

_:Bcd72 <http://example.com/test#_id> "1"^^<http://www.w3.org/2001/XMLSchema#int> .
_:Bcd72 <http://example.com/test#name> _:Be87 .
_:Be87 <http://example.com/test#first> "John" .
_:Be87 <http://example.com/test#last> "Backus" .
_:Bcd72 <http://example.com/test#contribs> "Fortran" .
_:Bcd72 <http://example.com/test#contribs> "ALGOL" .
_:Bcd72 <http://example.com/test#contribs> "Backus-Naur Form" .
_:Bcd72 <http://example.com/test#contribs> "FP" .
_:Bcd72 <http://example.com/test#awards> _:Bbc13 .
_:Bbc13 <http://example.com/test#award> "W.W. McDowell Award" .
_:Bbc13 <http://example.com/test#year> "1967"^^<http://www.w3.org/2001/XMLSchema#int> .
_:Bbc13 <http://example.com/test#by> "IEEE Computer Society" .
_:Bcd72 <http://example.com/test#awards> _:Ba9d .
_:Ba9d <http://example.com/test#award> "Draper Prize" .
_:Ba9d <http://example.com/test#year> "1993"^^<http://www.w3.org/2001/XMLSchema#int> .
_:Ba9d <http://example.com/test#by> "National Academy of Engineering" .

I ran this SPARQL query against those triples to find awards from after 1990,

PREFIX e: <http://example.com/test#>

SELECT ?awardName ?year WHERE {
   ?award e:year ?year ;
  e:award ?awardName  .
  
  FILTER (?year > 1990)
}

and got this result:

-------------------------------------------------------------------
| awardName      | year                                           |
===================================================================
| "Draper Prize" | "1993"^^<http://www.w3.org/2001/XMLSchema#int> |
-------------------------------------------------------------------

This is still a rather artificial example. Before converting that JSON about John Backus I could have just queried it directly with a tiny bit of JavaScript or an even tinier jq expression. The real payoff of easy conversion of JSON to RDF is the ease with which you can then integrate that data with other datasets. With the vast amount of JSON data out there, this means that there is even more data to take advantage of in RDF-based applications.

For example, imagine that you have two different MongoDB JSON datasets designed independently by two different developers. Merging these into a single JSON dataset so that you can treat the combination as a whole that is greater than the sum of its parts is going to be a lot of ETL work. With the data in RDF, you only need a CONSTRUCT query for each dataset to rename some properties. (Aa few class, subclass, and subproperty declarations might be handy for a little data modeling, but these are optional.) Then, you just append one set of transformed triples to the other and you’ve got a single dataset.

Two more notes about AtomGraph’s JSON2RDF:

  • Make sure to read through all the readme information on their Atomgraph’s github page.

  • As with SPARQLMotion’s ConvertJSONToRDF module, Atomgraph’s utility is part of a collection of tools that they make available to pipeline together for application development. Unlike SPARQLMotion, it’s open source and can be run from the command line, so in the old-fashioned Unix sense of the word “pipeline” it can be connected up to tools from other developers as well, such as the aforementioned jq.