Web3 and Web 3.0 at OriginTrail

An interview with CTO and co-founder Branimir Rakić

OriginTrail is doing one of the most interesting combinations of blockchain technology and RDF that I have seen. In November I spoke with CTO and co-founder Branimir Rakić.


OriginTrail logo

Tell me about OriginTrail.

OriginTrail is both an ecosystem and technology stack. Its mission is to grow an open, permissionless system for discovering, verifying and querying valuable assets, be they physical or digital. It merges the benefits of two technologies—blockchains and semantic tech (both named Web3, at different times), hence forming a Decentralized Knowledge Graph, or DKG for short.

With this “merge” of technologies OriginTrail enables innovative applications that transition from “managing data” to managing assets, with associated tools such as data marketplaces, knowledge tokens, and user-tailored search. It operates on a network of hundreds of nodes run by individuals and companies around the world (including the British Standards Institution, US retailers, and Swiss Railways), based on open source tech and standards such as W3C RDF/SPARQL and emerging Decentralized Identifiers and Verifiable Credentials. OriginTrail DKG can be seen as “middleware”, connecting different (often legacy) systems in a novel “Semantic Web3” network.

How would an interested user get started using this?

One of the best ways to start is to explore the official documentation. With a pending update of the network to version 6 (in December), we’re also about to release an updated version of the documentation with example tutorials so that would be a great starting point.

Naturally, knowing about graphs and SPARQL would be also a good start.

You can also develop graph-native Web3 applications, interfacing with assets on the DKG using the OriginTrail SDK. There are currently two SDKs available; one is available on the Oracle Cloud Marketplace and another one on DigitalOcean.

I guess what I mean is, what would a brand new user set about creating as a first step of using OriginTrail?

Broadly speaking a user could observe OriginTrail DKG as a global decentralized graph “database” which one can publish or query knowledge assets from. Both of these can be done using the DKG libraries (such as dkg.js) or public web interfaces (OriginTrail’s Project “Magnify”, currently in private beta).

Writing (or “publishing”) would, as a first step, entail prep work on the published information (triple generation) and publishing those as knowledge assets into DKG records.

For querying the DKG one could explore the existing knowledge assets (for example, via Project Magnify interface) and run SPARQL queries on it.

Apart from being a user, since OriginTrail is a permissionless decentralized system, one can also become a “system operator” by running an OriginTrail Network node and host the DKG state. For hosting the state, nodes collect publishing fees in the form of TRAC tokens.

I found this description on reddit; would you consider it accurate?

“OriginTrail allows anyone to store knowledge assets on its decentralized network of nodes by paying a fee. Those assets can then be queried, verified and made valuable because of the relationships that can be represented in the knowledge graph and also because of the interoperable nature of the platform.”

That is a pretty good description.

Is RDF a typical format for publishing this data?

Starting with the latest version 6, yes. This is about to reach productive state (release on the OriginTrail DKG main network) in December.

Is there SPARQL access to the published data?

Yes. There are two ways to query the data with SPARQL. One is through a SPARQL service (one is provided in Project Magnify) which provides a gateway into the DKG. The other would be to run your own gateway by running an OriginTrail Node.

On top of SPARQL access, one can verify the integrity of each triple in the graph by associating it with the issuer’s public key (associated with its blockchain account) and Merkle proofs.

There is some way to plug your own triplestore into a node, right?

Yes, absolutely. The node connects to a triple store and is decoupled from it. It currently supports Apache Jena (Fuseki), Blazegraph and GraphDB, with plans to extend direct support for others. Essentially, you can consider the node as a “modem” for your triple store that connects it with other nodes and uses blockchains for verification and transactions.

Nodes come in two flavors—full and light nodes, where light nodes do not have a triple store of their own and are not participating in running the system, but are able to perform operations on it such as publish and query.

If I’m going to publish data on one of these nodes and sell access to it, what are the potential mechanisms for my customers to pay for this data?

The payment mechanisms come in several flavors—paying with TRAC tokens, or paying with “Knowledge Tokens” (kTokens) which you can create on your own.

This enables you (as Bob) to create e.g. 1000 Bob tokens, which you can sell via the blockchain as “pay as you go” access tokens for your data. This enables interesting novelties such as the application of market mechanisms for price discovery on your data.

To briefly elaborate:

  • The data you are selling would be private (kept by you, in a triple store of your choice, connected to a DKG node of your choice).

  • Metadata about it would be published on the DKG, to make it discoverable.

  • Based on your decision on how to implement payments, you could opt for one of the above options.

  • When a buyer would discover your data, they would initiate the purchase via OriginTrail smart contracts by locking the right amount of tokens (escrow fashion).

  • Your node would verify the initiation of the transaction (tokens in escrow) and package the data for consumption and verification of the buyer.

  • Data is swapped for tokens. Using a “Proof of Misbehavior” system, tokens will only be spent if the original data has been transmitted.

OriginTrail’s website mentions the use of the W3C Decentralized Identifiers (DIDs) specification. What does this provide to your technology?

Decentralized identifiers are the key piece of tech enabling the blockchain side of things, and the core component of UALs (Universal Asset Locators—URLs in Web3, with resources being assets). With UALs, DIDs enable a standard for provisioning ownable identifiers without a need for a central authority and without dependency on a specific technology. OriginTrail is designed to be blockchain agnostic and, via this standard, can reference any object on any decentralized network (including the DKG itself).

With DIDs one can identify and interact with data issuers, verify integrity of data and fully control their identifiers.

It sounds like this is helping to tie the blockchain technology and the W3C standards-based technologies together.

Indeed it does, and it’s one of the recommendations getting the most traction, together with W3C Verifiable Credentials.

That is great to hear. There are a lot of W3C Recommendations that are nice in theory but not being applied anywhere.

What kind of OriginTrail customers are using it for what kinds of applications?

OriginTrail has been used quite a bit by enterprises. The Swiss Railway company uses it to track rail parts and maintenance events. Several food and beverage producers (whiskey, poultry, beef etc.) use it to show ingredient provenance information to their consumers. The British Standards Institution (BSI) issues verifiable certificates for their trainings on the DKG, and US retailers such as Walmart, Target, and The Home Depot use it to exchange factory audit reports among each other in a privacy preserving fashion. The World Federation of Hemophilia NGO uses it to track donated vaccines and medicine.

Most of these applications built on top of OriginTrail aggregate information from different sources (for example, rail companies, food supply chain companies, and factories) and perform various graph traversal queries to obtain product histories and discover associated events. Many of them also use OriginTrail together with GS1 EPCIS and CBV data models; GS1 is to the supply chain world what W3C is to the Web.

Supply chain applications seem to be a theme there. Are any of them using RDF?

Most of the applications mentioned are either already fully RDF-based or being migrated to RDF. Specifically, the ones using GS1 standardization are benefiting from RDF as it enables a great extension to the descriptive capabilities of those standards. The EPCIS 2.0 standard, which came out recently and we helped co-create through the GS1 Working group, makes this easy, as it’s created with RDF compatibility in mind. RDF and SPARQL are an important component of making these implementations easily extendable.

Is there anything else you’d like to add?

Just to reiterate that we are about to launch the latest OriginTrail version (V6) in a couple of weeks time and are excited to showcase the new capabilities unlocked by incorporating RDF/SPARQL into the tech stack with the wider audience. The great thing about OriginTrail is that it has a vibrant community of technologists and enthusiasts who help create content in and around the DKG. It’s a truly global community with lots of resources, so I encourage everyone who is interested in finding out more to join our Discord and check out the community created resources that can be found on our linktree site.


Comments? Reply to my tweet (or even better, my Mastodon message) announcing this blog entry.