Some questions about RDF named graphs

Trying to connect the data structure to real-world use.

Most triplestores support named graphs, and from a high level I can see how they’d be useful, but as I think about using named graphs to address specific application needs, some questions come to mind, so I thought I’d throw them out there.

  1. If graph membership is implemented by using the fourth part of a quad to name the graph that the triple belongs to, then a triple can only belong directly to one graph, right?

  2. I say “belong directly” because I’m thinking that a graph can belong to another graph. If so, how would this be indicated? Is there some specific predicate to indicate that graph x belongs to graph y?

  3. If we’re going to use named graphs to track provenance, then it would make sense to assign each batch of data added to my triplestore to its own graph. Let’s say that after a while I have thousands of graphs, and I want to write a SPARQL query whose scope is 432 of those graphs. Do I need 432 “FROM NAMED” clauses in my query? (Let’s assume that I plan to query those same 432 multiple times.)

I can think of more questions, but I want to wait and see what I can learn about the issues above, and then I can ask better follow-up questions.

6 Comments

By Eric Schoonover on March 1, 2009 1:02 PM

In the repository I am helping to build we have the concept of a graph alias that helps with the overload of named or default graph references in your SPARQL query. It is especially useful if you are going to be executing multiple queries against the same set of graphs. You can assign a single URI that acts as an alias to the 432 graphs you really intend to query and then you can have a single FROM or FROM NAMED clause that points to the graph alias and the SPARQL endpoint will automatically expand the query based on the contents of the graph alias.

By glenn mcdonald on March 1, 2009 1:58 PM

I think this idea of named-graphs being a “physical” (i.e., exclusive, containing) partitioning of the triple-space not only doesn’t make sense, but its failure to make sense is in hilariously exact hierarchical contradiction to the very graph-structured premise of RDF. The relationship between a triple and anything else demands all the same structure and flexibility as anything other kind of relationship. The fourth column in a quad-store should not be graph-name, it should be triple-id. Once a triple has an ID, you can then express anything you want *about* that triple, whether it’s confidence or provenance or batch or saltiness or whatever.

By Jeni Tennison on March 1, 2009 2:48 PM

I’m no expert, but I agree with Glenn Mcdonald, that the fourth column should really be triple-id (as in a unique URI for each triple). Then again, I think named graphs are flexible enough to be used in this way anyway: they just can encapsulate more than one triple if that’s useful.

As far as the questions go: my understanding is that a given triple (as in a unique subject/property/object combination) can belong to multiple graphs. Each graph it belongs to provides a separate ‘row’ in the quad store.

Named Graphs / Semantic Web Activity points to a vocabulary for describing the relationships between graphs (subgraphs, equivalent graphs and so on) at http://www.w3.org/2004/03/trix/rdfg-1/.

I agree with Eric about making your 432 graphs subgraphs or a larger graph which you then query. I guess how you do that depends on the triplestore you’re using. The SPARQL specification has an example of named and default graphs which might be useful as a starting point.

By Chris Booth on March 1, 2009 3:08 PM

I’m no expert, especially about your first two questions, but for your third question it seems to me that you could use a variable for the named graph and then FILTER the results. That might not reduce your 432 individual requests to one, but it might help quite considerably.

By Damian on March 1, 2009 3:24 PM

Oh boy, good questions. Let’s try these ropey definitions first:

Graph: a set of triples.
Named graph: a name, graph pair.
Dataset: a default graph, and zero or more named graphs.

1) No, a triple can be in more than one graph: , . However some stores let you ignore the graphs in certain situations, which require caution to maintain the set-ness of the resulting pseudo-graph. I believe some stores use this as the default graph in SPARQL, which is neither precluded nor suggested by the spec.

2) I don’t understand how a graph can belong to another graph. It might be mentioned (e.g. one graph contains a statement ‘:Bob eg:made ‘)? You may have functional dependencies between graphs ( made from via CONSTRUCT), but that’s up to your application to track. Named graphs are just graphs with names, nothing more.

3) An exciting part of SPARQL :-) In SPARQL you query a dataset, but what determines the dataset? It might be the protocol parameters, it might be the query (your FROM and FROM NAMED), and it might simply be the endpoint that determines it. So don’t expect the endpoint to even pay attention to your FROM NAMED clauses.

The best I can suggest is talk to your store vendor, although you may find FILTERing graphs in or out will do the trick.

Hope this comment helps a little.

By Lee Feigenbaum on March 2, 2009 12:21 AM

Bob,

For the most part, I agree with everything Damian says. That said, since Open Anzo is based on a named graph model, I wanted to give some specific answers based on our experience.

Since my comments were a bit lengthy, I stuck them on my blog:

http://www.thefigtrees.net/lee/blog/2009/03/named_graphs_in_open_anzo.html