Wed, Dec 15, 2010

My Feedback on SPARQL 1.1 Uniform HTTP Protocol for Managing RDF Graphs

I sent the following feedback yesterday to the W3C SPARQL Working Group on their proposal for a RESTful approach to managing graphs.

I reviewed the document at http://www.w3.org/TR/2010/WD-sparql11-http-rdf-update-20101014/ and enclose my initial comments below. Note that I stopped my review after section 4.2

Note: in my comments I use the word "represent" and "representation" only in the sense as defined by rfc2616.

Section 2

Graph Store is defined to be mutable. I don't see why it needs that requirement. The read only aspects of this document could apply to a non-mutable Graph Store

Section 4.1

I don't at all understand the need for the distinction in this document between a graph and RDF knowledge. I find the supplied explanation particularly confusing:

"we are not directly identifying an RDF graph but rather the RDF knowledge that is represented by an RDF document, which serializes that graph"

I have seen serialization and representation used interchangeably in many REST discussions but never seen them used as distinct operations so I don't know what to make of it really.

If my understanding of the terminology is correct than I think the relationships are that RDF Knowledge is the result of interpreting an RDF graph which may be represented by an RDF document. In this case the identified resource that is emitting representations is the graph itself. The RDF Knowledge is not explicitly named here, but could be somehow.

The immediately following sentence "Intuitively, the interpetations that satisfy [RDF-MT] the RDF graph serialized by the RDF document can be thought of as this RDF knowledge" implies that the Graph IRI identifies multiple things, i.e. multiple interpretations. It's axiomatic on the web that a URI (IRI) identifies only one resource so I see this as a conflict.

I assume the introduction of the term "RDF Knowledge" is motivated by an attempt to unify the concept of distinct document-like resources that you encounter on the web and an aggregation of the data in those documents as you might find in a database. I think this document would benefit from the removal of that term entirely and the addition of a section describing how a Graph Store might aggregate and interpret the graphs to form one or more datasets that may be accessed with zero or more SPARQL or other services. What form of entailment used by the Graph Store is out of scope of the document, but certainly will affect the behaviour of the SPARQL services it provides.

Section 4.2

The diagram implies that the encoded URI (e.g. http://www.example.org/other/graph) and the indirect URI http://example.com/rdf-graphs/employees?graph=http%3A//www.example.org/other/graph identify they same RDF Knowledge. Does this imply this triple:
<http://example.com/rdf-graphs/employees?graph=http%3A//www.example.org/other/graph>

<http://www.w3.org/2002/07/owl#sameAs>

<http://www.example.org/other/graph> .
I think the whole notion of indirect identification is problematic. What the document is saying, in essence, is that if you have the URI of a graph you need to discover some other URI by an unspecified mechanism with which to manipulate it. If you discover multiple such URIs would you be justified in assuming that they all manipulate the same underlying graph?

I am not convinced that it is intuitive that the following identify different graphs that have the same URI

http://foo.com/graphs?graph=http%3A//www.example.org/other/graph
http://bar.org/rdf-data?graph=http%3A//www.example.org/other/graph

Furthermore, should a conformant server that supports multiple independent collections of graphs (e.g. Talis Platform) be required to enforce that graph URIs identify the same knowledge across all the collections? In other words are the following required to manipulate the same "RDF Knowledge":

http://api.talis.com/dataset1/graphs?http://ex1.com/g?graph=http%3A//www.example.org/other/graph
http://api.talis.com/dataset2/graphs?http://ex1.com/g?graph=http%3A//www.example.org/other/graph

The following sentence implies that this is the case: "Any server that implements this protocol and receives a request URI in this form SHOULD invoke the indicated operation on the RDF knowledge identified by the URI embedded in the query component where the URI is the result of percent-decoding the value associated with the graph key."

At this point I stopped my review. That the two areas I explored are complicated excessively by the introduction of the RDF Knowledge concept into what I feel should be a very simple and straightforward document. I believe the removal of that concept and the introduction of a non-normative section describing the expected behaviour of Graph Stores would be the best route forward.

It is also unclear what this document has to say about a central concept of SPARQL: the dataset. I see in the change summary that the term Graph Store was introduced to replace Dataset but I don't know the background to that decision.

I would prefer to recast this whole document in the following way:

1. Introduce a Graph Store as a service that manages a collection of datasets and a collection of graphs. Many Graph Stores will have a single dataset, multi-tenant ones will have many.

2. Describe operations on a Graph Store: GET to obtain a document describing the graph store including a link to the collection of datasets, a link to the collection of graphs, links to provided services and links to other configuration information

3. Describe operations on the Collection of Datasets: GET to obtain the list of datasets, POST to append a new one

4. Describe operations on a Dataset: GET to obtain a list of graphs included in the dataset, POST to include an existing graph in the dataset, PUT to replace the definition, DELETE to remove a dataset

5. Describe operations on the collection of graphs: GET to obtain the list of graphs, POST to append a new one

6. Describe operations on a Graph: GET to obtain a representation of the graph, POST to append new data, PUT to replace data, DELETE to remove the graph

7. Describe how graph stores may interpret graphs in particular ways, treating datasets as more than a collection of individual graphs, i.e. RDF Knowledge

My message is archived in the public-rdf-dawg-comments archives.

I think this comment from Leigh Dodds on Twitter is also pertinent:

@iand +1 on sparql update feedback. My comments were going to be sinilar. Would be instructive for WG to look at atom protocol, IMHO

Permalink: http://blog.iandavis.com/2010/12/my-feedback-on-sparql-1-1-uniform-http-protocol-for-managing-rdf-graphs/

Other posts tagged as atom, linked-data, rdf, rest, sparql, technology, w3c

Internet Alchemy

My Feedback on SPARQL 1.1 Uniform HTTP Protocol for Managing RDF Graphs

Earlier Posts