Google+

A Guide to Publishing Linked Data Without Redirects

11

7 November 2010 by Ian Davis

This is a follow up to my post earlier this week which resulted in lot of very positive discussion on this blog, the LOD mailing list and Twitter.

Note that this is provisional guidance only, based on mailing list discussion. It does not currently have endorsement from the W3C or other standards bodies. If a serious logical flaw is discovered then this guide will be withdrawn.

Background

The choice of URIs for identifying real-world things dictates how you can publish Linked Data about those things. If you choose to identify your things with URIs comprising a base URI plus a fragment such as http://example.com/things#toucan (known as hash URIs) then you can simply publish a description document at the base URI containing RDF about the things you are identifying. If you choose URIs without a fragment such as http://example.com/toucan (slash URIs) then you must publish your description document at another, distinct URI. Up until now to connect your thing to its description you had to configure your webserver to issue a 303 redirect to the description document whenever someone requests your thing’s URI. This guide describes a way to avoid the redirect while still retaining the separation between the thing and its description.

How To

When your webserver receives a GET request to your thing’s URI you may respond with a 200 response code and include the content of the description document in the response provided that you:

  1. include the URI of the description document in a content-location header, and
  2. ensure the body of the response is the same as the body obtained by performing a GET on the description document’s URI, and
  3. include a triple in the body of the response whose subject is the URI of your thing, predicate is http://www.w3.org/2007/05/powder-s#describedby and object is the URI of your description document

Example

Suppose someone is publishing a description of a toucan. They assign the toucan a URI of http://example.com/toucan and its description a URI of
http://example.com/descriptions/toucan.rdf. When a GET is performed on http://example.com/toucan the following
response is sent by the server.

HTTP/1.1 200 OK 
Date: Mon, 08 Nov 2010 01:00:53 GMT 
Content-Location: http://example.com/descriptions/toucan.rdf 
Content-Type: text/turtle

@prefix wdrs: <http://www.w3.org/2007/05/powder-s# .
<http://example.com/toucan>
  a <http://dbpedia.org/resource/Toucan> ;
  wdrs:describedby <http://example.com/descriptions/toucan.rdf> .

Note that the content-location header points to the description document and the body contains a triple linking the thing with its description.

When a GET is performed on the description document at http://example.com/descriptions/toucan.rdf, the following response will be sent:

HTTP/1.1 200 OK 
Date: Mon, 08 Nov 2010 01:00:53 GMT 
Content-Type: text/turtle

@prefix wdrs: <http://www.w3.org/2007/05/powder-s# .
<http://example.com/toucan>
  a <http://dbpedia.org/resource/Toucan> ;
  wdrs:describedby <http://example.com/descriptions/toucan.rdf> .

The Theory

The primary concern and raison d’ĂȘtre for the current 303 approach is that a web server responding with a status code of 200 is indicating that the entity in the response is a representation of the requested resource. The W3C TAG’s decision on what resources can be identified by HTTP URIs (i.e. httpRange-14) forbids this for anything but information resources.

Mike Kelly found the key piece of spec text that legitimises my proposed approach and it hinges on the use of the content-location header. The latest draft revision of the HTTP/1.1 specification seeks to clarify current usage of HTTP. Section 6.1 deals with identifying the resource associated with a representation. Here is the relevant text:

In the common case, an HTTP response is a representation of the target resource (see Section 4.3 of [Part1]). However, this is not always the case. To determine the URI of the resource a response is associated with, the following rules are used (with the first applicable one being selected):

  1. If the response status code is 200 or 203 and the request method was GET, the response payload is a representation of the target resource.

  2. If the response status code is 204, 206, or 304 and the request method was GET or HEAD, the response payload is a partial representation of the target (see Section 2.8 of [Part6]).

  3. If the response has a Content-Location header field, and that URI is the same as the effective request URI, the response payload is a representation of the target resource.

  4. If the response has a Content-Location header field, and that URI is not the same as the effective request URI, then the response asserts that its payload is a representation of the resource identified by the Content-Location URI. However, such an assertion cannot be trusted unless it can be verified by other means (not defined by HTTP).

  5. Otherwise, the response is a representation of an anonymous (i.e., unidentified) resource.

Step 4 is the crucial one. It’s saying that if the server sends a status code of 200 with a content-location header then the entity in the response is not a representation of the requested resource but of the resource identified by the supplied header.

This is exactly what is needed. It means we can use any HTTP URI to identify real world objects and serve Linked Data from that URI provided we also including a Content-Location header pointing to a URI for the Linked Data document that we served up.

A caveat though is that it is not entirely clear that this interpretation of content-location is endorsed for GET requests. The section above states that the first matching rule applies and Section 6.7 of HTTPbis says:

If Content-Location is included in a response message and its value
differs from the effective request URI, then the origin server is
informing recipients that this representation has its own, presumably
more specific, identifier. For a GET or HEAD request, this is an
indication that the effective request URI identifies a resource that
is subject to content negotiation and the representation selected for
this response can also be found at the identified URI. For other
methods, such a Content-Location indicates that this representation
contains a report on the action’s status and the same report is
available (for future access with GET) at the given URI. For
example, a purchase transaction made via the POST method might
include a receipt document as the payload of the 200 response; the
Content-Location value provides an identifier for retrieving a copy
of that same receipt in the future.

Frequently Asked Questions

  • Doesn’t this approach confuse the thing with its description? No. The thing and the description have two different URIs and these are made explicit in two ways. Responses from the thing’s URI contain a content-location header that points to the URI of the description document. Also, the description document itself must contain a wdrs:describedby triple that links the thing to its description.
  • Does this mean I shouldn’t use 303 redirects? No. This is an additional technique for serving Linked Data and has its own advantages and disadvantages. For example you may still prefer to use a 303 redirect if the description document is located on another host.
  • Doesn’t this break existing Linked Data applications? Possibly. An informal poll on the LOD list gained responses from some application implementors, all of whom stated that it would not break their applications. Further testing is required.
  • The wdrs:described by property has a range of wdrs:Document which is specific to POWDER. Isn’t this a problem? Currently yes but this has been recognised as an oversight and steps are being taken to correct this and widen the range of this property to any description document. This has been added to the errata for POWDER:

11 thoughts on “A Guide to Publishing Linked Data Without Redirects

  1. kaiec says:

    For me this reads as if it were only applicable for POST requests. With GET, the first rule should apply…

  2. Ian Davis says:

    kaiec, can you be more specific about what part you are referring to?

  3. Ian Davis says:

    kaiec, I think I see what you mean. The HTTPbis work talks in terms of using content-location to offer up a permanent location for the response of a POST.

  4. kaiec says:

    Exactly, as it reads, the first rule should be selected, so a compatible application will regard every 200 response of a GET request as a representation of the target (i.e. request) URI. So if your approach (which would be nice to have) should work, I assume that these rules first have to be changed. Or you would break them, which for sure is not desirable. Or am I missing a point?

  5. sallamar says:

    This use of Content-Location is bending the protocol a bit. This header makes most sense for either negotiated responses or in response to POST or PUT when the representation in the body has a URI that is different from the request URI. For GETs, the net outcome of Content-Location is worse than 3xx.

  6. mikekelly85 says:

    subbu, how is this distinct from any other type of negotiated response?

  7. prototypo says:

    Doesn’t the use of the Content-Location header leave you in the same position in relation to those without control over their Apache settings as the 303 usage?

  8. prototypo says:

    My (rather long) response is at http://bit.ly/cMkCh3

  9. ZAZI says:

    Can you extend your example so that it returns documents* of different content-types (e.g. N3, RDF/XML, RDFa) that include, at least, all the same description of this resource (the information resource** – a part of the Semantic Graph that is included in a document*)?From my point of view an information resource** is that piece of information that should be delivered; at least; when I derefence a resource URI.*) concrete documents**) information resource here not in terms of the AWWW definition; the subject (not in terms of the subject part of a triple) of an information resource is the resource (URI)(a resource URI identifies/denotes a resource; an information resource is part of a representation that will be delivered by dereferencing the resource URI; an information resource can be part of different representations that will be delivered by dereferencing the resource URI; a resource can have many information resources and hence many resource URIs)

  10. ZAZI says:

    I wrote a note as an attempt to clarify a bit the terms Resource, Information Resource and Document and their relations (from my point of view). Please have a look at:http://infoserviceonto.wordpress.com/2010/11/25/on-resources-information-reso

  11. sallamar says:

    @mikekelly85 With conneg, clients usually expect the information content in variants equivalent, and this usage does not fit that expectation. Going by RFC 3986, http://example.com/toucan and http://example.com/descriptions/toucan.rdf are distinct resources. To me, these are just “related” resources.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 32 other followers

%d bloggers like this: