Sun, Nov 7, 2010

A Guide to Publishing Linked Data Without Redirects

This is a follow up to my post earlier this week which resulted in lot of very positive discussion on this blog, the LOD mailing list and Twitter.

Note that this is provisional guidance only, based on mailing list discussion. It does not currently have endorsement from the W3C or other standards bodies. If a serious logical flaw is discovered then this guide will be withdrawn.

Background

The choice of URIs for identifying real-world things dictates how you can publish Linked Data about those things. If you choose to identify your things with URIs comprising a base URI plus a fragment such as http://example.com/things#toucan (known as hash URIs) then you can simply publish a description document at the base URI containing RDF about the things you are identifying. If you choose URIs without a fragment such as http://example.com/toucan (slash URIs) then you must publish your description document at another, distinct URI. Up until now to connect your thing to its description you had to configure your webserver to issue a 303 redirect to the description document whenever someone requests your thing’s URI. This guide describes a way to avoid the redirect while still retaining the separation between the thing and its description.

How To

When your webserver receives a GET request to your thing’s URI you may respond with a 200 response code and include the content of the description document in the response provided that you:

include the URI of the description document in a content-location header, and
ensure the body of the response is the same as the body obtained by performing a GET on the description document’s URI, and
include a triple in the body of the response whose subject is the URI of your thing, predicate is http://www.w3.org/2007/05/powder-s#describedby and object is the URI of your description document

Example

Suppose someone is publishing a description of a toucan. They assign the toucan a URI of http://example.com/toucan and its description a URI of http://example.com/descriptions/toucan.rdf. When a GET is performed on http://example.com/toucan the following response is sent by the server.

HTTP/1.1 200 OK Date: Mon, 08 Nov 2010 01:00:53 GMT Content-Location: http://example.com/descriptions/toucan.rdf Content-Type: text/turtle

@prefix wdrs: <http://www.w3.org/2007/05/powder-s# . <http://example.com/toucan> a <http://dbpedia.org/resource/Toucan> ; wdrs:describedby <http://example.com/descriptions/toucan.rdf> .

Note that the content-location header points to the description document and the body contains a triple linking the thing with its description.

When a GET is performed on the description document at http://example.com/descriptions/toucan.rdf, the following response will be sent:

HTTP/1.1 200 OK 
Date: Mon, 08 Nov 2010 01:00:53 GMT 
Content-Type: text/turtle
@prefix wdrs: <http://www.w3.org/2007/05/powder-s# .
<http://example.com/toucan>
a <http://dbpedia.org/resource/Toucan> ;
wdrs:describedby <http://example.com/descriptions/toucan.rdf> .

The Theory

The primary concern and raison d'être for the current 303 approach is that a web server responding with a status code of 200 is indicating that the entity in the response is a representation of the requested resource. The W3C TAG’s decision on what resources can be identified by HTTP URIs (i.e. httpRange-14) forbids this for anything but information resources.

Mike Kelly found the key piece of spec text that legitimises my proposed approach and it hinges on the use of the content-location header. The latest draft revision of the HTTP/1.1 specification seeks to clarify current usage of HTTP. Section 6.1 deals with identifying the resource associated with a representation. Here is the relevant text:

In the common case, an HTTP response is a representation of the target resource (see Section 4.3 of [Part1]). However, this is not always the case. To determine the URI of the resource a response is associated with, the following rules are used (with the first applicable one being selected):

If the response status code is 200 or 203 and the request method was GET, the response payload is a representation of the target resource.

If the response status code is 204, 206, or 304 and the request method was GET or HEAD, the response payload is a partial representation of the target (see Section 2.8 of [Part6]).

If the response has a Content-Location header field, and that URI is the same as the effective request URI, the response payload is a representation of the target resource.

If the response has a Content-Location header field, and that URI is not the same as the effective request URI, then the response asserts that its payload is a representation of the resource identified by the Content-Location URI. However, such an assertion cannot be trusted unless it can be verified by other means (not defined by HTTP).

Otherwise, the response is a representation of an anonymous (i.e., unidentified) resource.

Step 4 is the crucial one. It’s saying that if the server sends a status code of 200 with a content-location header then the entity in the response is not a representation of the requested resource but of the resource identified by the supplied header.

This is exactly what is needed. It means we can use any HTTP URI to identify real world objects and serve Linked Data from that URI provided we also including a Content-Location header pointing to a URI for the Linked Data document that we served up.

A caveat though is that it is not entirely clear that this interpretation of content-location is endorsed for GET requests. The section above states that the first matching rule applies and Section 6.7 of HTTPbis says:

If Content-Location is included in a response message and its value differs from the effective request URI, then the origin server is informing recipients that this representation has its own, presumably more specific, identifier. For a GET or HEAD request, this is an indication that the effective request URI identifies a resource that is subject to content negotiation and the representation selected for this response can also be found at the identified URI. For other methods, such a Content-Location indicates that this representation contains a report on the action’s status and the same report is available (for future access with GET) at the given URI. For example, a purchase transaction made via the POST method might include a receipt document as the payload of the 200 response; the Content-Location value provides an identifier for retrieving a copy of that same receipt in the future.

Frequently Asked Questions

Doesn’t this approach confuse the thing with its description? No. The thing and the description have two different URIs and these are made explicit in two ways. Responses from the thing’s URI contain a content-location header that points to the URI of the description document. Also, the description document itself must contain a wdrs:describedby triple that links the thing to its description.
Does this mean I shouldn’t use 303 redirects? No. This is an additional technique for serving Linked Data and has its own advantages and disadvantages. For example you may still prefer to use a 303 redirect if the description document is located on another host.
Doesn’t this break existing Linked Data applications? Possibly. An informal poll on the LOD list gained responses from some application implementors, all of whom stated that it would not break their applications. Further testing is required.
The wdrs:described by property has a range of wdrs:Document which is specific to POWDER. Isn’t this a problem? Currently yes but this has been recognised as an oversight and steps are being taken to correct this and widen the range of this property to any description document. This has been added to the errata for POWDER:

Permalink: http://blog.iandavis.com/2010/11/a-guide-to-publishing-linked-data-without-redirects/

Other posts tagged as data, guide, httprange14, linked-data, projects, rdf

Internet Alchemy