Google+

Is 303 Really Necessary?

29

4 November 2010 by Ian Davis

A few months back I threw out a question on Twitter: what breaks on the web if we use status code 200 instead of 303 for our Linked Data? I saw a resurgence of this on Twitter today which prompted me to finally write up my thoughts in a medium with more than 140 character soundbites.

For those new to this debate the current practice in the Linked Data community is to divide the world into two classes: things that are definitely “Information Resources” and things that might be information resources or might be something else entirely, like a planet or a toucan. I have written on the subject of information resources before (such as what are information resources good for and 303 Asymmetry). Simplistically information resources can be considered to be electronic documents such as web pages, spreadsheets, images and the like. The only way to tell for sure whether a URI denotes an information resource is to dereference it. If you get a status code of 200 then the URI denotes an information resource. No meaning is ascribed to any other status code and those cases the URI might denote an information resource or it might not.

Why, you might ask, is all this emphasis placed on information resources? The answer is that the overwhelming use of the web is to serve up electronic documents, predominantly html. The Linked Data people want to use the web’s infrastructure to store information about other things (planets and toucans) and use HTTP URIs to denote those things. Because toucan’s aren’t electronic documents it has been assumed that we need to distinguish the toucan itself from the document containing data about the toucan. One of the central dictums of Linked Data is that URIs can only denote one thing at a time: that means the URI for the toucan needs to be different from the URI for the document about the toucan. We connect the two together in two ways:

  1. when someone issues an HTTP GET to the toucan’s URI the server responds with a 303 status code redirecting the user to the document about the toucan
  2. when someone issues an HTTP GET to the document’s URI the server responds with a 200 status code and an RDF document containing triples that refer to the toucan’s URI

That is the current state of affairs for situations where people want to use HTTP URIs to denote real world things. (there is another approach which uses URIs with a fragment e.g. http://example.com/doc#foo which avoids this 303 redirect, but it has its own problems as I point out here and here).

There are several disadvantages to the 303 redirect approach:

  1. it requires an extra round-trip to the server for every request
  2. only one description can be linked from the toucan’s URI
  3. the user enters one URI into their browser and ends up at a different one, causing confusion when they want to reuse the URI of the toucan. Often they use the document URI by mistake.
  4. its non-trivial to configure a web server to issue the correct redirect and only to do so for the things that are not information resources.
  5. the server operator has to decide which resources are information resources and which are not without any precise guidance on how to distinguish the two (the official definition speaks of things whose “essential characteristics can be conveyed in a message”). I enumerate some examples here but it’s easy to get to the absurd.
  6. it cannot be implemented using a static web server setup, i.e. one that serves static RDF documents
  7. it mixes layers of responsibility – there is information a user cannot know without making a network request and inspecting the metadata about the response to that request. When the web server ceases to exist then that information is lost.
  8. the 303 response can really only be used with things that aren’t information resources. You can’t serve up an information resource (such as a spreadsheet) and 303 redirect to metadata about the spreadsheet at the same time.
  9. having to explain the reasoning behind using 303 redirects to mainstream web developers simply reinforces the perception that the semantic web is baroque and irrelevant to their needs.

The one clear advantage it has is:

  1. It’s easy to distinguish the toucan from the description of the toucan

Given there are more disadvantages than advantages the natural assumption has to be that the single advantage vastly outweighs the cost of the disadvantages to server operators and consumers of Linked Data.

I am far from convinced that it does.

Firstly, I have never needed to distinguish these things and secondly, if I ever did, then RDF itself makes that trivial by its self-describing nature. The document retrieved can contain triples that assert the nature of the thing denoted by the requested URI.

So, back to my original question: what exactly would break on the web if we dropped the requirement to issue a 303 redirect when the user requested the URI of our toucan? What if we simply responded with a 200 status code and the description document?

It’s pretty clear what we would gain: all of those disadvantages above would be eliminated.

At first glance it looks like we would be left with the problem of distinguishing the toucan from its description. However, the description document can still retain its own URI. We link the toucan to its document by using a new property ex:isDescribedBy. This property has exactly the same semantics as the 303 redirect except it is active at the data layer and not the network layer. That means that we still keep the advantage of distinguishing the toucan from its document.

As an example here’s how one could declare the owner of the toucan and the owner of the description document to be different individuals. Under the current state of affairs it’s simple because the toucan and the document have different URIs and no RDF is ever emitted from the toucan’s URI:

GET /toucan  responds with a 303 to /doc

GET /doc responds with 200 and a representation containing some RDF which includes the triples <http://example.org/toucan> ex:owner <http://example.org/anna> and <http://example.org/doc> ex:owner <http://example.org/fred>

Under my new scheme:

GET /toucan responds with 200 and a representation containing some RDF which includes the triples <http://example.org/toucan> ex:owner <http://example.org/anna> and <http://example.org/toucan> ex:isDescribedBy <http://example.org/doc>

GET /doc responds with 200 and a representation containing some RDF which includes the triple <http://example.org/doc> ex:owner <http://example.org/fred>

There would be no requirement for the toucan’s response to include the ex:isDescribedBy property. If the owner of the server has no addiitonal information about the description document then there is no point in linking to it.

It’s important to note that the data in the response to the GET on /toucan should be taken at face value. Any triples referencing the /toucan URI refer to the thing denoted by that URI, not to the representation retrieved from it. (As an aside this is consistent with current HTTP semantics which does not name individual representations).

As far as I can see this approach doesn’t break the web, just provides a bunch of clear advantages. It’s simpler, more effiicent, more extensible and has clearer semantics than the current 303 approach and removes the onus on server operators to decide what is/isn’t an information resource.

If there really are no disadvantages and no breakage in the web, we really ought to evangelise to get this approach accepted as standard practice. That includes doing the following:

  1. define a stable URI for ex:isDescribedBy (the POWDER property describedby seems close but makes assumptions about the type of description pointed to)
  2. lobby the W3C TAG to deprecate their finding on httpRange-14
  3. updating the how to publish linked data tutorial
  4. updating Tabulator and other linked data browsers to understand the new semantics
  5. converting existing linked datasets to use 200 instead of 303
  6. perhaps lobbying the new RDF working group to write this approach up as a note or recommendation

But before all that, perhaps there really are areas of the web architecture that break under this approach. If you spot any, let me know in the comments.

Update Nov 5

I posted a link to this blog on the public-lod@w3.org mailing list which generated lots of discussion: http://markmail.org/thread/mkoc5kxll6bbjbxk

To aid that discussion I’ve also created a small demo of the idea.Here is the URI of a toucan:

http://iandavis.com/2010/303/toucan

Here is the URI of a description of that toucan:

http://iandavis.com/2010/303/toucan.rdf

As you can see both these resources have distinct URIs. I created a new property http://vocab.org/desc/schema/description to link the toucan to its description. The schema for that property is
here:

http://vocab.org/desc/schema

(BTW I looked at the powder describedBy property and it’s clearly designed to point to one particular type of description, not a general RDF one. I also looked at http://ontologydesignpatterns.org/ont/web/irw.owl and didn’t see anything suitable)

Here is the URI Burner view of the toucan resource and of its description document:

I’d like to use this demo to focus on the main thrust of my question: does this break the web  and if so, how?

 

 

 

 

 

29 thoughts on “Is 303 Really Necessary?

  1. cgutteridge says:

    If I understand correctly; this decouples URIs and URLs. The URI for a document is not the URL you got it from.

  2. danmux says:

    I only recently discovered, that (at least some) some browsers strip the # and anything after it from the url – before sending the request.Whilst browsers may end up not being the main client – they still are at the moment

  3. Ian Davis says:

    Not at all Chris. Nothing changes for documents (aka information resources), this is a change only for URIs that denote non-documents, i.e real world things. (As an aside I don’t think things have separate URLs and URIs. After all URLs are a subset of URIs so every URL is also a URI)

  4. Ian Davis says:

    Dan, it’s not normal for browsers to send the fragment to the server, See http://en.wikipedia.org/wiki/Fragment_identifier#Processing

  5. benosteen says:

    Adding a ‘Content-Location’ header is another complementary way in which you can state the document URI for a 200 on the toucan URI. Not an easy thing to add to a static web server, granted, but easier and less taxing to do for a service.”The Content-Location entity-header field MAY be used to supply the resource location for the entity enclosed in the message when that entity is accessible from a location separate from the requested resource’s URI.”

  6. ZAZI says:

    Good post! The only thing I like to add is: Linked Data community please stop dividing the world into two classes! http://example.org/fish != http://example.org/fish.html – however, I can serve http://example.org/fish.html, when some requests http://example.org/fish with a normal web browser.

  7. Alan Dix says:

    re: “One of the central dictums of Linked Data is that URIs can only denote one thing at a time”.In human language we constantly deal with referent ambiguity and only disambiguate when essential: e.g. “Alan Dix” the name, the person, the father, the author. The desire for uniqueness of reference appears to be intellectual hubris. (As does the desire to try to name everything in the world with URIs!)For machine readability we may want less ambiguity than for human dialogues, but it will never be ‘perfectly’ grounded – partly because to be ‘perfect’ would probably preclude any extensibility or generality. Once one accepts this, the more pragmatic arguments, as in this post, are what is important. The question is not which ether 200 or 303 is ‘right’, but instead which works better…. and indeed in the study of linguistics, pragmatics is regarded as a higher level than semantics anyway.

  8. Chris Wallace says:

    Ian, thanks for breaking your silence and articulating the concerns of many of us so well. The point about how a mere web server is supposed to make the distinction between things and descriptions is particularly strong.

  9. danmux says:

    So is the fragment and the fact that “its processing is exclusively client-side” just an intermediate annoyance (if at all) would you expect future and semantic UA’s to send the fragment or as per W3C Media Fragments Working Group suggest, would the fragment be encoded into the headers?

  10. Ian Davis says:

    Ben, yes I agree about content-location. I had intended to mention it, but failed to for some reason.

  11. Dave Reynolds says:

    Good summary of the disadvantages. Don’t agree with the advantage though, replace “easy” with “possible, sort of”. All the 303 says is this “might not” be an information resource, it doesn’t tell you it definitely isn’t an information resource. So it doesn’t even solve the original problem, it just hints at it.I find # URIs more useful than you give them credit for, they may be a hack but don’t cause anything like the same problems. In the cases where you can in fact put a sensible, bounded RDF document at the base URI then they are not even a hack.I agree that simply using different URIs when you need to distinguish just works and is preferable to 303. Evangelising the approach makes sense. It may be possible to simply use foaf:isPrimaryTopicOf for this (which we already do in some places) rather than mint a new predicate. Agreed that that indicates “a” document than “the” document but not sure that is a fundamental problem with using it.The main place I disagree is #2 on your action list. Re-opening this debate with TAG will just restart one of the world’s most depressing perma-threads. I think the sight of that going on would reinforce the impression that semantic web geeks aren’t in the real world. Make it a de facto practice, show that nothing serious breaks, then get it adopted formally.DaveP.S. Having to create an account in order to post a comment is a bit of a turn off!

  12. cgutteridge says:

    This gave me lots of ideas, so I’ve responded in our own blog: http://blogs.ecs.soton.ac.uk/webteam/2010/11/04/nobody-needs-a-303/

  13. mikekelly85 says:

    Looks like ben beat me to the punch on Content-Location! :)Seems like a lot of trouble was/is caused by an interpretation of “URIs can only denote one thing at a time” coloured by a misunderstanding of the relationships between URI, resource, and representation.Even ignoring the ‘limitations of static content’ argument (which is a good argument on its own), I think a decent case can be made against the supposed need to distinguish between different types of resources in this way, and mechanisms outlined to alleviate any problems – Content-Location being a good case in point.Anyway, great article Ian. Thanks! :)

  14. olyerickson says:

    Thanks for this posting;

  15. billroberts says:

    Ian – this makes a huge amount of sense. Shall we just start doing it?Having gone round another IR vs NIR brain bender the other day (a skos:Concept should be an NIR but the ConceptScheme is an IR…etc) I think this would help everyone and really help take up of linked data. Making it much easier to serve linked data from a regular static web server should greatly lower the barrier to people doing that.Bill

  16. Ian Davis says:

    Dave, in the current state of affairs when you get a 200 response you know for sure that the thing you have is an information resource. The 303 tells you nothing.I’m not really concerned with # URIs in this piece. I’ve made my case against them before without a lot of impact. Maybe they don’t break anything that we need to care about, but there is an inherent fragility with them, not least that when you want to move your primary resource all the secondary ones have to move with it.What I’m mainly trying to do with this post is move on from the conversation about IRs which were an artifact of the compromise of httpRange-14.Sorry about the account creation step – I didn’t realise that was needed. I’ll see if there’s a way to make it optional.

  17. Ian Davis says:

    Bill, I'm pleased that so many people are ready to move ahead on this. I've given it a lot of thought over the years and I think most of the arguments against doing it are covered off now. But I'd still like to hear any contrary views and see if there's something crucial that I've missed. Perhaps posting it while everyone is travelling to ISWC wasn't the best idea :)

  18. billroberts says:

    “while everyone is travelling to ISWC” – probably the best time to stage a 303 coup :-)

  19. yang_squared says:

    ok, but what happens when you dereferencing the subject URI of ex:isDescribedBy? Don’t u need 303 again. #linkeddata

  20. Ian Davis says:

    No, the subject of ex:isDescribedBy is the thing itself (i.e. the toucan). You get the same representation back that you just read.

  21. noahm says:

    Let’s say URIT is the URI for a Tucan, a real colorful, noisy bird. We give up on 303 and go with 200, so when I do a GET what comes back is a document, perhaps in RDF but perhaps not, describing the TucanNow, let’s say I want to make a statement about the description document. For example, that the description document is 1 year old. Is the RDF statement <urit> <#age> 1 about the Tucan or the description document? If we stick with 303, inconvenient as it’s known to be for some of the reasons stated, then we have GET(URIT) redirects to URIforTucanDescription. Now I can say: <urit> <#age> 1 is unambiguously a statement about the Tucan; <urifortucandescription> <#age> 1 is a statement about the description document.Puns usually cause trouble in computer systems, particularly when ambiguity is possible (as above), and where humans aren’t in the loop to disambiguate.

  22. Ian Davis says:

    I describe almost exactly this problem in my post and supply an answer: you give the description document a different URI and link it with a new property ex:isDescribedBy. You use the document's URI to make statements about the document and the toucan's URI to make statements about the toucan.

  23. Ian Davis says:

    I sent a link to this blog post to the Linked Open Data mailing list which has provoked a parallel discussion:http://markmail.org/thread/mkoc5kxll6bbjbxk

  24. masinter says:

    http://lists.w3.org/Archives/Public/www-tag/2010Nov/att-0025/duri.txt (next version of http://tools.ietf.org/html/draft-masinter-dated-uriNote that there is no necessity to use any particular protocol (HTTP) or any particular MIME type; I think that’s an advantage.

  25. Ian Davis says:

    Larry, thanks for the pointer to duri. I’m following those discussions (and tdb too). However, at this point in the deployment of the semweb I’m really looking to make more use of existing protocols and deployed infrastructure and it seems to me that those proposals are some years off being widely deployed.

  26. prototypo says:

    My (rather long) response is at http://bit.ly/cMkCh3

  27. […] is a follow up to my post earlier this week which resulted in lot of very positive discussion on this blog, the LOD mailing list and […]

  28. […] Interest Group Note, December 3, 2008. See http://www.w3.org/TR/cooluris/. [10] Ian Davis, 2010. Is 303 Really Necessary? Blog post, November 2010, accessed 20 January 2012. (See […]

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 29 other followers

%d bloggers like this: