Google+

Back to Basics with Linked Data and HTTP

7

6 December 2010 by Ian Davis

In the Semantic Web, it is not the Semantic which is novel, it is the Web

That quote, attributed to Chris Welty of IBM, is the one that best captures my outlook on the Semantic Web and Linked Data. The Web has connected people to information at an unprecedented rate and scale and comparisons to the impact of Caxton’s press are justified however trite they are. For the majority of people using the Web it’s a rich place full of stories, pictures, shops and encyclopedias but for us Web technologists we see that all of those marvelous things are enabled by the use of URIs, HTTP and various machine readable data formats.

HTTP itself is pretty simple: it tells you how you can use a specific type of URI to lookup some information. It doesn’t try and tell you what that information means but it provides plenty of clues about the provenance, format and timeliness of that information. HTTP just provides the transfer mechanism for messages between a client and a server and it’s very good at its job. It’s very natural to assume that the information received from a URI request is in some way related to the thing identified by that URI. Once you have that mechanism then it’s obvious that if you want to assign an identifier for something and publish information relating to it to the largest number of people then you’re going to pick an HTTP URI.

The HTTP specification uses the word “representation” to denote the relationship between a URI and the thing it identifies as in: “the information you received is a representation of the thing identified by that URI”. The spec doesn’t define what representation means any further than that.

The nature of that relationship and the meaning of “representation” has been the subject of a huge amount of debate by the Semantic Web community spread over half a decade and has resulted in a number of new terms such as “information resource” and some conventions such as “303 redirects” to resolve perceived problems. After all that debate we are left with something isn’t technically broken but has not been universally hailed as essential to the fabric of the Web. The majority of the wider Web community are content with publishing information using URIs, simply ignoring these strange conventions and distinctions that the Semantic Web commuity find so important.

I’ve always been of the opinion that this debate could have been avoided by keeping the responsibilities of each component of the Web separate and clean. The seperation should be: identify things using URIs, transfer information using HTTP, encode meaning in the data formats used in the transfer. Instead, we have special interpretations of certain parts of URIs and special interpretations of certain HTTP status codes to infer special meaning on the information being transferred.

I think it’s time to stop blurring those responsibilities so I’m going back to basics.

  1. Plain old resources — I don’t find the distinction between information resources and non-information resources to be a useful one when compared with the complexity of deciding which is which so I’m going to stop using that terminology. From now on everything that has a URI is just a plain old resource, just like it says in the HTTP spec.
  2. The meaning is in the message, not the protocol — I don’t think it’s useful to overload HTTP with notions of descriptions, documents and content. I think that classification is best conveyed in the body of the messages used by HTTP, not in the protocol itself. This means I won’t assume I can get a description of a resource or a copy of it by using its URI. Instead when I use GET on a URI I am simply looking up the information that the owner of that URI desires to send. Now, it might be the case that some other information I already have says that when I lookup information from a URI then I should treat it as a description of the identified resource. That’s entirely fine and a good separation of concerns. I may also discover some other evidence that the received information is not a description after all but something else. That’s cool too because any system I am using that can tell me these things should also be able to help me determine inconsistencies and discrepancies in the information I am collecting.
  3. Use the protocol to manage and route information — If I want to allow other people to change the information that can be looked up using any of my URIs then I’ll enable support for POST or PUT methods. I can even allow DELETE the method to prevent any future information being transmitted to users. Because I’m leaving the interpretation of the information to the message body I don’t need to worry whether someone is updating the content or the description of the resource. I also have the full range of redirects and other HTTP machinery available to help people find the information I have that’s related to my resource.
  4. Place trust in the information returned by a URI — the information received when a user accesses a URI has a special position because only the owner of that URI gets to control what is sent. That’s useful if I want to lookup what the owner thinks is important in relation to their URI and I should place more trust in that information than I should with information from other sources.

That’s it really. No messing around with special status codes and redirects based on hard-to-pin-down concepts. No special types of URI that differ in meaning depending on what software you use. Just standard HTTP. When someone enters a URI in their browser or application, they get useful related information back. Moreover, the URI in their browser’s address bar is one they can use to refer to that resource in any context. They can bookmark it, send it in an email, use it in a SPARQL query or even write some of their own RDF with it. I like that kind of simplicity.

7 thoughts on “Back to Basics with Linked Data and HTTP

  1. danbri says:

    As you hopefully know, I’m generally sympathetic to your outlook on matters http-range-14-y…

    However,
    “That’s useful if I want to lookup what the owner thinks is important in relation to their URI and I should place more trust in that information than I should with information from other sources.”

    Could you expand on this observation?

    Scenario:
    Assume I assign the uri http://danbri.org/tmp/allyouneedtoknowaboutiand to you, and place there let’s say an rdfa description of you, … with primaryTopic and so on making it quite clear we’re talking about Ian Davis, “technical architect; CTO of Talis; co-author of RSS 1.0; creator of FOAF icons; Semantic Web hacker” etc.

    Who should place more trust in what here? The trust is that the triples found there can be taken to reflect my view, rather than that they reflect any wider reality, right?

    The ‘should place more trust’ aspect needs some health warnings! Can you sketch that perhaps?

    • iand says:

      Hi Dan. What I mean is that given a choice of information relating to a URI then I should generally assign more weight to the information received when requesting the URI itself. I’m not forced to trust it more, but as a rule of thumb its useful to do so. There may be even better ways to determine what information to trust (eg my friend’s opinion) but that’s outside of pure HTTP. Hope that clarifies.

  2. danbri says:

    I think I understand your position, although it seems close sometimes to slipping from ‘information about a URI’ to ‘information about the thing the URI names’.

    If I create a URI identifying you, requesting it helps us find out what the owner of the URI thinks, nothing more. It’s authoritative in the sense that you get the URI’s view of the URI’s topic. That doesn’t really help us figure out which of possibly many URIs to put more trust in, so I find language like “…should place more trust in that information” a little strong.

    A weaker formulation starts to sound obvious: “to find out what information some URI provides about the thing it identifies, … go and look at what it says”. The point is that you can at least be sure (unless you’re on free public wifi :) that you’re getting the [controller of the] URI’s point of view on the matter.

  3. iand says:

    OK, makes sense. I’m not trying to overwork the idea of trust here and it’s probably the wrong term for what I’m trying to get across.

  4. danbri says:

    I guess we agree to agree then.

    Probably a confusing factor here is that in the ‘#-free-HTTP-URIs’-only-name-documents school of HTTP thought, what you get back from a GET is not just the URI owners view of the world, but an authoritative serialization of named the thing itself. And that stronger notion of authority has leaked out a bit into dialog with other perspectives on HTTP, and is being applied to situations where we’re dealing with annotational descriptions rather than authoritative representations-as-in-serializations.

  5. [...] Back to Basics with Linked Data and HTTP – “I’ve always been of the opinion that this debate could have been avoided by keeping the responsibilities of each component of the Web separate and clean. The seperation should be: identify things using URIs, transfer information using HTTP, encode meaning in the data formats used in the transfer. Instead, we have special interpretations of certain parts of URIs and special interpretations of certain HTTP status codes to infer special meaning on the information being transferred. I think it’s time to stop blurring those responsibilities so I’m going back to basics.” (by Ian Davis) [...]

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 27 other followers

%d bloggers like this: