Nov 21 2007

Reformulating the Web Architecture

Published by Ian Davis at 4:53 pm under Uncategorized and tagged as , ,

So, accepting that URIs with fragments are generally a broken piece of architecture for the Semantic Web and that information resources are not adding any real substance, here’s how I see the Web Architecture being reformulated for use with the Semantic Web:

  1. A hashless URI should be allowed to denote any resource whatsoever. Documents, books, people, galaxies and unicorns. There is no ambiguity here, the URI denotes a single thing. More than one URI can denote the same thing, so I can have a URI that denotes the city of London, and Danny can have a different URI that also denotes London.
  2. A representation of a resource can be obtained by issuing an HTTP GET on a URI. The representation is a sequence of bits that somehow stands in for the resource the URI denotes. Content negotiation can be used to select an appropriate format for the representation, withouth changing the actual resource being denoted. Perhaps my URI denoting London can respond with an HTML document containing essential facts and figures about the city, a JPEG aerbyial photograph, an SVG streetmap or a sound recording of the sounds encountered while in the city itself. None of these things are London, but they all can stand in for it in some limited fashion. I could retrieve them all to obtain a better sense of London itself, but I cannot actually obtain London using HTTP.
  3. URIs containing hashes are constrained in what they may denote and have an inherent ambiguity due to their reliance on the particular representation obtained. Their denotations vary depending on the URI plus a set of HTTP headers used during the request.
  4. There is no such thing as an “Information Resource”. All resources are made equal. However for many resources, the only representation available happens to be identical to the resource itself. Still, you cannot obtain the actual resource using HTTP, but you can get a copy in the form of a representation. The majority of HTML documents on the web behave in this manner, a single representation that is a copy of the resource itself.

These aren’t huge changes and they’re backwards compatible with the existing web. On the other hand they greatly reduce the reliance on fragment identifiers and they encourage people to use real unambiguous URIs to refer to things other than documents, weaving the Semantic Web right into today’s Web.

For background, you might like to read my earlier posts on this subject:

9 responses so far

9 Responses to “Reformulating the Web Architecture”

  1. Ryan Shawon 21 Nov 2007 at 5:58 pm

    This has been an excellent series of posts and I agree with your conclusions. The arguments over what URIs can and can’t denote have always seemed to me to be somewhat of a red herring, generated by researchers proceeding from the assumption that the SW needs to serve the needs of algorithms first, people second. Your approach is far more pragmatic and reflects an understanding that the SW must be usable by people first if it is to be used at all. If we have to make our algorithms a little smarter or a little less elegant to cope, so be it.

  2. Kingsley Idehenon 21 Nov 2007 at 6:56 pm

    Ian,

    I agree :-)

    I’ve paraphrased a very important point (often lost) that you made above:

    *All resources are made equal. However for many resources, the only representation available happens to be identical to the resource itself. The majority of HTML documents on the web behave in this manner, a single representation that is a copy of the resource itself. On the other hand there are other Things that are abstract in nature (real life things: cars, people, places, music etc..) were you cannot obtain the actual resource using HTTP, and as a result you can get an association in the form of representation via a resource that describes these things (e.g. an RDF file or the response from a SPARQL DESCRIBE or CONSTRUCT).*

    Related items from our Linked Data Deployment collateral:

    1. http://virtuoso.openlinksw.com/Whitepapers/html/VirtLinkedDataDeployment.html – White Paper (HTML)
    2. http://virtuoso.openlinksw.com/presentations/Virtuoso_Deploying_Linked_Data/Virtuoso_Deploying_Linked_Data.html – Slidy Presentation

    My Personal URI: http://kidehen.idehen.net/dataspace/person/kidehen#this

    Kingsley

  3. Kingsley Idehenon 21 Nov 2007 at 7:38 pm

    Ian,

    This note by Dan Connolly also sheds great light on this matter:
    http://www.w3.org/2006/04/irw65/urisym

  4. Lee Feigenbaumon 21 Nov 2007 at 8:00 pm

    Ian,

    I’ve never pretended to be much into Web architecture, so forgive what’s probably a stupid question. If the same URI denotes both a person and a representation of the person, how does one disambiguate between statements about the person (e.g. rdfs:label) and statements about the Web page?

    Lee

  5. iandon 21 Nov 2007 at 9:08 pm

    Lee, in your example the URI can only denote the person. The representation doesn’t have a URI. I think this is a major source of confusion in the web architecture. You could give the representation a URI, but AFAIK there is no standard vocabulary to relate that to the underlying resource.

  6. Daveon 22 Nov 2007 at 1:23 am

    To Lee’s comment , couldn’t the properties of the asserted statements provide the context necessary to differentiate the person from the representation?

    “1975-01-12″.
    “2007-11-20″.

    At first glance the overloading might seem disagreeable, but I’m thinking almost any concept could be considered overloaded if compared to a similar concept in a sufficiently “reductionist” ontology. Overloading, a very relative term, is unavoidable, and so, must be embraced and worked with.

    Ian, what do you think?

  7. Richard Cyganiakon 22 Nov 2007 at 1:28 am

    Ian, let’s assume that a GET on http://example.org/~bob returns an HTML page saying “Welcome to Bob’s homepage”. It doesn’t include any RDF or other semantic information, it’s just plain old HTML. Let’s further assume that we cannot process natural language. With your proposal, what do we know about the thing identified by the URI? Do we know if it identifies a document or a person?

  8. dulanovon 22 Nov 2007 at 5:00 am

    “Lee, in your example the URI can only denote the person. The representation doesn’t have a URI.”

    I absolutely agree with you! We can’t separate resources for information and non-information. And I believe that future pragmatic tools don’t distinguish them too.

    “Ian, let’s assume that a GET on http://example.org/~bob returns an HTML page saying “Welcome to Bob’s homepage”. It doesn’t include any RDF or other semantic information, it’s just plain old HTML.”

    It’s true also, because http://example.org/~bob represents the home page resource for Bob. And, for an example, the http://example.org/persons/~bob resource represents the Bob as a person. And we can get various additional representations for those resources.

    Richard, I think that the OpenLinkedData initiative is going to the wrong direction with the http-range-14 problem. Every week in the LinkedOpenData email list you explain about this ’self invented feature’ for any new persons. I agree with iand absolutely, we have only resources, which may have many URIs to identify them. After that we only get appropriate for us representations for them. And we can use metadata in the format of representation to describe it.

  9. dulanovon 22 Nov 2007 at 5:20 am

    Sorry, I try to reformulate my previous answer. I think, we should to distinct the resources by another way, not by information and non-information criteria. Any representations of resources must be generated from some center representation, for example, RDF. If we have additional picture or other resources for them, we need to use additional URIs too and set relations in RDF explicitly. And if we try to get a SVG represention of London, we will receive not a real picture of London, which is an another resource with additional statements, but the SVG diagram generated from RDF data about London. And after that we can go to additional depicts. Yes, we can redirect a user to that picture directly, but it’s our own decision how to implement it.