Thu, Jul 7, 2011

In Search of Ambiguity

This is inspired by Jeni’s recent blog post What do URIs mean anyway? where she writes:

The imperfection of the real world as it applies to linked data is that URIs will be used in ambiguous ways. We might not like it; we might write best practice documents that encourage people to have separate URIs for web-thing and non-web-thing, develop tools that help people detect when they’ve used the wrong URI, and so on. But it will still happen, and in my opinion we need to work out how to cope.

I think there is less ambiguity than Jeni states.

A lot of the perception of ambiguity in these arguments comes from in-built preconceptions about the nature of documents on the web. It’s easy to forget that when you think you’re accessing a webpage you’re not really getting the actual document that is on the web server but just a kind of snapshot of it at a point in time. In HTTP we call those snapshots “representations”. The important point is that the URI always identifies the resource and never the representation. You use the representations to learn about the resource you are interacting with.

To illustrate my points about ambiguity I worked through quite a few examples of HTTP interactions to try and expose where the supposed ambiguity would lie. The examples follow, but it is important to note that I am not using the extra information that the current resolution on httpRange-14 provides (namely that a 200 response says the resource is an “information resource”). I focus on license information in the examples because this is often cited as problematic where URIs are used ambiguously.

To be clear on the terminology, each request and response is a message and the response messages contain headers and a body which is the representation of the resource.

Example 1

Request:

GET /example1
Host: example.com

Response:

HTTP/1.1 200 OK 
Date: Mon, 6 Jul 2011 14:12:53 GMT
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT 
Content-Length: 12 
Content-Type: text/plain 

Hello World!

What do we know?

There is a resource identified by http://example.com/example1
The resource has a plain text representation which was last modified on 10 June 2010

There's no ambiguity here between the resource and the representation, although admittedly there is very little information here at all.

What don’t we know? Quite a lot of things! Here are a few:

The type of the resource, if any.
Whether any other representations exist.
Whether the representations change over time.
Whether the representation is licensed under the same terms as the resource.
The creator of the resource and/or representation.

Example 2

Request:

GET /example2
Host: example.com

Response:

HTTP/1.1 200 OK 
Date: Mon, 6 Jul 2011 14:12:53 GMT
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT 
Content-Length: 12 
Content-Type: text/html 


<html>
  <head>
    <title>Hello World!</title>
  </head>
  <body>
    <h1>Hello World!</h1>
  </body>
</html>

What do we know? Nothing much different from the first example really, except that this resource has an html representation.

Example 3

Request:

GET /example3 
Host: example.com

Response:

HTTP/1.1 200 OK 
Date: Mon, 6 Jul 2011 14:12:53 GMT 
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT 
Content-Length: 166 
Content-Type: text/html 


<html> 
 <head> 
 <title>Hello World!</title> 
 <link rel="license" href="http://example.com/license"> 
 </head> 
 <body> 
 <h1>Hello World!</h1> 
 </body> 
</html>

This example has some metadata embedded in the representation which we can extract and use to increase what we know:

There is a resource identified by http://example.com/example3
The resource has an html representation which was last modified on 10 June 2010
The resource has a license relationship with http://example.com/license
The license relationship is identified by http://www.w3.org/1999/xhtml/vocab#license

We don't know the type of the resource but we may now be able to use the license relationship to infer one. Apart from the last modified date and content type we still don't know anything more about the representation that we were sent.

One important thing we don’t know is whether the representation is licensed in the same way as the resource. This could be improved if definition of the license property suggested some inferences that could apply to the representations.

Example 4

Request:

GET /example4 
Host: example.com

Response:

HTTP/1.1 200 OK 
Date: Mon, 6 Jul 2011 14:12:53 GMT 
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT 
Content-Length: 107 
Content-Type: text/html 
Link: <http://example.com/license>; rel="license" 


<html>
 <head>
 <title>Hello World!</title>
 </head>
 <body>
 <h1>Hello World!</h1>
 </body>
</html>

We’ve moved the license metadata out of the body of the message into the headers. What do we now know?

There is a resource identified by http://example.com/example4
The resource has an html representation which was last modified on 10 June 2010
The resource has a license relationship with http://example.com/license
The license relationship is identified by http://www.w3.org/2005/Atom#license

This is very similar to example 3, except for one small niggle: the link header is specified in RFC 5988 and it defines the license relationship to be that defined by RFC 4946 which is for the Atom XML format. That aside, we have basically the same information.

Example 5

Request:

GET /example5
Host: example.com

Response:

HTTP/1.1 200 OK
Date: Mon, 6 Jul 2011 14:12:53 GMT
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT
Content-Length: 94
Content-Type: text/turtle


@prefix xh: <http://www.w3.org/1999/xhtml/vocab#> .
<> xh:license <http://example.com/license> .

We don’t know anything except:

There is a resource identified by http://example.com/example5
The resource has a turtle representation which was last modified on 10 June 2010
The resource has a http://www.w3.org/1999/xhtml/vocab#license relationship with http://example.com/license

Apart from the type of the representation, this is essentially the same as example 3.

Example 6

Request:

GET /example6
Host: example.com

Response:

HTTP/1.1 200 OK
Date: Mon, 6 Jul 2011 14:12:53 GMT
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT
Content-Length: 250
Content-Type: text/html


<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#">
 <head>
 <title>Hello World!</title>
 </head>
 <body>
 <div about="">
 <h1>Hello World!</h1>
 <a href="http://example.com/license" rel="xh:license">License</a>
 </div>
 </body>
</html>

This, again, is the same as example 3 but we have used some RDFa to express the license relationship.

So we’ve seen that examples 3 through to 6 are conveying basically the same information in different ways and they all leave us with roughly the same gaps in our knowledge.

Example 7

Request:

GET /example7
Host: example.com

Response:

HTTP/1.1 200 OK 
Date: Mon, 6 Jul 2011 14:12:53 GMT 
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT 
Content-Length: 210 
Content-Type: text/html 


<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#"
 xmlns:foaf="http://xmlns.com/foaf/0.1/">
 <head>
 <title>Hello World!</title>
 </head> 
 <body> 
 <div about="" typeof="foaf:Document"> 
 <h1>Hello World!</h1>
 <a href="http://example.com/license" rel="xh:license">
 </div>
 </body>
</html>

This example extends example 6 to add some type information. We now know that the resource has an rdf:type of http://xmlns.com/foaf/0.1/Document. We still don’t know anything more about the representation but there is no ambiguity here, the type property only applies to the resource.

There are good reasons to infer that the representation we received is a document, perhaps even a foaf:Document because it’s a stream of bytes that we can process with a computer. But since that would be trivially true for every possible representation then it doesn’t add much useful information.

Example 8

Request:

GET /example8
Host: example.com

Response:

HTTP/1.1 200 OK
Date: Mon, 6 Jul 2011 14:12:53 GMT
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT
Content-Length: 210
Content-Type: text/html


<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#"
 xmlns:foaf="http://xmlns.com/foaf/0.1/">
 <head>
 <title>Hello World!</title>
 </head>
 <body>
 <div about="" typeof="foaf:Person">
 <h1>Hello World!</h1>
 <a href="http://example.com/license" rel="xh:license">
 </div>
 </body>
</html>

Now the examples are getting more interesting. Now we know the following:

There is a resource identified by http://example.com/example8
The resource has an html representation which was last modified on 10 June 2010
The resource has a http://www.w3.org/1999/xhtml/vocab#license relationship with http://example.com/license
The resource has an rdf:type of http://xmlns.com/foaf/0.1/Person

We don't know if this information is inconsistent because we don't know if things of type foaf:Person can have an xh:license property. However we can be sure there is no ambiguity regarding what the metadata applies to: it applies to the resource. The URI is only referring to one thing.

Example 9

Request:

GET /example9
Host: example.com

Response:

HTTP/1.1 200 OK
Date: Mon, 6 Jul 2011 14:12:53 GMT
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT
Content-Length: 210
Content-Type: text/html


<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#"
 xmlns:foaf="http://xmlns.com/foaf/0.1/">
 <head>
 <title>Hello World!</title>
 <link rel="license" href="http://example.com/license"> 
 </head>
 <body>
 <div about="" typeof="foaf:Person">
 <h1>Hello World!</h1>
 </div>
 </body>
</html>

Here’s a better example of where inadvertent ambiguity could arise. Perhaps the HTML author was attempting to say that the HTML representation of a person had a particular license. However, they thought they were saying that, but actually they are saying exactly the same as example 8:

There is a resource identified by http://example.com/example9
The resource has an html representation which was last modified on 10 June 2010
The resource has a http://www.w3.org/1999/xhtml/vocab#license relationship with http://example.com/license
The resource has an rdf:type of http://xmlns.com/foaf/0.1/Person

The point here is that the data is not ambiguous and the URI is not ambiguous. There is only one interpretation, it just happens to be different to the one the HTML author thought they were making.

Referring to Representations

So how can we support what the author really intended? They wanted to say that the resource had a particular set of properties but the html document they sent containing that information was licensed in a particular way.

To do this they would need some way to refer to the representation, which is a problem because representations generally aren’t assigned identifiers in the web architecture. HTTP defines the content-location header for this purpose but the problem is that the representation is transitory. Content location is actually a property of the message, not of the resource. In other words it says “for this message here’s an identifier for the representation, but the next message might have a different one”. Worse still, the message doesn’t have an identifier either

Example 10

Request:

GET /example10
Host: example.com

Response:

HTTP/1.1 200 OK
Date: Mon, 6 Jul 2011 14:12:53 GMT
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT
Content-Length: 210
Content-Type: text/html
Content-Location: /example10a


<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#"
 xmlns:foaf="http://xmlns.com/foaf/0.1/">
 <head>
 <title>Hello World!</title>
 </head>
 <body>
 <div about="" typeof="foaf:Person">
 <h1>Hello World!</h1>
 </div>
<div about="/example10a">
<a href="http://example.com/license" rel="xh:license">License</a>
</div>
 </body>
</html>

So here’s what we know from this:

There is a resource identified by http://example.com/example10
The resource has an html representation which was last modified on 10 June 2010
The html representation is identified by http://example.com/example10a
The resource has an rdf:type of http://xmlns.com/foaf/0.1/Person
The representation has a http://www.w3.org/1999/xhtml/vocab#license relationship with http://example.com/license

That seems pretty clean and unambiguous. However it requires the author to do extra things: configure their web server to send content-location headers and refer to that extra URI in their content.

Example 11

Request:

GET /example11
Host: example.com

Response:

HTTP/1.1 200 OK
Date: Mon, 6 Jul 2011 14:12:53 GMT
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT
Content-Length: 210
Content-Type: text/html


<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#"
 xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:wdrs="http://www.w3.org/2007/05/powder-s#">
 <head>
 <title>Hello World!</title>
 </head>
 <body>
 <div about="" typeof="foaf:Person">
 <h1>Hello World!</h1>

<div rel="wdrs:describedby">
<div about="/example11a">
<a href="http://example.com/license" rel="xh:license">License</a>
</div>
</div>

</div>
</body>
</html>

So here’s what we know from this:

There is a resource identified by http://example.com/example11
The resource has an html representation which was last modified on 10 June 2010
The resource has an rdf:type of http://xmlns.com/foaf/0.1/Person
There is another resource identified by http://example.com/example11a
The resource has a http://www.w3.org/2005/05/powder-s#describedby relationship with http://example.com/example11a
The http://example.com/example11a resource has a http://www.w3.org/1999/xhtml/vocab#license relationship with http://example.com/license

Lots more explicit information, but we still don't know how the representation is licensed. We know there is another resource which is a description of the first resource but there's nothing here that connects it with the representation just received.

Example 12

Request:

GET /example12
Host: example.com

Response:

HTTP/1.1 303 See Other
Date: Mon, 6 Jul 2011 14:12:53 GMT
Location: /example12a

Request 2:

GET /example12a
Host: example.com

Response 2:

HTTP/1.1 200 OK
Date: Mon, 6 Jul 2011 14:12:53 GMT
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT
Content-Length: 210
Content-Type: text/html


<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#"
 xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:wdrs="http://www.w3.org/2007/05/powder-s#">
 <head>
 <title>Hello World!</title>
 </head>
 <body>
 <div about="/example12" typeof="foaf:Person">
 <h1>Hello World!</h1>
 
<div rel="wdrs:describedby">
<div about="/example12a">
<a href="http://example.com/license" rel="xh:license">License</a>
</div>
</div>
</div>
 </body>
</html>

Now we use a 303 redirect to separate the two resources. Here’s what we know from this:

There is a resource identified by http://example.com/example12
The resource has an rdf:type of http://xmlns.com/foaf/0.1/Person
There is another resource identified by http://example.com/example12a
The resource has a http://www.w3.org/2005/05/powder-s#describedby relationship with http://example.com/example12a
The http://example.com/example12a resource has an html representation which was last modified on 10 June 2010
The http://example.com/example12a resource has a http://www.w3.org/1999/xhtml/vocab#license relationship with http://example.com/license

This is quite similar to example 11, with all the same important information. But actually we have learned nothing new. All we know is that the resource with URI http://example.com/example12a is a description and has a particular license. We know nothing about the representation that was actually sent to us for that resource. The information we have about the description resource and its representation is identical to example 6, with the same missing pieces of information. We have no ambiguity, but we have no additional knowledge either.

A Solution

It seems that we can't easily refer to the actual representations that are sent to our users which means it's difficult to make statements about them. The content-location header seems to do the trick but it still requires the author to understand the gory details of HTTP.

I think there is a solution to this problem though. It’s similar in spirit to what Jeni called One-Step-Removed Properties. However, my idea is to define Representation Properties (I mentioned these on the TAG list last week). These are properties that are defined to infer meaning for the representations of a resource.

Currently a triple <s> xh:license <o> has the meaning “s is licensed using o”. We could redefine xh:license to be a representation property which would change its meaning to “s has representations that are licensed using o”.

You might ask, why not define it as “s is licensed and has representations that are licensed using 0”. My answer is that I don’t really think this adds anything. It’s impossible to get the actual resource, even if it’s just an html document on a web server’s file system. All interaction with the resource is via its representations and those are the things the user needs to know the license for.

Let’s go back to our potentially ambiguous example 9:

Example 9rev

Request:

GET /example9
Host: example.com

Response:

HTTP/1.1 200 OK
Date: Mon, 6 Jul 2011 14:12:53 GMT
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT
Content-Length: 210
Content-Type: text/html


<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#"
 xmlns:foaf="http://xmlns.com/foaf/0.1/">
 <head>
 <title>Hello World!</title>
 <link rel="license" href="http://example.com/license"> 
 </head>
 <body>
 <div about="" typeof="foaf:Person">
 <h1>Hello World!</h1>
 </div>
 </body>
</html>

We have exactly the same request and response. But we have some extra information:

There is a resource identified by http://example.com/example9
The resource has an html representation which was last modified on 10 June 2010
The resource has an rdf:type of http://xmlns.com/foaf/0.1/Person
The resource has a http://www.w3.org/1999/xhtml/vocab#license relationship with http://example.com/license which means that the html representation is licensed using http://example.com/license

That seems to solve the ambiguity problem pretty neatly. It's idiomatic html, plus it adheres to the fundamental rule that URIs always refer to the identified resource. It also meets my back to basics criterion of trusting what the author says about their resources.

It also has the advantage of being a simple change to make: it just means the w3c need to publish an rdf schema that includes this definition of xh:license (and probably the other xhtml link types too). Dublin Core would be another obvious candidate for this kind of redefinition.

Permalink: http://blog.iandavis.com/2011/07/in-search-of-ambiguity/

Other posts tagged as ambiguity, data, httprange14, linked-data, rdf, technology

Internet Alchemy

In Search of Ambiguity

Example 1

Example 2

Example 3

Example 4

Example 5

Example 6

Example 7

Example 8

Example 9

Referring to Representations

Example 10

Example 11

Example 12

A Solution

Example 9rev

Earlier Posts