In Search of Ambiguity
This is inspired by Jeni’s recent blog post What do URIs mean anyway? where she writes:
The imperfection of the real world as it applies to linked data is that URIs will be used in ambiguous ways. We might not like it; we might write best practice documents that encourage people to have separate URIs for web-thing and non-web-thing, develop tools that help people detect when they’ve used the wrong URI, and so on. But it will still happen, and in my opinion we need to work out how to cope.I think there is less ambiguity than Jeni states.
A lot of the perception of ambiguity in these arguments comes from in-built preconceptions about the nature of documents on the web. It’s easy to forget that when you think you’re accessing a webpage you’re not really getting the actual document that is on the web server but just a kind of snapshot of it at a point in time. In HTTP we call those snapshots “representations”. The important point is that the URI always identifies the resource and never the representation. You use the representations to learn about the resource you are interacting with.
To illustrate my points about ambiguity I worked through quite a few examples of HTTP interactions to try and expose where the supposed ambiguity would lie. The examples follow, but it is important to note that I am not using the extra information that the current resolution on httpRange-14 provides (namely that a 200 response says the resource is an “information resource”). I focus on license information in the examples because this is often cited as problematic where URIs are used ambiguously.
To be clear on the terminology, each request and response is a message and the response messages contain headers and a body which is the representation of the resource.
Example 1
Request:GET /example1
Host: example.com
Response:
HTTP/1.1 200 OK
Date: Mon, 6 Jul 2011 14:12:53 GMT
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT
Content-Length: 12
Content-Type: text/plain
Hello World!
What do we know?
- There is a resource identified by http://example.com/example1
- The resource has a plain text representation which was last modified on 10 June 2010
What don’t we know? Quite a lot of things! Here are a few:
- The type of the resource, if any.
- Whether any other representations exist.
- Whether the representations change over time.
- Whether the representation is licensed under the same terms as the resource.
- The creator of the resource and/or representation.
Example 2
Request:GET /example2
Host: example.com
Response:
HTTP/1.1 200 OK
Date: Mon, 6 Jul 2011 14:12:53 GMT
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT
Content-Length: 12
Content-Type: text/html
<html>
<head>
<title>Hello World!</title>
</head>
<body>
<h1>Hello World!</h1>
</body>
</html>
What do we know? Nothing much different from the first example really, except that this resource has an html representation.
Example 3
Request:GET /example3
Host: example.com
Response:
HTTP/1.1 200 OK
Date: Mon, 6 Jul 2011 14:12:53 GMT
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT
Content-Length: 166
Content-Type: text/html
<html>
<head>
<title>Hello World!</title>
<link rel="license" href="http://example.com/license">
</head>
<body>
<h1>Hello World!</h1>
</body>
</html>
This example has some metadata embedded in the representation which we can extract and use to increase what we know:
- There is a resource identified by http://example.com/example3
- The resource has an html representation which was last modified on 10 June 2010
- The resource has a license relationship with http://example.com/license
- The license relationship is identified by http://www.w3.org/1999/xhtml/vocab#license
One important thing we don’t know is whether the representation is licensed in the same way as the resource. This could be improved if definition of the license property suggested some inferences that could apply to the representations.
Example 4
Request:GET /example4
Host: example.com
Response:
HTTP/1.1 200 OK
Date: Mon, 6 Jul 2011 14:12:53 GMT
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT
Content-Length: 107
Content-Type: text/html
Link: <http://example.com/license>; rel="license"
<html>
<head>
<title>Hello World!</title>
</head>
<body>
<h1>Hello World!</h1>
</body>
</html>
We’ve moved the license metadata out of the body of the message into the headers. What do we now know?
- There is a resource identified by http://example.com/example4
- The resource has an html representation which was last modified on 10 June 2010
- The resource has a license relationship with http://example.com/license
- The license relationship is identified by http://www.w3.org/2005/Atom#license
Example 5
Request:GET /example5
Host: example.com
Response:
HTTP/1.1 200 OK
Date: Mon, 6 Jul 2011 14:12:53 GMT
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT
Content-Length: 94
Content-Type: text/turtle
@prefix xh: <http://www.w3.org/1999/xhtml/vocab#> .
<> xh:license <http://example.com/license> .
We don’t know anything except:
- There is a resource identified by http://example.com/example5
- The resource has a turtle representation which was last modified on 10 June 2010
- The resource has a http://www.w3.org/1999/xhtml/vocab#license relationship with http://example.com/license
Example 6
Request:GET /example6
Host: example.com
Response:
HTTP/1.1 200 OK
Date: Mon, 6 Jul 2011 14:12:53 GMT
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT
Content-Length: 250
Content-Type: text/html
<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#">
<head>
<title>Hello World!</title>
</head>
<body>
<div about="">
<h1>Hello World!</h1>
<a href="http://example.com/license" rel="xh:license">License</a>
</div>
</body>
</html>
This, again, is the same as example 3 but we have used some RDFa to express the license relationship.
So we’ve seen that examples 3 through to 6 are conveying basically the same information in different ways and they all leave us with roughly the same gaps in our knowledge.
Example 7
Request:GET /example7
Host: example.com
Response:
HTTP/1.1 200 OK
Date: Mon, 6 Jul 2011 14:12:53 GMT
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT
Content-Length: 210
Content-Type: text/html
<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
<head>
<title>Hello World!</title>
</head>
<body>
<div about="" typeof="foaf:Document">
<h1>Hello World!</h1>
<a href="http://example.com/license" rel="xh:license">
</div>
</body>
</html>
This example extends example 6 to add some type information. We now know that the resource has an rdf:type of http://xmlns.com/foaf/0.1/Document. We still don’t know anything more about the representation but there is no ambiguity here, the type property only applies to the resource.
There are good reasons to infer that the representation we received is a document, perhaps even a foaf:Document because it’s a stream of bytes that we can process with a computer. But since that would be trivially true for every possible representation then it doesn’t add much useful information.
Example 8
Request:GET /example8
Host: example.com
Response:
HTTP/1.1 200 OK
Date: Mon, 6 Jul 2011 14:12:53 GMT
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT
Content-Length: 210
Content-Type: text/html
<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
<head>
<title>Hello World!</title>
</head>
<body>
<div about="" typeof="foaf:Person">
<h1>Hello World!</h1>
<a href="http://example.com/license" rel="xh:license">
</div>
</body>
</html>
Now the examples are getting more interesting. Now we know the following:
- There is a resource identified by http://example.com/example8
- The resource has an html representation which was last modified on 10 June 2010
- The resource has a http://www.w3.org/1999/xhtml/vocab#license relationship with http://example.com/license
- The resource has an rdf:type of http://xmlns.com/foaf/0.1/Person
Example 9
Request:GET /example9
Host: example.com
Response:
HTTP/1.1 200 OK
Date: Mon, 6 Jul 2011 14:12:53 GMT
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT
Content-Length: 210
Content-Type: text/html
<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
<head>
<title>Hello World!</title>
<link rel="license" href="http://example.com/license">
</head>
<body>
<div about="" typeof="foaf:Person">
<h1>Hello World!</h1>
</div>
</body>
</html>
Here’s a better example of where inadvertent ambiguity could arise. Perhaps the HTML author was attempting to say that the HTML representation of a person had a particular license. However, they thought they were saying that, but actually they are saying exactly the same as example 8:
- There is a resource identified by http://example.com/example9
- The resource has an html representation which was last modified on 10 June 2010
- The resource has a http://www.w3.org/1999/xhtml/vocab#license relationship with http://example.com/license
- The resource has an rdf:type of http://xmlns.com/foaf/0.1/Person
Referring to Representations
So how can we support what the author really intended? They wanted to say that the resource had a particular set of properties but the html document they sent containing that information was licensed in a particular way.To do this they would need some way to refer to the representation, which is a problem because representations generally aren’t assigned identifiers in the web architecture. HTTP defines the content-location header for this purpose but the problem is that the representation is transitory. Content location is actually a property of the message, not of the resource. In other words it says “for this message here’s an identifier for the representation, but the next message might have a different one”. Worse still, the message doesn’t have an identifier either
Example 10
Request:GET /example10
Host: example.com
Response:
HTTP/1.1 200 OK
Date: Mon, 6 Jul 2011 14:12:53 GMT
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT
Content-Length: 210
Content-Type: text/html
Content-Location: /example10a
<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
<head>
<title>Hello World!</title>
</head>
<body>
<div about="" typeof="foaf:Person">
<h1>Hello World!</h1>
</div>
<div about="/example10a">
<a href="http://example.com/license" rel="xh:license">License</a>
</div>
</body>
</html>
So here’s what we know from this:
- There is a resource identified by http://example.com/example10
- The resource has an html representation which was last modified on 10 June 2010
- The html representation is identified by http://example.com/example10a
- The resource has an rdf:type of http://xmlns.com/foaf/0.1/Person
- The representation has a http://www.w3.org/1999/xhtml/vocab#license relationship with http://example.com/license
Example 11
Request:GET /example11
Host: example.com
Response:
HTTP/1.1 200 OK
Date: Mon, 6 Jul 2011 14:12:53 GMT
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT
Content-Length: 210
Content-Type: text/html
<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:wdrs="http://www.w3.org/2007/05/powder-s#">
<head>
<title>Hello World!</title>
</head>
<body>
<div about="" typeof="foaf:Person">
<h1>Hello World!</h1>
<div rel="wdrs:describedby">
<div about="/example11a">
<a href="http://example.com/license" rel="xh:license">License</a>
</div>
</div>
</div>
</body>
</html>
So here’s what we know from this:
- There is a resource identified by http://example.com/example11
- The resource has an html representation which was last modified on 10 June 2010
- The resource has an rdf:type of http://xmlns.com/foaf/0.1/Person
- There is another resource identified by http://example.com/example11a
- The resource has a http://www.w3.org/2005/05/powder-s#describedby relationship with http://example.com/example11a
- The http://example.com/example11a resource has a http://www.w3.org/1999/xhtml/vocab#license relationship with http://example.com/license
Example 12
Request:GET /example12
Host: example.com
Response:
HTTP/1.1 303 See Other
Date: Mon, 6 Jul 2011 14:12:53 GMT
Location: /example12a
Request 2:
GET /example12a
Host: example.com
Response 2:
HTTP/1.1 200 OK
Date: Mon, 6 Jul 2011 14:12:53 GMT
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT
Content-Length: 210
Content-Type: text/html
<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:wdrs="http://www.w3.org/2007/05/powder-s#">
<head>
<title>Hello World!</title>
</head>
<body>
<div about="/example12" typeof="foaf:Person">
<h1>Hello World!</h1>
<div rel="wdrs:describedby">
<div about="/example12a">
<a href="http://example.com/license" rel="xh:license">License</a>
</div>
</div>
</div>
</body>
</html>
Now we use a 303 redirect to separate the two resources. Here’s what we know from this:
- There is a resource identified by http://example.com/example12
- The resource has an rdf:type of http://xmlns.com/foaf/0.1/Person
- There is another resource identified by http://example.com/example12a
- The resource has a http://www.w3.org/2005/05/powder-s#describedby relationship with http://example.com/example12a
- The http://example.com/example12a resource has an html representation which was last modified on 10 June 2010
- The http://example.com/example12a resource has a http://www.w3.org/1999/xhtml/vocab#license relationship with http://example.com/license
A Solution
It seems that we can't easily refer to the actual representations that are sent to our users which means it's difficult to make statements about them. The content-location header seems to do the trick but it still requires the author to understand the gory details of HTTP.I think there is a solution to this problem though. It’s similar in spirit to what Jeni called One-Step-Removed Properties. However, my idea is to define Representation Properties (I mentioned these on the TAG list last week). These are properties that are defined to infer meaning for the representations of a resource.
Currently a triple <s> xh:license <o>
has the meaning “s is licensed using o”. We could redefine xh:license to be a representation property which would change its meaning to “s has representations that are licensed using o”.
You might ask, why not define it as “s is licensed and has representations that are licensed using 0”. My answer is that I don’t really think this adds anything. It’s impossible to get the actual resource, even if it’s just an html document on a web server’s file system. All interaction with the resource is via its representations and those are the things the user needs to know the license for.
Let’s go back to our potentially ambiguous example 9:
Example 9rev
Request:GET /example9
Host: example.com
Response:
HTTP/1.1 200 OK
Date: Mon, 6 Jul 2011 14:12:53 GMT
Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT
Content-Length: 210
Content-Type: text/html
<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
<head>
<title>Hello World!</title>
<link rel="license" href="http://example.com/license">
</head>
<body>
<div about="" typeof="foaf:Person">
<h1>Hello World!</h1>
</div>
</body>
</html>
We have exactly the same request and response. But we have some extra information:
- There is a resource identified by http://example.com/example9
- The resource has an html representation which was last modified on 10 June 2010
- The resource has an rdf:type of http://xmlns.com/foaf/0.1/Person
- The resource has a http://www.w3.org/1999/xhtml/vocab#license relationship with http://example.com/license which means that the html representation is licensed using http://example.com/license
It also has the advantage of being a simple change to make: it just means the w3c need to publish an rdf schema that includes this definition of xh:license (and probably the other xhtml link types too). Dublin Core would be another obvious candidate for this kind of redefinition.