In Search of Ambiguity
87 July 2011 by Ian Davis
This is inspired by Jeni’s recent blog post What do URIs mean anyway? where she writes:
The imperfection of the real world as it applies to linked data is that URIs will be used in ambiguous ways. We might not like it; we might write best practice documents that encourage people to have separate URIs for web-thing and non-web-thing, develop tools that help people detect when they’ve used the wrong URI, and so on. But it will still happen, and in my opinion we need to work out how to cope.
I think there is less ambiguity than Jeni states.
A lot of the perception of ambiguity in these arguments comes from in-built preconceptions about the nature of documents on the web. It’s easy to forget that when you think you’re accessing a webpage you’re not really getting the actual document that is on the web server but just a kind of snapshot of it at a point in time. In HTTP we call those snapshots “representations”. The important point is that the URI always identifies the resource and never the representation. You use the representations to learn about the resource you are interacting with.
To illustrate my points about ambiguity I worked through quite a few examples of HTTP interactions to try and expose where the supposed ambiguity would lie. The examples follow, but it is important to note that I am not using the extra information that the current resolution on httpRange-14 provides (namely that a 200 response says the resource is an “information resource”). I focus on license information in the examples because this is often cited as problematic where URIs are used ambiguously.
To be clear on the terminology, each request and response is a message and the response messages contain headers and a body which is the representation of the resource.
Example 1
Request:
GET /example1 Host: example.com
Response:
HTTP/1.1 200 OKDate: Mon, 6 Jul 2011 14:12:53 GMT Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT Content-Length: 12 Content-Type: text/plain Hello World!
What do we know?
- There is a resource identified by http://example.com/example1
- The resource has a plain text representation which was last modified on 10 June 2010
There’s no ambiguity here between the resource and the representation, although admittedly there is very little information here at all.
What don’t we know? Quite a lot of things! Here are a few:
- The type of the resource, if any.
- Whether any other representations exist.
- Whether the representations change over time.
- Whether the representation is licensed under the same terms as the resource.
- The creator of the resource and/or representation.
Example 2
Request:
GET /example2 Host: example.com
Response:
HTTP/1.1 200 OKDate: Mon, 6 Jul 2011 14:12:53 GMT Last-Modified: Wed, 10 Jun 2010 13:05:56 GMT Content-Length: 12 Content-Type: text/html <html> <head> <title>Hello World!</title> </head> <body> <h1>Hello World!</h1> </body> </html>
What do we know? Nothing much different from the first example really, except that this resource has an html representation.
Example 3
Request:
GET /example3Host: example.com
Response:
HTTP/1.1 200 OKDate: Mon, 6 Jul 2011 14:12:53 GMTLast-Modified: Wed, 10 Jun 2010 13:05:56 GMTContent-Length: 166Content-Type: text/html<html><head><title>Hello World!</title><link rel="license" href="http://example.com/license"></head><body><h1>Hello World!</h1></body></html>
This example has some metadata embedded in the representation which we can extract and use to increase what we know:
- There is a resource identified by http://example.com/example3
- The resource has an html representation which was last modified on 10 June 2010
- The resource has a license relationship with http://example.com/license
- The license relationship is identified by http://www.w3.org/1999/xhtml/vocab#license
We don’t know the type of the resource but we may now be able to use the license relationship to infer one. Apart from the last modified date and content type we still don’t know anything more about the representation that we were sent.
One important thing we don’t know is whether the representation is licensed in the same way as the resource. This could be improved if definition of the license property suggested some inferences that could apply to the representations.
Example 4
Request:
GET /example4Host: example.com
Response:
HTTP/1.1 200 OKDate: Mon, 6 Jul 2011 14:12:53 GMTLast-Modified: Wed, 10 Jun 2010 13:05:56 GMTContent-Length: 107Content-Type: text/htmlLink: <http://example.com/license>; rel="license"<html><head><title>Hello World!</title> </head> <body> <h1>Hello World!</h1> </body> </html>
We’ve moved the license metadata out of the body of the message into the headers. What do we now know?
- There is a resource identified by http://example.com/example4
- The resource has an html representation which was last modified on 10 June 2010
- The resource has a license relationship with http://example.com/license
- The license relationship is identified by http://www.w3.org/2005/Atom#license
This is very similar to example 3, except for one small niggle: the link header is specified in RFC 5988 and it defines the license relationship to be that defined by RFC 4946 which is for the Atom XML format. That aside, we have basically the same information.
Example 5
Request:
GET /example5Host: example.com
Response:
HTTP/1.1 200 OKDate: Mon, 6 Jul 2011 14:12:53 GMTLast-Modified: Wed, 10 Jun 2010 13:05:56 GMTContent-Length: 94Content-Type: text/turtle@prefix xh: <http://www.w3.org/1999/xhtml/vocab#> .<> xh:license <http://example.com/license> .
We don’t know anything except:
- There is a resource identified by http://example.com/example5
- The resource has a turtle representation which was last modified on 10 June 2010
- The resource has a http://www.w3.org/1999/xhtml/vocab#license relationship with http://example.com/license
Apart from the type of the representation, this is essentially the same as example 3.
Example 6
Request:
GET /example6Host: example.com
Response:
HTTP/1.1 200 OKDate: Mon, 6 Jul 2011 14:12:53 GMTLast-Modified: Wed, 10 Jun 2010 13:05:56 GMTContent-Length: 250Content-Type: text/html<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#"><head><title>Hello World!</title></head><body><div about=""><h1>Hello World!</h1><a href="http://example.com/license" rel="xh:license">License</a></div></body></html>
This, again, is the same as example 3 but we have used some RDFa to express the license relationship.
So we’ve seen that examples 3 through to 6 are conveying basically the same information in different ways and they all leave us with roughly the same gaps in our knowledge.
Example 7
Request:
GET /example7Host: example.com
Response:
HTTP/1.1 200 OKDate: Mon, 6 Jul 2011 14:12:53 GMTLast-Modified: Wed, 10 Jun 2010 13:05:56 GMTContent-Length: 210Content-Type: text/html<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#"xmlns:foaf="http://xmlns.com/foaf/0.1/"><head><title>Hello World!</title></head><body><div about="" typeof="foaf:Document"><h1>Hello World!</h1><a href="http://example.com/license" rel="xh:license"></div></body></html>
This example extends example 6 to add some type information. We now know that the resource has an rdf:type of http://xmlns.com/foaf/0.1/Document. We still don’t know anything more about the representation but there is no ambiguity here, the type property only applies to the resource.
There are good reasons to infer that the representation we received is a document, perhaps even a foaf:Document because it’s a stream of bytes that we can process with a computer. But since that would be trivially true for every possible representation then it doesn’t add much useful information.
Example 8
Request:
GET /example8Host: example.com
Response:
HTTP/1.1 200 OKDate: Mon, 6 Jul 2011 14:12:53 GMTLast-Modified: Wed, 10 Jun 2010 13:05:56 GMTContent-Length: 210Content-Type: text/html<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#"xmlns:foaf="http://xmlns.com/foaf/0.1/"><head><title>Hello World!</title></head><body><div about="" typeof="foaf:Person"><h1>Hello World!</h1><a href="http://example.com/license" rel="xh:license"></div></body></html>
Now the examples are getting more interesting. Now we know the following:
- There is a resource identified by http://example.com/example8
- The resource has an html representation which was last modified on 10 June 2010
- The resource has a http://www.w3.org/1999/xhtml/vocab#license relationship with http://example.com/license
- The resource has an rdf:type of http://xmlns.com/foaf/0.1/Person
We don’t know if this information is inconsistent because we don’t know if things of type foaf:Person can have an xh:license property. However we can be sure there is no ambiguity regarding what the metadata applies to: it applies to the resource. The URI is only referring to one thing.
Example 9
Request:
GET /example9Host: example.com
Response:
HTTP/1.1 200 OKDate: Mon, 6 Jul 2011 14:12:53 GMTLast-Modified: Wed, 10 Jun 2010 13:05:56 GMTContent-Length: 210Content-Type: text/html<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#"xmlns:foaf="http://xmlns.com/foaf/0.1/"><head><title>Hello World!</title><link rel="license" href="http://example.com/license"></head><body><div about="" typeof="foaf:Person"><h1>Hello World!</h1></div></body></html>
Here’s a better example of where inadvertent ambiguity could arise. Perhaps the HTML author was attempting to say that the HTML representation of a person had a particular license. However, they thought they were saying that, but actually they are saying exactly the same as example 8:
- There is a resource identified by http://example.com/example9
- The resource has an html representation which was last modified on 10 June 2010
- The resource has a http://www.w3.org/1999/xhtml/vocab#license relationship with http://example.com/license
- The resource has an rdf:type of http://xmlns.com/foaf/0.1/Person
The point here is that the data is not ambiguous and the URI is not ambiguous. There is only one interpretation, it just happens to be different to the one the HTML author thought they were making.
Referring to Representations
So how can we support what the author really intended? They wanted to say that the resource had a particular set of properties but the html document they sent containing that information was licensed in a particular way.
To do this they would need some way to refer to the representation, which is a problem because representations generally aren’t assigned identifiers in the web architecture. HTTP defines the content-location header for this purpose but the problem is that the representation is transitory. Content location is actually a property of the message, not of the resource. In other words it says “for this message here’s an identifier for the representation, but the next message might have a different one”. Worse still, the message doesn’t have an identifier either
Example 10
Request:
GET /example10Host: example.com
Response:
HTTP/1.1 200 OKDate: Mon, 6 Jul 2011 14:12:53 GMTLast-Modified: Wed, 10 Jun 2010 13:05:56 GMTContent-Length: 210Content-Type: text/htmlContent-Location: /example10a<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#"xmlns:foaf="http://xmlns.com/foaf/0.1/"><head><title>Hello World!</title></head><body><div about="" typeof="foaf:Person"><h1>Hello World!</h1></div><div about="/example10a"><a href="http://example.com/license" rel="xh:license">License</a></div></body></html>
So here’s what we know from this:
- There is a resource identified by http://example.com/example10
- The resource has an html representation which was last modified on 10 June 2010
- The html representation is identified by http://example.com/example10a
- The resource has an rdf:type of http://xmlns.com/foaf/0.1/Person
- The representation has a http://www.w3.org/1999/xhtml/vocab#license relationship with http://example.com/license
That seems pretty clean and unambiguous. However it requires the author to do extra things: configure their web server to send content-location headers and refer to that extra URI in their content.
Example 11
Request:
GET /example11Host: example.com
Response:
HTTP/1.1 200 OKDate: Mon, 6 Jul 2011 14:12:53 GMTLast-Modified: Wed, 10 Jun 2010 13:05:56 GMTContent-Length: 210Content-Type: text/html<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#"xmlns:foaf="http://xmlns.com/foaf/0.1/"xmlns:wdrs="http://www.w3.org/2007/05/powder-s#"><head><title>Hello World!</title></head><body><div about="" typeof="foaf:Person"><h1>Hello World!</h1><div rel="wdrs:describedby"> <div about="/example11a"><a href="http://example.com/license" rel="xh:license">License</a></div></div></div></body></html>
So here’s what we know from this:
- There is a resource identified by http://example.com/example11
- The resource has an html representation which was last modified on 10 June 2010
- The resource has an rdf:type of http://xmlns.com/foaf/0.1/Person
- There is another resource identified by http://example.com/example11a
- The resource has a http://www.w3.org/2005/05/powder-s#describedby relationship with http://example.com/example11a
- The http://example.com/example11a resource has a http://www.w3.org/1999/xhtml/vocab#license relationship with http://example.com/license
Example 12
Request:
GET /example12Host: example.com
Response:
HTTP/1.1 303 See OtherDate: Mon, 6 Jul 2011 14:12:53 GMTLocation: /example12a
Request 2:
GET /example12aHost: example.com
Response 2:
HTTP/1.1 200 OKDate: Mon, 6 Jul 2011 14:12:53 GMTLast-Modified: Wed, 10 Jun 2010 13:05:56 GMTContent-Length: 210Content-Type: text/html<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#"xmlns:foaf="http://xmlns.com/foaf/0.1/"xmlns:wdrs="http://www.w3.org/2007/05/powder-s#"><head><title>Hello World!</title></head><body><div about="/example12" typeof="foaf:Person"><h1>Hello World!</h1><div rel="wdrs:describedby"> <div about="/example12a"><a href="http://example.com/license" rel="xh:license">License</a></div></div></div></body></html>
Now we use a 303 redirect to separate the two resources. Here’s what we know from this:
- There is a resource identified by http://example.com/example12
- The resource has an rdf:type of http://xmlns.com/foaf/0.1/Person
- There is another resource identified by http://example.com/example12a
- The resource has a http://www.w3.org/2005/05/powder-s#describedby relationship with http://example.com/example12a
- The http://example.com/example12a resource has an html representation which was last modified on 10 June 2010
- The http://example.com/example12a resource has a http://www.w3.org/1999/xhtml/vocab#license relationship with http://example.com/license
A Solution
It seems that we can’t easily refer to the actual representations that are sent to our users which means it’s difficult to make statements about them. The content-location header seems to do the trick but it still requires the author to understand the gory details of HTTP.
I think there is a solution to this problem though. It’s similar in spirit to what Jeni called One-Step-Removed Properties. However, my idea is to define Representation Properties (I mentioned these on the TAG list last week). These are properties that are defined to infer meaning for the representations of a resource.
Currently a triple <s> xh:license <o> has the meaning “s is licensed using o”. We could redefine xh:license to be a representation property which would change its meaning to “s has representations that are licensed using o”.
You might ask, why not define it as “s is licensed and has representations that are licensed using 0″. My answer is that I don’t really think this adds anything. It’s impossible to get the actual resource, even if it’s just an html document on a web server’s file system. All interaction with the resource is via its representations and those are the things the user needs to know the license for.
Let’s go back to our potentially ambiguous example 9:
Example 9rev
Request:
GET /example9Host: example.com
Response:
HTTP/1.1 200 OKDate: Mon, 6 Jul 2011 14:12:53 GMTLast-Modified: Wed, 10 Jun 2010 13:05:56 GMTContent-Length: 210Content-Type: text/html<html xmlns:xh="http://www.w3.org/1999/xhtml/vocab#"xmlns:foaf="http://xmlns.com/foaf/0.1/"><head><title>Hello World!</title><link rel="license" href="http://example.com/license"></head><body><div about="" typeof="foaf:Person"><h1>Hello World!</h1></div></body></html>
We have exactly the same request and response. But we have some extra information:
- There is a resource identified by http://example.com/example9
- The resource has an html representation which was last modified on 10 June 2010
- The resource has an rdf:type of http://xmlns.com/foaf/0.1/Person
- The resource has a http://www.w3.org/1999/xhtml/vocab#license relationship with http://example.com/license which means that the html representation is licensed using http://example.com/license
That seems to solve the ambiguity problem pretty neatly. It’s idiomatic html, plus it adheres to the fundamental rule that URIs always refer to the identified resource. It also meets my back to basics criterion of trusting what the author says about their resources.
It also has the advantage of being a simple change to make: it just means the w3c need to publish an rdf schema that includes this definition of xh:license (and probably the other xhtml link types too). Dublin Core would be another obvious candidate for this kind of redefinition.
I don’t see how this solves the problem of ambiguity of identity, which is the main issue Jeni is raising, which is the same problem of resolving to 200 or 303; expensive lookups.
All that you do here is embed the type in the representation (and there’s even an ambiguous rule that states that the first DIV after the BODY carry the type for the resource if you understand FOAF) but still need to resolve and analyse the representation to get it. This is the *main* issue with all of this; what does the resource identify before you perform expensive operations to find out? *That’s* the ambiguity.
I don’t think there is any ambiguity at all. The URI denotes the resource, never the representation. I don’t understand what you mean when you say the resource identifies something, can you explain?
Well, the basic tenant here is about using URIs as identifiers. Just to recap; what does the following URI identify? http://shelter.nu/ If we resolve it, just my homepage, no metadata apart from what a human can figure out. Ok, so maybe I as the owner of the URI says this is an identifier you can use to talk about me, maybe I say so in a blog post or something. At this point, even if I specifically say it’s me, what URI do you use to talk about my website? Or my online presence? These are things I haven’t talked about. What we get is one ambiguity solved that creates another, and we don’t want resolving nor formal logic in sussing out identifiers (unless you’ve got limitless resources … :)
Ok, so let’s dig into a FOAF laden representation. A representation is not just a snapshot of the resource, it (the result that comes back from resolving it) is a stream of data that (ahem) represent the resource without being it, and it can frankly be anything, from the content-type you’d like or prefer or understand (from which you can grab metadata), to some known but not preferred (a GIF or PDF version), to something completely unknown (a Topic Maps representation, for example, which most people haven’t got a clue about). Even these two latter categories (non-preferred and not known) are still perfectly valid representations of a resource, but you cannot use them for the metadata you’re after. Not only are they valid, we must assume that more often than not we’ll get representations that don’t suit us or that we can’t understand.
So there’s ambiguity of identity and typification because you cannot truly rely on anything more than the actual URI you’ve got. That’s it, really, you can only trust that you’ve got a URI and that in your model you think it identifies something, not even that it is *the* URI that represent that thing you want to talk about. There’s a semantic distance between the owner of the URI and the people who’s going to use it, especially in the SemWeb community (and the web in general, by extension) that hasn’t been solved yet.
I’m a Topic Mapper, so often we deal with this ambiguity not by just referencing URIs, but having part of the model as to what sort of identifier this URI will be used as in our models. Not sure why RDF and friends don’t do this, but perhaps introduce rdf:about:identifier-uri, rdf:about:indicator-uri or somesuch wouldn’t hurt, some unambiguous about what our own model know about the identifier.
Anyway, this is turning into a ramble. Sorry about that. :)
You ask “what URI do you use to talk about my website?”. You haven’t provided one for me to use so I would need to mint my own on my own domain. I could create a URI http://example.com/shelter-nu-site to identify your website and write some data about that. If later on you published your own URI then we could add some owl:sameAs statements to link them together.
I’m mostly in agreement with the rest of your comment, except you say “not even that it is *the* URI that represent that thing you want to talk about.”. That suggests that you believe there should be only one URI that identifies a thing whereas I think there can be multiple URIs identifying the same thing (but each of them only identifies one thing).
It’s good to have the perspective of a Topic Mapper in the discussion. In past discussions around Topic Maps the notion of roles of identifiers had often come up but I’ve never been strongly motivated by the need for them.
Sorry for replying to an older post, the reply link doesn’t appear on your latest.
Yes, we could merge the two identifiers with owl:same-as, however, that particular method is the tool of the devil, and is quickly turning into the nightmare people predicted years and years ago. Specifically saying two things are the same generally means nothing, but more importantly people get it wrong by far more often than right. Once that bad sameness is in your system, the poison spreads and there is little escape! Beware! (Ahem, and do we really want to go down the fragmented rdf, rdfs, and owl ontology strategies? :)
No, multiple identifiers are the way to go, so I’m not saying canonical identifiers is the only way, however with every identifier you add the ambiguity of what the set of identifiers combined identify grows. 1 identifier can be controlled in your system far easier than any N+x set; the ambiguity grows with the complexity. This is not easy stuff.
Oh, and what Olaf said better than me; you’re using properties in representations that must be true for all representations.
Don’t get me wrong, in a pragmatic world we all resolve and scrape the resources, and often manually pop their type or identifying capabilities back into our system. We do do this ourselves. What we would *like* to have are automated systems that can go on about their lives.
I’ve suggested elsewhere (and often) that RDF be extended with a smaller ontology for identity management that would solve a lot of these problems, but people are (I suspect correctly) wary of more complexity, however I think the problem of identity management is far more important that collecting properties from triplets you have no idea what are. :)
Trying to think of cases where this wouldn’t work – no joy yet :)
This feel similarish to OWL punning in that an application can see how statements are made, and then interpret accordingly. So in the OWL world a URI can identify an owl:Class or owl:NamedIndividual, and OWL tools know how to treat a resource depending on the statements are made in the ontology. Works for me…I think :)
Hey Ian,
Nice idea. However, it has the following problem: It can only be used for properties that have the same property value for all representations of the resource. For licensing information this might be feasible. For other types of information (e.g. provenance) it is not. For instance, I cannot use such a property to express (inside the representations) that representation a1 of resource A was created based on yesterday’s content of my DB and representation a2 of A was created based on today’s content of my DB.
Cheers,
Olaf
Yes that is a problem, but it’s inherent in the web architecture. There is no way to refer to representations inside the representations themselves. The place to do that would be in the message headers of the HTTP response.