Google+

Fragmentation

14

14 November 2007 by Ian Davis

I’m troubled by this well written essay by Xiaoshu Wang, in particular this part:

This example showed that the identity of URI is never ambiguous. What is ambiguous is our mental assignment of the URI’s identity. Similarly, in the Mr. Hayes’ example, if we say that “http://www.ihmc.us/users/phayes/PatHayes” denotes a person and a representation of http://www.ihmc.us/users/phayes/PatHayes is a web page, no confusion would have been created. And in the image example, if we say that http://dfdf.inesc-id.pt/tr/doc/web-arch/img/fig2 denotes an idea and one of its representations is a picture, no ambiguity will arise either.

I’m troubled because I don’t think I can disagree with it. In fact I think it might be the only sane interpretation possible. The fact that it runs counter to the W3C’s Architecture of the World Wide Web and the whole of the Linked Data best practice is kind of a worry for me.

I’m also troubled by this statement from timbl on the subject of fragment identifiers:

There are three possible attitudes:

1) don’t mix HTML and RDF, HTML will always have anchors. I think
that this doesn’t meet the need.

2) Do mix RDF and HTML, allow one file to define both anchors and
arbitrary things. Don’t let the same fragid be used for both an
anchor and a thing.

3) Do mix them, and by the way, allow the same fragid to be used as
an ID for an anchor and an ID for a thing, with RDF clients and HTML
clienst doing different things. I think that this path leads to
madness, as in a script for exaple, I may want to use a URI to refer
to one or the other unambiguously. It also makes it impossible for
HTML+RDF clients.

Actually I’m more than troubled, I’m really concerned by this. The fact that writing off option 3 blows eRDF and RDFa to smithereens is a non-issue, we can rework things to fix that. What I’m really concerned about is the growing evidence that fragment identifiers on the web are a broken technology and are to be avoided if not deprecated.

The nub of the problem is that the meaning of a fragment identifier is determined by the type of the representation. For example, what is the meaning of http://www.w3.org/People/Berners-Lee/card#i – the answer depends on what representation you obtain when you issue an HTTP GET against it. You might get some HTML in which case the URI represents a fragment of an HTML document, or you might get some RDF in which case the URI could represent a person. Clearly HTML fragments and people are disjoint sets, no person is an HTML fragment and vice versa.

That principle is enshrined and embodied in dozens of RFCs and recommendations that define the meanings of fragment identifiers for various formats such as SVG and XPointer. The problem isn’t even a new one, as this note written by timbl in 2001 points out:

Fragment IDs and Content negotiation – known bug. If content negotiation occurs across types which do NOT share a fragment ID specification, then rigidly there has been an error. In practice, HTML was the only type (in 1997) which allowed fragment IDs anyway, and other types ignore it. Also, as falling back from a pointer to a specific view to a pointer to the whole document has been considered effective fallback procedure, so no harm was done. Now (2001) it becomes more of a problem. there have been proposasl to add the requested fragment idntifier to the HTTP request to fix this.)

This is echoed in the webarch document:

representation providers must not use content negotiation to serve representation formats that have inconsistent fragment identifier semantics. This situation also leads to URI collision (§2.2.1).

Serving RDF and HTML up from the same URI is forbidden. Content negotiation is blamed here, but in reality the problem is caused by the assumption that the fragment identifiers are somehow associated with the underlying resource rather than the representation.

It’s interesting that another warning on the dangers of fragment identifiers comes to a conclusion spookily similar to Xiaoshu’s . This time it’s from Aaron Swartz back in 2001 again:

Fragments in Web Architecture only makes sense when referring to a representation of a resource, not a resource itself. URI references worked in HTML where the set context was of surfing between web pages (representations), and human users could deal with some breakage. However, as we move into formats like RDF, clarity and precision become increasingly important and URI fragments just don’t work.

The whole httpRange-14 issue was supposed to resolve this (and I accepted it at the time) but I’ve come to the conclusion that it is fundamentally flawed. It encourages people to use URIs with fragments to represent resources, when the fragment changes meaning depending on the representation served from that URI. The URI represents the resource and the fragment identifies a portion of a representation obtained from that URI. Web pages are one type of representation and thinking of them in that way avoids all this nonsense about Information Resources, which I believe now is a patch required to cover up a crack in the architecture.

14 thoughts on “Fragmentation

  1. Simon St.Laurent says:

    I’m not sure – I’ve in fact never been convinced – that this issue is patchable.The disconnect between old expectations of URLs and new expectations of URIs has never been taken particularly seriously. Changing the name wasn’t and isn’t enough to avoid these questions.(They’re well worth avoiding – there’s some seriously headache-inducing stuff here. I just don’t see how.)

  2. Peter Murray says:

    The nub of the problem is that the meaning of a fragment identifier is determined by the type of the representation. For example, what is the meaning of http://www.w3.org/People/Berners-Lee/card#i – the answer depends on what representation you obtain when you issue an HTTP GET against it. You might get some HTML in which case the URI represents a fragment of an HTML document, or you might get some RDF in which case the URI could represent a person.Is the nub of the nub of the problem that the client may not know what kind of representation it might get back when dereferencing a URI? If the client does know the type of representation, or if asking for a particular type of representation results in an error from the server, then it can assume to know what the fragment identifiers mean. Right?

  3. iand says:

    Peter, no, I think the most important point is that the URI can have two different meanings depending on how it is dereferenced (actually more than two). It doesn’t matter whether you can control which one you get.

  4. Internet Alchemy » Fragmentation Reprise says:

    [...] Fragmentation Reprise [...]

  5. Internet Alchemy » Is the Semantic Web Destined to be a Shadow? says:

    [...] decision on the types of resources that can be addressed with HTTP. As I pointed out in my recent Fragmentation post, there is strong pressure towards using URIs with fragment IDs to represent [...]

  6. Internet Alchemy » What are Information Resources Good For? says:

    [...] is probably obvious from my recent posts (e.g. Fragmentation and Is the Semantic Web Destined to be a Shadow?), I’m thinking about the TAG’s [...]

  7. Internet Alchemy » Reformulating the Web Architecture says:

    [...] Fragmentation [...]

  8. Henry Story says:

    I wonder if there is not another solution to the # problem. Perhaps we can argue that in html the #url does not refer to a section of html, but rather that when people are working with html, they are working syntactically, and so things just work differently. Consider the sentence”Henry was born in England”We can say “Henry” is the first word of the sentence, and so we can identify positions in the sentence with words. But “Henry” really does identify me the physical 10 dimensional process writing this email.Now I would agree with you that information resources are perhaps not quite as seriously needed as one may think. But I don’t think this is such a big problem. Neither are hash urls are big problem. Hash urls do have a nice advantage in that it is easy to figure out if you already have the representation for a # url. Other urls that redirect to their definitions require a GET to discover this.

  9. Henry Story says:

    Just read the referred to article. I was hoping never to have to read about this issue again, but this is well written – even if the english could do with a little improovement (good english is important when difficult concepts such as this are being treated)The distinctions are really subtle. I suppose one could test it with some good example vocabularies that would allow one to speak of things, their representations and the relation between them, in such a way that all of this would work in a linked data way.

  10. Mark Nottingham says:

    I hadn’t heard people suggest that we deprecate fragment identifiers before, but it certainly is an option. Interestingly, people are also talking about deprecating — or at least cautioning against the use of — format conneg; see http://www.w3.org/Protocols/HTTP/1.1/rfc2616bis/issues/#i81(yes, that has a fragid in it ;)

  11. iand says:

    I wonder what URI we ought to use to refer to issue 81 on the RFC2616bis work?http://www.w3.org/Protocols/HTTP/1.1/rfc2616bis/issues/#i81 is the URI of the HTML fragment describing it. According to the webarch I can’t use that URI to also denote the issue itself.

  12. iand says:

    By the way, I’m not suggesting that fragments should be deprecated totally. I just think that they’re unsuitable for use as abstract concept identifiers in RDF since their meaning changes depending on the representation. If you really do mean to link to a subsection of an HTML page or some such then they work really well.

  13. Internet Alchemy » It’s OK to use URIs with Fragments in RDF says:

    [...] been doing some more digging on my fragmentation and shadow web themes and came across something I hadn’t really seen before or, if I have, [...]

  14. Mia Tyler says:

    Hey!…Man i love reading your blog, interesting posts ! it was a great Sunday .

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,721 other followers

%d bloggers like this: