Google+

It’s OK to use URIs with Fragments in RDF

14

29 November 2007 by Ian Davis

I’ve been doing some more digging on my fragmentation and shadow web themes and came across something I hadn’t really seen before or, if I have, has been completely wiped from my mind. The RDF Concepts document contains a whole section on fragment identifiers which is worth reproducing:

RDF uses an RDF URI Reference, which may include a fragment identifier, as a context free identifier for a resource. RFC 2396 [URI] states that the meaning of a fragment identifier depends on the MIME content-type of a document, i.e. is context dependent.

These apparently conflicting views are reconciled by considering that a URI reference in an RDF graph is treated with respect to the MIME type application/rdf+xml [RDF-MIME-TYPE]. Given an RDF URI reference consisting of an absolute URI and a fragment identifier, the fragment identifer identifies the same thing that it does in an application/rdf+xml representation of the resource identified by the absolute URI component. Thus:

  • we assume that the URI part (i.e. excluding fragment identifier) identifies a resource, which is presumed to have an RDF representation. So when eg:someurl#frag is used in an RDF document, eg:someurl is taken to designate some RDF document (even when no such document can be retrieved).
  • eg:someurl#frag means the thing that is indicated, according to the rules of the application/rdf+xml MIME content-type as a “fragment” or “view” of the RDF document at eg:someurl. If the document does not exist, or cannot be retrieved, or is available only in formats other than application/rdf+xml, then exactly what that view may be is somewhat undetermined, but that does not prevent use of RDF to say things about it.
  • the RDF treatment of a fragment identifier allows it to indicate a thing that is entirely external to the document, or even to the “shared information space” known as the Web. That is, it can be a more general idea, like some particular car or a mythical Unicorn.
  • in this way, an application/rdf+xml document acts as an intermediary between some Web retrievable documents (itself, at least, also any other Web retrievable URIs that it may use, possibly including schema URIs and references to other RDF documents), and some set of possibly abstract or non-Web entities that the RDF may describe.

This provides a handling of URI references and their denotation that is consistent with the RDF model theory and usage, and also with conventional Web behavior. Note that nothing here requires that an RDF application be able to retrieve any representation of resources identified by the URIs in an RDF graph.

I’ve been thinking about this for a couple of days and I’m still not entirely sure what to make of it. What it appears to be saying is that RDF ignores the Web Architecture principle that fragment identifiers are given meaning by the representation that is retrieved.

So this ensures that RDF is self-consistent. I can refer to anything I like using a fragment identifier in my URI and I’m guaranteed not to have my intended meaning upset by anything messy like a network operation. This alleviates one of my major concerns at using these kinds of URIs in RDF, but at what cost? If anything this increases my concerns over the shadow web since by circumventing the web architecture it sets RDF further away from today’s web of documents. For example, when I use “http://www.w3.org/TR/webarch/#media-type-fragid” as a URI in my RDF, it probably doesn’t refer to the thing you think it does. You, as a human (if you are), get to see a representation of that section of the document when you click on the link, but an RDF-aware agent must treat that URI as though rdf/xml had been retrieved. Unfortunately there isn’t any RDF there and the Web Architecture actually forbids you from serving up both HTML and RDF documents at the same URI.

What does that mean? How are we supposed to interpret that? One interpretation is that it really doesn’t matter what you do outside of RDF. You can throw up all kinds of other representation formats and it won’t affect yours or anyone else’s RDF. They might use the same identifiers, and occasionally, coincidentally they may identify the same things, but in general RDF is partitioned into its own little world. RDF can only link to RDF.

How can RDF co-exist with other formats on the Web if it ignores their semantics? If you just want the Semantic Web to be built using RDF then you probably don’t care. But if, like me, you want to see an inclusive Semantic Web built from a mix of RDF, microformats, topic maps, RDDL and all the other ways to express semantics, then it’s a very very big problem. I don’t want two webs competing for attention, I want one strong one.

Hence the title of this post. It is OK to use URIs with fragments in RDF, but only if you don’t particularly care about relating to the existing web. If you do care then avoid fragments at all costs. Use standard URIs and stick 303 redirects on them if you need to. It’ll work and the whole web will be better for it.

14 thoughts on “It’s OK to use URIs with Fragments in RDF

  1. Richard Cyganiak says:

    First, this section of RDF-Concepts is informative, hence representing anything said in that section with a boldfaced must is a bit odd.Second, you claim that AWWW forbids you from serving up HTML and RDF at the same URI. As I have said before, that’s plain nonsense, and you keep repeating it. AWWW only forbids this in the case that (1) you use fragment IDs in your HTML and RDF, and (2) those fragment IDs are not harmonized. Exactly the same restriction applies to all other formats. Does this mean AWWW forbids you from serving HTML and SVG at the same URI?Third, are you sure that your interpretation of this section is correct? It sounds as if you have some doubts. So do I. Nevertheless, you tell people to “avoid certain things at all costs”, based on your possibly flawed understanding. This is not how one builds credibility.Fourth, how about considering alternatives to content negotiation, instead of putting all the blame on fragment IDs? Problems only emerge from the interplay of content negotiation and fragment IDs. And fragment IDs are widely and successfully used throughout the Web, while content negotiation is not. Your whole argument rests on the unstated (and wrong, IMHO) assumption that we cannot possibly live without content negotiation.I would be very interested to learn why you consider content negotiation so important. Especially if it involves success stories from actual deployment. How about a blog post on that topic?

  2. iand says:

    Why do you think the statement “representation providers must not use content negotiation to serve representation formats that have inconsistent fragment identifier semantics” only applies when the fragment identifiers are actually being referenced? I think AWWW is pretty clear on that. The standard mode of thought is to abandon content negotiation so I’m exploring alternatives that keep the advantages that conneg gives. Why bother with the resource/representation model if every resource only ever has a single representation?By the way, I’m just blogging, thinking out loud. Feel free to pay no attention to it. If I have something that I think is a substantive error in the specifications then I’ll raise it in the appropriate places such as the TAG list.

  3. Danny says:

    Ok, what if an RDF doc was served at the WebArch URI which contained something like this :<http://www.w3.org/TR/webarch/#media-type-fragid&gt; a html:a ; html:title “media-type-fragid”; rdf:value “3.2.1. Representation types and fragment identifier semantics…” .An RDF/XML document is a serialisation of a graph, a HTML document is, well, a document. It’s only to be expected that fragids in each are apples and oranges. Ok, the bit of the spec you quote does seem like forcing a square peg into a round hole. But I couldn’t find anything around the HTML spec that provided convincing semantics for anchors that anything might conflict with. Have a look at them, I bet you’ll find yourself asking “What does that mean? How are we supposed to interpret that?” at least as often. I reckon we’re looking at fairly arbitrary apples and oranges.So I agree with Richard on this point, I can’t see any clear-cut reason in the relevant specs not to serve up HTML and RDF/XML docs at the same URI. It may be ill-advised to use the same fragids in RDF as in HTML. But then again in practice it might make sense (as the publisher of the representations) to assume a convention like the fragid part in a HTML doc is a text description of the resource in question. I don’t see anything in the HTML spec that would forbid this.Regarding conneg, I have to confess I like the idea a lot, essentially because it’s a concrete manifestation of the resource/representation world view, which is nice :-) Unfortunately, it doesn’t seem to be used enough that it’s an ingrained part of the infrastructure. (Probably because filesystems don’t generally allow you to create multiple files of exactly the same path/name but in different formats…and HTML has ruled the roost for so long). If there is any way of getting the benefits of conneg in a way that doesn’t mess with people’s POSIX/DOS preconceptions about files and resources, great!It’s great that you’re testing the water in these posts, although in this particular case I’d go ask HTML first about the thing floating in the pool. Whatever, has to be a good thing that you’ve made me want to re-read Chapter 5 again…

  4. Dan Connolly says:

    Have you considered the possibility that HTML and RDF *do* have consistent fragment identifier semantics? i.e. webarch doesn’t forbid you to use doc#frag with both HTML and RDF; webarch just says that if you do so, you’re claiming that both representations use #frag to mean the same thing.

  5. iand says:

    Dan, quite possibly. That’s certainly been my thinking in the design of embedded RDF where the id attribute is used to denote a new arbitrary resource. But Tim’s recent message made me look closer at this stuff again. HTML is quite clear on the meaning of the fragment, i.e. as a document section. If my RDF says the same then there’s clearly no issue, but if I want #me to represent me in RDF and also to be an anchor in HTML containing a prose description of me then that’s not strictly correct.

  6. iand says:

    Danny, the relevant section is in RFC 2854:For documents labeled as text/html, the fragment identifier designates the correspondingly named element; any element may be named with the “id” attribute, and A, APPLET, FRAME, IFRAME, IMG and MAP elements may be named with a “name” attribute. This is described in detail in [HTML40] section 12.

  7. Danny says:

    Ian, that’s what I was trying to (approximately) express in the RDF snippet, to cover “both representations use #frag to mean the same thing”.

  8. iand says:

    Danny, what if we wanted to serve a PDF version of AWWW there too? Then we’d have to align additionally with the semantics defined by RFC 3778:The handling of fragment identifiers [6] is currently defined in Adobe Technical Note 5428 [7]. This section summarizes that material.A fragment identifier consists of one or more PDF-open parameters in a single URL, separated by the ampersand (&) or pound (#) character. Each parameter implies an action to be performed and the value to be used for that action. Actions are processed and executed from left to right as they appear in the character string that makes up the fragment identifier.The PDF-open parameters allow the specification of a particular page or named destination to open. Named destinations are similar to the “anchors” used in HTML or the IDs used in XML. Once the target is specified, the view of the page in which it occurs can be specified, either by specifying the position of a viewing rectangle and its scale or size coordinates or by specifying a view relative to the viewing window in which the chosen page is to be presented.

  9. Danny says:

    True, it could get incredibly awkward, trying to reconcile fragids in apple, orange and kumquat flavours. But this is assuming you really want to use the same fragids across all the representations. It’s the publisher that makes the decisions on what constitutes a representation of a given resource in a given media type, I don’t see anything that compels the publisher to provide the exact same information in each representation. (Unless perhaps InformationResource is interpreted that way – but your arguments on those are quite compelling, the notion does seem increasingly rubbish each time I look…).

  10. iand says:

    Danny, once again you get to the core of my argument. Yes, fragments are interpreted differently according to the representation. But RDF assumes that they have a fixed meaning. So you can reason about them all you like but it could all fall apart when you take a look at what a URI actually denotes and find you get a different answer depending on how you look.Because of this people say you need to deprecate things like content negotation, effectively separating RDF off into its own little closed network. I’d prefer not to do that. If we avoid using fragment identifiers to identify abstract concepts then the issue goes away. httpRange-14 does fix this but because it adds an additional request into the loop (i.e. the 303 redirect) a lot of people try to avoid it by using fragments.

  11. Ed Summers says:

    Thanks for the really engaging series of posts Ian. I’m curious though, who wants to deprecate conneg?

  12. Danny says:

    “Because of this people say you need to deprecate things like content negotation, effectively separating RDF off into its own little closed network. I’d prefer not to do that.” – me neither.”If we avoid using fragment identifiers to identify abstract concepts then the issue goes away.” – hmm, I guess so, though it does seem a sledgehammer, especially given the number of #terms already in use. I can easily picture httpRange-14 as a necessary evil (hassle), though on all these points I don’t have any better answers :-)

  13. iand says:

    Thanks Ed, I think Dare summarises some of the viewpoints on conneg quite nicely.

  14. […] also implied by the Linked Data Principles which encourage you to use HTTP URIs. Furthermore it is a good advice not to include fragments in your URIs if you care about coexistence of the Web and the Semantic Web. Yes, there is RDF data […]

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 28 other followers

%d bloggers like this: