Google+

Crisis

30

13 September 2005 by Ian Davis

I experienced something of a system shock at the DC2005 conference today. I sat in on the Architecture Working Group meeting and as events unfolded I suddenly realised that, without a radical change, I could well be witnessing the beginning of the end of the RDF project.

We were discussing the progress of the Dublin Core RDF task force and there were a number of agenda items under discussion. We didn’t get past the first item though – it was so hairy and ugly that no-one could agree on the right approach. The essence of the problem is best illustrated by the dc:creator term. The current definition says An entity primarily responsible for making the content of the resource.. The associated comments states Typically, the name of a Creator should be used to indicate the entity and this is exactly the most common usage. Most people, most of the time use a person’s name as the value of this term. That’s the natural mode if you write it in an HTML meta tag and it’s the way tens or hundreds of thousands of records have been written over the past six years. Here’s that model:

Of course, us RDFers, with our penchant for precision and accuracy take issue with the notion of using a string to denote an “entity”. Is it an entity or the name of an entity. Most of us prefer to add some structure to dc:creator, perhaps using a foaf:Person as the value. It lets us make more assertions about the creator entity. Here’s a picture of that model:

The problem, if it isn’t immediately obvious, is that in RDF and RDFS it’s impossible to specify that a property can have a literal value but not a resource or vice versa. When I ask “what is the email address of the creator of this resource?” what should the (non-OWL) query engine return when the value of creator is a literal? It isn’t a new issue, and is discussed in-depth on the FOAF wiki.

There are several proposals for dealing with this. The one that seemed to get the most support was to recommend the latter approach and make the first illegal. That means making hundreds of thousands of documents invalid. A second approach was to endorse current practice and change the semantics of the dc:creator term to explictly mean the name of the creator and invent a new term (e.g. creatingEntity) to represent the structured approach.

Danbri referred us to work he had done after the last DC meeting in 2004 on a SPARQL query to convert between the two forms. Discussion then moved onto special case processing for particular properties, along the lines of “if you see a dc:creator property with a literal value then you should insert a blank node and hang the literal off of that”. Note that I’m paraphrasing, no-one actually said this but it was the intent.

That’s when my crisis struck. I was sitting at the world’s foremost metadata conference in a room full of people who cared deeply about the quality of metadata and we were discussing scraping data from descriptions! Scraping metadata from Dublin Core! I had to go check the dictionary entry for oxymoron just in case that sentence was there! If professional cataloguers are having these kinds of problems with RDF then we are fucked.

It says to me that the looseness of the model is introducing far too much complexity as evidenced by the difficulties being experienced by the Dublin Core community and the W3C HTML working group. A simpler RDF could take a lot of this pain away and hit a sweet spot of simplicity versus expressivity.

Where can complexity be removed?

The graph is fundamental and I think the base triple model – the simplest construct that could possibly make a graph – is right. But there are too many types of nodes: URIs, blanks, literals, literals with language, datatyped literals, XML literals. What if there were only two: literals and URIs? The rest are a distraction. URIs are cheap and to be honest I’ve never understood why referring to me with a URI like http://purl.org/NET/iand/foaf#ian is frowned on, especially in light of the httpRange-14 decision.

What if it were possible to define two types of properties: those that had literal values and those that had URI values? Wouldn’t that make mappings from HTML and Dublin Core simpler and easier to validate?

What if we jilted the ugly sisters of rdf:Bag, rdf:Alt and rdf:Seq and took reification out back and shot it? How many tears would be shed?

What if we junked classes, domains and ranges? Would anyone notice? The key concept in RDF is the relationship, the property.

The result would be a subset of RDF, RDF-lite perhaps. All instances of RDF-lite would be valid RDF-full but the converse couldn’t be true. Sparql would still work and so, I suspect, would the OWL machinery despite the omission of classes. RDF diffs would be trivial without blank nodes allowing efficient synchronisation of triple stores. Signing of triples would also be possible without requiring the hoops of canonicalisation to be jumped through.

Maybe it’s necessary to take a few steps back to find the true path to the summit.

30 thoughts on “Crisis

  1. Jimmy Cerra says:

    The problem is that you and the DCers are trying to impose too much order on the model. So you can’t specify whether a dc:creator is a literal or a resource with a certain rdf:type. So? RDF has an open world, so (without RDFS or OWL) you-the-consumer can simply process what you understand and ignore the rest. Henceforth you-the-author can supply both a literal and a resource version. One advantage of RDF over XML, IMHO, is that it is easier to ignore what you don’t understand without destroying the model. If that is too much, then supply one and use something else (OWL, RDFS, RDF/XML+XSLT?) to supply the alternates for other people. You see, writing in RDF is just like writing in English (something I obviously am bad at despite both being my native language), so you have to balance being concise with being expressive or being precise. Except that I have to go so think about that. (Note: I don’t claim to be nor claim to be better than experts like you: I am just trying to promote discussion with this.)

  2. Rich says:

    “What if it were possible to define two types of properties: those that had literal values and those that had URI values?”You mean owl:DatatypeProperty and owl:ObjectProperty?The first example you give is why I don’t use dc:creator. Instead I use foaf:maker.Semantics of dc:creator: “the name of the creating entity, as a literal”.Semantics of foaf:maker: “the creating entity”.dc:creator should not be used to point to anything other than a literal; foaf:maker the opposite. I find the former rather useless; that’s DC’s fault, not RDF’s.Reification exists because otherwise I’d have to invent it, and so would others. (Much of my current work involves referring to statements, with or without placeholders — think of describing query patterns themselves in RDF.)This actually goes for much of what’s in RDF — blank nodes being another example. Sure, I could use gensyms to create intermediate nodes (after all, if I’m programmatically creating chunks of graph, I can’t come up with names), but the point of bnodes is to expressly describe anonymity. You’d find lots of namespaces with generated URIs to fill in the gap left by bnodes.[ foaf:mbox "test@example.com" ] = [ foaf:mbox "test@example.com" ]I don’t know its URI, and bnodes allow me to say that. Forcing URIs doesn’t solve the problem.

  3. iand says:

    The moment of my crisis was the realisation that we were talking about using heuristics to determine the triples to be generated from simple Dublin Core records supposedly the most widespread metadata format in the world and one that has been closely aligned to RDF for many years.I know why logicians need reification and why blank nodes are useful. I had forgotten about owl:DatatypeProperty and owl:ObjectProperty but that’s almost moot because I don’t know of any large RDF databases that are using OWL reasoning (or even RDFS inferencing for that matter).My point is that perhaps those complexities are a big part of what is holding RDF back from widespread adoption. Most people just want some simple name/value metadata, a few want to cross-reference other bits of metadata. A tiny proportion of those need a logic expression language.

  4. iand says:

    Forgot to add that calling the subset RDF-Lite was a deliberate echo of OWL-Lite. Why shouldn’t RDF be layered in the same way, allowing many many more participants to get involved?

  5. Dominic Mitchell says:

    I’ve just started learning RDF. But so far, I’ve managed to ignore things to just about the same level as the proposed RDF-Lite here. The (possibly perceived) complexity of RDF is a definite hold up in its adoption.-Dom

  6. iand says:

    Thanks Dom, you’re confirming what I suspect from my own involvement with beginners. Anybody else think complexity is holding up adoption? Or anyone think that it isn’t?

  7. denny says:

    I couldn’t agree more with you, Ian! Please, this RDF-Lite needs to become reality. Thinks like bags or reification may well be seperate modules, just like RDFS is. Even the OWL people ignored bags and seqs. Everyone says, come on, RDF is so damn easy – but only if you ignore half of it, actually.

  8. Bruce says:

    As an RDF newbie coming from vanilla XML, I think that RDF is indeed more complex than it needs to be (now). Alt and Bag seem like they can go, as can data typing. But *I* find value in the class structure (it doesn’t do me much good to specify something isPart of something else without specify what we’re talking about), and being able to represent content only with string literals would limit RDF too much for my case.

  9. iand says:

    Bruce, specifying classes doesn’t add much, if anything in plain RDF or RDFS. Knowing that something is of type “Book” doesn’t allow you to infer anything else – you can’t expect it to have an isbn or an author or any other property. You need OWL for that.I’m not sure what you mean by only allowing string literals, whether you want numeric literals or whether you think I’m suggesting we only allow literal values full stop. I’m saying that most metadata applications need descriptions consisting of properties with string literal values plus a mechanism for linking descriptions together via properties. You’d be able to write (hope the formatting works):@prefix frbr: <http://purl.org/vocab/frbr/core#&gt; .@prefix rbib: <http://purl.org/net/xbib#&gt; .@prefix dc: <http://purl.org/dc/elements/1.1/&gt; .<http://www.imdb.com/title/tt0245712/&gt; dc:title “Amores Perros”@es ; dc:title “Love’s a Bitch”@en ; frbr:subject <#Global_Cities> ; frbr:subject <#Mexico_City> ; frbr:realization <#amores-perros> ; rbib:comment “Set in Mexico City.”@en ; rbib:comment “Shows the complexities of global city life.”@en .<#amores-perros> dc:creator “Alejandro González Iñárritu” ; frbr:embodiment <#amores-perros-dvd> .<#amores-perros-dvd> dc:date “2000” .I don’t think you lose much by not including classes.

  10. Bruce says:

    For my end-use, the classes matter. Simple example: citations for articles are formatted differently than chapters from books, despite being broadly equivalent structurally. Without some kind of metadata that makes those distinctions, my citations won’t get formatted correctly. It could be that metadata gets represented with simple properties (say something that indicates issuance), but then why not just use classes? It’s one thing that RDF adds on top of XML that seems like it can be useful (admittedly once tools have more reasoning support).And I want to know that the 2000 release of Amores Perros is a DVD version of a film.I think RDF has to find a sweet spot where it adds as much as possible beyond vanilla XML with as little additional pain as possible. I’m not sure where that sweet spot is, but the danger of simplifying too much is that the “but why not just XML?” argument might become even easier to make.

  11. iand says:

    Bruce, I’m suggesting a useful subset of RDF not a replacement. I think it’s possible to layer RDF. Classes, plus the mechanisms to infer them such as domains and ranges, possibly belong the a higher layer (RDF-Full).I think I’m looking for something that is easily embeddable in current (X)HTML, current DC-XML etc. It shouldn’t be a goal to represent the whole RDF model in every XML format.

  12. Bruce says:

    Right. Makes sense :-)

  13. Rich says:

    I think my point is actually backed up by Bruce… at some point, people are going to start making assertions like “x is_a Film”, at which point they’ve just invented rdf:type.Part of RDF defines semantics, and part defines the core vocabulary in order to have a starting point: common terms that would otherwise be reinvented time and time again.I suspect that Bag and Alt were bad decisions (I’d rather see more thorough support for lists), but I expect that if you look at the average FOAF file you’ll see your “RDF lite” already in use.

  14. iand says:

    I don’t think it’s anything new. I’ve been using something similar for a while, see http://iandavis.com/2005/web-description/waif-20050727If RDF-Lite is anything it’s what I call in the above document a “sane profile” of RDF. Reinvention of rdf:type won’t occur because people will just use rdf:type itself and move into RDF-Full but I don’t expect the syntax to be available to even say “rdf:type” for many applications e.g. HTML meta tags.

  15. karl says:

    RDF Lite – Do you mean like XMP the format of Adobe?http://support.adobe.com/devsup/devsup.nsf/docs/51675.htmhttp://www.xml.com/lpt/a/2004/09/22/xmp.htmlI wonder if someone has already published how much of RDF was covered by XMP.

  16. iand says:

    Thanks for reminding me Karl. XMP was another contributor to my crisis but I forgot to write about it in the original posting. If you add multiple dc:creator properties in XMP then Adobe kindly replaces them with a single dc:creator property with a an rdf:Seq value then puts each creator inside an rdf:li element. Yet another content model to deal with. All we need now is a tool that uses parseType=”Collection” and we’ll have the set!

  17. Danny Ayers, Raw Blog : » Crisis, what crisis? says:

    [...] the information overload…2005-09-14Crisis, what crisis?In Crisis, Ian Davis demonstrates the collision between a simple theoretical model and its practical appli [...]

  18. Danny says:

    Just a thought on the way out of the house – how does the same scenario pan out if you want to represent the same dc:creator information in e.g. a relational DB, or for that matter in Java, Ruby, PHP?”A simpler * could take a lot of this pain away and hit a sweet spot of simplicity versus expressivity.Where can complexity be removed?”

  19. iand says:

    Danny, I don’t understand your comment, can you elaborate?

  20. Phil Dawes says:

    Hi Ian, Danny,Have you had a look at my tagtriples scheme? In retrospect it is perhaps poorly named, but the fundamental idea is that *any* symbol can be a resource identifier. There is no distinction between literals and resource identifiers, which makes the scheme very simple to author and understand.This sounds really imprecise, but precision is easily added by making graphs first class citizens – you can cluster a number of statements together to increase precision.I’ll try and blog about this later tonight and link here.

  21. Phil Dawes' Stuff says:

    Tagtriples + identity precisionIan Davies has been discussing the complexity of RDF and considering the possibility of an RDF Lite. Danny Ayers also picked it up here.Readers of this blog will already know that I struggled with teaching RDF’s complexity when attempting to promo…

  22. Phil Dawes says:

    Doh! Couldn’t get the trackbacking to work – here’s the link.

  23. Elias Torres » Favorite RDF QOTD says:

    [...] . « I joined Roller Favorite RDF QOTD Ian Davis: What if we jilted the ugly sisters of rdf:Bag, rdf:Alt and rdf:Alt and took reification out [...]

  24. Danny Ayers, Raw Blog : » Dialog says:

    [...] ys seemed rather smug about. univers immedia: Thinking about RDF and Topic Maps See also: Crisis (Ian Davis), Tagtriple + identity precision (Phil Dawes), Is RDF moving beyond the desperate hac [...]

  25. paolo says:

    I would like to know what you think about microformats (microformats.org) and their evolutionary, start-simple approach?Could start from microformats be the “few steps back to find the true path to the summit” you mention?I’m curious about what you think about this.

  26. Danny Ayers, Raw Blog : » A bombshell and a ramble and a proposition says:

    [...] em” Henry’s SPARQL to ignite web 2.0. Regarding those posts, and iand’s Crisis, (and Julian Bond in Shelley’s comments) oddly enough based on the same starting points – [...]

  27. Chimezie says:

    Perhaps an appropriate, and concrete subset of the SW and open systems should be outlined. A closed system (which controls the production of identification – by URI – as well as content), which does *not* automatically apply entailment rules (not even the ‘simple’ entailment rules as defined in RDF-MT) could be such a subset. Then finally, simplify the model to consist only of the following parts:GraphContext (or collection of statements)StatementIdentifiers (URIs)Literalsvoila, you have RDF-Lite (clearly distinguished from the SW)

  28. Danny Ayers, Raw Blog : » Wrong heel says:

    [...] e Obasanjo in Questioning RDF. Now James uses asterisks in his quote from Ian Davis’ Crisis post, but I think the strongest line of that deserves repeating verbatim: If professional cata [...]

  29. Danny Ayers, Raw Blog : » Great Expectations says:

    [...] :1 mapping between domain model and the corresponding RDF model (where that fails you have crisis!). But the mapping between RDF model and RDF/XML is 1:many, which can be perceived as a showstop [...]

  30. Messages not Models says:

    Free vs Safe in semwebIan Davis is having an RDF breakdown. Seems Dublin Core can’t seem to get dc:creator quite right:

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 26 other followers

%d bloggers like this: