Google+

BNodes Out!

14

24 March 2007 by Ian Davis

Joost joins the list of real world applications that are eschewing blank nodes in their RDF. At Talis we’re doing the same – our RDF versioning protocol doesn’t support bnodes either and we actually replace them with URIs in many places. Over the past year I’ve spoken with several companies making practical, commercial use of RDF and none of them are using bnodes. In fact they are actively avoiding them. Two of these companies claim to have stores in the multi-GigaTriple range.

So far I haven’t found a use case for bnodes that can’t be catered for by URIs – can anyone prove me wrong?

14 thoughts on “BNodes Out!

  1. Chris Bizer says:

    No, I can not prove you wrong, as I think you are right at the point here.bNodes are especially harmful in the context of Linked Data on the Web, where you want to navigate and crawl information.Think for example of all the unnecessarily complicated smushing stuff when it comes to FOAF on the Web.So, we also don’t use bNodes in our content publication projects like dbpedia, DBLP or the RDF Book Mashup. Our tools like D2R Server, the DISCO browser or the Semantic Web Client Library somehow support bNodes, but work much better without them.I also think the SIMILE people follow a similar approach.So just forget, alongside with reification, about bNodes and use Web-dereferencable URIs for everything.CheersChris

  2. Danny says:

    I doubt there is a counter-case, IANAL but giving bnodes names wouldn’t seem to make a big difference in terms of capabilities. Replacing them with non-http: URIs wouldn’t produce any benefit as far as I can see, and as Chris suggests it’s desirable to be able to dereference.But I’m really not convinced bnodes should be thrown away. For a start, there is a cost to minting URIs – having to make them cool (and presumably useful) for eternity.Presumably each part of an infrastructure could be optimised to a bnode-free model (e.g. have the server framework offer automatically templated RDF/XML representations of locally-minted URIs, which would grow as a list of sameAs/equivalence statements). But at this point in time I’d worry that this might be premature optimisation looking globally.The smushing stuff is a bit of a red herring – if you have a person identified on two separate systems and you want to know if they’re the same person, you still have to smush against a IFP property or use heuristics (unless you use their mailbox or homepage URI their personal URI, which messes up the modelling). You then have to figure out how best to manage two (or more) URIs for one person. Ok, maybe the cost in this particular could be reduced by encouraging the use of personal URIs as portable IDs. But what of things like “talis:iand’s favourite colour”?The idea of authoritative URIs is more manageable without arbitrary URI creation, whether that idea is likely to be useful in the long term is another matter.So although it seems a reasonable rule of thumb to avoid bnodes, in the general case I think it’s just moving the cost elsewhere. Getting rid of them would reduce the opportunities for dealing with the cost in a way that best fits the local system. Having said all that, my main objection is just a feeling that existentials have a nice small-scale symmetry with the open world model at large.Let’s say you encounter an insect of a kind you’ve never before, but presumably is somewhere in the Universal Creature Guide. Is it better to give it a new *global* name, which will effectively be pushed into the Universal Creature Guide as (another) alias, or just to use a local placeholder with associations like “the bug I found in my salad from M & S in Shirley on Thursday”..?Anyhow, what of literals? If bnodes are bad, surely literals are worse in the way they hide content in a non-dereferenceable form?

  3. leo says:

    hey, I know a bnode with a uri:http://www.bnode.org/seriously: bnodes suck and in the semantic desktop projects (gnowsis, nepomuk.semanticdesktop.org, aperture.sf.net) we try to avoid them or forbid them completly.there are enough problems connnected with bnodes, but still they are good to say “hey, this uri must not be clicked and can be smushed at free will”.I would suggest something like a new uri scheme or urn scheme using UUIDs as bnodes:urn:bnode:bnode:UUIDs are here: http://en.wikipedia.org/wiki/UUID

  4. iand says:

    Danny, Minting new URI’s instead of bnode IDs probably doesn’t cost anything at all of you use document fragments for the URIs. They’re even locally scoped.I did think of one thing that bnodes give you over using a URI: they are guaranteed to be safely handled when smushing documents, i.e. that can be used to prevent accidental merging that could occur when the same URI were inadvertently used in two documents. But we smush documents containing a mix of URIs and bnodes all the time and this has never really presented itself as a problem, so I think it’s only a minor advantage.The other thing to consider is that in every RDBMS implementation of RDF I have seen bnodes are converted to URIs internally so they can be stored in the subject column of a triple table.

  5. Richard Cyganiak says:

    Semantically, a bNode is the same as a throwaway URI. Both have their advantages. Throwaway URIs can be linked to, and you can simplify your data model by forbidding bNodes. On the other hand, URIs should be kept stable, and keeping URIs stable has a cost. Using bNodes avoids it in cases where you don’t want to incur this cost.I’m not entirely convinced that bNodes should be removed from RDF. But avoiding them for practical reasons is usually a good idea.

  6. iand says:

    Richard, I don’t think we should change RDF, but it might be interesting to define a subset that still has the expressivity that we need to get things done.

  7. iand says:

    I should have added …and makes our applications simpler to build and use

  8. Henry Story says:

    Removing bnodes won’t remove the problem that bnodes are meant to solve, namely smushing, as you will later be forced to smush uris instead of bnodes, and just have a huge number of uris instead. So the problem remains.Having said that bnodes should be avoided if good uris can be found. And often with a little web site organisation uris can be created for people for example. It certainly helps a lot to have dereferenceable urls. Exchanging bnodes for URNs does not seem to give one much, apart from the trouble of having to mint urns.There is a good case for bnodes. Imagine you tag something with “bank”. You want to say something like</page.html> :tagging <http://tagger.com/bblfish/tag/10002&gt;.<http://tagger.com/bblfish/tag/10002&gt; :by <http://bblfish.net/people/card#me&gt;; :tag [ a skos:Concept; skos:label “bank” ] .Here you say that you are tagging something with a concept, but you don’t yet know which concept it is.Perhaps later that tag can be nailed down as being a and then the blank node will be given a URI.This helps keep things indeterminate while they are.

  9. iand says:

    Henry, the problem isn’t the large number of resources, bnode or otherwise, but that by their very nature bnodes can only be handled via indirection.From my work at Talis, and from conversations with many people putting RDF to work, being able to diff and patch RDF graphs is very very important. You just can’t do that sensibly with the endless indirection that bnodes require. With named nodes (URIs or literals) it’s trivial.I also don’t think your example is any more difficult with a URI vs a bnode. True, it’s easier to write down in N3 using bnodes, but it’s almost the same markup for RDF/XML. Also, just because it’s written down as a blank node doesn’t mean it has to be parsed that way into a triple store – the parser could substitute generated URIs and nothing would break, no meaning would be lost.

  10. Henry Story says:

    perhaps we need a bnode urn :-)urn:bnode:sflsdjflskdjflksdjfGreat podcast with Nova Spivak btw.

  11. Tony Hammond says:

    Have to pipe up here and mention the YADS data model that I had earlier proposed and still maintain over here:http://nurture.nature.com/tony/yads/Blurb reads as such: “YADS implements a simple, safe and predictable recursive data model for describing resource collections. The aim is to assist in programming complex resource descriptions across multiple applications and to foster interoperability between them.”So, the YADS model makes extensive use of bNodes to manage hierarchies of “fat” resources – i.e. resource islands, a resource decorated with properties. The bNodes are only used as a mechanism for managing containment. There is certainly no intention to globally reference the bNode “resource”. I guess one could say that the actual resources managed by YADS (those accorded a URI) are qualified (witth properties) *in context*. That is in the context of the complete YADS description. The “fat” resource managed by the bNode does not have or need a permanent global identifier.Seems to me that bNodes perform a very useful function. Yes, I am aware of the “smushing” problem but I think this is a red herring. bNodes give us the possibility of creating local “clumpiness” within the general RDF graph. If everything is reduced to global resources then the RDF graph will remain flat and homogenous and generally unspeakably uninteresting. IMHO. Like the primitive universe with no synthesis of elements and especially the heavier elements. Just a primordial soup.I think Danny is also “spot on” when he talks about the cost of minting URIs. As a publisher we are all too aware that URIs are expensive to maintain. This is why scholarly publishing in particular has invested considerable effort in developing the DOI (Digital Object Identifier – http://doi.org/) as a solution to maintaining persistent reference linking (see also CrossRef – http://crossref.org/). Even disposable URIs have an associated cost to mint. I guess top of my head the only no-cost solution to minting URIs would be data: URIs because there is no naming authority to contend with. (I’m not sure about the ethics of using someone else’s DNS name in a tag: URI.)In sum, bNodes are useful. Less is more.

  12. Henry Story says:

    Another thought that occurred to me. If you don’t want bnodes, then you should quickly ask for a SPARQL enhancement, so that they can incorporate the N3 :- sign. Otherwise writing SPARQL queries is going to be a little tedious.This is because you want to write things like<kkk> :rel [ :- &lturn:bnode:xyz>; a foaf:Person; foaf:mbox <mailto:joe@eg.com> ] .instead of<kkk> :rel <urn:bnode:xyz> .<urn:bnode:xyz> a foaf:Person; foaf:mbox <mailto:joe@eg.com> .The :- keyword (timbl also proposed ‘is’) just gives a name to the blank node. It’s semantically equivalent to owl:sameAs,on an infering DB, but most DBs won’t be inferring. But it allows the human reader to see the structure of the graph a lot more easily, especially as the graph gets larger. Perhaps one can think of it as the equivalent of the rdf/xml ‘about’ attribute.Henry

  13. Chimezie says:

    I’ve reluctantly come around to the same conclusions around BNodes: they cause more harm than good. *Ahem* SPARQL

  14. szydan says:

    Another issue with bnodes I’ve had few times – splitting large ntriples files into chunks in order to load them into triple store If you have bodes inside be careful cause if the same benode ends up in different chunks the triple store will create two different ones. So in general I would avoid bonodes when possible

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: