Google+

Google’s RDFa a Damp Squib

20

13 May 2009 by Ian Davis

It’s been an interesting week for embedding metadata in HTML. Yesterday I was exploring html5 microdata and today Google announce support for RDFa. At first this announcement seemed like a big deal – Google supporting the web of data in a big way, a real push into the world of open structured data. However, a closer look reveals that Google have basically missed the point of RDFa. The RDFa support is limited to the properties and classes defined on a hastily thrown together site called data-vocabulary.org. There you will find classes for Person and Organization and properties for names and addresses, completely ignoring the millions of pieces of data using well established terms from FOAF and the like. That means everyone has to rewrite all their data to use Google’s schema if they want to be featured on Google’s search engine. Its like saying you have to write your pages using Google’s own version of html where all the tags have slightly different spellings to be listed in their search engine!

The result is a hobbled implementation of RDFa. They’ve taken the worst part – the syntax – and thrown away the best – the decentralized vocabularies of terms. It’s like using microformats without the one thing they do well: the simplicity. This is why I believe Google missed the point. They made the mistake of treating RDFa as an alternative to microformats, which completely ignores its true strength as a structured data format.

As I twittered earlier: it seems odd that Google, a company that thrived on the open messy web, seeks to ignore it and go for a controlled vocabulary. I’m hoping that this is just a toe in the water and more will come. But there’s a part of me that thinks otherwise. Surely there’s no way the smart people in Google didn’t know about the existing vocabularies and data for people, places, reviews and businesses? We’ve all seen large companies claim support for key standards yet deliver partial or broken implementations and some companies use that as a deliberate tactic to undermine the standard itself, to break interoperability or make it impossibly hard. Its very easy for these situations to be explained away as a mistake, or as a work in progress, but we need to push and dig deeper and hold companies to their very public claims.

20 thoughts on “Google’s RDFa a Damp Squib

  1. Robert O'Callahan says:

    On one hand, you say that the best part of RDFa is “decentralized vocabularies”. On the other hand, you complain that Google didn’t use existing vocabularies. Isn’t the whole point of decentralized vocabularies that people can make up their own? Isn’t that fragmentation exactly what you expect from RDFa?

  2. roger says:

    yes. it’s a real shame. i was starting to imagine all the things i could do with the, and then realised we have a constrained set of vocabularies. maybe (as you point out) this is just the start … let’s hope so !

  3. Ian Davis says:

    Robert, the point of decentralization is not to encourage fragmentation and isolation, but to allow people to collaborate without needing permission from a middleman. Google’s approach imposes a centralized authority.

  4. Eric Hellman says:

    For the benefit of those of us who don’t speak British, what’s a damp squib? And realistically, if you had the market power to impose a single vocabulary on the world, wouldn’t you do it, too?

  5. Laurens Holst says:

    And their own vocabulary is pretty weird, the URIs of the relations look like: “http://rdf.data-vocabulary.orgreview”Looks like this was hastily put together by someone with only superficial understanding of RDF(a). In Dutch we have a saying: “having heard the bell ring, but not knowing where clapper is”.Too bad, this could have been a nice win for RDF but instead it is kind of so-so. Yahoo’s SearchMonkey does things much better.

  6. Ian Davis says:

    A damp squib is something that you think is going to be wonderful but fails to deliver (a squib is a firework)And no, I wouldn’t do what you suggest. Check out what my company is doing to find out why: http://blogs.talis.com/sharedinnovation

  7. Ian Davis says:

    Laurens, I hadn’t spotted that. I hope they fix it, or better still just support existing properties.

  8. Laurens Holst says:

    As I said elsewhere, I guess I shouldn’t be too negative, it seems to be just a small start-up mistake that can be fixed relatively easily if the basic framework is in place. Adding support/aliases for other types (such as foaf:Person) should also not be terribly difficult, and definitely something they should do. Dare I even think of support for owl:sameAs statements? :)It’s just always a bit scary when such a Google-bomb drops on us unsuspecting minds that people might be starting to pick it up and we have to live with it forever. SearchMonkey also had its issues in the beginning (I just read elsewhere that they hardcoded prefixes, tsk!), I guess Google also deserves the chance to incrementally improve their support :).Overall, Google’s announcement is probably a very good thing for RDF and RDFa, also considering the recent turmoil around microdata in HTML5.

  9. R.V.Guha says:

    * Structured data on web pages have many applications. Rich Snippets and CSE are two of them. * For Rich Snippets, Google search need to understand what the data means in order to render it appropriately. We will start incorporating existing vocabularies like FOAF, but there’s no way for us to have a decent user experience for brand-new vocabularies that someone defines. We also need a single place where a webmaster can come and find all the terms that Google understands. Which is why we have data-vocabulary.org. * For Custom search engines, grassroots vocabulary is exactly what we expect: sites who know their content will mark it up as they see fit and use that to enable the slicing and dicing of their search results.* Contrary to what Ian Davis says, we have absolutely no authority (centralized or decentralized) over what webmasters do!

  10. Ian Davis says:

    Guha, I just don’t understand why you needed to invent properties for names and people when the rest of the world uses foaf:name and foaf:Person – why ignore that data and expertise?

  11. Richard Cyganiak says:

    I think it’s not as dramatic as you make it sound, Ian. RDFa makes it easy to do things like:rel=”foaf:name google:name”This is not ideal, but it’s something we can work with.I also think that decentralization on the Web is often misunderstood. The key point is that the technology of the Web is decentralized. But social factors push us back towards monocultures: 80% of the Web use the same web browser, the same web server, the same search engine. 80% of the Web will also end up using the same few vocabularies. What vocabularies will that be? Not the ones that two kids cooked up for a university class. For people they probably should have used FOAF, but in general I’m not surprised that Google feels a need to roll their own vocabularies.There’s also a failure here from the RDF community’s side—we haven’t bothered to push our popular vocabularies, such as FOAF and SIOC, into usage with RDFa. If there was a significant installed base of FOAF+RDFa already, then I’m sure it would have found its way into Google’s documentation. But as it stands, all our cool vocabularies live in RDF/XML and SPARQL endpoints.

  12. Ian Davis says:

    Richard, yes RDFa supports doing that but what would be the benefit to the millions of people who start adding RDFa to get better listings in Google? I think its minimal, therefore we should assume that foaf:name etc will be superceded by google:name. I can live with that, but I’m questioning the motivation of Google taking this approach. They could very simply have listed out the existing properties they would support and supplemented those with new ones where needed. That’s a collaborative, open approach.

  13. Tom Morris says:

    “we have absolutely no authority (centralized or decentralized) over what webmasters do!”Well, that’s true only in theory. In practice, well, SEO exists; with power comes responsibility etc.

  14. Ian Davis says:

    Tom, I agree that Google have a lot of soft power in this.

  15. Bruce D'Arcus says:

    R.V. Guhu: “We also need a single place where a webmaster can come and find all the terms that Google understands. Which is why we have data-vocabulary.org.”That is NOT at all a compelling explanation. There are ways to better balance the needs here.You could, for example, simply itemize links to the vocabularies you support (foaf, dublin core, sioc, etc.). That might include some examples.Or consider the bibliographic ontology work that I’ve been involved with. We define some of our own stuff, but wherever possible use (and import, in the ontology) stuff from DC and FOAF.So I hope to see you work more collaboratively on this stuff, and not reinvent the wheel on all of it.

  16. Paul Tarjan says:

    Nice writeup, thank you.Are all these complaints valid for SearchMonkey as well? If so, let me know what you think we should be doing as well.

  17. Channy says:

    Actually we have to consider why RDFa and microfomat were born. Conventional semantic web focused on specific domain and more complex to be understood by web authors. For example, FOAF has complicated vocabularies for person whereas a few things are only needed in the web as like hCard. I didn’t make my own FOAF file without help of a generator, but I can write it microformat easily in my about page.I think Google’s intend supporting RDFa and microformat utilizes meaningful elements in HTML not semantic web. So there is no change in direction by Google.

  18. Bruce D'Arcus says:

    Paul, I’ve not looked at SearchMonkey much, but on quick look …I like the way you set up the service for developers. And I like that you allow different vocabularies. But it’s still pretty constrained. If you poke around my (currently sparse) site, you’ll see a fair bit of RDFa, using pretty standard stuff (foaf and dc). But for the most part, SM doesn’t grok it. I see no good reason for that.Also, on your use of DC, good to see that, and the use of the new dcterms namespace. But, some of the details are wrong. For example, dct:creator is supposed to be a URI or blank node; not a string. The same is true of dct:subject.

  19. Evan Goer says:

    Another SearchMonkey guy here (hi Paul!)Bruce, you make an excellent point. So we do parse and store a long list of microformats, plus any RDFa we find (and even eRDF). You can get at this data and use it by going to the SearchMonkey devtool or calling the Yahoo! Search BOSS APIs. But as for our standard enhanced results (aka automated SearchMonkey, aka “SearchMonkey Lite”), this operates on a constrained list of vocabularies. The reason we don’t have automated presentations for all the popular vocabularies yet is because we have to carefully design those presentations and test them for search effectiveness first. But rest assured, they’re coming.By the way, we’re holding a VoCamp at the Yahoo! Sunnyvale campus in June right after SemTech San Jose. If you or anyone else reading this post is going to be in the Bay Area then, we’d love to have you over to discuss vocabularies with our architect and SearchMonkey engineers.Evan GoerYahoo! SearchMonkey Team

  20. Thai Chi says:

    Perhaps further divulgence of open source protocols will provide more and more of us — even in emerging nations — with the power to enrich lives via enriched data.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 27 other followers

%d bloggers like this: