It’s Not RDF versus Microformats
Yesterday's post provoked a rather aggressive response from Tantek Çelik, leader of the Microformats community.
Since my post wasn't about microformats, I'm a bit surprised at the tone of the response. I guess it must be the CSS analogy that drew Tantek's attention. I don't like drawing comparisons between mf and semweb since I see them as complementary technologies not opposing ones. They're trying to solve completely different problems. I've seen too much abusive behaviour from both sides in the past few years: we need a little respect around here people. As I pointed out to the SWEO list last week:
I think it's very unproductive to pitch RDF vs Microformats and neither "side" should be arrogant enough to think they have nothing to learn from the other. We need to be building bridges with the MF community.
When I visited San Francisco back in 2005 I spent a little bit of time with Tantek and some of the other microformat people over lunch and dinner. I went there as a microformat skeptic but I kept an open mind and I paid attention to what they were saying. I learnt some stuff about hidden metadata and not repeating yourself and I wanted to bring that back to the RDF community. A few fevered weeks later I'd written Embedded RDF but I'd never be arrogant enough to call it a microformat, the closest I'd allow is that it's microformat-like. Hijacking micrformats with RDF would be incredibly disrespectful. I also happen to think the converse is true.
I'm presuming that Tantek commented on my blog to get some response and generate some debate, so I've pulled out most of his comment here.
Technology adoption is following the paths of least resistance: microformats is solving (has solved most of) the semantic web publishing and search problem with no apparent “chasm†as observed with other noted efforts.
I think Geoffrey Moore's analysis applies to the adoption of disruptive technologies. Microformats aren't disruptive, being more of a sustaining innovation and an incremental improvement over previous HTML practices. But I still think microformats are struggling to move from the early adopters to the early majority since there is still no real benefit to be had from adopting them. For most people they're solving a problem that doesn't need to be solved because they aren't experiencing pain when trying to do the things microformats are supposed to make easier. It's akin to having a business card scanner: great if you have loads of business cards and you're an obsessive salesman, but the majority of us do just as well keeping the little bit of card on our desks or perhaps transcribing the email address to our mail program. Most people don't need microformats because it's not all that difficult to note down the date of an event or the address of a company.
More difficult (learning new languages), uglier (namespaces syntactical vinnegar), and simultaneously fragile (duplication of *data* in hidden files or hidden comments or hidden tags which violate the DRY principle) approaches are likely to see less and less adoption over time, in comparison to simpler, higher reliability/fidelity approaches (re-using/â€salting†semantic information already being published in HTML on the web).
I don't dispute that learning a new language is more difficult. That's somewhat of a tautology anyway. I had to learn HTML once, and I distinctly remember thinking I must learn XML but it seemed so complex I put it off for ages (this was before namespaces). I've learnt CSS but I definitely can't say I've mastered it.
I also don't dispute the ugly argument and I've pointed out the same many times, sometimes to the frustration of my fellow semwebbers! I've also been heard to say that adoption of XML was the single greatest mistake in the development of RDF. There, I've said it again. I still believe that to be true.
I have a problem with the notion of fragility as characterised in Tantek's comment. His assumption is that RDF involves duplication of data. I don't see how that can be true when HTML can't express the data that can be expressed in RDF. If your world is limited to business cards, social events and reviews then I can see where you might favour expression in HTML. Humans want to read those things too. But the world is bigger than social networking and there are other types of first-class data that people want to exchange with one another. I'd prefer to use the right tool for the job where possible.
GRDDL+XMDP+microformatted web pages (read: Semantic (X)HTML transformed) is the most likely path to providing an Uppercase Semantic Web view of the existing rapidly growing lowercase semantic (HTML) web of data for those wishing to use those tools and technologies. Simultaneously, the growth of open source libraries which provide direct access to the intrinsically semantic microformatted web is providing an alternative to using a transformed intermediate abstract model or representation.
Well, I don't disagree with this and it's great to have an endorsement of GRDDL as the best way to connect the HTML web of data to the RDF one.I hope, with the mention of XMDP, that this is an implied commitment to encouraging the use of profile URIs on microformatted pages.
The growth of semantics in the existing HTML web (rather than a parallel web) and the increasing diversity of tools for accessing those semantics via a variety of models is rapidly advancing the state of the art for all semantic web approaches, now, not in 3-9 years.
Again this is true, although "state of the art" doesn't square with being mainstream. Nobody wants two webs, but many of us want the freedom to publish whatever data we have into the existing web without a centralised process.
In return, I'd like to offer Tantek some constructive suggestions (respectful I hope). Firstly, the microformats community process needs some attention. Contrarian views appear to be suppressed and the environment is very intimidating for newcomers. I have direct experience of a similar community and it pushes away many of the people who could make a valuable contribution. I have to ask if you're views are welcome in our community, why aren't ours welcome in yours?
Secondly, the specifications need work. In my opinon, in their current form most of them are not accessible to the majority of content authors or web developers. Who, for example, is the hCalendar specification aimed at? It appears to assume the reader is familiar with RFC2445, a particularly human-unfriendly format. What about the people who want to simply add hCalendar to their home page or their company's site template? I think that's a larger audience and the specification should reflect that, describing how the actual event information is incorporated into the HTML without the intermediate step of having to map to iCalendar first. If the community process were more tolerant I might feel empowered to help make those changes.