Nov
15
2006
Greg Linden, who Sam and I had the good fortune to meet at the Web 2.0 Summit this year, has posted a link to some excellent class notes on data mining. Lots of interesting topics on clustering and relevancy analysis usefully condensed and summarised. I highly recommend Greg’s blog if you’re interested at all in large-scale search and recommendation systems in general.
Aug
05
2005
Here’s something that occurred to me whilst enjoying Shelley’s Cheap Eats at the Semantic Web Café posting again. I added tagging to this site using Jerome’s tagging plugin for WordPress and then hacked around with Tom Gilbert’s del.icio.us plugin to fetch a list of my bookmarks for each tag. Now when I click on a tag I get to see recent thing I’ve written together with things I’ve bookmarked. Useful. In fact, so useful that I keep coming across things that I’d forgotten about. Sometimes I don’t even remember bookmarking them in the first place.
Here’s my revelation: I don’t actually visit del.icio.us all that often. For me, del.icio.us is a write-only environment. I fire and forget. I’m bookmarking because I might one day want to go back and use find it but in practice I rarely do. I seem to remember that the last time I did try to find something I’d bookmarked, I couldn’t remember the tags I’d used or even if I had bookmarked it and I ended up with Google anyway.
Part of the problem is that I’m not consistent: I seem to have things bookmarked under web-services, webservice and webservices. But I thought that was the mantra of folksonomy: no controlled vocabularies.
Looking at my tagcloud I can see lots of tags that never realised their full potential such as sidebar (I thought perhaps I could tag useful Firefox sidebar extensions) or stats. These languish, barely used, because I can’t remember them or why I thought they were useful.
My inconsistency and bad memory aside, the fact of the matter is that tagging systems and folksonomies are great for organising, but boy do they suck when it comes to finding something. Google still wins hands down – I just have to plug in the keywords (tags!) that are relevant to me now, rather than those that were relevant a few months ago. This is an area where controlled vocabularies win too: they’re designed for locating things quickly, not for ease of categorisation (e.g. Dewey, dmoz etc).
I think there’s a solution and it’s a web 2.0 thing. In the same way that Google desktop inserts results from my hard disk into any Google web search, I want all the pages I’ve bookmarked to be searched and shown first whenever I search in Google. Maybe I could do this as an extension to Google desktop, but a better solution would be for Google to allow me to register my RSS feeds with them. Then, they could subscribe to my feeds to learn what I’ve recently read or bookmarked and show those at the top of any search results. That would be extremely cool and infinitely useful!
Aug
05
2005
I recently added tagging to this weblog. But, as Om Malik writes, all is not well in the tagging world:
So this is where I lose the plot – I tag my post, Technorati benefits, and despite all that, my tags help spammers who clog my RSS readers gain more readers. That’s absolutely rotten! So essentially the spammers can write a script, generate tags, stay high on the Technorati listings and fool people into visiting their sites. By tagging I am helping this scumbags, the RSS-link blog spammers. This is clearly not going to help Technorati (or infact anyone’s reputation) as a good search tool.
This is a variation of comment spam but instead of them visiting our sites and defacing them, they deface Technorati and then we all link to them! How perverse can we be?
My tags don’t link across to anyone partly for this reason and also partly because there are so many places I could link them to that I don’t want to favour any one of them.
While we’re on the subject, I’ve decided that I’m going to change the tagging system here. Rather than me assigning tags, I should be
allowing others to tag them à la flickr. Think of this as microcomments – anyone will be able rate or categorise my postings, perhaps making associations I hadn’t thought of. Because they only link within my own site, spammers will have no
reason to abuse them.
Should be do-able with a few tweaks to the templates provided I can get the permissions right.
Jul
31
2005
It’s fun to see all these microformats being specified. I’m looking for some examples of where they’re being consumed – event aggregators, social networks, licenced works search engines? Got any good ones?
Jun
28
2005
The BBC has a prototype tagging system for BBC news:
We have built a social bookmarking tool just for BBC News that allows logged in users to tag/bookmark stories and view related stories that other users have tagged using similar terms.
If you go to any story from the front page and login as ‘guest’/'guest’ then you can start tagging stories and see how other people have tagged the same story.
The protoype uses XML-HTTPRequest to allow users to add tags without leaving the page and it also updates the related stories box based on applied tags so that users can see what others are doing in almost real time.
Jun
22
2005
Tagging for classification compared with tagging for annotation:
So here is that hypothesis – that the shift from people using blogs to blog represents the increasing dominance of a Flickr-style paradigm of tagging. Imagine the process of annotating a weblog – if you tag it with ‘blogs’ it seems clear that you are adding it to a collection of some kind. ‘Blogs’ is clearly the name of a folder which houses links to weblogs rather than an attempt to describe the weblog itself. But tagging something with the term “blog” suggests quite the opposite – to tag a link ‘blog’ suggests that I’m attempting to describe the link not as belonging to a bin labelled ‘blogs’ but simply as a ‘blog’ in and of itself. It is my conjecture, therefore, that the folder metaphor is losing ground and the keyword one is currently assuming dominance.