Feb 14 2007

Remixing at XTech

Published by Ian Davis under Uncategorized and tagged as , , ,

I’m excited because I just got back from a great long weekend away walking the ancient Wiltshire landscape and found a message in my inbox confirming that my XTech proposal had been accepted. My talk and paper’s going to be on RSS Remixing – using RSS 1.0 to “augment” feeds and search results.

Here’s the abstract I submitted:

In this session I’ll demonstrate and explain a new ultra-simple protocol for augmenting search results with enrichments and related content. Similarly to OpenSearch the protocol is RSS based. But instead of sending the search terms to each registered search provider we send the results and ask the providers to inspect them and add anything else they know about the items. I’ll show how this protocol allows rich search applications to be built very simply by remixing results from several different data sources.

As Adam Bosworth pointed out, using RSS and its kin as a data transfer format makes a lot of sense when you consider the number of clients that can consume it. It is particularly useful for ordered sets of items such as the output of search engines. We used RSS to design a very simple protocol for combining results from different search engines into a single RSS feed that can be consumed by any feed client or application. In fact, because the final result is itself RSS it can in turn be used to augment results from other searches.

One untapped feature of RSS 1.0 is the potential of merging multiple RSS feeds together. The underlying RDF nature of RSS 1.0 provides this capability for free and with some work it could be adapted to work with other flavours of RSS and Atom. Using a range of techniques from simple pattern matching to Sparql querying of the full feed we can look up related items and with a few simple conventions mix the results into the original feed.

One example I’ll use will show how an RSS feed of book details can be used to build a simple book search application. Other search services can match on key information such as ISBN, authors and titles to augment the book information with jacket images, reviews, author biographies and relevant links, all drawn from different data stores.

There are, perhaps, some similarities with Yahoo’s pipes but the main distinction is the embarrassingly parallel nature of our approach plus the focus on augmenting results rather than filtering or aggregating. We used this technique to build Cenote at Talis, which is really a tiny PHP script and some XSLT leveraging the RSS remixing services we’ve implemented in the underlying platform.

Comments Off

Feb 01 2007

Yahoo’s Marketing Platform

Published by Ian Davis under Random Stuff and tagged as

John Battelle has some interesting notes about Yahoo’s push towards becoming a marketing platform, partnering for promotions by drawing on content from all over their huge network of sites, much of it user generated:

The company will tie together its disjointed properties — such as search, groups, Flickr, Answers, avatars — to lead back to pages about a certain pop culture topic — for instance, Nintendo’s Wii.

Is this a credible alternative to the advertising platform that Google has built? They’re both aggregating eyballs and attention but in distinctly different ways.

Comments Off

Feb 01 2007

Fotolog Out-Socialises Flickr

Published by Ian Davis under Uncategorized and tagged as

At the end of last year I wrote:

Web 2.0 is about using the Internet for what it’s good at: building web applications that enable and benefit from human social behaviour on a massive scale.

I came across this recent post by Jason Kottke which might back this up. He notes that Alexa’s traffic stats put Fotolog at number 26 on the list of most popular sites, while Flickr, despite all of Yahoo’s backing, is down at 39. He then postulates a reason for this:

Flickr is more editorially controlled than Fotolog. The folks who run Flickr subtly and indirectly discourage poor quality photo contributions. Yes, upload your photos, but make them good. And the community reinforces that constraint to the point where it might seem restricting to some. Fotolog doesn’t celebrate excellence like that…it’s more about the social aspect than the photos.

And the conclusion he draws is:

Maybe tags, APIs, and Ajax aren’t the silver bullets we’ve been led to believe they are. Fotolog, MySpace, Orkut, YouTube, and Digg have all proven that you can build compelling experiences and huge audiences without heavy reliance on so-called Web 2.0 technologies. Whatever Web 2.0 is, I don’t think its success hinges on Ajax, tags, or APIs.

Fotolog, MySpace, Orkut, YouTube, and Digg – all intensely social applications built on (with all due respect) what many Web 2.0 pundits would brand as mediocre technology.

3 responses so far

Dec 11 2006

Redefining Web 2.0

Published by Ian Davis under Uncategorized and tagged as

Tim O'Reilly at Web 2.0 conferenceTim O’Reilly has posted yet another definition of Web 2.0. This time he has synthesised several earlier definitions into something that he hopes is clearer:

Web 2.0 is the business revolution in the computer industry caused by the move to the internet as platform, and an attempt to understand the rules for success on that new platform. Chief among those rules is this: Build applications that harness network effects to get better the more people use them. (This is what I’ve elsewhere called “harnessing collective intelligence.”)

While I think this is still too woolly, I thought I’d pick up on the “chief” rule of “applications that harness network effects to get better the more people use them”. Now, I actually believe this is more a definition of social software than Web 2.0 in general. It’s very similar to the definition Tom Coates wrote almost two years ago:

Social Software can be loosely defined as software which supports, extends, or derives added value from, human social behaviour – message-boards, musical taste-sharing, photo-sharing, instant messaging, mailing lists, social networking.

However, O’Reilly’s definition is slightly stronger. It distinctly says that the application gets “better” the more people use them. In the comments to that post, a visitor asks whether that final phrase means “to get better as more people use them” or “to get better as people use them more”. O’Reilly’s answer is:

…”both”. More people, or more usage, or both. A system with lots of people doing very little might be less powerful than one with a mid-sized group doing a lot. Most participatory systems involve both large, little-involved groups and smaller, more committed core groups

So the definition is applications that harness network effects to get better as more people use them or as people use them more. O’Reilly states his definition as a set of rules that will help inform businesses how to be successful using the Internet as a platform. I thought it would be interesting to explore a few Web 2.0 applications and see how this rule applies in each case. I often find that taking things to extremes helps me better understand the underlying problem so I imagined each candidate application having only a single user and asked if the experience were any different from having many users.

For starters, an easy one: eBay. My guess is that eBay must have tens of millions of users buying and selling all manner of goods. What if there were only a single user? Clearly in that case the single user can’t sell to herself and there’s nothing new to buy either. eBay is definitely better with more people using it so under O’Reilly’s definition eBay is doing a lot of the right sort of thing to be successful on the Internet.

last.fm? With only a single user this becomes simply a place to record your favourite tracks. There’s some utility in knowing your all time favourite artist and album, but it’s so much better when your tastes can be correlated with thousands of other people’s.

What about Basecamp, the lightweight project management application built by 37signals? If there were only one user then it seems to me that she could use Basecamp as effectively as when there are dozens or hundreds. Each user has their own project area which is entirely isolated from other peoples. This is by design but it means that Basecamp doesn’t get better the more people use it.

Stikkit was one of the startups launched at the recent Web 2.0 Summit. It’s an application that uses some smart pattern recognition software to overlay structure on your random notes. You can invite other people to contribute or comment on your notes. However, although Stikkit enables you to collaborate with others their participation doesn’t significantly improve the experience. The vast amount of utility that comes from Stikkit is available whether there is a single user or many.

What about something that is features heavily with the Web 2.0 mashup scene: Google Maps. Clearly the availability of the APIs coupled with some very interesting data makes this application a great poster-child for the Web 2.0 movement. However adding more users makes it no better than having a single user. The reason for this is pretty easy to see: Google Maps it’s purely a spectator sport. There’s no participation allowed or encouraged and, as far as I know, Google doesn’t even use incidental usage data to improve the application. So, perhaps this is a different kind of Web 2.0 application, one that doesn’t follow the chief rule of getting better the more people use it.

To be successful in any medium you need to exploit the advantages that medium gives you. You wouldn’t expect any televised news reporting to be successful if it were presented like a newspaper? The intrinsic advantage that television gives you is the capability to instantaneously broadcast moving pictures and sounds to millions of people. Any news reporting that doesn’t use that to the full or treats the medium like an extension of print is doomed to failure. O’Reilly is telling us that to be successful on the Internet you need to exploit the innate advantages of the Internet as a medium.

What are those advantages? There are a few, but the single most important one is the capability to enable almost zero cost communication and exchange of information between any number of people. In Tom Coates’ terms it enables human social behaviour on unparalleled scales. In the same way that television encompasses the advantages of print and adds new capabilities, the Internet enables the same broadcast mode of television but add the capability to incorporate multiway communication between the parties. Building applications that function as though the Internet were a cheaper way to broadcast moving pictures and sound is akin to newscasters simply reading out of copies of that morning’s newspaper on tv!

This communication advantage is the underlying reason for O’Reilly’s chief rule which could be expanded to:

Web 2.0 is the business revolution in the computer industry caused by the move to the internet as platform, and an attempt to understand the rules for success on that new platform. Because the primary advantage of the Internet as a medium is that it enables almost zero cost communication and exchange of information between any number of people the chief rule for success is to exploit this and build applications that harness network effects to get better the more people use them.

Basecamp and Stikkit are taking advantage of another capability that the Internet provides: location transparency. By storing data centrally and using the Internet to access that data these applications eliminate the need for their users to worry about where their data is. It’s available wherever they are. However, although this is a capability the Internet provides, it’s not unique and hence isn’t a significant advantage of the Internet medium. Both applications would work as well over a private dial-in network. Less convenient, but still perfectly workable. The same is true to a lesser extent of Google maps.

Because Basecamp and Stikkit aren’t exploiting the primary advantage of the Internet as a medium then I predict that unless they embrace supporting, extending, or deriving added value from, human social behaviour they will ultimately be left behind and fail.

The idea that the Internet uniquely enables this multiway communication isn’t new. There’s a long history from Tim Berners-Lee’s original web browser that let the user edit the page, through Dave Winer’s prescient Two-Way-Web ideas right up to Richard MacManus’ Read/Write Web. It’s no mystery why the Internet’s killer app has been email.

It’s also no surprise why the Bubble generation of Internet applications failed so spectacularly. They failed to realise the real advantage that the Internet brought and treated it like a cheaper broadcast or worse still a better newspaper.

Web 2.0 is about using the Internet for what it’s good at: building web applications that enable and benefit from human social behaviour on a massive scale.

3 responses so far

Nov 09 2006

The Great Database in the Sky

Published by Ian Davis under Uncategorized and tagged as ,

I’m baffled.

I’ve just watched MÃ¥rten Mickos from MySQL give a 10 minute talk on what he terms the “Great Database in the Sky” almost exactly describing the our community’s vision of a “web of data” while remaining completely ignorant of the semantic web.

To start, he characterised Google as giving unstructured people access to unstructured data whereas MySQL gives structured people access to structured data, meaning that MySQL is targeted towards developers who understand how to structure data “properly”. A strange polarisation in my view, but I guess he’s trying to put clear blue water between the Google approach and the traditional database approach. At Talis, we don’t see this distinction at all and our core platform technology, Bigfoot, unifies structured and unstructured data.

He went on to describe his vision of a skype for database access, combining my data, your data and public data into the next generation OLAP, running a trillion transactions per day. An example could be weather data and he asked what if you could run a SQL statement across all the data sources in the world, something like SELECT CurrentWindDirection, CurrentWindSpeed FROM AllTheWorldsWeatherStations, MyOwnWeatherStation, MyFriendsWeatherStation.

It’s a noble goal, but he’s not the first to suggest it. It’s also not a future vision because you can do it today with Sparql. It’s at the heart of Bigfoot and there are many other public services that can be used to learn and experiment. You can even query across HTML pages containing embedded structured data.

He followed it up by saying if this were achievable then a whole new generation of web 2.0 applications could be possible. Nothing controversial there, we share the same vision! But we think it’s closer than he does.

What else? Oh yes, he said “we may need a DNS of SQL servers” and that “routing may be an issue”. Another point of agreement, that’s why we built a directory of data collections and services and built web services to route straight into that content.

Then, “how do you make data definitions understandable to others?”. That’s almost like a problem statement for RDF! And yet he didn’t mention it in his list of technologies that might be candidates for the solution: RSS, Atom, Jabber, HTML, HTTP, XML, SQL and SMS.

He concluded his talk with the tagline “The data is the platform” and then took a question from the audience: “How is this different from the semantic web?”.

This is where it became evident that there is a deep disconnect between the traditional database community and the semantic web community. MÃ¥rten’s response was rather vague, that this wasn’t as broad as the semantic web and that the semweb includes unstructured data so wasn’t appropriate.

What a shame and what a failure of the semantic web community if the CEO of MySQL AB cannot see how his vision for an interconnected web of data is the same as ours! We must try harder and demonstrate at all levels the value of the semantic web approach to people like MÃ¥rten. SWEO and SWIG will help, but the convincing arguments will come from the practical applications of the semantic web being developed to solve real world problems.

Which is why I’m at Talis.

10 responses so far

Nov 08 2006

It’s All About the Infrastructure

Published by Ian Davis under Uncategorized and tagged as

One thing that strikes me about all the talks and presentations at this conference is that they all assume ubiquitous net access. Kind of ironic then that the wireless access here has gone the way of oceanic flight 815. So, since this is the web and I like to link in my posts, having two out of three page requests fail makes for very little blogging from me at the moment. Even though I’m sitting right under what looks like a huge wifi access point bolted to the ceiling and have great signal strength, it’s completely wasted when DHCP and DNS are out. You’d think at the Web 2.0 conference they’d actually have wireless that worked, wouldn’t you?

By some ultimate form of serendipity we just had Debra Chrapaty from Microsoft with a 10 minute presentation which gave me the inspiration for this post’s title. The presentation was a rather interesting tour of the new data centres that Microsoft are building. Truly awesome investments. It also illustrated the depths of competition that Google and MS find themselves in – literally competing for electricians to kit out their data centres.

It’s interesting to remember back to the days of the last boom and the massive investments by Worldcom and others. After the crash all that overcapacity became dark fibre that is now fuelling the growth of the current generation of Internet heavyweights. Are they now making the same bets as before and building the data centre equivalent of dark fibre for some future post-crash generation? Let’s hope not.

One response so far

Nov 08 2006

Day 2.0

Published by Ian Davis under Uncategorized and tagged as

It’s the start of the second day. We queued for ages outside the ballroom and then when the doors opened people actually ran through the hall to get to the front! We strolled on laser focussed on one thing only: power! So we’re sitting comfortably in the middle of the hall plugged in and awaiting the appearance of Jeff Bezos.

Comments Off

Nov 07 2006

Hmmm SOA

Published by Ian Davis under Uncategorized and tagged as

Being more resource-oriented than service-oriented, I approach this session with trepidation.

First up is Carol Jones from IBM to talk about a trio of software patterns for Web 2.0. The first is “Software as a Service” which has the following characteristics:

  • Service, not software
  • User-driven adoption
  • Value on demand
  • Low cost of entry
  • Public infrastructure
  • Most importantly… tight feedback loop between providers and consumers

This is followed by “Community Mechanisms”

  • Users add value
  • Recommendations
  • Social networking features
  • Tagging
  • User comments
  • Community rights management

The third pattern is “Simple User Interfaces and Data Services”:

  • Easy to use, easy to remix
  • Responsive UIs (AJAX)
  • Feeds (Atom, RSS)
  • Simple extensions
  • Mashups (REST APIs)

Her feeling is that enterprises will prefer on-premise deployment so that even when you grey out the SaaS part of Web 2.0 then you still have something useful and new. All the syndication technologies work well in the enterprise because CIOs are dealing with backlog of applications… but more simple services would enable more people to get involved in building the applications in the company. Hey, I agree!

Carol now demos a system for organising projects and activities. Some features: can invite people to your project, but no strong access control; tagging; bookmarklet to add links to external pages to the system – tagged and assigned to multiple activities; you can post anything relevant to the activity sort of like a blog, e.g. chat logs.

Each activity is actually an Atom feed styled to HTML. Posting the content is just adding to the feed. No mention of APP but presumably this would be the main protocol. Wonder if this is using Queso? That would make it RDF under the hood :)

Next up is a Bob from American Express. They view Web 2.0 as primarily about improving the user experience and expanding convenience and reach by using RSS. They’re trialling wikis internally and their approach is moving towards simplicity with REST interfaces

He talks on SOA – makes data more readily available but security integration is a challenge and Web 2.0 itself creates new challenges. Mashups can bring data to life though. They see an opportunity to simplify, to enable more functionality, more quickly and with lower cost.

He sees the following enterprise challenges for adopting Web 2.0 technology:

  • security – if we can’t secure it we won’t deploy it; protecting privacy and including identity in mashups
  • manageability, performance and scalability – overly complex integration; unpredictable throughput, capacity and difficult to monitor
  • development – fast and easy versus well designed and engineered; lack of standard development methods, tools and design patterns
  • business value – broad business value of social networking?
  • He emphasised that these are not criticisms but challenges that need to be overcome.

    Back to Rod Smith of IBM who puts up a slick enterprise-style slide showing SOA moving to Web 2.0 with all the tech buzzwords like Atom, RSS, XML, REST, Ajax, JSON. At the bottom was even the phrase “Web Oriented Architecture” – but he doesn’t mention it :(

    What should businesses do? There are a whole bunch of applications that are not being written because current processes are slower than the speed at which businesses operate and make deals. He recommends a book Igniting the Phoenix which informs his viewpoint on this. Situtational software – build applications that meet an immediate business need and solve a small definable problem. I hope the point he is trying to make is that by adopting Web 2.0 techniques applications can become very cheap and easy to create since the levels of abstraction are right for reuse of data across the enterprise.

    So, I’m pleasantly surprised, which is great!

    Comments Off

    Nov 07 2006

    Whose Data?

    Published by Ian Davis under Uncategorized and tagged as

    I’m now sitting in the Whose data is it? workshop which is just starting. It turns out that the workshop now has a new title “Open Data Workshop” which sits very well with our work on open data licences

    First up is Marc Hedlund who is referring to the O’Reilly open data quote that Paul blogged on a little while back. Hmm, he’s even referring to open data licences, but only mentioning Creative Commons by name. Points out that all the big map providers use MapTech data and then moves on to describe the OpenStreetMap project, one of my favourite examples of the new open data movement.

    Now, back to 1994 and a message from someone asking if there was a public archive of Usenet news. Someone replied that it would be infeasible due to the size. It turns out that it was Marc himself!. He uses this to point out Google’s policies on data retention, allowing users to ask for posts to be removed.

    Next, he puts up a slide of a “Data bill of rights” which highlights that export and delete are the two most important functions if you are dealing with user contributed data. That’s something I definitely agree with and is a fundamental component of our platform approach.

    Wesabe’s full data bill of rights is as follows:

    • You can export and/or delete your data from Wesebe whenever you want
    • Your data is your data, not ours. Our job is to help you understand and act on your data
    • We’ll keep all your data online and accessible for as long as you have an account. No “archive access” charges
    • Any data you want us to keep private, we will
    • If a question comes up not covered by these rights, we will answer it remembering that your data belongs to you.

    This was followed by a piece by Stewart Butterfield of Yahoo/Flickr about how Flickr treats open data. I’m not sure how this sits with the official Yahoo! policy which appears not to have changed since I wrote that Galway paper submission 3 years ago. For example section 9 effectively says that you give Yahoo! a licence to use your postings to Yahoo! Groups and your uploads of photos, graphics, audio or video. This licence lasts until you tell them to withdraw the content. But all other content gives Yahoo! a:

    perpetual, irrevocable and fully sublicensable license to use, distribute, reproduce, modify, adapt, publish, translate, publicly perform and publicly display such Content (in whole or in part) and to incorporate such Content into other works in any format or medium now known or later developed.

    Cool to see the first part, but it truly sucks to see Yahoo! claim rights over the remaining content. Booo!

    2 responses so far

    Nov 07 2006

    Yahoo!’s Web 2.0 Strategy

    Published by Ian Davis under Uncategorized and tagged as

    The first workshop of the first day. I got into the room early and am right at the back in the corner… next to the power!! Why, oh why don’t these conferences ever sort out the power? That’s what comes of choosing a 150 year old building to host the conference, beautiful though it is. Onto the workshop…

    Brief notes only…

    First up is a set of slides describing Flickr’s building blocks of participation which is very similar to our thinking at Talis:

    • user generated content – not licensed from providers but contributed by users
    • user organised content – tagging, categorising
    • user and publisher distributed content
    • user developed functionality – exposed api etc

    The discussion moves onto tagging and how it gives social context particularly through the recent introduction of geocoding. A search for lighthouses shows photos around the coast and a search for route 66 shows that photos along that road on a map. Sort of infered semantics, but there’s the obvious sloppiness inherent in this – probably ok if your only audience is humans.

    Dimensions of participation…

    • WHAT: topic, tags, categories
    • WHEN: time, events, duration
    • WHO: identity, reputuation, relationships
    • WHERE: location, size, surface

    It’s interesting to see these being rediscovered in the web2.0 world, see Danny’s work

    The next presentation was on the Yahoo! brand and how it sits with user generated content. Yahoo see an evolution of the media from mass media through my media to we media. Marketing is one voice of many. You have to imagine the slide: A series of onion layers with friends and family at the center, then peers, experts, media and marketing at the outside. In one sense I imagine that the layers depict influencing relationships, and levels of trust. I’d be interested to expand this to show multiple centers, each person is influenced mostly by their own family and peers, but the experts and media affect us all.

    Some description of Yahoo’s “unique” DNA. A slide was shown which looks like a four column version of the parthenon.
    The roof is labelled “yahoo user experience”, the four pillars are content, personalization, community and search and these rest on on: user profiles, preferences and ratings.

    There was some discussion on brand partners and how difficult it is to align these with the use of user generated content. Lots of moral issues. Nikon did it by using APIs to pull in Flickr content and encourage people to tag and explore from a very heavily Nikon branded site.

    Comments Off

    Next »