Wed, Oct 1, 2008

Publishing Linked Data With PHP

For a while now I've been experimenting with writing my own little PHP applications that run against the Talis Platform. Most of these have never been seen in public because they're mainly just for scratching an itch I have at the time. I've also used a lot of them to validate my own thinking around the types of services that the platform needs to provide to build interesting applications. The core of most of those applications became Moriarty my PHP library for accessing the platform. I use Moriarty extensively now to kick start any development I do. I'm even using it to write PHP scripts for running at the command line. I'm not sure that PHP is going to usurp Perl from my toolbox, but it's certainly becoming my language of choice for working with RDF.

I've been looking carefully at the core patterns that my PHP applications have been following to see if there's anything else I could pull out. This is generally how I prefer to build new libraries: extracting them from several different projects. Assuming you know how your library is going to work before you've written any applications is almost always wrong. I like using libraries that have distilled the essence of repeated attempts at solving the same problem. That's why I never think about modularization of a codebase until I need to.

I've been gravitating towards Konstrukt because it appears to be the least intrusive of the PHP web application frameworks out there and it keeps fairly true to REST principles. I used it to build Kniblet as part of a platform tutorial. However, there are some quirks that it has that I don't like. For example, to return anything other than HTML requires you to throw an exception. That mechanism works quite well for most applications but doesn't really suit data-rich applications that have multiple output formats.

It's with this in mind that I've started a new PHP web application framework called Paget. Calling it a framework is somewhat of an overstatement. It's a few classes that make it easy to publish RDF as linked data. It's very primitive at the moment, but it's quite versatile.

It uses a simple configuration array that is passed to a dispatcher that handles the request. The application's default behaviour is specified using this configuration. One part sets up a series of regular expressions that match URI paths handled by the application and map them to the resources it provides. The data about each resource is obtained by using one or more "generators". These are simply classes that generate RDF for the given resource. Paget runs each generator to gather the RDF data describing the resource and then handles the serving up of that data according to linked data principles. Right now that's just enough behaviour to function as a generic linked data publishing framework.

I have three different deployments of Paget that are publishing three RDF data sets using different generators. Each of these was quite trivial to set up, being a few lines of confiiguration. For my own site's data space I wrote a generator that fetched RDF directly from one of my platform stores (this one) and served it up as HTML and various flavours of RDF. See, for example, http://iandavis.com/id/me which is URI that identifies me.

My second deployment was for PlaceTime, a URI space that I have operated since 2003. It provides RDF data for timelike entities like instants and intervals and spacelike points. However, it hasn't been fully linked data compliant (mainly because it pre-dated the decision on httpRange-14). I wrote a generator for each type of entity that creates trivial RDF for each valid URI in the space. Some examples:

http://placetime.com/instant/gregorian/2003-05-19T08:00:00Z - an instant in time
http://placetime.com/interval/gregorian/1970-06-15T19:31:00Z/P1D - an interval in time
http://placetime.com/geopoint/wgs84/X-126.817Y46.183 - a point in space

Finally, I created a generator that reads a local RDF file. I then used it to serve up the whisky vocabulary that Tom, I and several others created at the recent VoCamp Oxford

Admittedly, all these datasets and spaces look pretty similar but this is still early days for Paget. I have some ideas for future development that will flesh out Paget into a fully-fledged RDF driven application framework. For example: as well as generators I plan to add filters, augmenters and transformers that alter the generated data in arbitrary fashions. These could be used to trim the data down, or to convert it to a more usable structure. I can imagine that it would be very useful to be able to pull in more RDF from arbitrary locations on the Web to supplement data in the initial set, e.g. with schema information or additional details. In my opinion that's one of the significant differences between the web of data and the web of documents: the web of data is going to enable more information to be brought automatically together for the user rather than forcing them to seek it out.

Paget's HTML rendering of RDF is very primitive at the moment, making only basic attempts to make it human readable. It's still extremely tabular which is hardly a great use of structured information. One area that I've been interested in exploring is that of dynamic user interfaces that adapt to the underlying data automatically. RDF is particularly amenable to building these kinds of interfaces because of its uniform data model. A lot of work on this was done by the Fresnel project and it would be interesting to apply some of the learnings from that project to building dynamic web applications. My goal here is to code as little specific behaviour into the application as possible, instead making the application detect patterns in the data and provide suitable user interface behaviours at runtime. This is really the only way we're going to be able to build true open world applications, i.e. those that are tolerant of missing data and can adapt to new and unanticipated data.

What I'm still experimenting with is whether these user interface additions should be server-side or passed on to the client. Some of the augmentations could make more sense when actioned by the client based on user activity.

There's lots to research here and hopefully some of these ideas will make it into Paget very soon.

Update: see my follow up post that describes the major revisions I made to Paget after this was written

Permalink: http://blog.iandavis.com/2008/10/publishing-linked-data-with-php/

Other posts tagged as data, moriarty, paget, php, platform, projects, rdf, tutorials

Internet Alchemy

Publishing Linked Data With PHP

Earlier Posts