Mon, Nov 10, 2008

Paget Iteration 2

A few weeks ago I released a small PHP framework for publishing linked data (see my earlier post Publishing Linked Data with PHP). Since then I have made a lot of changes to the code and ended up completely changing the application flow.

Previously all the behaviour was specified by a configuration array with a dispatcher class. I found that was limiting the flexibility I needed and the "simple" configuration array was becoming decidedly complex. The Dispatcher class has been replaced by a new UriSpace class which is responsible for identifying the resources identified by a group of URIs. Applications can create classes derived from UriSpace to encapsulate the behaviour of their resources. Resources are split into three categories: documents that can be served straight up, abstract resources and descriptions of abstract resources. The last two are where the interesting bits of Paget lie. An application will typically override the get_description method to return a custom description derived from ResourceDescription. This class does all the hard work of finding triples about the requested abstract resource.

A class derived from ResourceDescription can override several methods to customise the RDF returned:

get_resources: This method returns an array of resource URIs that the description will consider when generating its RDF. The default behaviour is simply to chop the file extension off of the description's URI. So, the description at http://iandavis.com/id/me.rdf will have a resource of http://iandavis.com/id/me.
get_generators: This returns a list of generators that seed the triples in the descrpition. The ResourceDescription class calls each generator's add_triples method once for each resource returned by the get_resources method. Paget has some pre-defined generators that can read triples from a local file or from a platform store. The default behaviour is to do nothing.
get_augmentors: This returns a list of augmentors that add triples to the description. Paget comes with a few built-in augmentors to augment with RDF from a platform store, annotate properties with human readable labels and even do some limited inferencing. By default the simple property labeller is returned as an augmentor.
get_label: This just calculates a sensible label for the description that could be used in the title of a web page or a link. The default behaviour is to look for an rdfs:label, dc:title or foaf:name for the primary resource in the description (which is the first one returned by get_resources). Applications could override this to use whatever heuristics make sense for their data.
get: This is the dispatch point for HTTP GET requests. At a later date I hope to handle other methods too, but for now Paget is a read only system
get_html: This is called by the get method to generate an HTML representation of the description. By default it uses Paget's SimpleHtmlRepresentation class but this is the point at which most customisations will take place for rendering linked data.

The HTML output from Paget has been revised too. The basic layout of the page is handled by the SimpleHtmlRepresentation class but some type-specific logic has been broken out into a number of "widgets". There's one for OWL ontologies, RDF classes and properties and a general one that can render any RDF data. The html representation chooses an appropriate widget based in the type of the primary resource being rendered. I'm thinking about adding widgets for people and various other common classes. This is all very early and experimental. Ideally I would like the page to adapt itself completely dynamically based on the underlying data. Switching on the class of a resource is rather simplistic, but it will do as a starter.

Here's an example of how I'm using Paget in my personal data space http://iandavis.com/id/me. All the data is held in a Talis Platform store. I handle requests to http://iandavis.com/id/ with some .htaccess rules that ensure every request is handled by a file called index.php which contains the code hooking the space up to Paget. In index.php I create a subclass of UriSpace called StoreBackedUriSpace that maps the URIs beneath http://iandavis.com/id/ to resources and their descriptions. That class creates instances of StoreBackedResourceDescription that use a StoreDescribeGenerators to fetch the descriptions from the platform store. The entire code for index.php (less PHP includes etc) is shown here:

class StoreBackedUriSpace extends PAGET_UriSpace {
  function get_description($uri) {
    return new StoreBackedResourceDescription($uri); 
  } 
}
class StoreBackedResourceDescription extends PAGET_ResourceDescription {

function get_generators() {
return array( new PAGET_StoreDescribeGenerator("http://api.talis.com/stores/iand") );
}
}
$space = new StoreBackedUriSpace();
$space->dispatch();

That's basically the pattern for publishing data using Paget: derive a class from UriSpace and override the get_description method to return a custom ResourceDescription. I do that to publish some vocabularies on vocab.org such as Bio and Whisky. The UriSpace for those locations returns a resource description class that uses the FileGenerator class to read the schemas from local RDF documents and the simple property labeller and the simple inferencer to augment the results. My other deployment, at placetime.com, uses a custom resource description for each type of resource with custom generators that create the raw triples based on the requested URI.

So far it seems that Paget is flexible enough to deal with these varied scenarios of data publishing. The next step is to start looking at editing of the data and providing more application functionality.

Permalink: http://blog.iandavis.com/2008/11/paget-iteration-2/

Other posts tagged as data, linked-data, moriarty, paget, php, projects, projects, rdf

Internet Alchemy

Paget Iteration 2

Earlier Posts