Wed, Dec 3, 2008

Introducing OpenVocab

OpenVocab is a project I have been working on in my spare time since spring. The recent VoCampsin Oxford and Galway gave me the opportunity to focus on getting it public and usable. The idea behind it is quite simple: it's a collaborative space for building an RDF schema. It provides a simple editing interface for classes and properties plus some wiki-like characteristics that hopefully will allow many people to participate.

Often I find that people have ideas for a couple of useful properties or classes but they don't want to go to the trouble of deciding on URIs, writing the schema, fixing problems and committing to making it available for the long term. They have better things to do. The goal of OpenVocab is to remove those barriers and make it incredibly simple to turn an idea into reality. I set up vocab.org a few years back to help with the problem of persisting schemas in a stable fashion for long periods. OpenVocab adds the other parts of the equation: naming, authoring and maintenance.

Importantly all the data and the content submitted to OpenVocab is in the public domain, free and unencumbered by any restrictions. The code that runs the site is open source as are all of its dependencies and libraries. I want this project to persist beyond a single person or organisation.

The Mechanics

All the vocabulary terms share a common URI prefix: http://open.vocab.org/terms/

The term URIs are set up to perform 303 redirects to a document describing the term. That redirection uses the HTTP Accept header to pick a suitable format. For example, visiting http://open.vocab.org/terms/favouriteDrink with a web browser will normally redirect to the HTML description http://open.vocab.org/terms/favouriteDrink.html. An RDF crawler or explorer will be sent to the RDF version http://open.vocab.org/terms/favouriteDrink.rdf. There are also turtle and JSON/RDF versions.

A full list of all the terms in the vocabulary can be obtained from http://open.vocab.org/terms which also uses content negotiation to pick a suitable format. The HTML version lists all the properties and classes in a simple alphabetical summary. The RDF version includes full descriptions of every term and of the vocabulary itself.

I enforce some strict naming conventions on term URIs, mainly because I like to have them be consistent. For example, terms are constrained to be alphanumeric plus hyphens. Properties start with a lower case first letter, classes with an upper case one. These are just generally accepted RDF conventions but there's no technical justification for them, just my aesthetic.

I'm also recommending that labels be written in the role noun style. I encourage this by including a little bit of Javascript that dynamically plugs the label being typed into sample sentences to get the context of use right. For example property labels get plugged into sentences like "foo is "author" of thing" and "thing has "author" foo". I'm also encouraging the additional of plural labels and using the same kind of hints: "the "authors" of thing are foo and bar" or "foo and bar are "authors" of foo". There's obviously no validation on the values of these labels (I certainly don't want to write the code that checks that grammar) but providing hints like these could help keep the labelling of terms in the schema consistent. Once again this this an aesthetic choice, but all software should be opinionated.

OpenVocab supports a smattering of RDFS and OWL to help relate properties and classes to one another as well as to define their semantics. The subset of OWL supported is roughly what many people are calling RDFS++ or OWL Mini

Behind The Scenes

OpenVocab is written in PHP and uses my moriarty library and Benjamin Nowack's ARC RDF library. The schema data is stored in a Talis Platform store (http://api.talis.com/stores/openvocab) which means it's searchable and SPARQLable. Currently it uses the Konstrukt PHP framework to handle all the webby stuff but I plan to port it to my paget framework. All the code for OpenVocab is open source and available from Google code. I plan to write a more detailed account of how it works over on the n² blog.

Future Plans

I'm working on some visualisations of the vocabulary. Some of these experiments are in the Google code subversion repository but I'm still exploring ideas. I want to be able to see the relationships between classes and properties like an entity relationship diagram and view class and property hierarchies.

My next big plan is to add OpenID support. I'm in two minds whether to force the use of OpenID before allowing edits or retain anonymous editing. Suggestions and comments welcome on this issue.

At the moment every term created in OpenVocab is deemed to be "unstable", i.e. it is subject to change at any time by any person using the site. Clearly that makes it difficult to build applications that depend on the meanings of the terms even if you subscribe to a Wittgenstein view of the world where meaning is use. My plan is to introduce a way for terms to migrate to being stable. One idea is that once a term has survived for 6 months without any edits to its semantics then it could change status to "testing". Editing a term with this status would be more difficult and would reset the status back to "unstable". If a term survives the "testing" phase for 12 months then it could move to a "stable" status where it would be locked down and become extremely difficult to change. That's just one possible sequence and I'm looking for some suggestions here.

I have ideas for many minor additions too such as RSS feeds of the list of recent changes and just of new terms added. It would be nice when you enter the URI of a non-existent term if you were presented with a page prompting you to create it rather than a 404. I have more ideas around better usability and prompting to reduce all that typing of URIs. These are all the sort of things that I'll probably do as late night wind-downs.

I also think there's no reason why the codebase couldn't be modified to manage multiple vocabularies. Currently it assumes a single global vocabulary but this could be switchable based on the URI used to access the site. If options were added to restrict edits to logged in users then the codebase could become a general purpose schema editing tool, one that has the advantage of keeping its data natively in RDF.

Competition!

Finally, OpenVocab needs a logo. I exhausted my limited artistic talent a few years ago so I'm looking for help here. There must be someone out there who could create a nice logo for the website, something that would work in the top-left area of the header and as a favicon. Email me your ideas at nospam@iandavis.com and I'll publish them on the OpenVocab site and work out some way for the community to vote on one.

If you have any suggestions or want to get involved in this project then leave a comment here, email me or post a message to the RDF Schema Dev mailing list.

Permalink: http://blog.iandavis.com/2008/12/introducing-openvocab/

Other posts tagged as data, openvocab, owl, projects, projects, rdf, vocabularies

Internet Alchemy

Introducing OpenVocab

The Mechanics

Behind The Scenes

Future Plans

Competition!

Earlier Posts