More Than The Minimum

The Linked Data design note lists four practices that lay the foundations of a web of connected data:

  1. Use URIs as names for things
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
  4. Include links to other URIs. so that they can discover more things.

These practices are now well known and implemented in hundreds of datasets. However, I think it is important to realise that these are the minimum requirements for a web of data about real-world things, a starter-kit if you like. They are not the final word on what will make up a rich web of data and there are many more things we, as data publishers, could be doing.

For example, rule 3 suggests that you provide some useful RDF when someone looks up the URI of a resource. That doesn’t mean you can’t publish more RDF about that URI at a different location. If you want to assert some additional information about then just publish a document on your site. Crucially you don’t have to persuade to add your triples to their database. There is no rule that the data found at a URI is the only relevant data about that thing – it’s just one privileged portion of the total data.

That leads nicely to another example. Rule 1 suggests that you use URIs to refer to real-world things. It says nothing about how or when you should create them. The convention so far has been to mint new URIs for things rather than try to find a pre-existing one URI. That’s an acceptable practice in the bootstrapping phase where the data is sparse in but it is saving up a big integration problem for the future. I think we should be encouraging people to reuse well-known identifiers such as those in dbpedia and geonames in preference to creating new ones.


Other posts tagged as random-stuff, technology

Earlier Posts