20 October 2009 by Ian Davis
The Linked Data design note lists four practices that lay the foundations of a web of connected data:
- Use URIs as names for things
- Use HTTP URIs so that people can look up those names.
- When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
- Include links to other URIs. so that they can discover more things.
These practices are now well known and implemented in hundreds of datasets. However, I think it is important to realise that these are the minimum requirements for a web of data about real-world things, a starter-kit if you like. They are not the final word on what will make up a rich web of data and there are many more things we, as data publishers, could be doing.
For example, rule 3 suggests that you provide some useful RDF when someone looks up the URI of a resource. That doesn’t mean you can’t publish more RDF about that URI at a different location. If you want to assert some additional information about http://dbpedia.org/page/Decentralization then just publish a document on your site. Crucially you don’t have to persuade dbpedia.org to add your triples to their database. There is no rule that the data found at a URI is the only relevant data about that thing – it’s just one privileged portion of the total data.
That leads nicely to another example. Rule 1 suggests that you use URIs to refer to real-world things. It says nothing about how or when you should create them. The convention so far has been to mint new URIs for things rather than try to find a pre-existing one URI. That’s an acceptable practice in the bootstrapping phase where the data is sparse in but it is saving up a big integration problem for the future. I think we should be encouraging people to reuse well-known identifiers such as those in dbpedia and geonames in preference to creating new ones.