When creating RDF schemas based on existing data formats you soon hit the inevitable decision point of whether to simply produce an RDF description of the syntactic format or to interpret it and produce a semantic model of the format. If you stick close to the syntactic form of the original then conversion to RDF is often easy, usually just a simple mapping of tokens in the original format to property URIs in the RDF form. The original vCard in RDF note takes this approach – most vCard terms have simple equivilents in the RDF schema. This works but feels awkward because some of the semantics are being ignored or represented in sub-optimal ways.
Norman Walsh’s recent rework of the vCard schema takes a much more interpretive approach. It keeps most of the mapping but introduces new relations such as workAdr to specify the role of an address within an individual vCard. This interpretation makes the resulting data semantically richer and more expressive.
I’m facing a similar decision with my current RDF work. I’m looking at representing MARC records in RDF. MARC originated in the 1970s and is a compact format for exchanging bibliographic and other library data. Obviously, given its age it’s not XML or anything close. Some work on representing it in RDF has been done by a team at Deri resulting in marcont, an ontology for MARC.
The approach they have taken is more along the transilteration lines than interpretation. For example, MARC defines a field which, for Book records, consists of a number of data elements defined as character offsets. Character 17 indicates whether the book is a biography and the letter (a, b, c, d, # or |) indicates what kind of biography. The marcont ontology defines a corresponding Book class with an isBiography property with cardinality of 1 containing the letter (or some representation of it). The ontology is mirroring the syntactic structure of the record but is it expressing the semantics?
In my current research project I’m exploring whether a more semantically rich representation of MARC would give benefits in terms of queryability via Sparql. My suspicion is that it would but getting the rich model is not easy. Here’s another example.
Field 008 in MARC contains a pair of dates. Character position 06 gives you the semantics of date data held in positions 07-10 and 11-14. If the character at position 06 is ‘s’ then the first date represents a single known date, the second date area should be blank. However, if the character at position 06 is ‘q’ then it represents a questionable date, the first area gives the lower bound and the second area gives the upper bound for the range. If the character at position 06 is ‘e’ then the first date area contains the year of a definate date while the second area encodes the month and day (mmdd)!
A straight transliteration of this format might yield something like a property for dateType and two properties date1 and date2. However, the meaning of date1 and date2 depend on dateType so any queries across the data would have to involve all three properties. A semantic interpretation of the record might introduce separate properties for the types of dates, e.g. singleKnownDate, definateDate or questionableDateRange. These would be easier to query and less prone to accidents involving misinterpretation of the properties (e.g. is ’1123′ a year or mmdd?)
There’s huge value in interpretations but they’re costly to analyse and being somewhat subjective can be the subject of much debate. Also the data needs to go through a much more detailed conversion process. Transliteration is easier, quicker and cheaper but far less satisfying.

[...]Re: Transliteration or InterpretationIan Davis asked the question: When creating RDF [...]
Ian — MODS is in some ways a re-expression of MARC in natural-language XML, so it might not hurt to look there. I’d prefer to ditch MARC altogether and simply find a decent way to map that data into the newer, better, world of RDF, FRBR, DC, etc.
Bruce, I’ve looked at MODS too and of course there are mappings between MARC and MODS (http://www.loc.gov/standards/mods/mods-mapping.html for those following this conversation) but much of the library world still revolves around MARC. There are vast amounts of authoratitive data in MARC format and most systems are simply doing flat searches across the records. Like you, I’d like to be able to perform deeper and broader queries, mix in data from other sources such as holdings, institutional data, name authority files, FRBR databases etc.
Yes, all true.I was part of an interesting conversation on the MODS listserv awhile ago that involved discussions of RDF (me: “why the hell aren’t library XML standards MODS and MADS RDF?”) and also FRBR. Barbara Tillett was explaining that library cataloguing is moving in the direction in the future of reusing existing metadata where possible, but placing into a wider FRBR-based view. It struck me that the way she was talking sounded very much like a semantic web perspective.BTW, Andy Houghton at OCLC has done some work (not sure how much) on an RDF/XML representation of MARC data. IIRC, it looked like MARCXML, only RDF. You could always ping him.
[...] Transliteration, Interpretation and AtomOWL Ian Davis asked the question: When creating RDF schemas based on existing data formats you soo [...]
I have two canonical examples, well, three really. And there’s two issues: Even if you capture the semantics do you remain faithful to the modeling style of the original?:1. WSDL mapping to RDF. I tried to model the component model *exactly*, so that my descriptions were transliterations of the WDSL spec. The WG made a choice that, by and large, when you had multiple values for a property, you would model that as a single relation between the parent and a *set* of values. It’s clearly more natural in RDF & OWL to model that as *multiple properties*. The SWBP group and others strongly preferred the naturalness (and somewhat more complex mapping) to fidelity.2. WS-Policy. (see: ) When I first tried to explain what I wanted to do, several people tried to encode the *syntax* of WS-Policy as OWL classes and properties. Thus you had an Operator superclass that had All and ExactlyOne as subclass. Bleah! Worthless! The point was to notice that the operators mapped on to OWL connectives so that you could get interesting services via reduction to OWL inference.3. OWL-S. The process model *has* a formal(izable) semantics, but the OWL docs just encode it in an incredibly clunky manor.Ooh, another:4. OWL itself (and rules languages). See this thread:
[...] RSS 1.0 (comments) « Transliteration or Interpretation | MARC Transliteration [...]
Hi!Interesting discussion…You should have a look at my work with the binding of IEEE LOM to RDF:http://kmr.nada.kth.se/el/ims/metadata.html
Mikael, the LOM work looks very interesting – there’s certainly lots to read there over the next few days. Thanks.