Refactoring BIO with Einstein Part 5: Updated Vocabulary

It's been four years since the last instalment of this series. Over the past couple of months I have revised the BIO vocabulary and have incorporated many of the changes I've been discussing over the course of this series of posts. Now I am going to revisit some of the examples from my earlier posts and bring them up to date with the new vocabulary.

The BIO Vocabulary models the biography of a person as a series of interrelated events. There are now classes for many different kinds of life event and the beginnings of a series of properties that link these events to people, places, times and other events. The vocabulary can be found at its usual location: http://vocab.org/bio/0.1/

Back in part 1 I set about trying to model this paragraph from Wikipedia:

Einstein was born at Ulm in Württemberg, Germany; about 100 km east of Stuttgart. His parents were Hermann Einstein, a featherbed salesman who later ran an electrochemical works, and Pauline, whose maiden name was Koch. They were married in Stuttgart-Bad Cannstatt. The family was Jewish (and non-observant); Albert attended a Catholic elementary school and, at the insistence of his mother, was given violin lessons.

My first attempt looked something like this (see this RDF file)

<foaf:Person rdf:nodeID="albert">
  <foaf:name>Albert Einstein</foaf:name>
  <bio:event>
    <bio:Birth rdf:nodeID="albert-birth">
      <rdfs:label>The birth of Albert Einstein</rdfs:label>
      <bio:date>1879-03-14</bio:date>
      <bio:place>Ulm, Württemberg, Germany</bio:place>
      <time:intDuring rdf:nodeID="hermann-and-pauline-being-married"/>
    </bio:Birth>
  </bio:event>
  <bio:event>
    <bio:Event rdf:nodeID="albert-attending-elementary-school">
      <rdfs:label>The event of Albert attending elementary school</rdfs:label>
      <time:intAfter rdf:nodeID="albert-birth"/>
    </bio:Event>
  </bio:event>
  <bio:event>
    <bio:Event rdf:nodeID="albert-taking-violin-lessons">
      <rdfs:label>The event of Albert taking violin lessons</rdfs:label>
      <time:intAfter rdf:nodeID="albert-birth"/>
    </bio:Event>
  </bio:event>
</foaf:Person>

<foaf:Person rdf:nodeID="hermann"> <foaf:name>Hermann Einstein</foaf:name> <bio:event> <bio:Marriage rdf:nodeID="hermann-and-pauline-marriage"> <rdfs:label>The marriage of Hermman Einstein and Pauline Koch</rdfs:label> <bio:place>Stuttgart-Bad Cannstatt</bio:place> <time:intMeets rdf:nodeID="hermann-and-pauline-being-married"/> </bio:Marriage> </bio:event> <bio:event rdf:nodeID="hermann-and-pauline-being-married"/> </foaf:Person>

<foaf:Person rdf:nodeID="pauline"> <foaf:name>Pauline Einstein</foaf:name> <bio:event rdf:nodeID="hermann-and-pauline-marriage"/> <bio:event rdf:nodeID="hermann-and-pauline-being-married"/> </foaf:Person>

<bio:Event rdf:nodeID="hermann-and-pauline-being-married"> <rdfs:label>The time during which Hermann and Pauline were married</rdfs:label> <time:intContains rdf:nodeID="albert-birth"/> </bio:Event>

The first thing that strikes me now is how primitive and unlinked the RDF is, something that the past 3 years of the Linked Data project has really hammered home. Those blank nodes are useful in the example, but quite poor for wide scale integration. Today, I would be using dbpedia URIs for the people and places mentioned in the data, especially since it is based on Wikipedia information in the first place.

What changes does the revised BIO Vocabulary make to this example? Actually there are quite a few. There are some new properties that relate events to the people playing particular roles in those events. These are:

  • bio:agent — a person, organization or group that plays a role in an event.
  • bio:employer — an agent that is involved in an event as an employer.
  • bio:officiator — a person that officiates at a ceremonial event.
  • bio:organization — an organization that plays a role in an event.
  • bio:parent — a person that takes the parent role in an event.
  • bio:partner — a person that is involved in a event as a partner in a relationship.
  • bio:principal — a person that takes the primary and most important role in an event.
  • bio:spectator — a person that is present at and observes the occurrence of at least part of an event.
  • bio:witness — a person that witnesses and can bear testimony to the occurrence of an event.

These are not designed to be comprehensive, but to cover the most common cases and provide building blocks for other types of role. The most important is bio:principal which relates an event to the most important person in the event. In the case of the birth of Albert Einstein, Albert is the bio:principal. We can rewrite the birth event like this:

<bio:Birth rdf:nodeID="albert-birth">
  <rdfs:label>The birth of Albert Einstein</rdfs:label>
  <bio:date>1879-03-14</bio:date>
  <bio:place rdf:resource="http://dbpedia.org/resource/Ulm" />
  <bio:principal rdf:resource="http://dbpedia.org/resource/Albert_Einstein" />
  <bio:parent rdf:resource="http://dbpedia.org/resource/Pauline_Koch" />
  <bio:parent rdf:resource="http://dbpedia.org/resource/Hermann_Einstein" />
</bio:Birth>

Some events, such as marriages, have no principal (this is my egalatarian viewpoint, some cultures will view marriage differently). We can rewrite Hermann and Pauline's marriage like this:

<bio:Marriage rdf:nodeID="hermann-and-pauline-marriage">
  <rdfs:label>The marriage of Hermman Einstein and Pauline Koch</rdfs:label>
  <bio:place rdf:resource="http://dbpedia.org/resource/Cannstatt" />
  <bio:partner rdf:resource="http://dbpedia.org/resource/Pauline_Koch" />
  <bio:partner rdf:resource="http://dbpedia.org/resource/Hermann_Einstein" />
</bio:Marriage>

I can now also explicitly say that Pauline and Hermann were the mother and father of Albert using the new bio:mother and bio:father properties:

<foaf:Person rdf:about="http://dbpedia.org/resource/Albert_Einstein">
  <foaf:name>Albert Einstein</foaf:name>
  <bio:mother rdf:resource="http://dbpedia.org/resource/Pauline_Koch" />
  <bio:father rdf:resource="http://dbpedia.org/resource/Hermann_Einstein" />
</foaf:Person>

One of the areas that was confusing in my original example was the use of the Event class for representing time periods. Although an event such as a marriage may extend for a period of time, it isn't strictly correct to say it is a time period. Rather it has a time period. The BIO Vocabulary now has a bio:Interval class to model intervals of time and a pair of properties bio:initiatingEvent and bio:concludingEvent to mark the start and end of time intervals. That means we can model the period of time during which Pauline and Hermann were married as a bio:Interval like this:

<bio:Interval rdf:nodeID="hermann-and-pauline-being-married">
  <rdfs:label>The time during which Hermann and Pauline were married</rdfs:label>
  <bio:initiatingEvent rdf:nodeID="hermann-and-pauline-marriage" />
  <bio:concludingEvent>
    <bio:Death>
      <rdfs:label>The death of Hermann Einstein</rdfs:label>
      <bio:principal rdf:resource="http://dbpedia.org/resource/Hermann_Einstein" />
    </bio:Death>
  </bio:concludingEvent>  
</bio:Interval>

Because bio:Events are not time intervals it's not appropriate to use the OWL-Time properties with them. Instead, the intervals associated with each event need to be explicitly drawn out. The bio:eventInterval property relates an event to the equivalent interval over which the event takes place. We can use this property to move the OWL-Time properties onto the relevant intervals like this:

<bio:Birth rdf:nodeID="albert-birth">
  <bio:eventInterval>
    <bio:Interval rdf:nodeID="albert-birth-interval">
      <time:intDuring rdf:nodeID="hermann-and-pauline-being-married"/>
    </bio:Interval>
  </bio:eventInterval>
</bio:Birth>

Finally, putting it all together, this is what the original example from part 1 of this series now looks like:

<foaf:Person rdf:about="http://dbpedia.org/resource/Albert_Einstein">
  <foaf:name>Albert Einstein</foaf:name>
  <bio:mother rdf:resource="http://dbpedia.org/resource/Pauline_Koch" />
  <bio:father rdf:resource="http://dbpedia.org/resource/Hermann_Einstein" />

<bio:event> <bio:Birth rdf:nodeID="albert-birth"> <rdfs:label>The birth of Albert Einstein</rdfs:label> <bio:date>1879-03-14</bio:date> <bio:place rdf:resource="http://dbpedia.org/resource/Ulm" /> <bio:principal rdf:resource="http://dbpedia.org/resource/Albert_Einstein" /> <bio:parent rdf:resource="http://dbpedia.org/resource/Pauline_Koch" /> <bio:parent rdf:resource="http://dbpedia.org/resource/Hermann_Einstein" /> <bio:eventInterval rdf:nodeID="albert-birth-interval"/> </bio:Birth> </bio:event> </foaf:Person>

<foaf:Person rdf:about="http://dbpedia.org/resource/Hermann_Einstein"> <foaf:name>Hermann Einstein</foaf:name> <bio:event> <bio:Marriage rdf:nodeID="hermann-and-pauline-marriage"> <rdfs:label>The marriage of Hermman Einstein and Pauline Koch</rdfs:label> <bio:place rdf:resource="http://dbpedia.org/resource/Cannstatt" /> <bio:partner rdf:resource="http://dbpedia.org/resource/Pauline_Koch" /> <bio:partner rdf:resource="http://dbpedia.org/resource/Hermann_Einstein" /> </bio:Marriage> </bio:event> </foaf:Person>

<foaf:Person rdf:about="http://dbpedia.org/resource/Pauline_Koch"> <foaf:name>Pauline Einstein</foaf:name> <bio:event rdf:nodeID="hermann-and-pauline-marriage"/> </foaf:Person>

<bio:Interval rdf:nodeID="albert-birth-interval"> <rdfs:label>The time during which Albert was born</rdfs:label> <time:intDuring rdf:nodeID="hermann-and-pauline-being-married"/> </bio:Interval>

<bio:Interval rdf:nodeID="hermann-and-pauline-being-married"> <rdfs:label>The time during which Hermann and Pauline were married</rdfs:label> <bio:initiatingEvent rdf:nodeID="hermann-and-pauline-marriage" /> <bio:concludingEvent> <bio:Death> <rdfs:label>The death of Hermann Einstein</rdfs:label> <bio:principal rdf:resource="http://dbpedia.org/resource/Hermann_Einstein" /> </bio:Death> </bio:concludingEvent> </bio:Interval>

<bio:Interval rdf:nodeID="albert-attending-elementary-school"> <rdfs:label>The period of time during which Albert attended elementary school</rdfs:label> <time:intAfter rdf:nodeID="albert-birth-interval"/> </bio:Interval>

<bio:Interval rdf:nodeID="albert-taking-violin-lessons"> <rdfs:label>The period of time during which Albert took violin lessons</rdfs:label> <time:intAfter rdf:nodeID="albert-birth-interval"/> </bio:Interval>

I hope you'll agree that this has many advantages over the original, with better connections between people, their life events and others who take part in those events. If you want more like this the BIO Vocabulary contains an extensive worked example of Henry VIII and his six wives.

Other posts in the “Refactoring Bio” series:

  1. Part 1: First Steps
  2. Part 2: Conditions
  3. Part 3: Temporal Invariants
  4. Part 4: Employment and Families

Permalink: http://blog.iandavis.com/2010/06/refactoring-bio-with-einstein-part-5-updated-vocabulary/


Earlier Posts