Refactoring Bio With Einstein Part 1: First Steps

I'm going to try to describe the life of Albert Einstein using the BIO vocabulary. I'm expecting this to be quite difficult but hopefully should understand better where the vocabulary is deficient. I'm keen to examine how ordering of events can be achieved using OWL-Time and I'd like to be able to enhance the BIO vocabulary to make expressing common biographical information easy.

I'm basing this micro-project on the Wikipedia biography of Einstein. That work is licensed under the GNU Free Documentation License and so I'm putting this article and associated RDF data under the same license.

Why Einstein? I chose Einstein because, being a former phycisist, I have an admiration for him and his theories. He is a popular icon, is the subject of dozens of biographies and, having lived in the modern era, there are photographs and movies that could also be relevant.

This exercise will use a combination of the BIO, FOAF and OWL-Time vocabularies. I'll use the namespace prefixes bio, foaf and time for these.

My approach is to follow the Wikipedia article and translate each distinct event into RDF.

According to the article introduction, Einstein was born on March 14, 1879. A few paragraphs down is the following:

Einstein was born at Ulm in Württemberg, Germany; about 100 km east of Stuttgart. His parents were Hermann Einstein, a featherbed salesman who later ran an electrochemical works, and Pauline, whose maiden name was Koch. They were married in Stuttgart-Bad Cannstatt. The family was Jewish (and non-observant); Albert attended a Catholic elementary school and, at the insistence of his mother, was given violin lessons.

Here's a skeleton document to start off:

<foaf:Person rdf:nodeID="albert">
  <foaf:name>Albert Einstein</foaf:name>

<bio:event>

&lt;bio:Birth rdf:nodeID=&quot;albert-birth&quot;&gt;
 &lt;rdfs:label&gt;The birth of Albert Einstein&lt;/rdfs:label&gt;
 &lt;bio:date&gt;1879-03-14&lt;/bio:date&gt;
 &lt;bio:place&gt;Ulm, Württemberg, Germany&lt;/bio:place&gt;
&lt;/bio:Birth&gt;

</bio:event>

</foaf:Person>

That's a bit dry. I'm not expressing any of the relationship information from the original article. Here's what it could look like if I used the Relationship vocabulary:

<foaf:Person rdf:nodeID="albert">
  <foaf:name>Albert Einstein</foaf:name>

<rel:childOf> <foaf:Person rdf:nodeID="hermann"> <foaf:name>Hermann Einstein</foaf:name> <rel:fatherOf rdf:nodeID="albert" /> <rel:spouseOf rdf:nodeID="pauline" />

&lt;/foaf:Person&gt;

</rel:childOf>

<rel:childOf> <foaf:Person rdf:nodeID="pauline"> <foaf:name>Pauline Einstein</foaf:name>

  &lt;rel:motherOf rdf:nodeID=&quot;albert&quot; /&gt;
  &lt;rel:spouseOf rdf:nodeID=&quot;hermann&quot; /&gt;
&lt;/foaf:Person&gt;

</rel:childOf>

<bio:event> <bio:Birth rdf:nodeID="albert-birth">

 &lt;bio:date&gt;1879-03-14&lt;/bio:date&gt;
 &lt;bio:place&gt;Ulm, Württemberg, Germany&lt;/bio:place&gt;
&lt;/bio:Birth&gt;

</bio:event>

</foaf:Person>

However, the problem is that the relationship vocabulary assumes a fixed point in time, whereas the bio vocabulary attempts to express different states across a period of time. Some relationships are immutable throughout time, e.g. childOf, whereas others apply for definite periods, e.g. spouseOf. One interpretation is to assume that the relationships hold for at least some period of time but it is not safe to use them for analysis of time-sensitive data. In other words you can ask "were these two people ever married" but you cannot ask "were the parents of this person married when he was born?"

It would be possible to use a modified relationship schema where the domain of the properties is some kind of "Person At Point In Time" but that feels unnatural. A better way, in my opinion, is to explicitly represent the marriage as a time interval. I can't use the bio:Marriage class because that represents the actual marriage ceremony, instead I need to use an general Event instance:

<bio:Event rdf:nodeID="hermann-and-pauline-being-married">
  <rdfs:label>The event of Hermann and Pauline being married</rdfs:label>
</bio:Event >

In the BIO vocabulary Event is defined as "A general event, i.e. something that the person participated in." - it can be any episode with a duration. I can also relate this event to Hermann and Pauline's marriage:

<foaf:Person rdf:nodeID="hermann">
    <foaf:name>Hermann Einstein</foaf:name>
    <bio:event>
      <bio:Marriage rdf:nodeID="hermann-and-pauline-marriage">
    &lt;rdfs:label&gt;The marriage of Hermman Einstein and Pauline Koch&lt;/rdfs:label&gt;
    &lt;bio:place&gt;Stuttgart-Bad Cannstatt&lt;/bio:place&gt;
    &lt;time:intMeets rdf:nodeID=&quot;hermann-and-pauline-being-married&quot; /&gt;
  &lt;/bio:Marriage&gt;
&lt;/bio:event&gt;

&lt;bio:event rdf:nodeID=&quot;hermann-and-pauline-being-married&quot; /&gt;

</foaf:Person>

<foaf:Person rdf:nodeID="pauline"> <foaf:name>Pauline Einstein</foaf:name> <bio:event rdf:nodeID="hermann-and-pauline-marriage" />

&lt;bio:event rdf:nodeID=&quot;hermann-and-pauline-being-married&quot; /&gt;

</foaf:Person>

The intMeets property says that the married interval starts directly after the marriage event. I've also associated the event of being married with each person so they are explicitly participating in the event.

I can enrich the structure of the biography by asserting that Albert was born during his parent's marriage. It seems to me that it should be possible to derive this fact, but until I understand more about this I need to state it explicitly can do this on the married interval:

<bio:Event rdf:nodeID="hermann-and-pauline-being-married">
  <rdfs:label>The time during which Hermann and Pauline were married</rdfs:label>
  <time:intContains rdf:nodeID="albert-birth" />

</bio:Event>

or, equivilently, in the Birth event itself:

<bio:event>
  <bio:Birth rdf:nodeID="albert-birth">
   <rdfs:label>The birth of Albert Einstein</rdfs:label>
   <bio:date>1879-03-14</bio:date>

<bio:place>Ulm, Württemberg, Germany</bio:place> <time:intDuring rdf:nodeID="hermann-and-pauline-being-married" /> </bio:Birth> </bio:event>

The Wikipedia article mentioned that Albert attended a Catholic elementary school and took violin lessons. I'm going to express those as events:

<bio:event>

<bio:Event rdf:nodeID="albert-attending-elementary-school"> <rdfs:label>The event of Albert attending elementary school</rdfs:label> <time:intAfter rdf:nodeID="albert-birth" /> </bio:Event> </bio:event>

<bio:event>

<bio:Event rdf:nodeID="albert-taking-violin-lessons"> <rdfs:label>The event of Albert taking violin lessons</rdfs:label> <time:intAfter rdf:nodeID="albert-birth" /> </bio:Event > </bio:event>

So, what have I been able to represent from that single paragraph of biography? I've represented Albert Einstein's date and place of birth; his parent's marriage and the fact that Albert was born during the marriage; Albert attending elementary school and taking violin lessons. Each of the events is related to another event to assist with automatic ordering.

What haven't I represented? His father's occupation; his mother's maiden name; the family's faith. I haven't explictly stated that Albert is the son of Hermann and Pauline and their participation in the marriage event isn't strong enough to state that they were actually the couple getting married (other people can participate such as a minister or witnesses). Also the events have no colour - I have dry labels describing the mechanics of the event, but nothing with personality.

I need to be able to annotate events and provide commentary. I need also to be able to resolve the roles of participants in events. I'll be thinking about those issues for part two.

Here's an RDF file that collates what I've done so far.

See also: posts in the "Refactoring Bio" series: Part 1: First Steps, Part 2: Conditions, Part 3: Temporal Invariants, Part 4: Employment and Families

Permalink: http://blog.iandavis.com/2005/04/refactoring-bio-with-einstein-part-1-first-steps/


Earlier Posts