Refactoring Bio With Einstein Part 2: Conditions

Genealogy is mostly a detective game dealing with partial knowledge. Given a few facts and dates, the researcher makes informed estimates for related events. For example if you know that someone was born in 1905 you might start searching for their marriage around 1925-1935 because most people marry in their twenties. If you don't find it you might first look earlier, back to around 1920 and onward from 1935. The genealogist mentally assigns a probability of success to the search and focusses on the highest probability ranges first. Lots of factors affect this estimate. Fashions of the period might affect average marriage ages as can family traditions.

Working the other way is important too. Often the genealogist has a marriage certificate with an age. It's natural to subtract the age from the date of marriage to look for the birth of the indivudual. However, this is skewed you allow for people adjusting their ages upwards because they're under the legal age of marriage, or downwards to close up a scandalous age difference with their partner! It's not unknown for people just to forget or not know how old they are, especially if they had little or no contact with their birth parents.

It is often useful to know the whereabouts of an individual to narrow down a search for information. For example, if it was known that a person was in the army and stationed in a particular country for an extended period of time then that would restrict the range of a search for the person's marriage certificate.

With these points in mind, I set about trying to understand how I can better model the time-related information used in genealogy. One of the outstanding problems I mentioned at the end of the previous part of this series was that I hadn't represented Einstein's mother's maiden name. Currently I have this description of Pauline:

<foaf:Person rdf:nodeID="pauline">
    <foaf:name>Pauline Einstein</foaf:name>

    <bio:event rdf:nodeID="hermann-and-pauline-marriage" />

    <bio:event rdf:nodeID="hermann-and-pauline-being-married" />
  </foaf:Person>

I could simply add her maiden name:

<foaf:Person rdf:nodeID="pauline">
    <foaf:name>Pauline Koch</foaf:name>

    <foaf:name>Pauline Einstein</foaf:name>
    <bio:event rdf:nodeID="hermann-and-pauline-marriage" />

    <bio:event rdf:nodeID="hermann-and-pauline-being-married" />
  </foaf:Person>

But that isn't very useful since there's no way to tell which name to use at which time. To get around this I could introduce a maidenName property to explicitly note her former name:

<foaf:Person rdf:nodeID="pauline">
    <ex:maidenName>Pauline Koch</ex:maidenName>
    <foaf:name>Pauline Einstein</foaf:name>
    <bio:event rdf:nodeID="hermann-and-pauline-marriage" />

    <bio:event rdf:nodeID="hermann-and-pauline-being-married" />
  </foaf:Person>

This works for this simple example but what about people who remarried more than once or people who changed their legal names? It seems that name is an example of a mutable property that takes different values depending on when you ask. How could I represent it? Here's one way, by modelling the condition of the person at a point in time:

<foaf:Person rdf:nodeID="pauline">
    <bio:condition>
      <bio:Condition>

        <foaf:name>Pauline Koch</foaf:name>
      </bio:Condition>
    </bio:condition>

    <bio:condition>
      <bio:Condition>

        <foaf:name>Pauline Einstein</foaf:name>
      </bio:Condition>
    </bio:condition>


    <bio:event rdf:nodeID="hermann-and-pauline-marriage" />

    <bio:event rdf:nodeID="hermann-and-pauline-being-married" />
  </foaf:Person>

Here I've introduced a new class Condition which I think of as the state of being for an individual at a particular period of time. I can ground those conditions in time by relating them to other events:

<foaf:Person rdf:nodeID="pauline">
    <bio:condition>

      <bio:Condition rdf:nodeID="pauline-maiden-name">
        <time:intMetBy rdf:nodeID="pauline-birth" />
        <time:intMeets rdf:nodeID="pauline-married-name" />
        <foaf:name>Pauline Koch</foaf:name>
      </bio:Condition>

    </bio:condition>

    <bio:condition>
      <bio:Condition rdf:nodeID="pauline-married-name">
        <time:intMetBy rdf:nodeID="hermann-and-pauline-being-married" />
        <foaf:name>Pauline Einstein</foaf:name>

      </bio:Condition>
    </bio:condition>


    <bio:event rdf:nodeID="hermann-and-pauline-marriage" />

    <bio:event rdf:nodeID="hermann-and-pauline-being-married" />
  </foaf:Person>

The condition of pauline having her maiden name starts as soon as she is born (referred to here by a new blank node 'pauline-birth') right up until the condition of her being known as Pauline Einstein which starts at the same time as her being married. There's no end to this latter condition. As far as I know she kept her name until her death and that is how she is known today.

The concept of Condition may be quite useful. It looks to me as though most conditions will be bounded by events, e.g marriage and divorce events bound the condition of being married. This suggests that a better definition of event is something that brings about a change in condition of an individual

There is a domain issue around the usage of FOAF properties with Conditions. The domains of most of the relevant properties in FOAF are either foaf:Person or foaf:Agent. I'm not sure that I can reconcile the notion of a Condition being a Person or vice-versa. This might mean that I have to create parallel properties for some of the more interesting FOAF ones such as foaf:name and its ilk.

How far do we go with this? I think that most properties relating to a person are temporal in nature. Many of the properties used by FOAF to identify people are mutable over time: foaf:mbox, foaf:weblog, foaf:homepage. For FOAF that doesn't cause a problem. FOAF is intended to be a general description of a person, not necessarily at a point in time. However, that approach isn't appropriate for things like genealogy and biographical writing. In those activities the goal is to create a narrative of an individual's life. This is what I'm aiming at with BIO. I'd like to be able to generate a timeline of a person's life from an RDF description. It should also be possible to pick a point in time and produce a FOAF description of that person at that time.

Although most attributes of a person do change, there are a couple of relationship-oriented ones that are immutable throughout throughout the lifetime of a person: father and mother. Here's how they could be used:

<foaf:Person rdf:nodeID="albert">
  <foaf:name>Albert Einstein</foaf:name>
  <bio:father rdf:nodeID="hermann" />
  <bio:mother rdf:nodeID="pauline" />
  ...
</foaf:Person>

This looks to be a useful way to build up simplistic family trees.

So, in conclusion I now have a way to represent the relationship of Einstein to his parents and a way of representing the fact that his mother was known by different names at different points in time. Not quite what I thought I'd be focussing on last time but still progress. What's remaining? Looking at my list I have still to represent his father's occupation; the family's faith; the roles of the participants in events and annotation of events and conditions. More thought for next time...

See also: posts in the "Refactoring Bio" series: Part 1: First Steps, Part 2: Conditions, Part 3: Temporal Invariants, Part 4: Employment and Families

Permalink: http://blog.iandavis.com/2005/10/refactoring-bio-with-einstein-part-2-conditions/


Other posts tagged as bio, biography, einstein, foaf, genealogy, owl, projects, projects, rdf, semantic-web

Earlier Posts