Mar 16 2006

Refactoring Bio With Einstein Part 4: Employment and Families

Published by Ian Davis under Random Stuff and tagged as , , , , , , ,

I’m sure regular readers of this series read part three and thought “nice, but do I have to wait five months for the next part again?”. Well I thought I’d buck the trend and just get on with part four :)

I thought I’d go back to the original paragraph about Einstein from Wikipedia:

Einstein was born at Ulm in Württemberg, Germany; about 100 km east of Stuttgart. His parents were Hermann Einstein, a featherbed salesman who later ran an electrochemical works, and Pauline, whose maiden name was Koch. They were married in Stuttgart-Bad Cannstatt. The family was Jewish (and non-observant); Albert attended a Catholic elementary school and, at the insistence of his mother, was given violin lessons.

I just rechecked Wikipedia this paragraph has hardly changed which is encouraging. Presumably wiki-consensus has been reached on that aspect of Einstein’s life. Here’s what I have so far, expressed in Turtle rather than RDF/XML.


# People

_:albert
  a foaf:Person ;
  foaf:name "Albert Einstein" ;
  bio:father _:hermann ;
  bio:mother _:pauline ;
  bio:event _:albert-birth ;
  bio:event _:albert-taking-violin-lessons ;
  bio:event _:albert-attending-elementary-school .

_:hermann
  a foaf:Person ;
  foaf:name "Hermann Einstein" ;
  bio:event _:hermann-and-pauline-marriage ;
  bio:event _:hermann-and-pauline-being-married .

_:pauline
  a foaf:Person ;
  foaf:name "Pauline Einstein" ;
  bio:event _:hermann-and-pauline-marriage ;
  bio:event _:hermann-and-pauline-being-married ;
  bio:condition _:pauline-married-name ;
  bio:condition _:pauline-maiden-name .

# Events

_:albert-birth
  a bio:Birth ;
  rdfs:label "The birth of Albert Einstein" ;
  bio:date "1879-03-14" ;
  bio:place "Ulm, Württemberg, Germany" ;
  time:intDuring _:hermann-and-pauline-being-married .

_:albert-attending-elementary-school
  a bio:Event ;
  rdfs:label "The event of Albert attending elementary school" ;
  time:intAfter _:albert-birth .

_:albert-taking-violin-lessons
  a bio:Event ;
  rdfs:label "The event of Albert taking violin lessons" ;
  time:intAfter _:albert-birth .

_:hermann-and-pauline-marriage
  a bio:Marriage ;
  rdfs:label "The marriage of Hermman Einstein and Pauline Koch";
  bio:place "Stuttgart-Bad Cannstatt" ;
  time:intMeets _:hermann-and-pauline-being-married .

_:hermann-and-pauline-being-married
  a bio:Event ;
  rdfs:label "The time during which Hermann and Pauline were married" ;
  time:intContains _:albert-birth .

# Conditions

_:pauline-maiden-name
  a bio:Condition ;
  time:intMetBy _:pauline-birth ;
  time:intMeets _:pauline-married-name ;
  foaf:name "Pauline Koch" .

_:pauline-married-name
  a bio:Condition ;
  time:intMetBy _:hermann-and-pauline-being-married ;
  foaf:name "Pauline Einstein" .

There are some anomalies such as foaf:name still being applied to the foaf:Person and some of those events perhaps should be replaced by or enhanced with conditions. But I want to examine the rest of the paragraph for what I haven’t expressed yet. I can see:

  1. Ulm is about 100km east of Stuttgart
  2. Hermann was a featherbed salesman
  3. Hermann later ran an electrochemical works
  4. The family was Jewish
  5. Albert’s elementary school was Catholic
  6. Albert’s violin lessons were at the insistence of his mother

I’m going to dismiss 1, 5 and 6 as out of scope for Bio and move straight onto 2. Bearing in mind my earlier characterisation of an Event as something that brings about a change in condition of an individual I can model it is as a Condition and two Events. The Condition is that Hermann is employed:

_:hermann-as-featherbed-salesman
  a bio:Condition ;
  rdfs:label "Working as a featherbed salesman" .

Then there is the event of Hermann starting this period of employment:

_:hermann-starts-working-as-featherbed-salesman
  a bio:Event ;
  rdfs:label "Starts work as featherbed salesman" .

and the event of him finishing this employment:

_:hermann-stops-working-as-featherbed-salesman
  a bio:Event ;
  rdfs:label "Starts work as featherbed salesman" .

I can relate the three things together like this:

_:hermann-as-featherbed-salesman
  time:intMetBy _:hermann-starts-working-as-featherbed-salesman ;
  time:intMeets _:hermann-stops-working-as-featherbed-salesman .

_:hermann-starts-working-as-featherbed-salesman
  time:intMeets _:hermann-as-featherbed-salesman ;
  time:intBefore _:hermann-stops-working-as-featherbed-salesman .

_:hermann-stops-working-as-featherbed-salesman
  time:intMetBy _:hermann-as-featherbed-salesman ;
  time:intAfter _:hermann-starts-working-as-featherbed-salesman .

I can use the same model for item 3 on my list

_:herman-as-electro-works-manager
  a bio:Condition ;
  rdfs:label "Running an electrochemical works" ;

_:hermann-starts-running-electro-works
  a bio:Event ;
  rdfs:label "Starts running an electrochemical works" ;
  time:intMeets _:herman-as-electro-works-runner ;
  time:intBefore _:hermann-stops-running-electro-works ;

_:hermann-stops-running-electro-works
  a bio:Event ;
  rdfs:label "Stops running an electrochemical works" ;
  time:intMetBy _:herman-as-electro-works-manager;
  time:intAfter _:hermann-starts-running-electro-works ;

I know that Hermann was running an electrochemical works after he was a featherbed salesman, but the description doesn’t give me any more than that:

_:herman-as-electro-works-manager
  time:intAfter _:herman-as-featherbed-salesman .

Since employment is a very common condition for a person to be in, it’s probably appropriate to define a class for it:

_:herman-as-featherbed-salesman
  a bio:Employment .

_:herman-as-electro-works-manager
  a bio:Employment .

I’d also like to be more explicit about the actual job Herman was doing during those periods of employment. I’ve used rdfs:label to describe the events and conditions but I’d like to structure this information in some way. My first guess is simply to introduce a bio:jobTitle like this:


_:herman-as-featherbed-salesman
  bio:jobTitle "Featherbed Salesman" .

_:herman-as-electro-works-manager
  bio:jobTitle "Electrochemical Works Manager" .

Now, these events and conditions are related temporally by the OWL-Time properties I’m using but I think it would be useful to relate them causally too. Doing this modelling is helping me understand the relationshops between the concepts I’m using. If a Condition is the state of being for an individual at a particular period of time and an Event is something that brings about a change in condition of an individual then there’s a causal relationship between the two concepts. An event brings about a new condition and a condition is a consequence of an event. Here’s how I could express that latter idea:

_:hermann-starts-working-as-featherbed-salesman
  bio:consequence _:herman-as-featherbed-salesman .

_:hermann-starts-running-electro-works
  bio:consequence _:herman-as-electro-works-manager .

Clearly events can have multiple consequences:

_:hermann-and-pauline-marriage
  a bio:Marriage ;
  bio:consequence _:hermann-married-to-pauline ;
  bio:consequence _:pauline-married-to-hermann .

_:hermann-married-to-pauline
  a bio:Condition ;
  rdfs:label "Being married to Pauline" .

_:pauline-married-to-hermann
  a bio:Condition ;
  rdfs:label "Being married to Hermann" .

But can conditions be the consequent of multiple events? I’m not sure. A literal reading of my definition of event suggests than every event brings about a change in condition of an individual without regard to any other events. I think I’m going to wait until I have more experience before deciding on this one.

So that’s items 2 and 3 described, so onto 4 which is the family’s faith. I could apply a new bio:faith property to a condition for each person but the paragraph clearly says that the family’s faith was Jewish (non-observant). It makes no mention of the individual family members’ beliefs. Clearly people can be born into families of one faith without necessarily subscribing to it themselves. So perhaps I have to model a family unit. I could invent a bio:Family class and make the Albert and his parents members of it. I don’t want to assume any particular social structure but a family seems to be a universal unit of social organisation. The extent of it and the degree of involvement does vary consideraby between cultures but the definition of bio:Family can be kept as unconstrained as possible.

It just so happens that another schema I’ve been involved with has a class that could be useful here. The Relationship schema, with which I had a minor and mostly editorial role, defines a Relationship class which is described as A particular type of connection existing between people related to or having dealings with each other. I added that class hoping to use it for things like family units, marriages and partnerships. It was kept very open to enable use across all kinds of social situations. Here’s my definition of a bio:Family class with a definition drawn from the Wikipedia page on Family:

bio:Family
  a owl:Class ;
  rdfs:subClassOf rel:Relationship ;
  rdfs:label "Family"@en ;
  rdfs:comment "a domestic group of people" .

The Relationship vocabulary provides the rel:participant property to declare involvement in a Relationship class. It can be used like this:

_:einstein-family
  a bio:Family ;
  rel:participant _:albert ;
  rel:participant _:hermann ;
  rel:participant _:pauline .

Now here’s an interesting thing: there was a time when this Einstein family consisted only of Hermann and Pauline. Then Albert was born and he became a member too. So families have a state of being too which changes over time and those changes are triggered by events such as the birth of Albert. So perhaps I should be modelling the family in terms of Conditions too. Here’s how it could work:

_:einstein-family
  a bio:Family ;
  bio:condition _:family-with-hermann-and-pauline ;
  bio:condition _:family-with-hermann-and-pauline-and-albert .

_:family-with-hermann-and-pauline
  a bio:Condition ;
  rel:participant _:hermann ;
  rel:participant _:pauline ;
  time:intMetBy _:hermann-and-pauline-marriage ;
  time:intMeets _:family-with-hermann-and-pauline-and-albert .

_:family-with-hermann-and-pauline-and-albert
  a bio:Condition ;
  rel:participant _:albert ;
  rel:participant _:hermann ;
  rel:participant _:pauline ;
  time:intMetBy _:albert-birth ;
  time:intMetBy _:family-with-hermann-and-pauline .

_:hermann-and-pauline-marriage
  time:intMeets _:family-with-hermann-and-pauline ;
  bio:consequence _:family-with-hermann-and-pauline .

_:albert-birth
  time:intMeets _:family-with-hermann-and-pauline-and-albert ;
  bio:consequence _:family-with-hermann-and-pauline-and-albert .

Here I’m saying that there is a family that has two conditons (states of being): one when it consisted of Hermann and Pauline and another when it included Albert too. Of course there may be others but I’m not considering those yet. The first condition started as soon as the marriage of Hermann and Pauline was complete and in fact was a consequence of that marriage (I’m simplifying here because they were a type of family before marriage too). This condition existed until the birth of Albert whereupon a new condition arises. This new condition is a family consisting of Hermann, Pauline and Albert and is a consequence of Albert’s birth. I think that makes sense.

Now what about that faith property I wanted to use, how do I do that? Well it seems obvious now that it has to be part of a condition. All I want to say is that at some point, and certainly when Albert was born, the family was of the Jewish faith:

_:einstein-family
  bio:condition _:family-faith .

_:family-faith
  bio:faith "Jewish (non-observant)" ;
  time:intContains _:albert-birth .

It doesn’t give me a great deal of information about ordering of events. It would be good to anchor that to some event that defines the formation of the family:

_:einstein-family-forms
  a bio:Event ;
  time:intBefore _:family-faith ;
  time:intBefore _:family-with-hermann-and-pauline .

But I have no way of saying that the family didn’t exist before this event. I have the same problem with people. There could be some small value in having properties that represent the start and end events of a family or a person and adding constraints so that entities can have only a maximum of one of each type. For a start event like a birth, I’d like then to be able to say that there can be no events or conditions that start before that event – another area I believe is not supported by OWL yet.

But that’s the end of the list and I think I’ve managed to model the entire paragrah of biographical information. It remains to be seen how useful this information really is and what holes there are in the sequencing of event. That’s the end of this posting, but I already have plans for several more. Now I’m worried that I’ve jinxed them by mentioning them in public :) Remember also that all of the development I’m doing here is experimental. I haven’t made any changes to the Bio schema as yet and I may choose not to. I’m still thinking this through and trying to understand where the limits of this all-in-the-rdf-model approach lie. I’m actually pretty pleased with it so far and I’m gaining confidence that it’s going to end up expressive enough to answer the kinds of genealogical questions I have in mind.

P.S. While writing this it occurred to me that I forgot to mention another candidate for a temporal invariant in the previous post and it’s one that already exists in the Bio schema – bio:olb. This property was the whole reason why Dave and I created Bio in the first place. It’s a simple enough concept – a one line biography of the person. It’s a potted history, a summary of that person’s life achievements. Now, it may change over the lifetime of the person but it does so in a specialway. It’s a cumulative record of that person’s life. It doesn’t depend on a particular state of being of a person, but on all of them up to that point in time. It’s not strictly invariant but it’s not something that gets replaced over time – it simply grows.

See also: posts in the “Refactoring Bio” series: Part 1: First Steps, Part 2: Conditions, Part 3: Temporal Invariants, Part 4: Employment and Families

Comments Off

Mar 14 2006

Refactoring Bio With Einstein Part 3: Temporal Invariants

Published by Ian Davis under Random Stuff and tagged as , , , , , , ,

In the previous part of this series I explored how we can represent people as they change over time. At the end of the post I touched on attributes of a person that don’t change: temporal invariants. Two of these are the person’s biological mother and father. Obviously with modern scientific techniques the absolute requirement for two distinct parents is becoming weakened – but for now I’m going to ignore that.

Here’s how we could define these properties in OWL:

bio:mother
  a owl:ObjectProperty ;
  rdfs:label "mother"@en ;
  rdfs:label "The biological mother of the person"@en ;
  rdfs:domain foaf:Person ;
  rdfs:range foaf:Person .

bio:father
  a owl:ObjectProperty ;
  rdfs:label "father"@en ;
  rdfs:label "The biological father of the person"@en .
  rdfs:domain foaf:Person ;
  rdfs:range  foaf:Person .

I want to be able to say that a person has only one father property and one mother property. I can do this with an owl:Restriction like this:

[]
 a owl:Restriction ;
 owl:onProperty bio:mother ;
 owl:cardinality 1^^xsd:nonNegativeInteger .

[]
 a owl:Restriction ;
 owl:onProperty bio:father ;
 owl:cardinality 1^^xsd:nonNegativeInteger .

Those statements define two anonymous classes requiring members of each class to have exactly one value for the relevant property. Another way to do this would be to make bio:mother and bio:father functional properties. The difference is that the functional property assertion is a global constraint, it applies no matter what class the property is used with whereas the cardinality restriction applies on a per-class basis. It would probably be valid in the context of the bio schema to make the functional property assertion which would probably exclude using bio for genealogies of some mythical figures and deities. I’m not sure if that’s at all important at the moment so I’m going to stick with the weaker cardinality restictions.

But how can I best apply those restrictions to the people being deescribed? One way is to modify foaf:Person like this:

foaf:Person
  rdfs:subClassOf [
    a owl:Restriction ;
    owl:onProperty bio:mother ;
    owl:cardinality 1^^xsd:nonNegativeInteger .
  ]

Here I’m saying that foaf:Person is a subclass of a class of things with exactly one mother. I can also add in the father restriction:

foaf:Person
  rdfs:subClassOf [
    a owl:Restriction ;
    owl:onProperty bio:mother ;
    owl:cardinality 1 .
  ]
  rdfs:subClassOf [
    a owl:Restriction ;
    owl:onProperty bio:father ;
    owl:cardinality 1 .
  ]

Which is saying that foaf:Person is also a subclass of a class of things with exactly one father. So a paraphrase would be: foaf:Person is the class of individuals that, amongst other things, have exactly one bio:mother and exactly one bio:father. Note that this very different from using owl:equivalentClass, e.g.

foaf:Person
  owl:equivalentClass [
    a owl:Restriction ;
    owl:onProperty bio:mother ;
    owl:cardinality 1 .
  ]
  owl:equivalentClass [
    a owl:Restriction ;
    owl:onProperty bio:father ;
    owl:cardinality 1 .
  ]

Using owl:equivalentClass in this way is saying that foaf:Person has the same set of individuals as the class of things with exactly one father and the class of things with exactly one mother. Now that sounds very similar to my earlier paraphrase but it’s subtly different. The distinction lies in the concepts of necessary and sufficient conditions. The earlier subclass form provides necessary conditions for an individual to be a member of the foaf:Person class: it must have a bio:mother property and a bio:father property. Crucially those are necessary conditions but they’re not sufficient to determine that the individual is a foaf:Person because there may be other restrictions not expressed here that are also necessary. That is the purpose of the amongst other things phrase in the paraphrase above and is a direct consequence of the open world assumption. owl:equivalentClass, on the other hand, makes a much stronger assertion: it’s saying that to be a member of foaf:Person an individual needs a bio:mother property and that’s enough. In other words it defines necessary and sufficient conditions for membership. More formally the foaf:Person class has exactly the same members as the class of individuals with exactly one mother, no more and no less. If you want also to say that a foaf:Person must be human, then you’re out of luck.

But is it ok to modify foaf:Person like this? I guess I could ask Dan and Libby to modify the FOAF specification but that would mean asking FOAF to depend on bio’s deliberately pedantic view of the world. Also bio is much less mature than FOAF and has a different change cycle and tying something that is quite stable to something in flux can’t be a good idea. I think it’s perfectly acceptable to make those assertions about foaf:Person in the bio schema. The interpretation is that if you subscribe to the bio schema’s view of the world then you have to accept the restrictions on the foaf:Person class.

That statement is stronger than it first appears because subscribing to bio’s worldview happens as soon as you use any of its properties in your RDF. It doesn’t matter if you, or your authoring tool, can’t understand the schema because RDF licenses the consumer to dereference any URIs in the content to discover more information. When you derereference http://purl.org/bio/0.1/father you will eventually get the axioms that the bio schema deems necessary for consistency. In theory the document referenced by that URI could simple include the assertions that directly involve bio:father but in practice you’ll get the whole bio schema including assertions about bio:mother. Is this a problem? Probably not, but I think the schema author has to be responsible about the granularity that schemas are written to. For example, vocab.org hosts several RDF schemas and in theory I could combine them all into a single document and contrive that all property and class URIs return that document when dereferenced. That would mean you’d get axioms about all the schemas even if the content uses only a single property or class from one of them. That sounds like a bad idea which is why schema documents tend to reflect a single namespace URI.

Returning to our definitions of bio:mother and bio:father is it possible to say that no father is also a mother (in the strict biological sense, not the social sense). It’s not possible to use a restriction in OWL to say that the values of two different properties must be different but it is possible to define two classes to have different members using owl:disjointWith. So, I can define a class of mothers and a class of fathers and say that they never have the same members:

bio:Mother
  a owl:Class ;
  rdfs:subClassOf foaf:Person .

bio:Father
  a owl:Class ;
  rdfs:subClassOf foaf:Person ;
  owl:disjointWith bio:Mother .

Then I can relate these to the bio:mother and bio:father properties:

bio:mother
  rdfs:range bio:Mother .

bio:father
  rdfs:range bio:Father .

But now I have a dilemma: membership of the bio:Mother class isn’t temporally invariant. A person becomes a member of that class at a particular point in time, i.e. when they have a child. All along I’ve been modelling bio:mother as an invariant property but of course it is time bounded. A person doesn’t have a mother until they are born and in many cultures motherhood begins at conception, or somewhere between the two! Potentially, there’s an even worse problem waiting in the wings: the child doesn’t exist until conception or some short time after depending on your definition.

I think there are several ways to resolve this dilemma.

One way would be to model bio:mother in terms of bio:Condition in the same way I did for people’s names in the previous refactoring bio post.

_:child
  a foaf:Person ;
  bio:condition [
    a bio:Condition ;
    bio:mother _:mother .
  ] .

This says that at some point in the person’s life they had a mother. Perhaps I can bound it by the time of their birth, but I would want to leave the upper bound undefined or specify forever. But now, of course, the domain assertions apply to the bio:Condition which I defined as the state of being for an individual at a particular period of time. I can envisage motherhood as a state of being that some people have for part of their life and having a biological mother is a state of being that every person has for the duration of their existence. So what’s the range of bio:mother here? Is it a foaf:Person, bio:Mother or bio:Condition? It would be convenient to reference a person but I think it’s going to destroy any chance of asserting that mothers and fathers are different people.

Another way is to redefine what I mean by bio:Mother and bio:Father. These could be better named bio:MotherAtSomeTime and bio:FatherAtSomeTime indicating that members of these classes are mothers or fathers at some time in their lives but not necessarily all of it. This approach would still let me express my disjoint class axiom:

bio:MotherAtSomeTime
  a owl:Class ;
  rdfs:subClassOf foaf:Person .

bio:FatherAtSomeTime
  a owl:Class ;
  rdfs:subClassOf foaf:Person ;
  owl:disjointWith bio:MotherAtSomeTime.

and the range axioms for the properties:

bio:mother
  rdfs:range bio:MotherAtSomeTime .

bio:father
  rdfs:range bio:FatherAtSomeTime .

Is this a hack? I don’t think so, I think it’s just a different perspective on the same information. It can even work in conjunction with the condition approach

There is a third way and that is to step outside of the RDF model entirely. I could annotate each triple with the date range for which it is valid. Any query then would have to specify the date which would be used to select the correct triples. This is the approach taken by the researchers in this Temporal RDF paper. I’m sure it could even be supported with Sparql and named graphs, but I think the number of graphs would rapidly grow vast, probably converging in the limit with the number of triples. Also it requires a notion of absolute time which doesn’t exist, and if it did would preclude the kind of broad notions of temporal sequencing that are common in biographical and genealogical records.

As an aside, this raises the whole issue of contexts and in particular the differing notions of what constitutes context. Most people discussing context in RDF are really talking about provenance, they want to know where each triple came from and who asserted it. However, for bio, the context would be what time period the triple was true for. There are many other notions of context I’m sure, such as why a triple was asserted.

So, what is the best approach? My feeling is that solution two with the bio:MotherAtSomeTime and bio:FatherAtSomeTime classes is a pragmatic and reasonable approach that doesn’t preclude a more formal approach using bio:Condition so I’ll probably move forward with it.

There are some other restrictions I’d like to be able to express but that I don’t think are possible in OWL. For example, I’d like to say that no person is their own father or mother. I may have to fall back to the use of rules for these kinds of constraints, but for now I think I have the foundations set to actually work on the other parts of Einstein’s biography that I promised last time!

Update: I removed the datatyping from the cardinality values in response to a comment by Dan Connolly below.

See also: posts in the “Refactoring Bio” series: Part 1: First Steps, Part 2: Conditions, Part 3: Temporal Invariants, Part 4: Employment and Families

11 responses so far