Refactoring Bio With Einstein Part 3: Temporal Invariants
In the previous part of this series I explored how we can represent people as they change over time. At the end of the post I touched on attributes of a person that don't change: temporal invariants. Two of these are the person's biological mother and father. Obviously with modern scientific techniques the absolute requirement for two distinct parents is becoming weakened - but for now I'm going to ignore that.
Here's how we could define these properties in OWL:
bio:mother a owl:ObjectProperty ; rdfs:label "mother"@en ; rdfs:label "The biological mother of the person"@en ; rdfs:domain foaf:Person ; rdfs:range foaf:Person .bio:father a owl:ObjectProperty ; rdfs:label "father"@en ; rdfs:label "The biological father of the person"@en . rdfs:domain foaf:Person ; rdfs:range foaf:Person .
I want to be able to say that a person has only one father property and one mother property. I can do this with an owl:Restriction like this:
[] a owl:Restriction ; owl:onProperty bio:mother ; owl:cardinality 1^^xsd:nonNegativeInteger .[] a owl:Restriction ; owl:onProperty bio:father ; owl:cardinality 1^^xsd:nonNegativeInteger .
Those statements define two anonymous classes requiring members of each class to have exactly one value for the relevant property. Another way to do this would be to make bio:mother and bio:father functional properties. The difference is that the functional property assertion is a global constraint, it applies no matter what class the property is used with whereas the cardinality restriction applies on a per-class basis. It would probably be valid in the context of the bio schema to make the functional property assertion which would probably exclude using bio for genealogies of some mythical figures and deities. I'm not sure if that's at all important at the moment so I'm going to stick with the weaker cardinality restictions.
But how can I best apply those restrictions to the people being deescribed? One way is to modify foaf:Person like this:
foaf:Person rdfs:subClassOf [ a owl:Restriction ; owl:onProperty bio:mother ; owl:cardinality 1^^xsd:nonNegativeInteger . ]
Here I'm saying that foaf:Person is a subclass of a class of things with exactly one mother. I can also add in the father restriction:
foaf:Person rdfs:subClassOf [ a owl:Restriction ; owl:onProperty bio:mother ; owl:cardinality 1 . ] rdfs:subClassOf [ a owl:Restriction ; owl:onProperty bio:father ; owl:cardinality 1 . ]
Which is saying that foaf:Person is also a subclass of a class of things with exactly one father. So a paraphrase would be: foaf:Person is the class of individuals that, amongst other things, have exactly one bio:mother and exactly one bio:father. Note that this very different from using owl:equivalentClass, e.g.
foaf:Person owl:equivalentClass [ a owl:Restriction ; owl:onProperty bio:mother ; owl:cardinality 1 . ] owl:equivalentClass [ a owl:Restriction ; owl:onProperty bio:father ; owl:cardinality 1 . ]
Using owl:equivalentClass in this way is saying that foaf:Person has the same set of individuals as the class of things with exactly one father and the class of things with exactly one mother. Now that sounds very similar to my earlier paraphrase but it's subtly different. The distinction lies in the concepts of necessary and sufficient conditions. The earlier subclass form provides necessary conditions for an individual to be a member of the foaf:Person class: it must have a bio:mother property and a bio:father property. Crucially those are necessary conditions but they're not sufficient to determine that the individual is a foaf:Person because there may be other restrictions not expressed here that are also necessary. That is the purpose of the amongst other things phrase in the paraphrase above and is a direct consequence of the open world assumption. owl:equivalentClass, on the other hand, makes a much stronger assertion: it's saying that to be a member of foaf:Person an individual needs a bio:mother property and that's enough. In other words it defines necessary and sufficient conditions for membership. More formally the foaf:Person class has exactly the same members as the class of individuals with exactly one mother, no more and no less. If you want also to say that a foaf:Person must be human, then you're out of luck.
But is it ok to modify foaf:Person like this? I guess I could ask Dan and Libby to modify the FOAF specification but that would mean asking FOAF to depend on bio's deliberately pedantic view of the world. Also bio is much less mature than FOAF and has a different change cycle and tying something that is quite stable to something in flux can't be a good idea. I think it's perfectly acceptable to make those assertions about foaf:Person in the bio schema. The interpretation is that if you subscribe to the bio schema's view of the world then you have to accept the restrictions on the foaf:Person class.
That statement is stronger than it first appears because subscribing to bio's worldview happens as soon as you use any of its properties in your RDF. It doesn't matter if you, or your authoring tool, can't understand the schema because RDF licenses the consumer to dereference any URIs in the content to discover more information. When you derereference http://purl.org/bio/0.1/father you will eventually get the axioms that the bio schema deems necessary for consistency. In theory the document referenced by that URI could simple include the assertions that directly involve bio:father but in practice you'll get the whole bio schema including assertions about bio:mother. Is this a problem? Probably not, but I think the schema author has to be responsible about the granularity that schemas are written to. For example, vocab.org hosts several RDF schemas and in theory I could combine them all into a single document and contrive that all property and class URIs return that document when dereferenced. That would mean you'd get axioms about all the schemas even if the content uses only a single property or class from one of them. That sounds like a bad idea which is why schema documents tend to reflect a single namespace URI.
Returning to our definitions of bio:mother and bio:father is it possible to say that no father is also a mother (in the strict biological sense, not the social sense). It's not possible to use a restriction in OWL to say that the values of two different properties must be different but it is possible to define two classes to have different members using owl:disjointWith. So, I can define a class of mothers and a class of fathers and say that they never have the same members:
bio:Mother a owl:Class ; rdfs:subClassOf foaf:Person .bio:Father a owl:Class ; rdfs:subClassOf foaf:Person ; owl:disjointWith bio:Mother .
Then I can relate these to the bio:mother and bio:father properties:
bio:mother rdfs:range bio:Mother .bio:father rdfs:range bio:Father .
But now I have a dilemma: membership of the bio:Mother class isn't temporally invariant. A person becomes a member of that class at a particular point in time, i.e. when they have a child. All along I've been modelling bio:mother as an invariant property but of course it is time bounded. A person doesn't have a mother until they are born and in many cultures motherhood begins at conception, or somewhere between the two! Potentially, there's an even worse problem waiting in the wings: the child doesn't exist until conception or some short time after depending on your definition.
I think there are several ways to resolve this dilemma.
One way would be to model bio:mother in terms of bio:Condition in the same way I did for people's names in the previous refactoring bio post.
_:child a foaf:Person ; bio:condition [ a bio:Condition ; bio:mother _:mother . ] .
This says that at some point in the person's life they had a mother. Perhaps I can bound it by the time of their birth, but I would want to leave the upper bound undefined or specify forever. But now, of course, the domain assertions apply to the bio:Condition which I defined as the state of being for an individual at a particular period of time. I can envisage motherhood as a state of being that some people have for part of their life and having a biological mother is a state of being that every person has for the duration of their existence. So what's the range of bio:mother here? Is it a foaf:Person, bio:Mother or bio:Condition? It would be convenient to reference a person but I think it's going to destroy any chance of asserting that mothers and fathers are different people.
Another way is to redefine what I mean by bio:Mother and bio:Father. These could be better named bio:MotherAtSomeTime and bio:FatherAtSomeTime indicating that members of these classes are mothers or fathers at some time in their lives but not necessarily all of it. This approach would still let me express my disjoint class axiom:
bio:MotherAtSomeTime a owl:Class ; rdfs:subClassOf foaf:Person .bio:FatherAtSomeTime a owl:Class ; rdfs:subClassOf foaf:Person ; owl:disjointWith bio:MotherAtSomeTime.
and the range axioms for the properties:
bio:mother rdfs:range bio:MotherAtSomeTime .bio:father rdfs:range bio:FatherAtSomeTime .
Is this a hack? I don't think so, I think it's just a different perspective on the same information. It can even work in conjunction with the condition approach
There is a third way and that is to step outside of the RDF model entirely. I could annotate each triple with the date range for which it is valid. Any query then would have to specify the date which would be used to select the correct triples. This is the approach taken by the researchers in this Temporal RDF paper. I'm sure it could even be supported with Sparql and named graphs, but I think the number of graphs would rapidly grow vast, probably converging in the limit with the number of triples. Also it requires a notion of absolute time which doesn't exist, and if it did would preclude the kind of broad notions of temporal sequencing that are common in biographical and genealogical records.
As an aside, this raises the whole issue of contexts and in particular the differing notions of what constitutes context. Most people discussing context in RDF are really talking about provenance, they want to know where each triple came from and who asserted it. However, for bio, the context would be what time period the triple was true for. There are many other notions of context I'm sure, such as why a triple was asserted.
So, what is the best approach? My feeling is that solution two with the bio:MotherAtSomeTime and bio:FatherAtSomeTime classes is a pragmatic and reasonable approach that doesn't preclude a more formal approach using bio:Condition so I'll probably move forward with it.
There are some other restrictions I'd like to be able to express but that I don't think are possible in OWL. For example, I'd like to say that no person is their own father or mother. I may have to fall back to the use of rules for these kinds of constraints, but for now I think I have the foundations set to actually work on the other parts of Einstein's biography that I promised last time!
Update: I removed the datatyping from the cardinality values in response to a comment by Dan Connolly below.
See also: posts in the "Refactoring Bio" series: Part 1: First Steps, Part 2: Conditions, Part 3: Temporal Invariants, Part 4: Employment and Families