Aug 10 2009

Representing Time in RDF Part 5

Published by Ian Davis at 11:30 am under Projects and tagged as , , , , , ,

Approach 4: N-ary Relations

In this approach a new class is created for each time-dependent predicate. This new class represents the context of the property and allows more specific predicates to be used that provide extra meaning.

Scenario 1

In the first scenario we use a new ex:NameInContext class. This provides two predicates ex:individual and ex:name to link an individual to a name in a particular context.

@prefix bio: <http://purl.org/vocab/bio/0.1/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix time: <http://www.w3.org/2006/time#> .
@prefix ex: <http://example.org/ex#> .
@prefix thing: <http://example.org/thing#> .

thing:maria a foaf:Person .

thing:mariaUnmarried
  a ex:NameInContext ;
  ex:individual thing:maria ;
  ex:name "Maria Smith" ;
  time:start "1867" ;
  time:end "1888" .

thing:mariaMarried
  a ex:NameInContext ;
  ex:individual thing:maria ;
  ex:name "Maria Johnson" ;
  time:start "1888" ;
  time:end "9999" .

Original file: a4s1.ttl

The query is very similar to that in Approach 3:

prefix bio: <http://purl.org/vocab/bio/0.1/>
prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix time: <http://www.w3.org/2006/time#>
prefix xsd:  <http://www.w3.org/2001/XMLSchema#>
prefix ex: <http://example.org/ex#>
prefix thing: <http://example.org/thing#> 

select ?name where {
  ?p a ex:NameInContext .
  ?p ex:individual thing:maria .
  ?p time:start ?start .
  ?p time:end ?end .
  ?p ex:name ?name .
  filter (xsd:integer(?start) <= 1891 && xsd:integer(?end) >= 1891) .
}

Original file: a4s1.sq

Scenario 2

For this scenario I use a class to represent the part-of relationship with two new predicates: ex:part and ex:whole. Once again, for simplicity I assume the place name information is timeless.

@prefix bio: <http://purl.org/vocab/bio/0.1/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix time: <http://www.w3.org/2006/time#> .
@prefix ex: <http://example.org/ex#> .
@prefix thing: <http://example.org/thing#> .

thing:oxfordshire
  a ex:County ;
  foaf:name "Oxfordshire" .

thing:gloucestershire
  a ex:County ;
  foaf:name "Gloucestershire" .

thing:widford
  a ex:Parish ;
  foaf:name "Widford" .

thing:widfordInGloucestershire
  a ex:PartOfContext ;
  ex:part thing:widford ;
  ex:whole thing:gloucestershire ;
  time:start "1837" ;
  time:end "1844" .

thing:widfordInOxfordshire
  a ex:PartOfContext ;
  ex:part thing:widford ;
  ex:whole thing:oxfordshire ;
  time:start "1844" ;
  time:end "9999" .

Original file: a4s2.ttl

The query here looks like:

prefix bio: <http://purl.org/vocab/bio/0.1/>
prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix time: <http://www.w3.org/2006/time#>
prefix xsd:  <http://www.w3.org/2001/XMLSchema#>
prefix ex: <http://example.org/ex#>
prefix thing: <http://example.org/thing#> 

select ?name where {
  ?p a ex:PartOfContext .
  ?p ex:part thing:widford .
  ?p ex:whole ?x .
  ?p time:start ?start .
  ?p time:end ?end .
  ?x foaf:name ?name
  filter (xsd:integer(?start) <= 1841 && xsd:integer(?end) >= 1841) .
}

Original file: a4s2.sq

Scenario 3

For the final scenario I use ex:ResidenceContext to represent the context of someone being resident somewhere. The person and the place are referred to using new predicates ex:individual and ex:place:

@prefix bio: <http://purl.org/vocab/bio/0.1/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix time: <http://www.w3.org/2006/time#> .
@prefix ex: <http://example.org/ex#> .
@prefix thing: <http://example.org/thing#> .

thing:lymeRegis
  a ex:Town ;
  foaf:name "Lyme Regis" .

thing:charmouth
  a ex:Town ;
  foaf:name "Charmouth" .

thing:hastings
  a ex:Town ;
  foaf:name "Hastings" .

thing:anon a foaf:Person .

thing:anonInLymeRegis
  a ex:ResidenceContext ;
  ex:individual thing:anon ;
  ex:place thing:lymeRegis ;
  time:intervalBefore thing:anonInCharmouth ;
  time:intervalContains "1844" .

thing:anonInCharmouth
  a ex:ResidenceContext ;
  ex:individual thing:anon ;
  ex:place thing:charmouth ;
  time:intervalAfter thing:anonInLymeRegis ;
  time:intervalBefore thing:anonInHastings ;
  time:intervalContains "1871" .

thing:anonInHastings
  a ex:ResidenceContext ;
  ex:individual thing:anon ;
  ex:place thing:hastings ;
  time:intervalAfter thing:anonInCharmouth ;
  time:intervalContains "1881" .

Original file: a4s3.ttl

Once again the query is very similar to that in Approach 3:

prefix bio: <http://purl.org/vocab/bio/0.1/>
prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix time: <http://www.w3.org/2006/time#>
prefix xsd:  <http://www.w3.org/2001/XMLSchema#>
prefix ex: <http://example.org/ex#>
prefix thing: <http://example.org/thing#> 

select ?nameBefore ?nameAfter where {
  ?pBefore a ex:ResidenceContext .
  ?pBefore ex:individual thing:anon .
  ?pBefore ex:place ?placeBefore .
  ?placeBefore foaf:name ?nameBefore .

  ?pBefore time:intervalContains ?dateBefore .
  filter (xsd:integer(?dateBefore) <= 1874) .

  ?pAfter a ex:ResidenceContext .
  ?pAfter ex:individual thing:anon .
  ?pAfter ex:place ?placeAfter .
  ?placeAfter foaf:name ?nameAfter .

  ?pAfter time:intervalContains ?dateAfter .
  filter (xsd:integer(?dateAfter) > 1874) .

  ?pBefore time:intervalBefore ?pAfter .
}

Original file: a4s3.sq

Approach 4 Conclusions

In the examples shown here Approach 4 is identical to Approach 3 in complexity. In fact the key difference is the use of rdf:type rather than ex:property to distinguish the different types of relationships. In this respect it seems to offer no advantage over Approach 3 and adds the complexity of specific property names for each context relationship.

However, it does potentially offer a wider use beyond simply recording time-varying properties. A context could include other factors such as provenance or location. Also it could be easier to model multi-agent contexts such as a marriages with predicates to represent the bride and groom separately. For example:

thing:marriage
  a ex:MarriedContext ;
  ex:husband thing:person1 ;
  ex:wife thing:person2 ;
  time:start "1820" ;
  time:end "1862" .

This post is part 5 of a series about representing time in RDF. Other posts in this series: part 1, part 2, part 3, part 4, part 6

6 responses so far

6 Responses to “Representing Time in RDF Part 5”

  1. glenn mcdonaldon 10 Aug 2009 at 5:26 pm

    I left a comment on part 6 about the modeling issues that are the real topic here, but the SPARQL here in scenario 3 is so painful that I will offer an off-topic observation:

    SPARQL is not good enough. The human question here is, given a sparse set of date-observations, what is the pair before and after a given date known to be in the middle. Your SPARQL query takes 14 lines to express this (plus six prefixes!), and would be even uglier if it weren’t for the semantically dubious (and anti-open-world-assumption) intervalAfter/intervalBefore relationships you added to the data itself.

    I submit that a good query language would let you ask this question in some way much more like this:

    @anon.Residence#RecordYear::?,RecordYear1874,?

    And, for that matter, what you’d really like here in a more general sense is a list of the person’s residences, sorted by temporal proximity to the year in question, subsorted by year. That’s a better question, because it can be answered whether the year is before, during or after whatever data you have. Is that doable in SPARQL? Via CONSTRUCT, somehow? But it should be even easier to ask than the previous question, something more like this:

    @anon.Residence#(.RecordYear._–1874),Year

    I have a blog post about this query-language, if you’re at all intrigued:

    http://www.furia.com/page.cgi?type=log&id=324

  2. Ian Davison 10 Aug 2009 at 7:57 pm

    Glenn, your query language looks very powerful. Can you see a natural way to fit it with RDF?

  3. glenn mcdonaldon 10 Aug 2009 at 10:04 pm

    The language (it’s called Thread) is basically already suited for RDF by its graph-model nature. There are three practical differences, at least, between the data-model over which Thread operates and that of “raw RDF”, but none of them are conceptually problematic:

    1. Thread nodes “have” literals, whereas in RDF nodes are connected to literals by predicates. This allows a Thread query to say, for example, “Year:=1984″. But this could already be written out more explicitly as “Year:(.Name:=1984)”, so this point is just about brevity in queries, and could be solved by mapping some common “name”-ish predicates to the syntactic shorthand. Minor.

    2. In Thread data all relationships are asserted and maintained in both directions. Thus you can ask “Artist:(.Album.Year:=1984)” to find out all the bands who put out albums in 1984, and you don’t have to worry about whether the relationship from Black Sabbath to Sabotage was explicitly asserted as “black_sabbath album sabotage .” or “sabotage artist black_sabbath .”. So you need to get those inverses all either asserted or inferred, either at data-modification-time or query-time, to make sure you’re not missing anything in query-evaluation. Not a conceptual problem, but if it’s going to be solved with inferrence, the RDF data would have to include the appropriate owl:inverse statements, at least, and I don’t usually see those in published RDF examples.

    3. Thread’s data is expressed directly in nodes, with each node’s internal data-structure including all its outbound arcs/predicates, whereas RDF implies nodes via decomposed triples. Thus most traversal operations are fast in Thread more or less by definition, and would tend to be slow or impractical if each step of query-evaluation had to rescan the entire universe of triples to re-infer the “nodes”. But this is just a matter of pre-indexing, and/or convention. The N3 notation in your examples is basically this same idea: bundle together all the statements about each node. And I suspect many RDF query-engines build indices by node-ID as preprocessing already.

    Logistically speaking, the production system for which Thread was designed is a data-aggregation/analysis/republishing platform being built at ITA Software in Cambridge, MA. It’s in private beta-testing now, and I’m hoping to have at least a small public database available for query-language demonstration ahead of public exposure of the whole system, hopefully within a small number of months…

  4. Ian Davison 11 Aug 2009 at 1:27 am

    Glenn, I suspect the hardest part of fitting your language to RDF will be the mapping between properties and classes and your short names for them. Your examples seem very simple even for a restricted domain language. When you refer to “Year” which do you mean? The year copyright was assigned, the year the album was first published, one of the years of re-issue, year of issue in USA, UK or Japan etc?

    Also how will you deal with ambiguity such as detecting which group called Nirvana was intended in an Artist query? Artist names don’t generally have a normalised form and if your database doesn’t contain the year of formation or home country then it’s going to give some strange results.

    The traversal parts etc seem pretty trivial to overlay on a triple store. I recently implemented the Fresnel Selector Language in PHP which has some similarities (see http://www.w3.org/2005/04/fresnel-info/fsl/ and http://code.google.com/p/moriarty/source/browse/trunk/graphpath.class.php )

  5. glenn mcdonaldon 11 Aug 2009 at 3:17 am

    Oh, the names of properties are a feature of the data, not the query language. If your domain models albums has having ReleaseDates, which in turn have a year and a country, then you could find all the artists who had albums that were released anywhere in 1984 like this:

    Artist:(.Album.ReleaseDate.Year:=1984)

    or, because all the relationships go both ways, you could find out the same thing in any of these ways:

    Album:(.ReleaseDate.Year:=1984).Artist
    ReleaseDate:(.Year:=1984).Album.Artist
    Year:=1984.ReleaseDate.Album.Artist

    But if you wanted to find only artists who had albums that were released in the US in 1984, you’d instead do one of these:

    Artist:(.Album.ReleaseDate:(.Country:=USA):(.Year:=1984))
    Album:(.ReleaseDate:(.Country:=USA):(.Year:=1984)).Artist
    ReleaseDate:(.Country:=USA):(.Year:=1984).Album.Artist
    Year:=1984.ReleaseDate:(.Country:=USA).Album.Artist
    Country:=USA.ReleaseDate:(.Year:=1984).Album.Artist

    As for finding the right Nirvana, that’s also a data issue, not a query-language issue. You can refer to things by ID in Thread, too, so “@nirvana_us” is a complete Thread query to find the node with that known ID. Or you could find them by whatever other criteria you’d use in any other system, like:

    Artist:=Nirvana:(.Member:@cobain_kurt)
    Artist:=Nirvana:(.Album:=Nevermind)

  6. Tom Passinon 13 Aug 2009 at 2:09 am

    Ian, have you looked at John Sowa’s approach to how to represent time, both points and intervals? Of course, he uses conceptual graphs, but the ideas ought to translate to RDF. One thing he does is to construct “situations”, which are probably similar to your context nodes, I imagine.

    Worth looking at if you haven’t already.