Graph Database Representation
Modeling and Queries
Tracking RT statements about portions of reality with graph databases necessitates reified structures which allow us to track information provenance and recreate historical system states. Merely recording the statements does not allow us to understand the state of the system, however, so we should also represent the actual nodes and relationships described in the RT statements. In a graph database, these two sets of data (a model of reality and a model of our understanding of reality) are integrated by using the same nodes as primitives for modeling the statements as we use for modeling the particulars of reality. This gives us the opportunity to run different types of searches, while guaranteeing consistent results.
The following are “state searches”, the simplest representation, relying on relationship types and their date range attributes:
- Where do we think Person X is right now?
- Where did we think Person X was on July 4th, 2001?
The following are “provenance searches”, the more complex representation, using more complex traversal patterns to mine sets of related statements:
- Why do we think Person X knows Person Y?
- How did Person X end up on this list?
The following are “reasoned searches”, and must integrate both state and provenance with ontological reasoning or artificial intelligence:
- Why did we think Person X would be at Location Y within the last 24 hours?
- If there is a dead drop within X meters of Location Y, who might be involved?
As meta-data corrections are added to the system, our statement count and connectedness should increase, and our representation should change enough that incorrect information does not show up in a “current state search”. In a state search you might see only one relation between two particulars, but historically you might find that represents a chain for any number of relations that were invalidated and redirected. Meta changes must not only record themselves properly, but must also alter End Dates on invalidated relations, and insert new relations with the proper Start Date (in the case of a U1 error type). [Ideally, all searches are capable of being date specific, and any search without a chronological range returns the current state. Based on a review of documentation and examples, I am expecting to do this through Start and End TimeStamp attributes on relations, but I don’t have working examples of my own yet.]
Implementations
Neo4J is under consideration. It is a Node /Relation model with a "key value store" for both Nodes and Relations (sometimes called an "attributable graph"). It is not RDF, although it should be sharable that way if implemented with those principles in mind (using only ontologically relevant "key value" implementations).
The values of attributes can be indexed, and the use of multiple indexes provides access points for traversal algorithms to be applied. Traversal algorithms can be as simple as Manchester Syntax, or even much simpler than that using basic graph language, or as complicated as a software class module.
Template Representations
A slideshow of these images is available: tmpls_slideshow.pptx
Planned Neo4J Model Images
- A & D, asserted and recorded by A1 @ t1
- A & D, asserted by A1 @ t1, recorded by A2 @ t2
- A & D, Error Type A1
- PtoP & D, asserted and recorded by A1 @ t1
- PtoP & D, Error Type U1
- PtoU & D, asserted and recorded by A1 @ t1
- Toy Onto New Instance
- Toy Onto Changed Data