Taking AIM at Data Fusion - Using the Ontological Approach - technical details.

Oct 15, 2024
6 min read

Previous blogs in this ‘Taking AIM at Data Fusion’ series provide an understanding of how Advanced Information Modelling (AIM) applies what is known as an ontological approach to address the challenge of data fusion faced by many businesses. This part dives into the technical detail of using such approaches.

Conventional Ontological Approaches

With the benefits being made clear, it is probably worth being open about the challenges for businesses to implement such ontological approaches by unpacking some differences in this approach. How we currently ‘do’ data and information management is based largely on data stores with bespoke structure and what’s known as a ‘3D’ representation of the data being described e.g. a common relational database holds data on things like entities, types , relationships and individuals.

AIM provides next generation ontologies for data fusion

The UK research community is currently pushing our understanding of the importance of an ontological approach instead of bespoke structures, to go beyond the specific aspects of data modelled by 3D ontologies. As such, the UK AIM community has been pioneering the application of ‘4D’ Ontological models, which not only model all the things contained in a 3D model but the temporal states of those things; for this reason, we describe such models as '4D' ontologies.

For example, if we think in ‘4D’, an entity can have a particular relationship with another entity (e.g. an employee within an organisation) and each of the entities and the relationship itself can have a ‘state’ that persists for a specific period of time. A 4D ontology seeks to model all of those aspects including the state of the object (or associated relationship). It is important to be clear that any type of ontological approach (3D or 4D) can be used for data fusion, but different benefits can be realised depending on the option used. 3D versus 4D? Benefits in practice Nearly all organisations today use conventional databases for their data and information. These, based on 3D ontologies, are commonly considered to be the default option for businesses and government in the UK digital infrastructure. Such 3D databases and document stores are built from well defined, hard coded schema and they are highly efficient and well supported for well established static world models. This is a key benefit of the 3D approach: it is conventional, cheap and easy to access.

However, as we continue to understand the application and viability of new information processing technologies such as AI and graph databases, it is increasingly being shown that interoperability between different 3D databases is a key issue. As more data is held in 3D formats and as more cross-comparison and referencing between different databases and data stores is being undertaken, the need to convert such sources into consistent formats is becoming more and more of a time consuming and inefficient step.

As a result, the core processes that organisations use to conduct data fusion with 3D ontologies are, by nature, more time consuming and challenging for data fusion. In practice, human knowledge and manual processing is the workaround for these issues of interoperability, which costs more time and effort.

With AIM we are looking to further unlock the benefits of the ontological approach by the continued research and application of 4D ontologies, to allow us to generate greater benefits for interoperability at source. This is not easy, but the research and associated tools and techniques spinning out from this are already realising benefits across the UK government and for early adopters in the business community. By developing ontological thinking at the source of analysis and industrial use cases, we are improving how we address the data fusion challenge. Using an ontological approach means data from disparate sources follows predictable patterns, greatly reducing the need for bespoke processing where integration or interoperability are required and helps address the real pain points we see in data fusion today.

Although they might seem like a lot of up-front work, such ontological approaches are flexible enough to represent any conceivable concept, thus allowing the modelling of any source of data. Despite the varying nature of data to describe a problem, the structure of the ontology guides similar concepts to be represented in a similar manner that can thus be interchanged without significant processing. An Example (going back to Dyno-Freight, our fictional use case from here)

‘Dyno-Freight’ has a database of its IT infrastructure, including an estate of PC-based terminals at static positions throughout their depots, used for tracking freight as it passes through. Each terminal is named according to the depot and loading bay in which it is positioned e.g. SO-LB13 for Southampton depot, loading bay 13. The naming system is sufficient for uniquely identifying the machine on the network and for employees to reference in IT support queries.

Periodically, a terminal may be removed for repair or upgrade and to keep the freight operation running smoothly is immediately swapped with another. The replacement machine takes on the same name as the original machine and once the repair or upgrade has occurred, the previous terminal becomes available to be swapped into any other depot.

Current ‘3D’ Asset Tracking

The IT department internally tracks the hardware itself via a unique serial number so they can identify it for repair and support with the supplier. A simplified 3D representation in a relational database table that links the hardware to its name and location may look like the following:

serial_number	machine_name	site
SN234257-ADE-B38	SO-LB13	Southampton Depot

This representation allows the IT department to locate the terminal hardware but because it is only a 3D representation, there is no temporal data. We only know this information to be true for the present moment.

Tracking a security breach

Now, consider the scenario where a security breach is discovered to have occurred 6 months ago at the Southampton depot. The IT security team decide they wish to examine not only all of the current hardware in Southampton, but any hardware that has been located there in the last 6 months. With the above 3D representation, this information would not be available. There is nothing preventing developers from modifying the relational database to hold historic records of hardware installation but it would be too late for the above scenario or this would need to have been anticipated when the database was designed. A ‘4D’ alternative to asset tracking With a 4D approach, temporal data is fundamental to the representation of physical entities within the system. A 4D ontological data model uses the concepts of "spatio-temporal extents" and "states" to express the existence of physical entities and changes to their characteristics over time. Spatio-temporal extents define the existence of a non-abstract entity by its material presence over a period of time. States represent time periods within an entity's lifetime where a particular set of characteristics hold true e.g. the machine name and location in our example above.

Where developers express physical entities, they are mandated to supply timestamps that bound their existence and changes to properties, thus a history of change is baked into the model. Additionally, these changes are captured at a semantic level, potentially providing richer information than simply timestamping database record changes.

A 4D ontological representation of the Dyno-Freight IT estate would contain spatio-temporal extents and states that connect the serial number to the physical machine and changes to machine name and site for all recorded time periods. A simplified version of this data model may look like this:

Now we have enough information to answer the question “Which hardware was installed in Southampton over the last 6 months?” and the best part is that we did not have to anticipate this query during system design.

Conclusion

By promoting the inclusion of temporal data as a core part of the data model, 4D ontological representations, whilst novel, can provide us with a more accurate view of the problem domain and implicitly enable historic event tracking. Time is inescapable so let us embrace it in our data models!

About us

What is AIM? Over the past 30 years, the UK research community has pioneered a new database technology built upon our understanding and collective research on Advanced Information Models (AIM). As a partner in this research, Elemendar has played a key role in bringing together and working with this rich community of operators who have built specific AIM techniques, and pioneering the application and development of specific 3D and 4D Ontologies for Cyber Threat Intelligence.

M. West, Developing High Quality Data Models. Burlington, MA: Morgan Kaufmann, 2011.

Part 1. Taking AIM at Data fusion Why Advanced Information Modelling (AIM) matters to the businesses of today (and tomorrow) Part 2. Taking AIM at Data Fusion - Data fusion for Cyber Threat Intelligence.

Acknowledgements

This blog post was authored by:

Chris Evett, Head Of AIM

Ross Marwood, AIM - Technical Architect