Finding patterns with rules (2024)

Machine learning algorithms are now synonymous with finding patterns in data but not all patterns are suitable for statistics based data-driven techniques, for example when these patterns don’t have explicitly labelled targets to learn from.

In some cases, these patterns can be expressed precisely as a rule. Reasoning is the process of matching rule-based patterns or verifying that they don’t exist in a graph. Because these patterns are found with deductive logic they can be found more efficiently and interpreted more easily than Machine Learning patterns which are induced from the data.

This article will introduce some common patterns and how you can express them in the rule language, Datalog, using RDFox, a knowledge graph and semantic reasoning engine developed by Oxford Semantic Technologies. RDFox is a standards based, RDF-triple store which we will query with SPARQL. If you are not yet familiar with knowledge graphs and reasoning, you can read an introduction published on Towards Data Science here. If you don’t have RDF data, read our simple tutorial for importing CSVs and SQL data into an RDF graph.

The “located in” relation is intuitively transitive but might not be completely expressed in the graph. For example, a graph might contain the following triples:

@prefix : <https://oxfordsemantic.tech/RDFox/tutorial/> .:oxford :located_in :oxfordshire .
:oxfordshire :located_in :england .
:england :located_in :uk .

A SPARQL query will not return that Oxford is located in England because it is missing the triple:

:oxford :located_in :england .

This triple could be added on a one-off basis, but this would soon become impractical on larger graphs where other towns located in Oxfordshire might also be missing the located in England edge.

In this case, a transitive relation rule can effortlessly draw the relevant :located_in edges automatically:

[?x, :located_in, ?z] :- 
[?x, :located_in, ?y],
[?y, :located_in, ?z] .

The first part is the head of the rule which will materialise ?x :located_in ?z triples if the pattern following the symbol :- is found in the graph.

In our example, the rule will bind the variable ?x to :oxford, variable ?y to :oxfordshire and variable ?z to :england, and then as a logical consequence of the rule being satisfied, create the :oxford :located_in :england triple by replacing ?x with :oxford and ?z with :england.

Finding patterns with rules (4)

RDFox will materialise the rule incrementally on the fly whenever new data points are added or removed from the graph which makes it an efficient solution for dynamic data sources.

Transitive closure patterns help materialise transitive relations which don’t yet exist in the graph but could potentially. For example, some Twitter followers can be represented in the following graph:

:alice :follows :bob .
:bob :follows :charlie .
:diana :follows :alice .
Finding patterns with rules (5)

When recommending follow suggestions to the users we might wish to compute the missing connections as :follows_closure edges using the following rules:

[?x, :follows_closure, ?y] :- 
[?x, :follows, ?y] .
[?x, :follows_closure, ?z] :-
[?x, :follows, ?y],
[?y, :follows_closure, ?z] .

The first rule specifies that the new relation :follows_closure is an extension of the relation :follows. The second rule implements the closure by saying that if a person ?x directly follows ?y and ?y (directly or indirectly) follows person ?z, then ?x (indirectly) follows ?z.

Finding patterns with rules (6)

The new :follows_closure relations which were not originally a :follows relation are therefore:

:diana :charlie .
:alice :charlie .
:diana :bob .

These simple rules can be enhanced to include user interests, geography, language, common followers etc.

An important practical use of knowledge graphs is to power Open Query Answering (Open QA) applications or chatbots, where the user asks a question in natural language, which is then automatically answered against the graph. Open QA systems often struggle to interpret questions that involve several “hops” in the graph. For instance, consider the graph consisting of the triples given next.

:douglas_adams :born_in :uk .
:uk rdf:type :country.

A user may ask in which country Douglas Adams was born. To obtain this information, the system would need to construct a query involving two hops in the graph. In particular, the SPARQL query

select ?c where {
:douglas_adams :born_in ?c .
?c rdf:type :country .
}

would return :uk as an answer.

The results of the open QA system would be greatly enhanced if the desired information had been available in just a single hop. RDFox rules can be used to provide a clean solution in this situation. In particular, we can use rules to define a new :country_of_birth relation that provides a “shortcut” for directly accessing the desired information.

[?x, :country_of_birth, ?y] :- 
[?x, :born_in, ?y],
[?y, rdf:type, :country] .

The rule says that, if a person ?x is born in a place ?y, and that place ?y is a :country, then ?y is the country of birth of ?x.

As a result, RDFox would derive that Douglas Adams’ country of birth is the UK. The Open QA system would now only need to construct the following simpler query, which involves a single hop in the graph, to obtain the desired information.

select ?x ?y where {?x :country_of_birth ?y}

As we have already seen rules can build on the results of other rules so this pattern combined with the the first, transitive relation, can cater for data which includes more the full detail of where Douglas Adams was born. Such as:

:douglasAdams :born_in :cambridge .
:cambridge :located_in :cambridgeshire .
:cambridgeshire :located_in :england .
:england :located_in :uk .
:uk rdf:type :country .

When the rules are applied to this data they produce the graph visualised below.

Finding patterns with rules (7)

A common task in knowledge graphs is to identify cyclic relationships. For instance, partonomy relations are typically acyclic (e.g., if an engine is part of a car, we wouldn’t expect the car to be part of the engine as well!). In these cases, cycle detection may be needed to detect errors in the graph and thus provide data validation. Cycle detection is also useful for detecting fraud or insider trading by identifying for example communication relations in the network which shouldn’t exist.

Consider the following graph with “part of” relations:

:piston :part_of :engine .
:engine :part_of :car .
:car :part_of :piston .

The graph contains a cyclic path :piston -> :engine -> :car -> :piston. via the :part_of relation.

Finding patterns with rules (8)

The relationship is naturally transitive and can be define with the following rule:

[?x, :part_of, ?z] :- 
[?x, :part_of, ?y],
[?y, :part_of, ?z] .

The following SPARQL query will return the elements which are part of others (directly or indirectly)

select ?x ?y where {?x :part_of ?y}

Which gives us the following results

:piston :piston .
:car. :car .
:engine :engine .
:piston :car .
:car :piston .
:engine :car .
:piston :engine .

A cycle manifests itself by the presence of self-loops (e.g. :piston is derived to be a part of itself).

Finding patterns with rules (9)

Hence, it is possible to detect cyclic part of relations with the following SPARQL query.

ask {?x :part_of ?x}

Alternatively, we could define cyclic relations with the following rule:

[:part_of, rdf:type, :cyclic_relation] :- 
[?x, :part_of, ?x] .

Which tells us that if any object is determined to be a part of itself, then the partonomy relation is cyclic.

We can now easily retrieve the list of cyclic relations in the graph.

select ?x where {?x rdf:type :cyclic_relation}

To obtain :part_of as a result.

Many relations naturally imply some sort of order, and in such cases, we might be interested in finding the first and last elements of such orders.

For instance, consider the managerial structure of a company.

:alice :manages :bob .
:bob :manages :jeremy .
:bob :manages :emma .
:emma :manages :david .
:jeremy :manages :monica .

We would like to recognise which individuals in the company are “top-level managers”. We can use a rule to define a top-level manager as a person who manages someone and is not managed by anyone else.

[?x, rdf:type, :top_level_manager] :-
[?x, :manages, ?y],
not exists ?z in ([?z, :manages, ?x]) .
Finding patterns with rules (10)

The query

select ?x where {?x rdf:type :top_level_manager}

asking for the list of top level managers gives :alice as the answer.

We can now use a rule to define “junior employees” as those who have a manager but who themselves do not manage anyone else.

[?x, rdf:type, :junior_employee] :- 
[?y, :manages, ?x],
not exists ?z in ([?x, :manages, ?z]) .

The query for junior employees is then

select ?x where {?x rdf:type :junior_employee}

This returns :monica and :david as answers.

This was a short introduction to a sample of rule-based patterns which will be expanded with more examples and applications in my next articles. Imagine what is possible with the combination of these with other rules and when run at scale over your data.

If you would like to search for rule-based patterns yourself visit RDFox’s getting started guide.

The team behind Oxford Semantic Technologies started working on RDFox in 2011 at the Computer Science Department of the University of Oxford with the conviction that flexible and high-performance reasoning was a possibility for data extensive applications without jeopardising the correctness of the results. RDFox is the first market-ready knowledge graph designed from ground up with reasoning in mind. Oxford Semantic Technologies is a spin out of the University of Oxford and is backed by leading investors including Samsung Venture Investment Corporation (SVIC), Oxford Sciences Innovation (OSI) and Oxford University’s investment arm (OUI). The author is proud to be a member of this team.

Photo by Paweł Czerwiński on Unsplash

Finding patterns with rules (2024)
Top Articles
Latest Posts
Article information

Author: Jeremiah Abshire

Last Updated:

Views: 5713

Rating: 4.3 / 5 (74 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Jeremiah Abshire

Birthday: 1993-09-14

Address: Apt. 425 92748 Jannie Centers, Port Nikitaville, VT 82110

Phone: +8096210939894

Job: Lead Healthcare Manager

Hobby: Watching movies, Watching movies, Knapping, LARPing, Coffee roasting, Lacemaking, Gaming

Introduction: My name is Jeremiah Abshire, I am a outstanding, kind, clever, hilarious, curious, hilarious, outstanding person who loves writing and wants to share my knowledge and understanding with you.