Knowledge Representation and Reasoning (KRR)

Knowledge representation and reasoning (KRR) is about converting information from an area of knowledge into machine understandable form and then enabling a machine such as a computer using software to process that information in a manner that is as good as a human could have performed that task/process or even better than a human could have performed that task/process.

For example, some task or process currently performed by humans that, if measured, would achieve a sigma level of 3 which is a defect rate of 6.7% (about 67,000 defects per million opportunities) would be improved and would achieve a sigma level of 6 which is a defect rate of 0.00034% (about 4 defects per million opportunities).

You are hearing me right, defects go from a whopping 67,000 down to 4. Think I am joking or on drugs? The Federal Deposit Insurance Corporation (FDIC) call report collection system went from 18,000 defects (reporting errors) down to 0 defects when it modernized their call report system to make use of XBRL. That was in 2003 and that was an easier forms-based system but they still had around 18,000 reporting errors every quarter. But now the same results can be obtained for a customizable reporting system.

So how do you make all this work? How do you get a machine to perform work better, faster, and/or cheaper than humans? The answer is: very carefully, very deliberately. Here are some things that you need to consider.

Knowledge
Knowledge representation approach
Acquiring knowledge to represent
Approach to reasoning on the represented knowledge
Technical implementation of software for selected reasoning approach
Operator of implemented software

The sections below looks into the choices you have to make for each of of these areas in order to get knowledge representation and reasoning to work effectively.

Knowledge

Knowledge is a form of familiarity with information from some specific area or corpus. Knowledge is often understood to be awareness of facts, having learned skills, or having gained experience using the things and the state of affairs (situations) within some area of knowledge. An area of knowledge (corpus) is a highly organized socially constructed aggregation of shared knowledge for a distinct subject matter. An area of knowledge has a specialized insider vocabulary, underlying assumptions (axioms, theorems, constraints), and persistent open questions that have not necessarily been resolved (i.e. flexibility is necessary). You can think about an area of knowledge as being characterized in a spectrum with two extremes:

Kind area of knowledge: clear rules, lots of patterns, lots of rules, repetitive patterns, and unchanging tasks.
Wicked area of knowledge: obscure data, few or no rules, constant change, and abstract ideas.

Sensemaking is the process of determining the deeper meaning or significance or essence of the collective experience for those within an area of knowledge or corpus. System stakeholders need to be in agreement as to an undisputed core knowledge of a system. The Cynefin Framework provides a tool for understanding and categorizing knowledge and rules within a corpus. Per the Cynefin Framework, knowledge can be categorized as being:

Best practice (obvious)
Good practice (only obvious if you have the right skills and experience)
Emergent practice (tend to have to have more skills and experience, then can use principles to group alternatives)
Novel practice (tends to be unique, but describable)

Knowledge of facts is distinct from opinion or guesswork by virtue of justification or proof. Knowledge is objective. Opinions and guesswork are subjective. In our case we are talking about certain specific knowledge, the facts that make up that knowledge, being able to create a proof to show the knowledge graph system is complete, consistent, and precise; and all of this logic being put into a form readable by a machine and reach a conclusion as to whether the information in the knowledge graph is functioning properly. Effectively, a machine can read that knowledge and mimic understanding of that knowledge represented in a knowledge graph and the information available to both a human reader and a machine reader would be the same and therefore the human and machine should reach the same conclusion.

Knowledge must be managed. Machine readable knowledge needs to be curated to keep it current. This curation and management has value of machine readable knowledge is valuable because the machine readable rules are valuable. This management and curation of rules takes effort.

Knowledge representation approach

There are a number of different approaches that a knowledge representation might take, each approach having a different level of expressivity, which forms a knowledge representation spectrum. The logical theory is the most powerful approach in terms of expressive power.

Acquiring knowledge to represent

There tends to be three approaches to acquiring knowledge for some area of knowledge. These three approaches are:

Handcrafted knowledge: Skilled and experienced subject matter experts for some area of knowledge create/construct the knowledge representation. This approach can be costly and take time, but it also yields the highest result if done correctly.
Statistical learning: Also referred to as machine learning, of which there are various forms, but all approaches are based on probability and statistics. While this approach can cost less, the quality can be significantly lower. This tends to be referred to as unsupervised learning.
Combining handcrafted knowledge approach and statistical learning approach: Combining both approaches, called supervised statistical learning, is where humans and machines work together to achieve the highest quality result with the least expense and time being involved.

As any craftsmen or craftswoman knows, you need to use the right tool for the job. A tool offers a basket of capabilities, PROs and CONs. No tool is only PROs or only CONs. For example, statistical learning works best if the system are creating has a high tolerance to error. These types of systems work best for:

capturing associations or discovering regularities within a set of patterns;
where the volume, number of variables or diversity of the data is very great;
relationships between variables are vaguely understood; or,
relationships are difficult to describe adequately with conventional approaches.

Statistical learning uses probability and statistics, correlations . This is not to say that statistical learning is a bad thing. It is not, statistical learning (a.k.a. machine learning) is a tool. Using the wrong tool for the job will leave you unsatisfied. Ultimately, what you create will either work or it will not work to achieve your objectives. The craftsman's or craftswoman's task is to figure that out.

Important terms and associations

Important terms and associations between terms of an area of knowledge need to be represented. Not every term, only important terms for the system of interest. Even a small knowledge graph can provide massive value.

Approach to reasoning on the represented knowledge

Logic is a formal system that defines the rules of correct reasoning. Logic involves logical reasoning. Inference are steps in reasoning. There are three types of logical reasoning or types of steps in inference: deductive reasoning, inductive reasoning, and abductive reasoning. This forms what is sometimes referred to as a "triad of reasoning approaches" or reasoning types. Those reasoning approaches are different tools that have different sets of capabilities, different sets of PROs and CONs.

A hybrid system can be created that combines all three approaches into one single tool that leverages the best of each approach. Again, a craftsman's or craftswoman's task is to figure that out.

Technical implementation of software for selected reasoning approach

There tends to be three primary groups of problem solving tools for implementing knowledge representation and reasoning against the representation:

Semantic web stack of technologies
Graph databases
Logic programming

All three implementation approaches can work, each has a basket of PROs and CONs that should be considered.

Operator of implemented software

There tends to be two primary groups of users of the software used to implement knowledge representation and reasoning:

Technical professionals
Nontechnical professionals (business professionals)

Irreducible complexity (a.k.a. essential complexity) is a term used to describe a characteristic of complex systems whereby the complex system needs all of its individual component systems in order to effectively function.

In other words, it is impossible to reduce the complexity of a system (or to further simplify a system) by removing any of its component parts and still maintain its functionality objective because all those component parts are essential to the proper functioning of the system. So for example, consider a simple mechanism such as a mousetrap. If you remove a piece, the mousetrap will not be able to function properly.

The Law of Conservation of Complexity states that: Every software application has an inherent amount of irreducible or essential complexity. The question is who will have to deal with that complexity:

the application developer,
the platform developer that the software runs on, or
the software user.

A kludge is an engineering/computer science term that defines what is best described as a workaround or quick-and-dirty solution that is typically clumsy, inelegant, inefficient, difficult to extend and hard to maintain; but it gets the job done. By contrast, elegance is beauty that shows unusual effectiveness and simplicity.

The trick is to create the right tool for the job and include only essential complexity, not accidental complexity that is not necessary in the system.

Additional Information:

Search This Blog

XBRL-based Digital Financial Reporting