Revisiting the Power of Classification

This is a second take on a prior blog post, Understanding the Power of Classification.

The Greek philosopher Aristotle (384-322 B.C.) first came up with the idea of classifying plants and animals by type, essentially creating the notion of a hierarchy or taxonomy.  The idea was to group types of plants and animals according to their similarities thus forming something that looked like a "tree" with which most people are familiar.  People tend to understand the notion of a "tree", but people tend to be less familiar with the notion of what is known as a "graph".

A tree, or hierarchy of things, is actually a type of graph.  You can differentiate the notion of a tree and the notion of a graphs in your mind as follows:  A "tree" has, well, only ONE TREE.  A graph can have many trees.

Classification is about organizing knowledge.  Categorization is a synonym of classification.  The only thing better than classifications is standard classifications.  One well known classification system is the Dewey Decimal Classification used by libraries. Imagine what it would be like if every different library had a different classification system for organizing its books.

Trees and graphs are used to classify things.  Taxonomies and Ontologies are tools for organizing knowledge into trees or graphs. Another term for the trees and graphs that make up a taxonomy or ontology is the knowledge graph. Taxonomies and ontologies are just ways to represent knowledge, the classifications, in machine readable graphs of knowledge.

Why would one do this?

Artificial intelligence is about bringing taxonomies and ontologies to life. What that means to me is that if you have an artificial intelligence software application, but you have no machine readable taxonomy or ontology for the AI to use; what you get will not be that interesting.  But, if you have both artificial intelligence software and a taxonomy or ontology, magical things can be the result.

Here is an example of the classifications in a small financial reporting scheme:

That set of classifications looks really hard for a human to read, and it is.  But if you filter the relations it becomes easier for humans to read: balance sheet, income statement, cash flow statement, roll forwards, trial balance, transaction groups.


There are many different types of categorization but they can be distilled down into two fundamental groups: "is-a" (types) and "has-a" (parts).  Other terms for this are "part-whole", "generalization-specialization", association/aggregation/composition (UML), "broader-narrower".

You can get an idea of the power to express associations if you have a look at the W3C's OWL 2 Web Ontology Language Primer and the OWL 2 Quick Reference.  All those gory details will be buried deep, deep within software.  It seems that GSQL which will become an ISO standard within a few years will factor into this as well.

XBRL International publishes arcroles (higher level classification capabilities) that can be used to represent different types of associations in a standard Link Role Registry.  Included in that registry are arcroles related to accounting and financial reporting. That metadata will contribute to the great transmutation of accounting, reporting, auditing, and analysis.

An excellent book that discusses classification is Everything is Miscellaneous by David Weinberger. (Here is a preview.) The book points out that there are three orders of order:
  1. First order of order. Putting books on shelves is an example the first order of order.
  2. Second order of order. Creating a list of books on the shelves you have is an example of second order of order. This can be done on paper or it can be done in a database.
  3. Third order of order. Adding even more information to information is an example of third order of order. Using the book example, classifying books by genre, best sellers, featured books, bargain books, books which one of your friends has read; basically there are countless ways to organize something.
Weinberger points out in his book that the  third order of order removes the limitations which people seem to assume exist when it comes to organizing information. Weinberger says this about the third order of order:
"In fact, the third-order practices that make a company's existing assets more profitable, increase customer loyalty, and seriously reduce costs are the Trojan horse of the information age. As we all get used to them, third-order practices undermine some of our most deeply ingrained ways of thinking about the world and our knowledge of it."
Weinberger's book points out two important things:
  1. That every classification scheme ever devised inherently reflects the biases of those that constructed the classification system.
  2. The role metadata plays in allowing you to create your own custom classification system so you can have the view of something that you want.
As we move from "atoms" to "bits", people drag along the rules which apply to atoms and try to apply those rules to solve problems in the world of bits. This, of course, does not work.  What is occurring is a paradigm shift.  The new paradigm has different rules.

Adding "classification" is like refining crude oil into gasoline or even into high-octane racing fuel. Machine readable knowledge is valuable.

This is why I do all my experimentation; to figure out and understand those new rules.  In the hands of a master craftsmen, what can be achieved using these new capabilities will seem like magic.

More Information:

Comments

Popular posts from this blog

Relational Knowledge Graph System (RKGS)

Graph Hairball

PLATINUM Business Use Cases, Test Cases, Conformance Suite