Solving the Problems of "Accidental Taxonomists" and "Data Janitors"

There is an article and a presentation by Mike Dillinger that points out very important information about real people working with knowledge graphs.  Those two pieces of work are:

The first article provides a summary, a "foundation for success" that includes three key points that would lead to better "taxonomies" or knowledge graphs.  Those three key points are: (with me paraphrasing a little)
  • People: We need to move beyond leaving knowledge creation to naïve non-practitioners (area of knowledge subject matter experts); semantic training programs are necessary to create the essential talent required to create quality knowledge representations.
  • Process: We need to move beyond simply improvising; we need industry good/best practices to follow. We need to create proven, tested frameworks using good/best practices that are proven to provide reliable results dependably; over, and over, and over again.
  • Tools: We need to move beyond only crafting manufactured knowledge using machines; we also need good tools that subject matter experts within an area of knowledge can use effectively to represent the terms and rules of their subject matter.
So to Mike Dillinger's list I would add a fourth point.  That forth point I guess would be called "Simplify" or maybe "Specialize".  What I mean by that is that providing "general" tools such as Protégé and then expecting a "non-practitioner" to be able to use that general purpose tool is an unrealistic expectation. If high-level models of an area of knowledge are created, and then the subject matter experts of an area of knowledge could work within the boundaries of this "specialized" system (as contrast to having to work within the broader more general system); I contend that there is a significantly higher probability of success. Or maybe that fourth bullet is "Theory". And if there is a theory you also need to test that theory, the fifth bullet:
  • Theory: We need to understand how the specialized system really works within the general model; what is the logic of the system that the people use, that have the processes, that use the tools.
  • Proof: We need to move beyond only building systems and go further by testing to provide proof that the system actually works as would be expected. And because we are not working in "silos" the proof is necessary so that everyone in the chain agrees that the theory is correct and implemented as would have been expected.
A specific example can help make my point.  When trying to wrap my head around XBRL-based digital financial reporting, what I did was to create a logical theory, Logical Theory Describing Financial Report, represented that logical theory visually as best as I could, represented the logical theory as a machine-readable knowledge graph using XBRL (see section titled "Technical"), and then built software using that high-level model specifically for the area of knowledge of my focus.  Pesseract, Auditchain Luca and Auditchain Pacioli is an example of that software. Then I built a PROOF to show that the system is working as expected. A conformance suite provides more testing and proof the system is working.

Basically; the software which is specialize for one specific area of knowledge lets subject matter experts of an area of knowledge focus on the logic of their specific area of knowledge and reduces the skills they need in order to perform knowledge representation tasks effectively by burying the complexity in the software and platform rather than requiring the software user to figure out all aspects of knowledge representation.  The specialized software follows the general knowledge engineering principles good/best practices; but the software users don't have to deal with knowledge representation at that general level.

This brings us to the notion of the "accidental taxonomist" a term borrowed from Heather Hedden's book The Accidental Taxonomist.  An accidental taxonomist is an area of knowledge subject matter expert; generally a well-intentioned and intelligent but naïve novice non-practitioners when it comes to knowledge representation.

It seems to me that now every subject matter expert within every area of knowledge is expected to be a knowledge engineer.  Technologists tried to take these subject matter experts out of the loop using tools like machine learning and ChatGPT.  Well, how is that working out? People are eventually going to figure out that you need to use the right tool for the job.  While tools like machine learning and ChatGPT can supplement knowledge creation; the probability of machine learning or ChatGPT correctly untangling every aspect of a designed system created by humans is ZERO!  This is not to say that machine learning and ChatGPT are not useful.  They are very useful tools; when applied correctly.

But that said, I can understand where the computer scientists are coming from when they try and take the subject matter experts out of the loop.  I have been an accounting information systems professional for 40+ years.  I can tell you that the average accountant cannot even set up a proper chart of accounts for an accounting system.  But now we want them to set up knowledge graphs?  Contemplate that.

Think about something.  Do we require people who want to drive cars to also be able to design those cars, build the car, maintain the car or fix the car when it breaks?  Of course not.  The subject matter experts for an area of knowledge can be grouped into the "builder" or systems and "users" of systems.  It is very rare for a system user to be good at also building systems; or system builders to use a system day, after day, after day. System builders and system operators are two different skill sets.

How did we end up with the systems that we have now?  Is that approach really the best approach going forward? Do alternatives exist?

Because accidental taxonomies cause, well, a lot of accidents; ended up at a point where people like accountants have become, as Mike Atkins of Semantic Arts describe it, little more than "data janitors" in an era of very poor "semantic hygiene practices."

The way I see it, what is going on in the area of knowledge representation is the same mistake made by the automotive industry and aerospace industry which was ultimately solved by using the quality control techniques, practices, and principles of what is now called Lean Six Sigma.  For example, one principle of Lean Six Sigma is the 1-10-100 rule as explained by Inspectorio:  Comparing relative cost of preventing, correcting, and cost of errors, consider this.  In relative terms, fixing a system to prevent a problem costs say $1 whereas having to correct a problem costs $10 as contrast to having to deal with the cost of the failure related to not detecting the problem is about $100.

Information and knowledge representation is a construction type problem.  That problem should not be viewed from the perspective of the individual silos that make up the system.  Rather, this system is more like a "chain".  And as is said, a chain is only as strong as its weakest link.

Currently, one of the weak links is the "accidental taxonomists" who are also the "data janitors".  Another weak link is the very poor "semantic hygiene practices".  Another weak link is the software applications that are being created to enable the accidental taxonomists who are also currently acting as data janitors to correct problems after the problem has already occurred.  Another weak link is the lack of world class good/best practices and tested, proven frameworks that make all this work effectively.

There are examples of groups of stakeholders of systems putting the pieces together correctly.  For example HL7's FHIR is one of those groups demonstrating leadership.  I think those behind the Financial Data Transparency Act of 2022 like the Data Foundation are showing leadership. ACTUS is demonstrating leadership.

Change is a process.  Evolution takes time.  Sensemaking takes time.  The most important thing is to recognize that a paradigm shift has occurred and change your mental map because the territory has changed. Using older, outdated maps to navigate this new territory will not work.  Many things will be tried, most attempted solutions will fail.  But some will succeed.  Some will figure out the puzzle.  Personally, I believe that the accidental taxonomists and the data janitors will be the ones to figure this out.  They are not the problem; they are the solution to the problem. After all; it is their problem.

Very few people seem to think about this change strategically.  New industrial strength systems will be created.  The technologies needed exist today.

Additional Information:

Comments

Popular posts from this blog

Microsoft CEO: "AI Agents will Replace All Software"

Getting Started with Auditchain Luca (now called Luca Suite)

New Tool for Accountants, Auditors, Analysts