Thinking Beyond the Document

There is an elephant in the room.

Accountants, auditors, and analysts have been performing work related to creating information, attestation related to the trustworthiness of that information, and decision making and other supporting services related to making use of that information for many thousands of years. The basis of that information, the medium used to convey that information, has been the "physical hard copy"; the physical paper-based document; the physical paper-based spreadsheet. These paper based documents were the sources of data used by accountants, auditors, and analysts for literally thousands of years. Those physical paper based documents evolved from the earlier mediums which included physical objects, clay tablets, papyrus and other earlier forms of physical hard copies. Those physical paper based document oriented artifacts literally drove the universal technology of accountably for thousands of years.

The internetworked computer (a.k.a. the computer hardware, the software, plus the internet) offer new approaches, new mediums, for conveying those sources of data and the information and knowledge provided by those sources to the end of performing work.

And yet, many accountants, auditors, and analysts are still constrained in their thinking by those "physical hard copy" based mediums of information exchange. Rather than physical hard copy documents and document oriented electronic spreadsheets they are thinking digital copies of documents and document oriented electronic spreadsheets. And then when computers and software have a hard time making use of that data and information because the computer based processes cannot interpret that information; they invent approaches to attempt to get computers and software to somehow interpret those digital copies of documents and document oriented electronic spreadsheets.

But that document oriented approach is a dead end. Why? As I have pointed out before, computers are dumb beasts.

What is necessary is an information oriented approach that both a human and machines can interpret and work with. Think of this as universal industrial strength information plug-and-play. Put the information in machine interpretable form first; then use computer based processes to convert that machine interpretable information into something that is also interpretable by humans.

Don't get me wrong; documents and document oriented spreadsheets are incredibly useful tools. We just need to put things in the right order and starting with the human readable version is doing things in the wrong order.

We need to approach this by doing things in the proper order.

An internetworked computer is an extremely useful tool; it can store, retrieve, process, and provide instant access to data and information. But there are obstacles which must be overcome to make effective use of that tool. I will not even get into the technical obstacles. Here are the business oriented obstacles:

Business professionals use different terminologies to refer to exactly the same thing.
Business professionals have inconsistent understandings of an area of knowledge (a.k.a. area of interest, community of practice, field, domain, subject domain, universe of discourse, society)

Artificial intelligence is a tool, it is not magic.

A key ingredient to creating an effective computer-based system is metadata. An example of metadata is the machine interpretable model of the conceptualization of a business report. Another example of metadata is the language of accounting.

Metadata is structured information that describes, explains, categorizes, gives context to, or otherwise enhances the usability of other information so that humans and machines can understand and use that information effectively.

In his book, Everything is Miscellaneous, David Weinberger explains that there are three orders of order. Understanding these three orders of order can help you understand the value of metadata.

Putting books on shelves in some sort of order is an example the first order of order.
Creating a list of books on the shelves you have is an example of second order of order. This can be done on paper or it can be done in a database.
Adding even more information to information is an example of third order of order. Using the book example, classifying books by genre, best sellers, featured books, bargain books, books which one of your friends has read; basically there are countless ways to organize something.

It is the third-order practices that make a company's existing assets more profitable, increase customer loyalty, and seriously reduce costs are the Trojan horse of the information age. As we all get used to them, third-order practices undermine some of our most deeply ingrained ways of thinking about the world and our knowledge of it.

The power of a computer based system is proportional to the high-quality metadata available to that system. What is not in dispute is the need for a "thick metadata layer" and the benefits of that metadata in terms of getting a computer to be able to perform useful and meaningful work.

But what is sometimes disputed is how to most effectively and efficiently get that thick metadata layer. There are two basic approaches to getting this thick metadata layer:

Have the computer figure out what the metadata is: This approach uses artificial intelligence, machine learning, and other high-tech approaches to detecting patterns and figuring out the metadata.
Tell the computer what the metadata is: This approach leverages talented, skilled, and experienced business domain experts and knowledge engineers to piece together the metadata so that the metadata becomes available to the computer based system.

Because acquiring this knowledge, called knowledge acquisition, can be slow and tedious; much of the future of internetworked computer based systems depends on breaking the metadata acquisition bottleneck and in codifying and representing a large knowledge infrastructure. However, this is not an “either/or” question. Both manual and automated knowledge acquisition methods can be used together. Manually created metadata are used to prime the pump; then machine learning can build on that foundation. Humans and machines can work together to curate this important metadata.

Machine learning or deep learning systems work best if the system you are using them to model has a high tolerance to error. These types of systems work best for things like:

capturing associations or discovering regularities within a set of patterns;
where the volume, number of variables, or diversity of the data is very great;
relationships between variables are vaguely understood; or,
relationships are difficult to describe adequately with conventional approaches.

Machine learning basically uses probability and statistics, correlations. This is not to say that machine learning is a bad thing. Machine learning is a tool. Any craftsman knows that you need to use the right tool for the job. Using the wrong tool will leave you unsatisfied . Ultimately, what you create will either work or it will not work to achieve your objectives or the objectives of system stakeholders.

There are no short cuts. Again, no one really disputes the need for this thick layer of metadata to get a computer to perform work effectively. Also, this metadata provides leverage similar to how software code creates leverage. Metadata, like software, only has to be created once and then millions can use that metadata similar to how many people can use the same software application.

Accounting does not have a high tolerance to error. The tolerance for error in many aspects of accounting, reporting, auditing, and analysis is ZERO. So, the threat of inaccuracy needs to be managed. Epistemic risk is manageable.

Accounting information wants to be connected, to be linked. Accounting has built in transparency and traceability mechanisms. Accounting information also has built it quality control mechanisms like double entry and articulation. However, when you start at the "end of the chain" or in the middle, much of the traceability provided by the linking is lost. If metadata is missing, linking might still be available, but the functionality is less than it could be.

Remember the 1-10-100 rule. The relative cost of fixing a broken process is $1 as contrast to spending $10 to deal with fixing the mistake and the $100 cost of dealing with the ramifications of a mistake.

If we want interconnected computer based systems to work better, why don't we create the inputs that would give us better outputs?

We need to consciously try and build a new paradigm rather than try and fix the current paradigm. While computers seem smart, they are actually quite dumb. We need to try and provide things in a form they can grasp as contrast to forcing them to decipher the messy situation we have created over the past 50 years. We humans need to fix the mess we created. Making this investment will yield significant dividends.

* * *

Popular artificial intelligence implementations such as LLMs have a fundamental problem when it comes to understanding things like a financial statement. When these types of artificial intelligence try and interpret things like a detailed financial statement provided in the form of a document, they can get about 87% of the information right. This fundamental architectural limitation makes them dangerously unreliable. You don't know which 87% is right and what is wrong.

LLMs are incredibly good at predicting the next word in a sentence. That is their core capability. But that is not what you need to interpret 100% of a financial statement. To do that, you need logical reasoning. There is a big difference between "guessing" and "proving". Another type of artificial intelligence is symbolic artificial intelligence also known as rules-based artificial intelligence. Not even that rules-based artificial intelligence can understand 100% of a financial statement. But, what the rules-based artificial intelligence does understand, you can be certain that it is right to the extent of the rules that the rules-based artificial intelligence is using. Humans still need to fill in the gap between what artificial intelligence understands and what they are not capable of understanding. And so, it is very important to understand the line between what tasks artificial intelligence is performing and what additional work needs to be performed by a human collaborator.

A financial statement is not natural language. A financial statement is encoded financial logic using the language of accounting wearing a natural language costume to make the financial statement representation (i.e. the encoding) look like a document. This is particularly true now days when financial statements are prepared using XBRL.

Underneath those words and numbers in a financial statement is robust structure, an intentionally designed sub straight, of logic. And that is why rules-based artificial intelligence is so good at interpreting that logic and proving that the logic is consistent, coherent, and provable. Making a statistical guess about what a financial statement means is not how you interpret a financial statement.

When it comes to things like financial statements, this is a high stakes game. Misinterpreting one word can have consequences that might lead to an incorrect interpretation that could lead to a million dollar mistake being made. Sure, an LLM can be helpful; but there needs to be bright lines between what you know for sure because rules-based artificial intelligence that you can trust reached a conclusion, the work that a human did, the trustworthiness of that human (e.g. what is their knowledge and skill), and plausible but potentially incorrect information provided by an LLM. You cannot delegate responsibility to a machine.

A single educated guess by an LLM that is wrong could result in an expensive catastrophic liability. This goes for the creation of the financial statement, verification of the financial statement, and in the interpretation and use of the information from the financial statement.

When a financial statement is first encoded (i.e. represented) as a provable machine-readable knowledge graph (a.k.a. graph first, model-driven) and the algorithm for converting that knowledge graph into a human readable "costume" for human consumption; then it is crystal clear to know where those bright lines between work performed by a reliable and trustworthy artificial intelligence, a skilled and experienced human, and the work product provided by an LLM that could be very helpful, but could be wrong. Provability is critically important.

Financial statements submitted to a regulator that has specific compliance requirements require proof. Known logic in the form of machine-readable rules, facts, and rules-based artificial intelligence cannot only provide a guaranteed result, they can do work at lower cost and also offload monotonous, repetitive work that is highly susceptible to error to a more reliable machine. Skilled and experienced humans can focus on the higher value work that only such skilled and experienced humans can perform.

Additional Information:

Search This Blog

XBRL-based Digital Financial Reporting

Thinking Beyond the Document

Comments

Post a Comment

Popular posts from this blog

Professional Study Group for Digital Financial Reporting

Big Idea for 2025: Semantic Accounting and Audit Working Papers

Rethinking Financial Reporting: the Model-driven Financial Statement