Electronic Spreadsheets

Electronic spreadsheets have been referred to as "the backbone of the enterprise" and "the quiet engines of a modern economy". Gartner points out (page 2) that the typical Fortune 1000 company uses 800 spreadsheets to prepare its regulatory compliance report. Those 800 spreadsheets are connected together with what amounts to duct tape, bailing wire, and band aids.

Spreadsheets are systems, not documents. Spreadsheets just look like documents. As pointed out by Matt Wood of PWC in his article, The Inversion, (paraphrasing) a spreadsheet is a "network of dependencies" and that "meaning lives in the relationships between cells not in any sequence of words" and "thousands of logical gates that connect input to output". But to a software application that does not understand the information clothed in that document, the information remains "dark matter" that is "present, consequential, but invisible" to software agents attempting to effectively pull the information from within the document that clothes that information.

This is particularly true of something like a financial statement, a financial analysis model, accounting working papers, or audit working papers represented within a spreadsheet. If you look beneath the document oriented spreadsheet "costume" or workbooks, sheets, columns, rows, and cells; what you see is a rigorously designed logical substrate.

And because there are so many spreadsheet systems disguised as documents within large enterprises that, as described in an article by Will Hodges, PwC becomes first to deliver reliable AI reasoning for enterprise-grade spreadsheets, PWC has created a mechanism using OpenAI and specifically an OpenAI Frontier agent and the From Rows to Reasoning (FRTR) agent measured against the FRTR Benchmark, for decomposing the spreadsheet document, separating out the information from the document, and then measuring the result against the provided benchmark to see how good they have done.

What PWC seems to have created would be a very, very valuable mechanism given the massive number of spreadsheets that exist within organizations. PWC says that they have achieved a "3x higher accuracy" per the benchmark. HOWEVER, my reading of the results seems to say that the accuracy rate of 24% was increased to an accuracy rate of 74%; that is the 3x. I think they are also saying that an accuracy rate of 87% was achieved with an improved version of OpenAI (i.e. GPT 5).

While a 74% accuracy rate or even an 87% accuracy rate would be acceptable for some use cases, for other use cases that is not even in the ball park. For example, if a regulated company is submitting a compliance report to the regulator an accuracy rate of 87% will not do. What I have personally been working to achieve is an accuracy rate of 99.99966% which is sigma level 6 which is only achievable using a rules-based system. Neuro-symbolic system is even better.

As Georg Philip Krog points out in his article, Why AI Can’t Read Your Contracts — And What To Do About It, LLMs which are probability-based tools can never provide certainty about whether something is right or wrong. If something is 87% correct; you don't understand WHICH PARTS are right and which parts are wrong. Therefore, it is unwise to trust the work without rechecking 100% of the work.

There is a dilemma here.

On the one hand, as the PWC article points out that for decades now, the information for the most important business decisions have lived inside electronic spreadsheets. Thousands of rows, many columns, dense formulas, many dependencies, multimodal inputs. In the age of artificial intelligence, this information to not be accessible to that artificial intelligence is definitely not optimal. There are millions and millions of these electronic spreadsheets.

On the other hand, remembering that a computer is a dumb beast; wouldn't it be better to give these dumb beasts a chance at success and represent information in a manner that a computer can actually work with effectively? Why would it not be better to represent information first in a manner that a machine can interpret and then convert that machine representation into a presentation format that is also consumable by humans? This is explained in my Core Pattern article. This is doable today using a global open industry standard format. What you get is a universal industrial strength "plug-and-play" format for information. This may, or may not, work for all information; but it absolutely works for financial accounting information.

So, is there an acceptable alternative to the currently available traditional electronic spreadsheet? To be absolutely honest, I cannot see that alternative YET instantiated in the form of working software. I have seen glimpses of what could exist. I know what that alternative would look like. This is my wish list (read the link). What we currently have is effectively a kludge. It is a dead end. It was a stepping stone. A "new normal" needs to be created. A paradigm shift is necessary. But that paradigm shift cannot occur until new software exists that checks enough of the boxes to be that new normal.

Think about it this way. We cannot redo all those millions if not billions of traditional electronic spreadsheets which already exist. But what about the electronic spreadsheet you will create tomorrow or the day after that. Should we continue building what amount to being very important systems using documents? Should we continue to connect those important systems using duct tape?

There are two specific paradigms that I am aware of for thinking about and working with spreadsheets:

Document model of presentation oriented paradigm: With a document oriented paradigm, you make use of the notion of an enhanced table model such as the CALS Table Model and view or present information in the form of a table or set of tables which humans can easily read. This is the model of today's electronic spreadsheet.
Logical model of meaning representation oriented paradigm: With a model oriented paradigm, you make use of a global open industry standard high level logical model of meaning such XBRL International's Open Information Model (OIM) or Object Management Group's Standard Business Report Model (SBRM) as the logic to represent a multidimensional information model which is easily interpretable by machines and then you take the machine representation and convert that single machine representation into a natural or neutral presentation so that humans can also interpret the same single representation. This is the model of my vision, global open industry standards-based model-driven, semantic-powered, artificial intelligence enabled logical spreadsheet.

The problems of the document model oriented approach to creating electronic spreadsheets tends to be well understood. These tend to have more errors than are acceptable, the errors are hard to find, linking mechanisms are brittle and unreliable, standardization is extremely hard to impossible, as such document model oriented electronic spreadsheets are hard to scale (impossible to scale). The advantages of the logical model oriented approach to creating electronic spreadsheets are compelling: scalable, modular, standardizable, and artificial intelligence software can understand the information reliably.

What if all future spreadsheets could be built using approach #2 above, using a logical model oriented paradigm. What would it take for that to be possible? Well, the first thing one needs is that software.

Admittedly, my use case of a quality level of 99.99966% is different than the use case PWC might be working to achieve. But it cannot be argued that my zero tolerance for error use case is not valid.

Is it even possible to build such software? Well, I know it can be build for accounting, reporting, audit, and analysis. My superpower is accounting information systems. The real question is can this be based on a global open industry standard. Can we come together? It is worth trying. The other question is whether this type of an approach also be used for other domains in business.

Think about something. There are two perspectives that one might take. One perspective is to digitize your existing processes. But another perspective is to rethink how processes might work in an information rich connected world where we have artificial intelligence to leverage. Trying to automate bad processes is not a good idea.

An electronic spreadsheet is a document pretending to be infrastructure. Infrastructure allows two different people, two different systems, or two different software agents to reach the same consistent conclusion for the same reason.

The opportunity costs of continuing to follow traditional practices is simply just too great.

Additional Information:

Search This Blog

XBRL-based Digital Financial Reporting

Electronic Spreadsheets

Comments

Post a Comment

Popular posts from this blog

Internationalized Resource Identifier (IRI)

Digital Proficiency

Example Financial Statement Holon