DataBook

June 18, 2026

As explained by Kurt Cagle and Chloe Shannon in their article, DataBooks: Markdown as Semantic Infrastructure, a DataBook is effectively a microdatabase.

A DataBook is a document a human can read, a data file that a computer can process, and a toolbox that caries its own instructions. A DataBook is a technique that enables data and an explanation to travel together in a data pipeline.

One important part of the magic of DataBook files to understand is that a DataBook can also easily be read and interpreted by LLMs.

Another part of the magic of the DataBook is that everything travels together within one file including:

data
meaning
rules
queries
documentation

Finally, DataBook files can easily be versioned by Github and Gitlab. Both Github and Gitlab support MD files which are both based on CommonMarkup. And so, there appear to be different "flavors" of MD files, but they are close.

There are no separate files which can be forgotten or lost. This technique uses a markdown file (.md) as the container which holds everything. No separate files to lose, no context to forget. A markdown file provides a powerful yet straightforward way for users, both technical and non-technical, to write plain text documents that can be rendered richly as HTML but also easily read by a computer software application.

Within the markdown file you can provide fenced blocks (a.k.a. section) of structured data formats such as YAML, RDF/Turtle, JSON-LD, SPARQL, SHACL, SQL, CSV, and other such well understood structured formats.

That’s it; one physical file that contains many formally fenced off layers. Here are some of the different fenced blocks that can be provided within a DataBook:

Markdown: this is the primary format of the file; holds the human-readable text and all of the other structured information, those fenced blocks.
YAML frontmatter: Used at the top of the file for metadata like title, author, version, provenance. YAML is a unicode based data serialization language which is broadly useful for programming needs ranging from configuration files to internet messaging to object persistence to data auditing and visualization.
RDF/Turtle: Used to store a graph of data a.k.a. linked data.

JSON‑LD: Popular alternative or complementary linked‑data format.
SPARQL: Queries and updates embedded directly in the file.
SHACL: Validation rules for the data.
Other typed structured blocks: Depending on the use case, a DataBook may also include:

CSV
XML fragments
XSLT transforms
XML schemas
XML Schema Datatypes
SVG
XBRL

XBRL instance
XBRL taxonomy schema
XBRL linkbase
XBRL formula

SQL
GQL
Python
PROLOG
JSON
LLM prompts
Manifests
Encryption blocks

Here is a very simple, basic example of a DataBook. You can read the databook file here on Github and a machine can be sent this raw databook file of the file on Github.

Note that a set of DataBooks can be tied together to form a graph if you use an appropriate YAML header.

As I understand it, work is being done to try and get DataBook to become a W3C standard. This appears to be done via the holon working group. Even if DataBook is not a global standard, it is a useful convention and might even be considered a best practice.

Google has created what they are referring to as Open Knowledge Format (OKF). This is explained in the article Introducing the Open Knowledge Format. OKF can be found on Github here.

Additional Information:

Search This Blog

XBRL-based Digital Financial Reporting

DataBook

Comments

Post a Comment

Popular posts from this blog

Internationalized Resource Identifier (IRI)

Digital Proficiency

Example Financial Statement Holon