DataBook
As explained by Kurt Cagle and Chloe Shannon in their article, DataBooks: Markdown as Semantic Infrastructure, a DataBook is effectively a microdatabase.
A DataBook is a document a human can read, a data file that a computer can process, and a toolbox that caries its own instructions. A DataBook is a technique that enables data and an explanation to travel together in a data pipeline.
One important part of the magic of DataBook files to understand is that a DataBook can also easily be read and interpreted by LLMs.
Another part of the magic of the DataBook is that everything travels together within one file including:
- data
- meaning
- rules
- queries
- documentation
Finally, DataBook files can easily be versioned by Github and Gitlab. Both Github and Gitlab support MD files which are both based on CommonMarkup. And so, there appear to be different "flavors" of MD files, but they are close.
There are no separate files which can be forgotten or lost. This technique uses a markdown file (.md) as the container which holds everything. No separate files to lose, no context to forget. A markdown file provides a powerful yet straightforward way for users, both technical and non-technical, to write plain text documents that can be rendered richly as HTML but also easily read by a computer software application.
Within the markdown file you can provide fenced blocks (a.k.a. section) of structured data formats such as YAML, RDF/Turtle, JSON-LD, SPARQL, SHACL, SQL, CSV, and other such well understood structured formats.
That’s it; one physical file that contains many formally fenced off layers. Here are some of the different fenced blocks that can be provided within a DataBook:
- Markdown: this is the primary format of the file; holds the human-readable text and all of the other structured information, those fenced blocks.
- YAML frontmatter: Used at the top of the file for metadata like title, author, version, provenance. YAML is a unicode based data serialization language which is broadly useful for programming needs ranging from configuration files to internet messaging to object persistence to data auditing and visualization.
- RDF/Turtle: Used to store a graph of data a.k.a. linked data.
- JSON‑LD: Popular alternative or complementary linked‑data format.
- SPARQL: Queries and updates embedded directly in the file.
- SHACL: Validation rules for the data.
- Other typed structured blocks: Depending on the use case, a DataBook may also include:
Here is a very simple, basic example of a DataBook. You can read the databook file here on Github and a machine can be sent this raw databook file of the file on Github.
Additional Information:
- Introducing DataBooks for Small Semantic Graphs
- Introducing Databooks: A Toolkit for Semantic Data Pipelines
- DataBooks: Part I: Markdown as Semantic Infrastructure
- DataBooks, Part II: The Semantic Execution Layer
- Data on the Web Best Practices
- Github Repository for Databook
- Knowledge Products Offer New Business Models
- YAML Cheat Sheet
- Github Markdown (MD) Cheat Sheet
- SHACL 1.2 Overview
- CommonMark
- Markdown Syntax
- Markdown Basics

Comments
Post a Comment