TSV (streaming tables) | Rbbt

Rbbt uses TSV (tab-separated value) files as a first-class data format.

The tooling focuses on:

large files (millions of rows)
streaming (don’t load everything into memory)
semantics (columns and “value types” are meaningful)
indexing / persistence (fast random access)

TSV “types” (value shapes)

Historically, Rbbt classifies TSVs by the shape of the value column(s):

single: one value per key
list: list of values per key
double: list-of-lists per key (multiple columns, each a list)
flat: list of values representing the same aspect (e.g. transcripts)

When to use it

Interchange format between tasks
Cached artifacts (fast to inspect, easy to diff)
Lightweight data tables for workflows

Deeper docs

Legacy deep dive: TSV tutorial
TSV traversal / map-reduce: Legacy map-reduce tutorial