TOON: Efficient Data Format for LLMs
TOON (Token-Oriented Object Notation) is a compact, human-readable serialization of JSON for LLM prompts. It minimizes tokens while maintaining structure.
Key Features
- 30-60% fewer tokens than JSON for uniform arrays.
- Explicit lengths and fields for LLM validation.
- Indentation-based, no braces/brackets.
- Tabular arrays: Headers once, rows as data.
- Optional dotted paths for nested keys.
Syntax Basics
- Objects:
key: value(indented nesting). - Arrays:
[N]for length,{fields}for headers. - Values: Unquoted unless needed.
- Example comparison:
{
"users": [
{ "id": 1, "name": "Alice", "role": "admin" },
{ "id": 2, "name": "Bob", "role": "user" }
]
}Comparisons
- JSON: Verbose; TOON saves 30-60% tokens on arrays.
- YAML: Less efficient than TOON for tabular data.
- XML: Far more verbose.
- Protobuf: Binary, not human-readable; TOON is text-based.
- CSV: Compact for flat tables but unstructured; TOON adds 5-10% overhead for reliability.
TOON excels in AI workflows by reducing costs and errors.