TOON: Efficient Data Format for LLMs

TOON (Token-Oriented Object Notation) is a compact, human-readable serialization of JSON for LLM prompts. It minimizes tokens while maintaining structure.

Key Features

  • 30-60% fewer tokens than JSON for uniform arrays.
  • Explicit lengths and fields for LLM validation.
  • Indentation-based, no braces/brackets.
  • Tabular arrays: Headers once, rows as data.
  • Optional dotted paths for nested keys.

Syntax Basics

  • Objects: key: value (indented nesting).
  • Arrays: [N] for length, {fields} for headers.
  • Values: Unquoted unless needed.
  • Example comparison:
{
  "users": [
    { "id": 1, "name": "Alice", "role": "admin" },
    { "id": 2, "name": "Bob", "role": "user" }
  ]
}

Comparisons

  • JSON: Verbose; TOON saves 30-60% tokens on arrays.
  • YAML: Less efficient than TOON for tabular data.
  • XML: Far more verbose.
  • Protobuf: Binary, not human-readable; TOON is text-based.
  • CSV: Compact for flat tables but unstructured; TOON adds 5-10% overhead for reliability.

TOON excels in AI workflows by reducing costs and errors.

Sources

toon-format