abhilaksh-arora

Posted on Nov 2

🚀 TOON (Token-Oriented Object Notation) — The Smarter, Lighter JSON for LLMs

#webdev #ai #llm #javascript

When building AI and LLM-based applications, one of the biggest hidden costs often comes from something simple — the format of your data.

Every {}, [], and " inside JSON counts as a token when you send it to a Large Language Model (LLM).

With big payloads or complex structured data, this can burn through tokens (and money) fast. ⚡️

That's where TOON (Token-Oriented Object Notation) steps in — a format designed specifically for LLMs to make structured data compact, readable, and token-efficient.

💡 What Is TOON?

TOON stands for Token-Oriented Object Notation — a modern, lightweight data format optimized for LLMs.

Think of it as:

"JSON, reimagined for token efficiency and human readability."

It trims the excess — no curly braces, square brackets, or quotes — and uses indentation plus tabular patterns instead.

The result is a format that models (and humans) can parse easily, while using far fewer tokens.

⚙️ Why TOON Matters

When you send JSON to an LLM:

Every punctuation mark adds to the token count.
Repeated keys in long arrays multiply the cost.
The verbosity doesn't actually help model understanding.

TOON solves this by:

Declaring keys once per table-like block
Replacing commas/braces with indentation
Maintaining data clarity but cutting syntactic noise

💰 The result: 30–60% fewer tokens on average.

🧠 Example: TOON in Action

JSON

{
  "users": [
    { "id": 1, "name": "Alice" },
    { "id": 2, "name": "Bob" }
  ]
}

TOON

users[2]{id,name}:
  1,Alice
  2,Bob

Same structure.

Same meaning.

Roughly half the tokens.

🧰 Encode JSON → TOON in TypeScript

Try it yourself using the official TOON package.

Installation

npm install @toon-format/toon
# or
pnpm add @toon-format/toon

Example Code

import { encode, decode } from "@toon-format/toon";

const data = {
  users: [
    { id: 1, name: "Alice", role: "admin" },
    { id: 2, name: "Bob", role: "user" },
  ],
};

const toon = encode(data);
console.log("TOON Format:\n", toon);

// Decode back to JSON if needed
const parsed = decode(toon);
console.log("Decoded JSON:\n", parsed);

Output

users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user

⚖️ JSON vs TOON

Feature	JSON	TOON
Purpose	Universal data format (APIs, configs, storage)	Token-efficient format for LLMs
Syntax	Verbose `{}`, `[]`, `"`	Compact indentation, tabular style
Readability	Moderate	High (human + model friendly)
Token Usage	High	🔥 Up to 60% fewer
Best Use Case	APIs, persistence	LLM prompts, structured outputs
Nested Objects	Excellent	⚠️ Inefficient for deep nesting
Ecosystem	Mature, universal	Emerging, growing fast

⚠️ When Not to Use TOON

TOON shines for flat, tabular JSON objects, but it's not ideal for deeply nested structures.

In those cases, the extra indentation and context actually increase tokens.

Example:

{
  "company": {
    "departments": [
      {
        "name": "Engineering",
        "employees": [{ "id": 1, "name": "Alice" }]
      }
    ]
  }
}

➡ Converting this to TOON can be longer, not shorter.

✅ Best suited for

Flat lists (users, products, messages)
Prompt templates
Model training or evaluation datasets

❌ Avoid for

Deeply nested hierarchies
Complex relational data

📊 Token Efficiency Snapshot

Dataset	JSON Tokens	TOON Tokens	Savings
User list	150	82	−45%
Product catalog	320	180	−44%
Nested data	410	435	❌ +6%

🧩 TL;DR

TOON (Token-Oriented Object Notation) is a lightweight, token-efficient alternative to JSON — built for AI and LLM workloads.

✅ Cleaner syntax

✅ Human-readable

✅ Up to 60% fewer tokens

But remember — it works best for flat JSON objects, not deeply nested structures.

If you're building LLM pipelines, prompt templates, or structured AI datasets, TOON can save tokens, reduce cost, and keep your data clean.

🧪 Bonus: Benchmark Token Count (JSON vs TOON)

Here's a quick Node.js script you can use to compare token usage between JSON and TOON using OpenAI's tiktoken tokenizer.

Install Dependencies

npm install @toon-format/toon tiktoken

Script

import { encode } from "@toon-format/toon";
import { encoding_for_model } from "tiktoken";

const data = {
  users: [
    { id: 1, name: "Alice", role: "admin" },
    { id: 2, name: "Bob", role: "user" },
    { id: 3, name: "Charlie", role: "editor" },
  ],
};

const jsonData = JSON.stringify(data, null, 2);
const toonData = encode(data);

// Use GPT-4 tokenizer (you can change to "gpt-3.5-turbo" etc.)
const tokenizer = encoding_for_model("gpt-4o-mini");

const jsonTokens = tokenizer.encode(jsonData).length;
const toonTokens = tokenizer.encode(toonData).length;

console.log("📊 Token Comparison");
console.log("-------------------");
console.log("JSON tokens:", jsonTokens);
console.log("TOON tokens:", toonTokens);
console.log("Savings:", (((jsonTokens - toonTokens) / jsonTokens) * 100).toFixed(2) + "%");

tokenizer.free();

Example Output

📊 Token Comparison
-------------------
JSON tokens: 84
TOON tokens: 32
Savings: 61.90%

You can tweak this for your own datasets — you'll see consistent 30–60% token savings for flat, tabular data.

💬 Final Thoughts

The ecosystem around LLMs is evolving fast, and even small optimizations — like switching from JSON to TOON — can create huge cost and performance improvements at scale.

Try it out, benchmark it, and see how many tokens (and dollars) you save! 🚀

Tags: #AI #LLM #PromptEngineering #JSON #TOON #AIOptimization #OpenAI #DataCompression #DeveloperTools

Top comments (5)

Ali Farhat • Nov 12

Interesting take. I’ve built a lightweight JSON → TOON Converter for quick benchmarking between both formats.
Sharing here in case others want to experiment:
scalevise.com/json-toon-converter

JaRo • Nov 3

This is great, thanks a lot!

Roughly half the tokens.

That's just awesome :)

Rakesh Gajjar • Nov 7

Godo article, would appriciate if you can include more quantifying numbers of token optimized for various file format like programming code(.py, .cpp files), CSV and flat text files

Kush Jaiswal • Nov 7

This is available in their Github readme, with different token usage of each format for different LLM models.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.