DEV Community

Cover image for ๐Ÿš€ TOON (Token-Oriented Object Notation) โ€” The Smarter, Lighter JSON for LLMs
abhilaksh-arora
abhilaksh-arora

Posted on

๐Ÿš€ TOON (Token-Oriented Object Notation) โ€” The Smarter, Lighter JSON for LLMs

When building AI and LLM-based applications, one of the biggest hidden costs often comes from something simple โ€” the format of your data.

Every {}, [], and " inside JSON counts as a token when you send it to a Large Language Model (LLM).

With big payloads or complex structured data, this can burn through tokens (and money) fast. โšก๏ธ

That's where TOON (Token-Oriented Object Notation) steps in โ€” a format designed specifically for LLMs to make structured data compact, readable, and token-efficient.


๐Ÿ’ก What Is TOON?

TOON stands for Token-Oriented Object Notation โ€” a modern, lightweight data format optimized for LLMs.

Think of it as:

"JSON, reimagined for token efficiency and human readability."

It trims the excess โ€” no curly braces, square brackets, or quotes โ€” and uses indentation plus tabular patterns instead.

The result is a format that models (and humans) can parse easily, while using far fewer tokens.


โš™๏ธ Why TOON Matters

When you send JSON to an LLM:

  • Every punctuation mark adds to the token count.
  • Repeated keys in long arrays multiply the cost.
  • The verbosity doesn't actually help model understanding.

TOON solves this by:

  • Declaring keys once per table-like block
  • Replacing commas/braces with indentation
  • Maintaining data clarity but cutting syntactic noise

๐Ÿ’ฐ The result: 30โ€“60% fewer tokens on average.


๐Ÿง  Example: TOON in Action

JSON

{
  "users": [
    { "id": 1, "name": "Alice" },
    { "id": 2, "name": "Bob" }
  ]
}
Enter fullscreen mode Exit fullscreen mode

TOON

users[2]{id,name}:
  1,Alice
  2,Bob
Enter fullscreen mode Exit fullscreen mode

Same structure.

Same meaning.

Roughly half the tokens.


๐Ÿงฐ Encode JSON โ†’ TOON in TypeScript

Try it yourself using the official TOON package.

Installation

npm install @toon-format/toon
# or
pnpm add @toon-format/toon
Enter fullscreen mode Exit fullscreen mode

Example Code

import { encode, decode } from "@toon-format/toon";

const data = {
  users: [
    { id: 1, name: "Alice", role: "admin" },
    { id: 2, name: "Bob", role: "user" },
  ],
};

const toon = encode(data);
console.log("TOON Format:\n", toon);

// Decode back to JSON if needed
const parsed = decode(toon);
console.log("Decoded JSON:\n", parsed);
Enter fullscreen mode Exit fullscreen mode

Output

users[2]{id,name,role}:
  1,Alice,admin
  2,Bob,user
Enter fullscreen mode Exit fullscreen mode

โš–๏ธ JSON vs TOON

Feature JSON TOON
Purpose Universal data format (APIs, configs, storage) Token-efficient format for LLMs
Syntax Verbose {}, [], " Compact indentation, tabular style
Readability Moderate High (human + model friendly)
Token Usage High ๐Ÿ”ฅ Up to 60% fewer
Best Use Case APIs, persistence LLM prompts, structured outputs
Nested Objects Excellent โš ๏ธ Inefficient for deep nesting
Ecosystem Mature, universal Emerging, growing fast

โš ๏ธ When Not to Use TOON

TOON shines for flat, tabular JSON objects, but it's not ideal for deeply nested structures.

In those cases, the extra indentation and context actually increase tokens.

Example:

{
  "company": {
    "departments": [
      {
        "name": "Engineering",
        "employees": [{ "id": 1, "name": "Alice" }]
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

โžก Converting this to TOON can be longer, not shorter.

โœ… Best suited for

  • Flat lists (users, products, messages)
  • Prompt templates
  • Model training or evaluation datasets

โŒ Avoid for

  • Deeply nested hierarchies
  • Complex relational data

๐Ÿ“Š Token Efficiency Snapshot

Dataset JSON Tokens TOON Tokens Savings
User list 150 82 โˆ’45%
Product catalog 320 180 โˆ’44%
Nested data 410 435 โŒ +6%

๐Ÿงฉ TL;DR

TOON (Token-Oriented Object Notation) is a lightweight, token-efficient alternative to JSON โ€” built for AI and LLM workloads.

โœ… Cleaner syntax

โœ… Human-readable

โœ… Up to 60% fewer tokens

But remember โ€” it works best for flat JSON objects, not deeply nested structures.

If you're building LLM pipelines, prompt templates, or structured AI datasets, TOON can save tokens, reduce cost, and keep your data clean.


๐Ÿงช Bonus: Benchmark Token Count (JSON vs TOON)

Here's a quick Node.js script you can use to compare token usage between JSON and TOON using OpenAI's tiktoken tokenizer.

Install Dependencies

npm install @toon-format/toon tiktoken
Enter fullscreen mode Exit fullscreen mode

Script

import { encode } from "@toon-format/toon";
import { encoding_for_model } from "tiktoken";

const data = {
  users: [
    { id: 1, name: "Alice", role: "admin" },
    { id: 2, name: "Bob", role: "user" },
    { id: 3, name: "Charlie", role: "editor" },
  ],
};

const jsonData = JSON.stringify(data, null, 2);
const toonData = encode(data);

// Use GPT-4 tokenizer (you can change to "gpt-3.5-turbo" etc.)
const tokenizer = encoding_for_model("gpt-4o-mini");

const jsonTokens = tokenizer.encode(jsonData).length;
const toonTokens = tokenizer.encode(toonData).length;

console.log("๐Ÿ“Š Token Comparison");
console.log("-------------------");
console.log("JSON tokens:", jsonTokens);
console.log("TOON tokens:", toonTokens);
console.log("Savings:", (((jsonTokens - toonTokens) / jsonTokens) * 100).toFixed(2) + "%");

tokenizer.free();
Enter fullscreen mode Exit fullscreen mode

Example Output

๐Ÿ“Š Token Comparison
-------------------
JSON tokens: 84
TOON tokens: 32
Savings: 61.90%
Enter fullscreen mode Exit fullscreen mode

You can tweak this for your own datasets โ€” you'll see consistent 30โ€“60% token savings for flat, tabular data.


๐Ÿ’ฌ Final Thoughts

The ecosystem around LLMs is evolving fast, and even small optimizations โ€” like switching from JSON to TOON โ€” can create huge cost and performance improvements at scale.

Try it out, benchmark it, and see how many tokens (and dollars) you save! ๐Ÿš€


Tags: #AI #LLM #PromptEngineering #JSON #TOON #AIOptimization #OpenAI #DataCompression #DeveloperTools

Top comments (5)

Collapse
 
alifar profile image
Ali Farhat

Interesting take. Iโ€™ve built a lightweight JSON โ†’ TOON Converter for quick benchmarking between both formats.
Sharing here in case others want to experiment:
scalevise.com/json-toon-converter๏ฟผ

Collapse
 
jaro profile image
JaRo

This is great, thanks a lot!

Roughly half the tokens.

That's just awesome :)

Collapse
 
rakeshgajjar profile image
Rakesh Gajjar

Godo article, would appriciate if you can include more quantifying numbers of token optimized for various file format like programming code(.py, .cpp files), CSV and flat text files

Collapse
 
kush8887 profile image
Kush Jaiswal

This is available in their Github readme, with different token usage of each format for different LLM models.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.