If you’re building anything with large language models today, there’s a good chance you’re overpaying for every single API call, and you might not even know it. The culprit? JSON.
For years, JSON has been the default. It’s reliable, readable, and works everywhere. But in the world of generative AI, it’s quietly bleeding budgets dry. We call it the JSON Tax: you’re literally paying for every curly brace, every quote, every repeated key name that an LLM has to process.
After our own API bills started looking more like rent payments, we went looking for a solution. We found it in an unlikely format, TOON (Token-Oriented Object Notation), and it didn’t just lower our costs. It changed how we build.
What Is TOON? (And Why It Feels Like Finding Extra RAM)
TOON isn’t just another data format. It’s JSON built for the token economy. While JSON was designed for machines to parse, TOON is designed for machines to think with, specifically, large language models.
It looks like what might happen if YAML and a spreadsheet had a minimalist love child. The brackets and repetitive keys disappear. What’s left is the pure data, structured in a way that feels almost tabular.
Here’s the best part. Take a simple list of users:
In JSON (What We All Use Now):
{
"employees": [
{"id": 101, "name": "Maria Chen", "department": "Engineering"},
{"id": 102, "name": "David Park", "department": "Design"}
]
}
In TOON (What We Switched To):
employees[2]{id,name,department}:
101,Maria Chen,Engineering
102,David Park,Design
The difference seems aesthetic, until you realize: the JSON version consumes about 54 tokens. The TOON version? Around 22. That’s a 60% reduction before you’ve even asked your question.
Why This Isn’t Just About Saving Money (But Yes, You’ll Save Money)
1. The Direct Cost Slash
Our benchmarks were consistent: 30–60% fewer tokens per call. For a high-volume agent processing thousands of support tickets or analyzing hundreds of documents daily, this wasn’t a minor optimization. It effectively doubled our runway.
One of our internal tools, a contract review agent, saw its monthly OpenAI bill drop from ~$1,800 to under $900, with no change in capability.
2. Fitting More Into Every Window
Context windows are getting larger, but they’re never free. With TOON, we could suddenly fit twice as many product reviews into a single sentiment analysis prompt, or include three extra support tickets in a summarization task.
It felt like unlocking hidden capacity.
3. The Accuracy Surprise
Here’s what we didn’t expect: our models started performing better. In structured data extraction tasks, like pulling invoice details from emails, our accuracy nudged up from about 70% with JSON to 74% with TOON.
Why? Our theory is that by removing the syntactic “noise” (all those brackets and quotes), the model focuses more on the actual data patterns. It’s less distracted.
How We Use It: Real Examples From Our Stack
We didn’t stop at prototypes. We adapted TOON across our entire ecosystem.
- Customer Support Triager: Instead of feeding JSON logs of past tickets, we stream TOON-formatted rows:
timestamp, customer_id, issue_category, first_response_time. The model identifies patterns faster. - E-commerce Product Summarizer: We pipe database rows (product ID, review text, rating) as a compact TOON block. It can process hundreds of reviews in one go where JSON would have hit token limits.
- Internal Report Generator: Financial data, sales figures, regional performance, weekly growth, now flows in TOON. What was a nested JSON nightmare became a flat, model-friendly table.
Even our non-AI internal tools started using it for config files. It’s just easier on the eyes.
The Catch (Because There Always Is One)
TOON isn’t a magic bullet. We learned this quickly.
TOON is Perfect For:
- RAG Pipelines: Feeding retrieved database rows or document chunks to the LLM.
- Batch Analysis: Large, flat datasets like logs, survey results, or transaction histories.
- Chat History Compression: Turning verbose JSON message logs into lean, chronological streams.
Not Ideal For:
- Deeply Nested Data: If your data looks like a Russian nesting doll, JSON is still your friend.
- Public APIs: The web runs on JSON. Don’t break compatibility for external users.
- Tiny, One-off Objects: The savings are in scale. A simple
{"status": "ok"}doesn’t need it.
Making the Switch: Our Practical Advice
Don’t rewrite your entire backend. The power move is using TOON at the last mile, the precise moment you serialize data for the LLM.
The libraries are refreshingly simple. Here’s how we do it in our Python services:
# We keep our internal data as normal Python dicts
internal_data = {
"orders": [
{"order_id": "A123", "amount": 249.99, "priority": True},
{"order_id": "A124", "amount": 89.50, "priority": False}
]
}
# Right before the API call, we compress
from toon_format import encode
llm_ready_payload = encode(internal_data)
# Send `llm_ready_payload` to GPT/Claude/etc.
# Bills shrink. Performance stays high.
The ecosystem is young but growing. The official TOON GitHub has libraries for all the big languages. The spec is refreshingly straightforward—we had it running in a day.
The Bottom Line
JSON built the interactive web. But the AI era has a different set of constraints: tokens, context, and cost per call. TOON addresses those constraints directly.
For us, adopting TOON was a no-brainer. It cut costs, improved performance, and honestly, made our prompts cleaner to read. It’s one of those rare shifts that feels both technically clever and immediately practical.
If your AI bills are keeping you up at night, maybe it’s not the model’s fault. Maybe it’s the format. Ditching the curly braces might be the most impactful thing you do this quarter. For us, it was.