A new data format called Toon cuts token usage for LLMs by up to 55% compared to XML and 23% over JSON, targeting repeated structures like database query results. Integrated into Oracle's UC AI PL/SQL package, it enables direct conversion from JSON in queries. This could transform RAG pipelines and AI agents by reducing costs, boosting speed, and curbing hallucinations.
Toon: Token-Oriented Notation Slashes LLM Data Costs by Optimizing Context Efficiency

As large language models (LLMs) power ever more sophisticated AI assistants, the battle for effective context management intensifies. Techniques like Retrieval Augmented Generation (RAG), natural language to SQL, and ad-hoc tools aim to deliver precise data without bloating the input. Yet tokens—the fundamental units of LLM processing—govern everything from cost and latency to output quality. Extraneous syntax in formats like JSON or XML dilutes performance, inflating expenses and hallucination risks.
Token-Oriented Object Notation (Toon), devised by Johann Schopplich, addresses this head-on. By merging CSV-like rows with YAML-inspired structure, Toon converts JSON into a lean format tailored for LLMs, prioritizing content over syntactic overhead. Launched in November 2025, it promises substantial gains for developers feeding database analytics or RAG outputs into models.
The Token Toll of Traditional Formats
Standard JSON and XML, while ubiquitous and database-native, squander tokens on braces, quotes, and tags. A real-world Oracle query aggregating top products and customers yields minified JSON at 150 tokens and XML at 258. Studies confirm LLMs derive no benefit from pretty-printed versions, so minification is standard.
Here's the SQL:
SELECT JSON_OBJECT(
'top_products' VALUE (
SELECT JSON_ARRAYAGG(
JSON_OBJECT('name' VALUE product_name, 'total_sales' VALUE total_sales)
)
FROM (
SELECT p.product_name, SUM(dp.quote_price) AS total_sales
FROM EBA_SALES_DEAL_PRODUCTS dp
JOIN EBA_SALES_PRODUCTS p ON dp.product_id = p.id
GROUP BY p.product_name
ORDER BY total_sales DESC
FETCH FIRST 5 ROWS ONLY
)
),
'top_customers' VALUE (
SELECT JSON_ARRAYAGG(
JSON_OBJECT('name' VALUE customer_name, 'total_sales' VALUE total_sales)
)
FROM (
SELECT c.customer_name, SUM(d.deal_amount) AS total_sales
FROM EBA_SALES_DEALS d
JOIN EBA_SALES_CUSTOMERS c ON d.customer_id = c.id
GROUP BY c.customer_name
ORDER BY total_sales DESC
FETCH FIRST 5 ROWS ONLY
)
)
) AS analytics;
Minified JSON (150 tokens):
{"top_products":[{"name":"Liquid Designer","total_sales":90000},{"name":"Symmetric 1000","total_sales":82200},{"name":"Osprey Enterprise Edition","total_sales":65010},{"name":"Symmetric 2100","total_sales":47300},{"name":"Peregrine Enterprise Edition","total_sales":35000}],"top_customers":[{"name":"Asymmetrical Antibiotics Inc","total_sales":246285},{"name":"Madison Materials","total_sales":198474},{"name":"Turbo Charged Migration Systems","total_sales":38500},{"name":"Acme Department of Transportation","total_sales":25500},{"name":"Acme Department of Taxation","total_sales":14580}]}
Toon trims it to 116 tokens:
top_products[5]{name,total_sales}:
Liquid Designer,90000
Symmetric 1000,82200
Osprey Enterprise Edition,65010
Symmetric 2100,47300
Peregrine Enterprise Edition,35000
top_customers[5]{name,total_sales}:
Asymmetrical Antibiotics Inc,246285
Madison Materials,198474
Turbo Charged Migration Systems,38500
Acme Department of Transportation,25500
Acme Department of Taxation,14580
Array lengths are explicit—LLMs falter at counting—and indentation enforces structure without verbose delimiters.
Savings shine brightest in repeated arrays; deep nests or irregular data may favor JSON.
Oracle Integration: From Query to LLM in One Step
UC AI, a PL/SQL SDK for Oracle-based AI workflows, now embeds Toon via uc_ai_toon.to_toon(). Pass JSON queries directly:
SELECT uc_ai_toon.to_toon(JSON_OBJECT(...)) AS analytics;
Or in PL/SQL:
declare
l_json clob := '{"employees":[{"name":"Alice","age":30,"department":"HR"},{"name":"Bob","age":25,"department":"Engineering"},{"name":"Charlie","age":28,"department":"Marketing"}]}';
l_toon clob;
begin
l_toon := uc_ai_toon.to_toon(l_json);
dbms_output.put_line(l_toon);
end;
/
Yields:
employees[3]{name,age,department}:
Alice,30,HR
Bob,25,Engineering
Charlie,28,Marketing
Grab the package from the UC AI GitHub repo's packages folder (Source: hartenfeller.dev).
Toon handles nests, irregular arrays, and mixed types:
Nested example (Toon, 112 tokens vs. minified JSON 100):
glossary:
title: example glossary
GlossDiv:
title: S
GlossList:
GlossEntry:
ID: SGML
SortAs: SGML
GlossTerm: Standard Generalized Markup Language
Acronym: SGML
Abbrev: "ISO 8879:1986"
GlossDef:
para: "A meta-markup language, used to create markup languages such as DocBook."
GlossSeeAlso[2]: GML,XML
GlossSee: markup
This niche won't dethrone JSON universally—it's for LLM pipelines where token thrift is paramount. As specs evolve, Toon equips developers to stretch context windows further, refining the delicate balance of relevance, cost, and reliability in AI-driven analytics.

Comments
Please log in or register to join the discussion