A new data format called Toon cuts token usage for LLMs by up to 55% compared to XML and 23% over JSON, targeting repeated structures like database query results. Integrated into Oracle's UC AI PL/SQL package, it enables direct conversion from JSON in queries. This could transform RAG pipelines and AI agents by reducing costs, boosting speed, and curbing hallucinations.

Toon: Token-Oriented Notation Slashes LLM Data Costs by Optimizing Context Efficiency

![Main article image]()

As large language models (LLMs) power ever more sophisticated AI assistants, the battle for effective context management intensifies. Techniques like Retrieval Augmented Generation (RAG), natural language to SQL, and ad-hoc tools aim to deliver precise data without bloating the input. Yet tokens—the fundamental units of LLM processing—govern everything from cost and latency to output quality. Extraneous syntax in formats like JSON or XML dilutes performance, inflating expenses and hallucination risks.

Token-Oriented Object Notation (Toon), devised by Johann Schopplich, addresses this head-on. By merging CSV-like rows with YAML-inspired structure, Toon converts JSON into a lean format tailored for LLMs, prioritizing content over syntactic overhead. Launched in November 2025, it promises substantial gains for developers feeding database analytics or RAG outputs into models.

The Token Toll of Traditional Formats

Standard JSON and XML, while ubiquitous and database-native, squander tokens on braces, quotes, and tags. A real-world Oracle query aggregating top products and customers yields minified JSON at 150 tokens and XML at 258. Studies confirm LLMs derive no benefit from pretty-printed versions, so minification is standard.

Here's the SQL:

SELECT JSON_OBJECT(
  'top_products' VALUE (
    SELECT JSON_ARRAYAGG(
      JSON_OBJECT('name' VALUE product_name, 'total_sales' VALUE total_sales)
    )
    FROM (
      SELECT p.product_name, SUM(dp.quote_price) AS total_sales
      FROM EBA_SALES_DEAL_PRODUCTS dp
      JOIN EBA_SALES_PRODUCTS p ON dp.product_id = p.id
      GROUP BY p.product_name
      ORDER BY total_sales DESC
      FETCH FIRST 5 ROWS ONLY
    )
  ),
  'top_customers' VALUE (
    SELECT JSON_ARRAYAGG(
      JSON_OBJECT('name' VALUE customer_name, 'total_sales' VALUE total_sales)
    )
    FROM (
      SELECT c.customer_name, SUM(d.deal_amount) AS total_sales
      FROM EBA_SALES_DEALS d
      JOIN EBA_SALES_CUSTOMERS c ON d.customer_id = c.id
      GROUP BY c.customer_name
      ORDER BY total_sales DESC
      FETCH FIRST 5 ROWS ONLY
    )
  )
) AS analytics;

Minified JSON (150 tokens):

{"top_products":[{"name":"Liquid Designer","total_sales":90000},{"name":"Symmetric 1000","total_sales":82200},{"name":"Osprey Enterprise Edition","total_sales":65010},{"name":"Symmetric 2100","total_sales":47300},{"name":"Peregrine Enterprise Edition","total_sales":35000}],"top_customers":[{"name":"Asymmetrical Antibiotics Inc","total_sales":246285},{"name":"Madison Materials","total_sales":198474},{"name":"Turbo Charged Migration Systems","total_sales":38500},{"name":"Acme Department of Transportation","total_sales":25500},{"name":"Acme Department of Taxation","total_sales":14580}]}

Toon trims it to 116 tokens:

top_products[5]{name,total_sales}:
  Liquid Designer,90000
  Symmetric 1000,82200
  Osprey Enterprise Edition,65010
  Symmetric 2100,47300
  Peregrine Enterprise Edition,35000

top_customers[5]{name,total_sales}:
  Asymmetrical Antibiotics Inc,246285
  Madison Materials,198474
  Turbo Charged Migration Systems,38500
  Acme Department of Transportation,25500
  Acme Department of Taxation,14580

Array lengths are explicit—LLMs falter at counting—and indentation enforces structure without verbose delimiters.

Tokens usage

Savings shine brightest in repeated arrays; deep nests or irregular data may favor JSON.

Oracle Integration: From Query to LLM in One Step

UC AI, a PL/SQL SDK for Oracle-based AI workflows, now embeds Toon via uc_ai_toon.to_toon(). Pass JSON queries directly:

SELECT uc_ai_toon.to_toon(JSON_OBJECT(...)) AS analytics;

Or in PL/SQL:

declare
  l_json clob := '{"employees":[{"name":"Alice","age":30,"department":"HR"},{"name":"Bob","age":25,"department":"Engineering"},{"name":"Charlie","age":28,"department":"Marketing"}]}';
  l_toon clob;
begin
  l_toon := uc_ai_toon.to_toon(l_json);
  dbms_output.put_line(l_toon);
end;
/

Yields:

employees[3]{name,age,department}:
  Alice,30,HR
  Bob,25,Engineering
  Charlie,28,Marketing

Grab the package from the UC AI GitHub repo's packages folder (Source: hartenfeller.dev).

Toon handles nests, irregular arrays, and mixed types:

Nested example (Toon, 112 tokens vs. minified JSON 100):

glossary:
  title: example glossary
  GlossDiv:
    title: S
    GlossList:
      GlossEntry:
        ID: SGML
        SortAs: SGML
        GlossTerm: Standard Generalized Markup Language
        Acronym: SGML
        Abbrev: "ISO 8879:1986"
        GlossDef:
          para: "A meta-markup language, used to create markup languages such as DocBook."
          GlossSeeAlso[2]: GML,XML
        GlossSee: markup

This niche won't dethrone JSON universally—it's for LLM pipelines where token thrift is paramount. As specs evolve, Toon equips developers to stretch context windows further, refining the delicate balance of relevance, cost, and reliability in AI-driven analytics.