Beyond Text-to-SQL: The Architecture Behind Building a Production-Ready AI Data Analyst
#AI

Beyond Text-to-SQL: The Architecture Behind Building a Production-Ready AI Data Analyst

LavX Team
3 min read

Discover why traditional text-to-SQL approaches fail for real business analytics and how semantic layers, multi-agent systems, and instruction-tuned retrieval transform raw data into actionable insights. Learn battle-tested patterns from Findly's journey building an AI analyst that handles complex workflows.

For years, text-to-SQL promised to democratize data access by translating natural language into database queries. Yet in production, teams at companies like Findly discovered a harsh truth: real business questions rarely map to single SQL statements. Questions like "Compare these cohorts over the last four releases and explain the variance" or "Create a trading strategy" demand multi-step workflows combining data retrieval, transformation, validation, and narrative explanation—far beyond what vanilla text-to-SQL delivers.

Why Text-to-SQL Alone Fails

Real-world analytics requires orchestration:

  1. Planning: Decomposing problems into steps
  2. Execution: Targeted SQL + Python for transformations
  3. Validation: Sanity checks and data quality guards
  4. Narrative: Explaining results with drill-down paths

Without this scaffolding, systems generate brittle queries that crumble on ambiguous joins, complex logic, or missing context. The solution? Treat text-to-SQL as one tool in an AI analyst’s toolbox—not the entire product.

The Semantic Layer: Your Business Logic Compiler

Findly’s breakthrough came from encoding business meaning explicitly via a semantic layer. Using Malloy—an open-source modeling language—they defined:

  • Dimensions (e.g., region, order_date)
  • Measures (e.g., total_revenue is sum(price * quantity))
  • Relationships (e.g., orderscustomers)
  • Metadata (descriptions, units, constraints)
# Malloy model snippet
source: orders extend {
  measure: total_revenue is sum(price * quantity)
  dimension: order_date is cast(order_timestamp as date)
  join_one: customers with customer_id
}

This acts as a "business logic compiler" that:

  • Shrinks hallucination risk by restricting LLMs to validated concepts
  • Enables pre-execution checks (e.g., catching invalid field references)
  • Grounds code generation in single-source-of-truth definitions

Multi-Agent Workflows: The Orchestration Engine

Complex questions trigger a squad of specialized AI agents:

  1. Planner: Breaks down the problem
  2. Retriever: Fetches context via precision RAG
  3. Coder: Generates SQL/Python
  4. Validator: Runs sanity tests
  5. Narrator: Explains results

Article Image

Multi-agent planning isolates failures and maintains accountability across steps.

Retrieval: Recommendation Systems for LLMs

Treating retrieval as a recommendation pipeline was pivotal. Findly uses:

  1. Keyword search: For exact acronyms/terms
  2. Embeddings: For semantic matches
  3. Fine-tuned rerankers: Instruction-optimized models (e.g., Voyage AI’s rerank-2.5) that boost relevance by 8%+

Article Image

Instruction-following rerankers understand task-specific context like "filter to Q3 metrics"—critical for precision.

Python’s Strategic Role

SQL alone can’t handle:

  • Statistical tests
  • Time-series transforms
  • Custom calculations

Sandboxed Python execution enables:

# Example: Currency conversion using semantic layer tags
df['revenue_eur'] = df['total_revenue_usd'] * get_exchange_rate('USD','EUR')

By composing pre-built functions (e.g., convert_currency()), LLMs generate concise, maintainable code.

Failure Modes & Fixes

Key production lessons:

  • Ambiguous joins? Enforce grain/joins in semantic layer
  • Slow responses? Route simple queries to faster models (e.g., o4-mini), complex ones to Gemini 2.5 Pro
  • Silent errors? Add validation hooks and citation requirements

The Road to Adaptive Analytics

The future lies in systems that:

  • Dynamically switch between fast/reasoning LLMs
  • Autonomously critique and refine outputs
  • Continuously harvest business logic from user interactions

As Pedro Nascimento notes: "Context isn’t just data—it’s the product." By architectural design, not prompt engineering, AI analysts move from demos to daily drivers.


Source: Lessons on building an AI data analyst by Pedro Nascimento (ex-Google/Twitter ML lead, Findly co-founder).

Comments

Loading comments...