Beyond Text-to-SQL: The Architecture Behind Building a Production-Ready AI Data Analyst
Share this article
For years, text-to-SQL promised to democratize data access by translating natural language into database queries. Yet in production, teams at companies like Findly discovered a harsh truth: real business questions rarely map to single SQL statements. Questions like "Compare these cohorts over the last four releases and explain the variance" or "Create a trading strategy" demand multi-step workflows combining data retrieval, transformation, validation, and narrative explanation—far beyond what vanilla text-to-SQL delivers.
Why Text-to-SQL Alone Fails
Real-world analytics requires orchestration:
1. Planning: Decomposing problems into steps
2. Execution: Targeted SQL + Python for transformations
3. Validation: Sanity checks and data quality guards
4. Narrative: Explaining results with drill-down paths
Without this scaffolding, systems generate brittle queries that crumble on ambiguous joins, complex logic, or missing context. The solution? Treat text-to-SQL as one tool in an AI analyst’s toolbox—not the entire product.
The Semantic Layer: Your Business Logic Compiler
Findly’s breakthrough came from encoding business meaning explicitly via a semantic layer. Using Malloy—an open-source modeling language—they defined:
- Dimensions (e.g., region, order_date)
- Measures (e.g., total_revenue is sum(price * quantity))
- Relationships (e.g., orders → customers)
- Metadata (descriptions, units, constraints)
# Malloy model snippet
source: orders extend {
measure: total_revenue is sum(price * quantity)
dimension: order_date is cast(order_timestamp as date)
join_one: customers with customer_id
}
This acts as a "business logic compiler" that:
- Shrinks hallucination risk by restricting LLMs to validated concepts
- Enables pre-execution checks (e.g., catching invalid field references)
- Grounds code generation in single-source-of-truth definitions
Multi-Agent Workflows: The Orchestration Engine
Complex questions trigger a squad of specialized AI agents:
1. Planner: Breaks down the problem
2. Retriever: Fetches context via precision RAG
3. Coder: Generates SQL/Python
4. Validator: Runs sanity tests
5. Narrator: Explains results
Multi-agent planning isolates failures and maintains accountability across steps.
Retrieval: Recommendation Systems for LLMs
Treating retrieval as a recommendation pipeline was pivotal. Findly uses:
1. Keyword search: For exact acronyms/terms
2. Embeddings: For semantic matches
3. Fine-tuned rerankers: Instruction-optimized models (e.g., Voyage AI’s rerank-2.5) that boost relevance by 8%+
Instruction-following rerankers understand task-specific context like "filter to Q3 metrics"—critical for precision.
Python’s Strategic Role
SQL alone can’t handle:
- Statistical tests
- Time-series transforms
- Custom calculations
Sandboxed Python execution enables:
# Example: Currency conversion using semantic layer tags
df['revenue_eur'] = df['total_revenue_usd'] * get_exchange_rate('USD','EUR')
By composing pre-built functions (e.g., convert_currency()), LLMs generate concise, maintainable code.
Failure Modes & Fixes
Key production lessons:
- Ambiguous joins? Enforce grain/joins in semantic layer
- Slow responses? Route simple queries to faster models (e.g., o4-mini), complex ones to Gemini 2.5 Pro
- Silent errors? Add validation hooks and citation requirements
The Road to Adaptive Analytics
The future lies in systems that:
- Dynamically switch between fast/reasoning LLMs
- Autonomously critique and refine outputs
- Continuously harvest business logic from user interactions
As Pedro Nascimento notes: "Context isn’t just data—it’s the product." By architectural design, not prompt engineering, AI analysts move from demos to daily drivers.
Source: Lessons on building an AI data analyst by Pedro Nascimento (ex-Google/Twitter ML lead, Findly co-founder).