BigQuery's SQL-Native AI Inference Brings Hugging Face Models to Data Warehouses

Google BigQuery now lets data teams run Hugging Face models directly through SQL, eliminating the need for separate ML infrastructure and making AI accessible to analysts without engineering overhead.

Google has unveiled a major expansion to BigQuery's AI capabilities, introducing SQL-native managed inference for Hugging Face models. This new feature allows data teams to deploy and run thousands of open models directly through BigQuery's familiar SQL interface, eliminating the need for separate machine learning infrastructure.

The capability, currently in preview, represents a significant shift in how organizations can leverage AI within their existing data workflows. Rather than requiring data scientists to spin up Kubernetes clusters, configure endpoints, and manage multiple tools, BigQuery handles the entire lifecycle through SQL statements.

Simplifying AI Deployment

The process is remarkably straightforward. Users create a model with a single CREATE MODEL statement that specifies a Hugging Face model ID, such as sentence-transformers/all-MiniLM-L6-v2 for embeddings or any of the 170,000+ text generation models available. BigQuery automatically provisions the necessary compute resources with sensible defaults, typically completing deployment in just 3-10 minutes depending on model size.

Once deployed, inference becomes equally simple. The platform provides two dedicated functions: AI.GENERATE_TEXT for language models and AI.GENERATE_EMBEDDING for embedding models. These functions can query data directly from BigQuery tables, meaning analysts can apply AI to their existing datasets without data movement or complex ETL pipelines.

Resource management is handled automatically through the endpoint_idle_ttl option, which shuts down idle endpoints to prevent unnecessary charges. For batch jobs, users can manually undeploy endpoints with ALTER MODEL statements when processing completes. When a model is no longer needed, a simple DROP MODEL statement automatically cleans up all associated Vertex AI resources.

Production-Ready Customization

While the defaults work well for experimentation, the system offers granular control for production workloads. Users can specify machine types, replica counts, and endpoint idle times directly in the CREATE MODEL statement. For consistent performance, Compute Engine reservations can lock in GPU instances.

Google describes this as providing "granular resource control" and "automated resource management," allowing teams to balance performance and cost without leaving the SQL environment. This is particularly valuable for organizations that need to optimize spending while maintaining service levels.

Real-World Performance

Early benchmarks demonstrate compelling economics. An earlier blog post from September 2025 showed processing 38 million rows for roughly $2-3 using similar patterns with open-source embedding models. This cost-effectiveness, combined with the elimination of infrastructure overhead, could make AI capabilities accessible to organizations that previously found them prohibitively expensive or complex.

The feature supports over 13,000 Hugging Face text embedding models and 170,000+ text generation models, including popular families like Meta's Llama series and Google's Gemma models. All models must comply with Vertex AI Model Garden deployment requirements, including regional availability and quota limits.

Impact Across Teams

Virinchi T, writing about the launch, highlighted the cross-functional benefits:

Data Analysts can now experiment with ML models without leaving their SQL environment or waiting for engineering resources
Data Engineers can build ML-powered data pipelines dramatically more simply—no separate ML infrastructure to maintain
Organizations can make AI capabilities accessible to teams that previously lacked the skills or resources to implement them

This democratization of AI could accelerate adoption across organizations, as business analysts and data engineers gain the ability to prototype and deploy AI solutions without specialized ML expertise.

Competitive Landscape

The launch positions BigQuery against competitors like Snowflake's Cortex AI and Databricks' Model Serving, both of which offer SQL-accessible ML inference. BigQuery's potential advantage lies in its direct integration with Hugging Face's massive model catalog within the data warehouse environment.

For organizations already running on Google Cloud, this tight integration could provide a compelling reason to consolidate AI workloads alongside their existing data analytics. The ability to go from data to AI-powered insights without leaving the SQL environment represents a significant productivity gain.

Getting Started

Documentation and tutorials are available for common use cases, including text generation with Gemma models and embedding generation. The preview is available in supported regions, with Google likely expanding availability based on early feedback.

This move reflects a broader trend in the industry toward making AI more accessible to mainstream data professionals. By reducing the barrier to entry from "requires ML engineering team" to "requires SQL knowledge," Google is opening up new possibilities for how organizations can leverage AI in their data workflows.

As data teams increasingly need to incorporate AI into their analytics pipelines, solutions like this could become essential tools in the modern data stack. The question is no longer whether organizations can afford the infrastructure to run AI models, but whether they can afford not to leverage the insights these models can provide.

#BigQuery #Hugging Face #AI inference #SQL #Google Cloud