Instacart rebuilt its entire search infrastructure on Postgres to handle billions of grocery items and real-time inventory changes, replacing Elasticsearch and FAISS with a unified hybrid retrieval system. By leveraging pgvector for semantic search alongside traditional text matching, they achieved 6% fewer dead-end searches while simplifying operations. This technical deep dive reveals how normalized data models and NVMe storage helped them conquer scale challenges.

The Database Double Play: How Instacart Unified Search on Postgres

For Instacart, handling over billions of grocery items across thousands of retailers demands a search infrastructure that balances speed, relevance, and real-time accuracy. When traditional solutions buckled under scale, engineers executed a radical consolidation: migrating both text and semantic search into Postgres using pgvector and custom ranking functions. The results? Fewer failed searches, simplified operations, and a masterclass in database-driven innovation.

Why Elasticsearch Stumbled at Grocery Scale

Instacart's initial Elasticsearch implementation faltered under extreme write loads:

"Frequent partial writes to update billions of items for price and inventory changes caused indexing loads so severe that fixing erroneous data took days."

The denormalized data model forced 10x more writes than necessary. As ML features expanded, the system became operationally untenable—prompting a counterintuitive pivot.

The Postgres Pivot: Text Search Reborn

By migrating full-text search to sharded Postgres, engineers exploited relational advantages:

Radical write reduction via normalized data models
GIN indexes + modified ts_rank for high-performance text matching
Co-located ML features in separate tables with optimized write frequencies

Crucially, pushing computation to the data layer proved transformative:

-- Simplified hybrid retrieval query structure
SELECT items.* 
FROM items
JOIN item_embeddings USING (item_id)
WHERE 
  availability = true 
  AND (
    text_search @@ plainto_tsquery('organic milk') 
    OR embedding <=> '[0.12, 0.34, ...]' < 0.8
  )
ORDER BY combined_score DESC
LIMIT 100;

Network call consolidation and NVMe storage slashed latency, proving that sometimes "old" databases offer new efficiencies.

The FAISS Detour: Semantic Search Complications

Initial semantic search used HuggingFace's MiniLM-L3-v2 embeddings with FAISS indexes—delivering relevance gains but operational headaches:

Overfetching: No attribute filtering forced broad document retrieval
Synchronization headaches: Separate datastores caused consistency drift
Application-layer merging: Limited algorithmic flexibility for hybrid ranking

Maintaining hundreds of retailer-specific FAISS indexes became unsustainable.

The Hybrid Breakthrough: pgvector Unifies Stacks

Postgres offered an unexpected solution: pgvector for ANN search alongside text retrieval. Migration required ingenious tuning:

Metric	FAISS	pgvector
Recall@100 (large retailer)	0.76	0.82
Latency (ms)	38	42
Index Maintenance	300+ indexes	Unified clusters

Key optimizations included:

Hybrid retailer-clustered indexes instead of per-retailer
Aggressive parallelization (max_parallel_workers=8)
Inline embedding storage to avoid TOAST overhead

Production A/B tests revealed a 6% drop in zero-result searches—translating to measurable revenue gains from improved discoverability.

Beyond Keywords: The Attribute Filtering Advantage

Unified retrieval unlocked previously impossible optimizations:

"Real-time inventory availability became a pre-filter for semantic search—dramatically reducing compute waste versus post-processing."

Future iterations will leverage brand/category attributes for precision tuning, demonstrating how consolidation breeds innovation.

Why This Matters Beyond Grocery Aisles

Instacart's journey shatters assumptions about "modern" search infrastructure. Their normalized Postgres approach delivered:

Cost efficiency: 10x write reduction versus denormalized systems
Relevance fusion: Keyword + semantic understanding in one query
Operational simplicity: One datastore, one sync pipeline, one scaling profile

As vector search matures, the lesson is clear: sometimes the most revolutionary infrastructure isn't the newest—it's the most cohesively engineered.

Source: How Instacart Built a Modern Search Infrastructure on Postgres by Ankit Mittal et al.