The Database Double Play: How Instacart Unified Search on Postgres

Article illustration 1

For Instacart, handling over billions of grocery items across thousands of retailers demands a search infrastructure that balances speed, relevance, and real-time accuracy. When traditional solutions buckled under scale, engineers executed a radical consolidation: migrating both text and semantic search into Postgres using pgvector and custom ranking functions. The results? Fewer failed searches, simplified operations, and a masterclass in database-driven innovation.

Why Elasticsearch Stumbled at Grocery Scale

Instacart's initial Elasticsearch implementation faltered under extreme write loads:

"Frequent partial writes to update billions of items for price and inventory changes caused indexing loads so severe that fixing erroneous data took days."

The denormalized data model forced 10x more writes than necessary. As ML features expanded, the system became operationally untenable—prompting a counterintuitive pivot.

The Postgres Pivot: Text Search Reborn

By migrating full-text search to sharded Postgres, engineers exploited relational advantages:

  • Radical write reduction via normalized data models
  • GIN indexes + modified ts_rank for high-performance text matching
  • Co-located ML features in separate tables with optimized write frequencies

Crucially, pushing computation to the data layer proved transformative:

-- Simplified hybrid retrieval query structure
SELECT items.* 
FROM items
JOIN item_embeddings USING (item_id)
WHERE 
  availability = true 
  AND (
    text_search @@ plainto_tsquery('organic milk') 
    OR embedding <=> '[0.12, 0.34, ...]' < 0.8
  )
ORDER BY combined_score DESC
LIMIT 100;

Network call consolidation and NVMe storage slashed latency, proving that sometimes "old" databases offer new efficiencies.

The FAISS Detour: Semantic Search Complications

Initial semantic search used HuggingFace's MiniLM-L3-v2 embeddings with FAISS indexes—delivering relevance gains but operational headaches:

  • Overfetching: No attribute filtering forced broad document retrieval
  • Synchronization headaches: Separate datastores caused consistency drift
  • Application-layer merging: Limited algorithmic flexibility for hybrid ranking

Maintaining hundreds of retailer-specific FAISS indexes became unsustainable.

The Hybrid Breakthrough: pgvector Unifies Stacks

Postgres offered an unexpected solution: pgvector for ANN search alongside text retrieval. Migration required ingenious tuning:

Metric FAISS pgvector
Recall@100 (large retailer) 0.76 0.82
Latency (ms) 38 42
Index Maintenance 300+ indexes Unified clusters

Key optimizations included:
1. Hybrid retailer-clustered indexes instead of per-retailer
2. Aggressive parallelization (max_parallel_workers=8)
3. Inline embedding storage to avoid TOAST overhead

Production A/B tests revealed a 6% drop in zero-result searches—translating to measurable revenue gains from improved discoverability.

Beyond Keywords: The Attribute Filtering Advantage

Unified retrieval unlocked previously impossible optimizations:

"Real-time inventory availability became a pre-filter for semantic search—dramatically reducing compute waste versus post-processing."

Future iterations will leverage brand/category attributes for precision tuning, demonstrating how consolidation breeds innovation.

Why This Matters Beyond Grocery Aisles

Instacart's journey shatters assumptions about "modern" search infrastructure. Their normalized Postgres approach delivered:

  • Cost efficiency: 10x write reduction versus denormalized systems
  • Relevance fusion: Keyword + semantic understanding in one query
  • Operational simplicity: One datastore, one sync pipeline, one scaling profile

As vector search matures, the lesson is clear: sometimes the most revolutionary infrastructure isn't the newest—it's the most cohesively engineered.

Source: How Instacart Built a Modern Search Infrastructure on Postgres by Ankit Mittal et al.