Article illustration 1

For developers managing container ecosystems, Docker registries like Docker Hub and Quay have long been black boxes. While they excel at storing and serving container images, their object-storage backbone makes simple queries—like listing all repositories or finding the largest layers—prohibitively expensive operations. As one developer laments: "Querying arbitrary information about stored images is either impossible or requires scanning massive S3 buckets."

Enter Reg, an experimental open-source OCI registry that replaces traditional metadata storage with SQLite. By decoupling metadata management from blob storage, Reg enables the kind of rich querying that DevOps teams have long desired while maintaining compatibility with existing S3-backed infrastructure.

The Metadata Bottleneck

The OCI distribution specification relies on HTTP APIs for image management, but implementations like Docker Distribution store everything—from image layers to manifest data—in S3-compatible storage. This design delivers scalability and durability at the cost of query flexibility. As the Reg developer explains: "It's cheap, it's persistent, it scales to infinity... but there's one major flaw."

Any non-trivial metadata operation requires scanning entire bucket prefixes, a process that becomes increasingly slow and expensive as registries grow. Want to find which images reference a specific layer? Prepare for a full bucket scan. Need to identify repositories with the most tags? Another scan.

SQLite as Metadata Engine

Reg's innovation lies in its hybrid architecture:

-- Simplified Reg metadata schema
CREATE TABLE manifests (id INTEGER PRIMARY KEY, digest TEXT);
CREATE TABLE tags (id INTEGER PRIMARY KEY, name TEXT, manifest_id INTEGER);
CREATE TABLE blobs (digest TEXT PRIMARY KEY, size INTEGER);

When images are pushed:
1. Blobs (image layers) go directly to S3
2. Metadata updates write to SQLite first
3. Changes propagate to S3 for durability

This "write-through" pattern makes SQLite the system of record for metadata while preserving S3 as the canonical blob store. The magic? SQLite serves as a high-performance query cache that understands relationships between images, tags, and layers.

Bootstrap and Recovery

Reg cleverly solves the bootstrapping problem: An existing registry's S3 bucket can rebuild the SQLite database through a one-time scan. While slow for massive registries, this enables drop-in replacement of existing implementations. For production resilience, Reg leverages:

  • Turso's embedded replicas for SQLite synchronization
  • Litestream for continuous backup
  • Traditional rsync workflows

Unleashing SQL Superpowers

With metadata in SQLite, previously impossible queries become trivial:

-- Top 10 repositories by tag count
SELECT repo, COUNT(*) AS tag_count 
FROM tags GROUP BY repo 
ORDER BY tag_count DESC 
LIMIT 10;

-- Most reused layers across images
SELECT blob_digest, COUNT(DISTINCT manifest_id) AS usage_count
FROM manifest_blobs 
GROUP BY blob_digest
ORDER BY usage_count DESC;

These queries execute in milliseconds rather than hours, unlocking new visibility into container ecosystems.

The Path Forward

Reg remains experimental but already supports core OCI operations:
- Image pushing/pulling via Docker clients
- Basic repository/tag listing
- S3-compatible storage backend

The project currently lacks HTTPS support (requiring --tls-verify=0 in clients), but its MIT-licensed codebase welcomes contributors. As container registries evolve beyond simple storage endpoints, Reg demonstrates how thoughtful metadata architecture can transform infrastructure from opaque data silos into query-ready knowledge graphs—proving that sometimes the most powerful innovations emerge from rethinking foundational layers.

Source: Write That Blog