Reducing Onboarding From 48 Hours to 4: Inside Amazon Key’s Event-Driven Platform

Amazon Key migrated from a monolithic architecture to an event-driven system using Amazon EventBridge, reducing service integration time from 48 hours to 4 hours while processing 2,000 events per second at 99.99% reliability.

Amazon Key, the system enabling secure in-garage deliveries and property access management, underwent a fundamental architectural transformation to overcome scalability and reliability constraints in its monolithic design. The previous architecture suffered from tight coupling where service failures propagated across components, manual event routing with limited filtering, and inefficient schema validation processes. These limitations constrained the platform to a small number of subscribers and made onboarding new consumers prohibitively slow.

The redesign implemented a centralized event backbone using Amazon EventBridge. At its core is a multi-account pattern where:

A primary EventBridge bus in a dedicated core account ingests all domain events
Routing rules evaluate event patterns and forward matching events to subscriber accounts
Each subscriber account maintains isolated processing logic and targets

This structure provides service autonomy while preserving centralized governance over routing policies, IAM permissions, and compliance controls. Teams deploy independently while sharing a common event infrastructure.

Reducing Onboarding From 48 Hours to 4: Inside Amazon Key’s Event-Driven Platform - InfoQ

Schema management received significant overhaul through:

A centralized schema registry enforcing version-controlled contracts
A custom client library validating and serializing events against schemas pre-publication
Identical validation/deserialization at subscriber endpoints

This eliminated integration errors from inconsistent payloads and enabled structured validation beyond basic field checks. The validation flow ensures contract compliance across producers and consumers.

Reducing Onboarding From 48 Hours to 4: Inside Amazon Key’s Event-Driven Platform - InfoQ

Infrastructure provisioning was automated using AWS CDK constructs that:

Configure event buses and routing rules
Establish cross-account IAM permissions
Deploy standardized monitoring and alerting

These reusable components reduced manual configuration and enforced consistent observability practices.

Quantifiable outcomes include:

Throughput: 2,000 events/sec sustained
Reliability: 99.99% success rate
Latency: p90 of ~80ms from ingestion to target
Onboarding: Reduced from 48 hours to 4 hours
Integration: Service connections decreased from 40 hours to 8 hours

The platform now handles millions of daily events while maintaining predictable performance. This architectural shift demonstrates how centralized event governance combined with decentralized processing can resolve scaling bottlenecks in complex service ecosystems.

#Infrastructure #Cloud #Event-driven #AWS #DevOps

Reducing Onboarding From 48 Hours to 4: Inside Amazon Key’s Event-Driven Platform

Comments