Discord has open-sourced Osprey, a high-performance event stream decisions engine capable of evaluating 2.3 million rules per second. Built with a Rust coordinator and stateless Python workers, Osprey provides a scalable architecture for real-time threat detection and mitigation that's already being adopted by networks like Bluesky and Matrix.org.

Discord Open Sources Osprey: A High-Performance Rules Engine for Real-Time Safety Systems

Introduction

Discord has recently open-sourced Osprey, an internal event stream decisions engine capable of evaluating an impressive 2.3 million rules per second across 400 million daily actions. This move transforms a proprietary tool into a configurable resource for the broader engineering community, offering a horizontally scalable architecture for real-time threat detection and mitigation. The project, developed in partnership with the ROOST organization and internet.dev, has already seen early adoption by networks like Bluesky and Matrix.org, demonstrating its potential as a foundational component for large-scale platforms.

Architecture Overview

Osprey operates as an event stream decisions engine that investigates real-time platform activity and executes automated responses. At its core, the system employs a polyglot architecture combining the performance of Rust with the flexibility of Python. This approach has become increasingly common in high-throughput systems, where Rust serves as the data plane managing network traffic and memory allocation, while Python acts as the control plane handling business logic and user APIs.

The architecture consists of two primary components:

Rust Coordinator: A service written in Rust that manages asynchronous event streams from message queues and prioritizes synchronous gRPC requests. This design helps maintain stable latency even under high concurrency, leveraging Rust's performance characteristics for critical path operations.
Stateless Python Workers: Nodes that handle the actual rule evaluation. These workers are containerized and stateless, allowing organizations to horizontally scale processing capacity to accommodate traffic spikes.

Rule Processing Engine

Osprey evaluates JSON-formatted event payloads, called Actions, against dynamically loadable rules. These rules are written in SML (Safety Markup Language), a domain-specific language with Python syntax that supports static validation. This design choice provides accessibility for security analysts while remaining extensible for software engineers.

The rule processing pipeline follows these steps:

Rule Compilation: At startup, Python workers parse SML rules into an Abstract Syntax Tree (AST). This front-loading of compilation cost minimizes per-event processing time, which is crucial for maintaining high throughput.
Rule Distribution: Rules are distributed to workers via ETCD, enabling dynamic updates in production without requiring a redeployment of the application. This flexibility allows operators to adapt to emerging threats quickly.
Event Processing: When an Action is received, the engine evaluates it against the loaded rules, tracking state across specific targets known as Entities. This state tracking allows operators to apply labels and classifications based on historical behavior.
Verdict Generation: After processing an Action, the engine generates verdicts or effects that are routed to configurable output sinks.

Extensibility and Integration

Osprey is designed to be highly extensible through several mechanisms:

User Defined Functions (UDFs): Developers can expand the engine using UDFs written in standard Python. These functions define the standard library for Osprey and enable external API calls or machine learning model integrations.
Output Sinks: The open-source release utilizes the Pluggy Python library to provide integration points for output sinks, replacing internal Discord dependencies. This design allows organizations to customize how verdicts are acted upon.
Standard Deployment Pattern: A typical deployment utilizes Apache Kafka to route results into an Apache Druid cluster, which powers real-time analysis through the Osprey UI. However, the system is designed to support various output configurations.

Performance Characteristics

The system's performance benchmarks are impressive:

Evaluates 2.3 million rules per second
Processes 400 million daily actions
Maintains stable latency through prioritized gRPC requests
Horizontally scales to accommodate traffic spikes

These capabilities make Osprey suitable for large-scale platforms that need real-time safety systems without compromising performance.

Use Cases and Adoption

Osprey's design makes it suitable for a variety of use cases requiring real-time event evaluation:

Content Moderation: Automatically detecting and responding to policy violations in user-generated content.
Threat Detection: Identifying potential security threats or malicious behavior patterns.
Rate Limiting: Enforcing usage policies across platform services.
Compliance Monitoring: Ensuring platform activities adhere to regulatory requirements.

The system has already found early adoption in other networks like Bluesky and Matrix.org, indicating its applicability beyond Discord's specific use case. This broader adoption suggests that Osprey could become a standard component for decentralized communication platforms.

Trade-offs and Considerations

While Osprey offers impressive performance and flexibility, implementing and maintaining such a system involves several trade-offs:

Operational Complexity: The polyglot architecture requires expertise in both Rust and Python, along with understanding of distributed systems concepts.
Resource Requirements: High-performance rule evaluation requires significant computational resources, particularly for large-scale deployments.
Rule Management: As the number of rules grows, managing their interactions and ensuring they don't conflict becomes increasingly complex.
State Management: While the workers are stateless, the system must still maintain state across Entities, which requires careful design to avoid consistency issues.
Learning Curve: Security analysts need to learn the SML language, and developers need to understand the system's architecture to create effective UDFs.

The Rust/Python Pattern in Modern Systems

Osprey exemplifies a broader pattern in modern system design: using Rust for performance-critical components and Python for business logic and extensibility. This approach combines the best of both worlds:

Rust provides memory safety, concurrency, and performance, making it ideal for data plane components.
Python offers rapid development, extensive libraries, and readability, making it suitable for control plane logic.

This pattern is not unique to Osprey. Other successful projects have adopted similar approaches:

Polars DataFrame library uses a Rust core for compute-heavy operations while exposing a Python API.
Hugging Face tokenizers rely on a Rust implementation for performance while providing Python bindings.

The success of these projects suggests that the Rust/Python polyglot architecture is a sustainable pattern for high-performance systems that also require rapid iteration and extensibility.

Future Directions

As an open-source project, Osprey's future will be shaped by its community of contributors. Potential areas for development include:

Enhanced Machine Learning Integration: More sophisticated UDF patterns for common ML operations.
Advanced Rule Analysis: Tools for understanding rule interactions and potential conflicts.
Performance Optimization: Further improvements in throughput and latency for specialized workloads.
Extended Visualization: Enhanced UI components for understanding system behavior and rule effectiveness.

Conclusion

Discord's open-sourcing of Osprey represents a significant contribution to the field of real-time safety systems. By combining the performance of Rust with the flexibility of Python, the project offers a compelling architecture for platforms that need to evaluate millions of rules per second. The system's adoption by other networks like Bluesky and Matrix.org demonstrates its broad applicability beyond Discord's specific use case.

For organizations considering real-time safety systems, Osprey provides a well-architected foundation that balances performance, scalability, and extensibility. While implementing and maintaining such a system requires significant expertise, the open-source nature of the project means that organizations can benefit from community contributions and avoid vendor lock-in.

As the digital landscape continues to evolve, systems like Osprey will become increasingly important for maintaining safety at scale. The project's combination of high performance and flexible architecture positions it as a valuable tool for the broader engineering community.

Author photo