A Python Web Framework with a Rust Runtime, Three Times Faster than FastAPI!

Robyn is a fast, high-performance Python web framework with a Rust runtime. It aims to deliver near-native Rust throughput while leveraging the benefits of code written in Python. It has over 200k installations on PyPI.

Robyn is comparable to other frameworks like Flask, FastAPI, Django, and preferred web servers. One of Robyn’s main advantages is that it doesn’t require an external web server for production, making it more efficient and streamlined.

Since all other popular Python frameworks are written in Python or CPython, they are not very concurrency-friendly due to the notorious GIL and slow execution speed. Robyn uses a Rust runtime and a built-in web server, attempting to work around the GIL to enhance runtime performance in various ways. While having a decoupled web server has its advantages, Robyn’s integrated web server gives us better control over performance execution.

We love seeing performance data. It’s one of our core goals. We are excited to share another milestone from our continuous efforts: a 4x reduction in write latency for our data pipeline, dropping from 120ms to 30ms!
This improvement resulted from transitioning from a C library accessed via a Python application to a fully Rust-based implementation.

This is a straightforward introduction to our architectural changes, the tangible results, and their impact on system performance and user experience.

Switching from Python to Rust

So, why did we switch from Python to Rust? Our data pipeline is used by all services!

Our data pipeline is the backbone of our real-time communication platform. Our team is responsible for replicating event data from all APIs to all internal systems and services. This includes data processing, event storage and indexing, connection state, and more. Our primary goal is to ensure the accuracy and reliability of real-time communication.

Before the migration, the old pipeline used a C library, accessed via a Python service, which buffered and batched data. This was indeed a key factor contributing to our latency. We wanted to optimize this and knew it was achievable.

We explored transitioning to Rust because we had previously seen benefits from its performance, memory safety, and concurrency capabilities. It was time to do it again!

High Regard for Rust’s Performance and Async IO Advantages

Rust excels in performance-intensive environments, especially when combined with async IO libraries like Tokio. Tokio is a multithreaded, non-blocking runtime for writing asynchronous applications with the Rust programming language. Migrating to Rust allowed us to fully leverage these features, achieving high throughput and low latency—all with compile-time memory and concurrency safety.

Memory and Concurrency Safety

Rust’s ownership model provides compile-time guarantees for memory and concurrency safety, thereby avoiding the most common issues like data races, memory leaks, and invalid memory accesses. This was highly advantageous for us.

Looking ahead, we can confidently manage the lifecycle of our codebase. If needed later, we can perform ruthless refactoring. And there will always be a “later need.”

Technical Implementation of Architectural Changes using MPSC and Tokio for Service-to-Service Messaging

The previous architecture relied on a service-to-service messaging system, which introduced considerable overhead and latency. A Python service used a C library to buffer and batch data. Latency occurred when exchanging messages between multiple services, adding complexity to the system. The buffering mechanism in the C library was a significant bottleneck, causing an end-to-end latency of approximately 120ms. We considered this optimal, as our average latency per event was 40 microseconds. While this seemed acceptable from the old Python service’s perspective, downstream systems suffered during the debatching process, leading to higher overall latency.

When we deployed, the average latency per event increased from the original 40 microseconds to 100 microseconds. This did not seem optimal.

However, looking back, we can see why this happened.
The good news is that downstream services can now consume events individually and faster, without needing to debatch.

The overall end-to-end latency had the potential to improve significantly from 120ms to 30ms.

The new Rust application can trigger events concurrently and immediately.

This approach was not feasible in Python, as using a different concurrency model would also require a rewrite. We could have rewritten it in Python. But if we were going to rewrite, we might as well do the best rewrite possible—with Rust!

Resource Reduction: CPU and Memory:

Our Python service consumed over 60% of core resources. In contrast, the new Rust service consumes less than 5% of resources across multiple cores. The memory reduction is also very significant, with the Rust runtime occupying about 200MB of memory, compared to gigabytes required by Python.

The New Rust-Based Architecture:

The new architecture leverages Rust’s powerful concurrency mechanisms and async IO capabilities.

Service-to-service messaging was replaced with multiple instances utilizing Multi-Producer, Single-Consumer (MPSC) channels.

Tokio, built for efficient asynchronous operations, reduces blocking and increases throughput.

Our data flow is simplified by eliminating the need for intermediate buffering stages, opting instead for concurrency and parallelism.

These measures enhance both performance and efficiency.

Rust Application Example

This code is not a direct copy; it’s merely an illustrative example simulating the functionality of our production code. Also, this code shows only one MPSC channel, whereas our production system uses multiple channels.

Cargo.toml: We need to include dependencies for Tokio and any other crates we might use (e.g., an async channel for events).

Event Definition: The Event type is used in the code but not fully defined, as we have many types not shown in this example.

Event Stream: event_stream is referenced but created differently from many streams, depending on your approach, so the example keeps it simple.

Here is a Rust example with code and a Cargo.toml file, including event definition and event stream initialization.

Cargo.toml

toml

[package]
name = "tokio_mpsc_example"
version = "0.1.0"
edition = "2021"

[dependencies]
tokio = { version = "1", features = ["full"] }

main.rs

rust

use tokio::sync::mpsc;
use tokio::task::spawn;
use tokio::time::{sleep, Duration};

// Define the Event type
#[derive(Debug)]
struct Event {
    id: u32,
    data: String,
}

// Function to handle each event
async fn handle_event(event: Event) {
    println!("Processing event: {:?}", event);
    // Simulate processing time
    sleep(Duration::from_millis(200)).await;
}

// Function to process data received by the receiver
async fn process_data(mut rx: mpsc::Receiver<Event>) {
    while let Some(event) = rx.recv().await {
        handle_event(event).await;
    }
}

#[tokio::main]
async fn main() {
    // Create a channel with a buffer size of 100
    let (tx, rx) = mpsc::channel(100);

    // Spawn a task to process the received data
    spawn(process_data(rx));

    // Simulate an event stream with dummy data for demonstration
    let event_stream = vec![
        Event { id: 1, data: "Event 1".to_string() },
        Event { id: 2, data: "Event 2".to_string() },
        Event { id: 3, data: "Event 3".to_string() },
    ];

    // Send events through the channel
    for event in event_stream {
        if tx.send(event).await.is_err() {
            eprintln!("Receiver dropped");
        }
    }
}

Rust Example Files

Cargo.toml:
- Specifies the package name, version, and edition.
- Includes the tokio dependency with the “full” feature set required.
main.rs:
- Defines the Event struct.
- Implements the handle_event function to process each event.
- Implements the process_data function to receive and process events from the channel.
- Creates an event_stream with dummy data for demonstration purposes.
- Uses the Tokio runtime to spawn a task that handles events and sends events through the channel in the main function.

Benchmarks

To validate our performance improvements, we conducted extensive benchmarking in development and staging environments. We used tools like hyperfine and criterion.rs to gather latency and throughput metrics. We simulated various scenarios to mimic production-like loads, including peak traffic periods and edge cases.

Production Validation

To assess real-world performance in production, we implemented continuous monitoring using Grafana and Prometheus. This setup allows tracking key metrics such as write latency, throughput, and resource utilization. Additionally, alerts and dashboards were configured for timely identification of any deviations or bottlenecks in system performance, ensuring potential issues could be addressed promptly. Of course, we deployed cautiously over several weeks to a low percentage of traffic initially. The charts you see are from the full deployment after our validation phase.

Benchmarks Alone Are Not Enough

Load testing demonstrated the improvements. While yes, testing doesn’t prove success, it provides evidence. Write latency was consistently reduced from 120ms to 30ms. Response times were enhanced, and end-to-end data availability was accelerated. These advancements significantly improved overall performance and efficiency.

Before and After

Before (Old System):
Service-to-service messaging was done via a C library for buffering. This involved multiple services in the messaging loop, and the C library added latency through event buffering. The Python service added an extra layer of latency due to Python’s Global Interpreter Lock (GIL) and its inherent operational overhead. These factors contributed to high end-to-end latency, complex error handling and debugging processes, and limited scalability due to bottlenecks introduced by event buffering and the Python GIL.

After (Rust Implementation):
Messaging via direct channels eliminated intermediary services. Tokio enabled non-blocking async IO, significantly increasing throughput. Rust’s strict compile-time guarantees led to fewer runtime errors, and we gained robust performance. The observed improvements included:

End-to-end latency reduced from 120ms to 30ms.
Enhanced scalability through efficient resource management.
Improved error handling and debugging thanks to Rust’s strict type and error handling models.
It’s hard to argue for using anything else besides Rust.

Deployment and Operations

Minimal Operational Changes

The deployment process required minimal modifications to accommodate the migration from Python to Rust. The same deployment and CI/CD pipelines were used. Configuration management continued to leverage existing tools like Ansible and Terraform, facilitating seamless integration. This allowed us a smooth transition without disrupting existing deployment processes. This is a common approach—you want to make as few changes as possible during a migration. This way, if issues arise, we can isolate the footprint and identify problems faster.

Monitoring and Maintenance

Our application integrates seamlessly with the existing monitoring stack, including Prometheus and Grafana, enabling real-time metric monitoring. Rust’s memory safety features and reduced runtime errors significantly lower maintenance overhead, resulting in a more stable and efficient application. It’s great to see our build system working correctly, and even better, we can catch errors during development on our laptops, allowing us to catch issues before pushing commits that could cause build failures.

Practical Impact on User Experience

Improved Data Availability
Faster write operations enable near-instantaneous data reading and index readiness, enhancing the user experience. These enhancements include reduced data retrieval latency, leading to more efficient and responsive applications. Real-time analytics and insights are also improved. This provides businesses with up-to-date information for informed decision-making. Furthermore, faster propagation of updates across all user interfaces ensures users always have access to the latest data, enhancing collaboration and productivity for teams using the APIs we provide. From an external perspective, the reduced latency is noticeable. Combined APIs ensure data is now available faster.

Increased System Scalability and Reliability
Businesses focusing on Rust will gain significant advantages. They will be able to analyze vast amounts of data without slowing down the system. This means you can keep up with user load. And let’s not forget the added benefits of a more resilient system with less downtime. We operate a business with billions of connected devices; outages are absolutely not allowed, and continuous operation is a must.

Conclusion

Transitioning to Rust not only significantly reduced latency but also laid a solid foundation for future enhancements in performance, scalability, and reliability. We are committed to delivering the best possible experience for our users.

Rust aligns with our commitment to providing the best API services for billions of users. Our experience empowers us to meet and exceed the demands of real-time communication, both now and in the future.

Easy Python

A Python Web Framework with a Rust Runtime, Three Times Faster than FastAPI!

New Article

Related articles