Make FastAPI run calls in parallel instead of serial

David Y.

March 15, 2024

The Problem

I’ve written the following code to test out FastAPI’s parallelism:

Click to Copy

import time
from fastapi import FastAPI, Request

app = FastAPI()

@app.get("/ping")
async def ping(request: Request):
        print("Hello")
        time.sleep(5)
        print("Goodbye")
        return { "PING": "PONG!" }

When I test this out by calling the /ping API route multiple times, I get the following output:

Click to Copy

Hello
Goodbye
Hello
Goodbye

This indicates that the server is processing my /ping requests one after the other, i.e. in serial. If it were processing them in parallel, I would expect to see this output:

Click to Copy

Hello
Hello
Goodbye
Goodbye

My ping function is defined as async, so I’m uncertain why it is still being executed serially. How do I achieve parallelization?

The Solution

Serial vs parallel processing is often conflated with synchronous vs asynchronous processing, but are quite different concepts in practice, even if they sometimes produce similar results.

Serial processing: tasks are run one at a time, in order.
Parallel processing: multiple tasks are run at the same time, using different threads or processors.
Synchronous processing: tasks are run in order, with pauses for blocking operations, such as retrieving user input or time.sleep(5) in our code above.
Asynchronous processing: tasks are put in an event loop and executed until a blocking operation is encountered, at which point execution is switched to the next task.

These four types of processing can be combined in various ways depending on your application’s architecture. A simple FastAPI project like the one above will run as a serial asynchronous application by default. Asynchronous processing is a good way to make efficient use of a single thread.

However, to get the full benefits of asynchronous processing, we must indicate the points in our code where blocking functionality is invoked using the await keyword. We also need to ensure that the operation we’re invoking is awaitable. Most standard Python functions were written with synchronous code in mind and are not compatible with await – this includes time.sleep.

Fortunately, Python’s built-in asyncio library provides an awaitable sleep command that we can use. Here’s how we can modify the code above to work asynchronously:

Click to Copy

import asyncio # changed import
from fastapi import FastAPI, Request

app = FastAPI()

@app.get("/ping")
async def ping(request: Request):
        print("Hello")
        await asyncio.sleep(5) # added await and changed time to asyncio
        print("Goodbye")
        return { "PING": "PONG!" }

If we run this code and send two separate requests to /ping, we should see this output in our terminal:

Click to Copy

Hello
Hello
Goodbye
Goodbye

This happens because our server now knows to pause processing of the first /ping request once we reach await asyncio.sleep(5) and move on to the next operation in the event loop, which is our second /ping request. Then, once it encounters await asyncio.sleep(5), it returns to processing the first request. This causes the print("Hello") line in both requests to be executed before the print("Goodbye") request in either of them.

Note that we may still get the old behavior if we issue the requests by opening two browser tabs in the same session, as most browsers will detect when the same URL is requested twice and cache the result from the first to use in the second. To test our endpoint correctly, we should issue the requests using a non-browser tool such as wget or curl:

Click to Copy

curl http://localhost:8000/ping

Click to Copy

wget http://localhost:8000/ping

Even with this change, our application is still processing operations serially, just in a more efficient order. For simple, low-traffic APIs, this should be sufficient. To achieve true parallelization, we need to run our FastAPI project on an ASGI server such as Uvicorn with more than one worker configured. FastAPI provides documentation on running a server with Uvicorn. For more information about how many Uvicorn workers should be configured for a given system, see this answer.

Loved by over 4 million developers and more than 90,000 organizations worldwide, Sentry provides code-level observability to many of the world’s best-known companies like Disney, Peloton, Cloudflare, Eventbrite, Slack, Supercell, and Rockstar Games. Each month we process billions of exceptions from the most popular products on the internet.

Platform

Languages & Frameworks

Why Sentry?

Features

Make FastAPI run calls in parallel instead of serial

David Y.

The Problem

The Solution

Related Answers

A better experience for your users. An easier life for your developers.

Make FastAPI run calls in parallel instead of serial

David Y.

The Solution

Related Answers

A better experience for your users. An easier life for your developers.

A peek at your privacy

Who we collect PII from

PII we may collect about you

How we use your PII

Third parties who receive your PII

We use cookies (but not for advertising)

Know your rights