I’ve written the following code to test out FastAPI’s parallelism:
import time from fastapi import FastAPI, Request app = FastAPI() @app.get("/ping") async def ping(request: Request): print("Hello") time.sleep(5) print("Goodbye") return { "PING": "PONG!" }
When I test this out by calling the /ping
API route multiple times, I get the following output:
Hello Goodbye Hello Goodbye
This indicates that the server is processing my /ping
requests one after the other, i.e. in serial. If it were processing them in parallel, I would expect to see this output:
Hello Hello Goodbye Goodbye
My ping
function is defined as async
, so I’m uncertain why it is still being executed serially. How do I achieve parallelization?
Serial vs parallel processing is often conflated with synchronous vs asynchronous processing, but are quite different concepts in practice, even if they sometimes produce similar results.
time.sleep(5)
in our code above.These four types of processing can be combined in various ways depending on your application’s architecture. A simple FastAPI project like the one above will run as a serial asynchronous application by default. Asynchronous processing is a good way to make efficient use of a single thread.
However, to get the full benefits of asynchronous processing, we must indicate the points in our code where blocking functionality is invoked using the await
keyword. We also need to ensure that the operation we’re invoking is awaitable. Most standard Python functions were written with synchronous code in mind and are not compatible with await
– this includes time.sleep
.
Fortunately, Python’s built-in asyncio
library provides an awaitable sleep
command that we can use. Here’s how we can modify the code above to work asynchronously:
import asyncio # changed import from fastapi import FastAPI, Request app = FastAPI() @app.get("/ping") async def ping(request: Request): print("Hello") await asyncio.sleep(5) # added await and changed time to asyncio print("Goodbye") return { "PING": "PONG!" }
If we run this code and send two separate requests to /ping
, we should see this output in our terminal:
Hello Hello Goodbye Goodbye
This happens because our server now knows to pause processing of the first /ping
request once we reach await asyncio.sleep(5)
and move on to the next operation in the event loop, which is our second /ping
request. Then, once it encounters await asyncio.sleep(5)
, it returns to processing the first request. This causes the print("Hello")
line in both requests to be executed before the print("Goodbye")
request in either of them.
Note that we may still get the old behavior if we issue the requests by opening two browser tabs in the same session, as most browsers will detect when the same URL is requested twice and cache the result from the first to use in the second. To test our endpoint correctly, we should issue the requests using a non-browser tool such as wget
or curl
:
curl http://localhost:8000/ping
wget http://localhost:8000/ping
Even with this change, our application is still processing operations serially, just in a more efficient order. For simple, low-traffic APIs, this should be sufficient. To achieve true parallelization, we need to run our FastAPI project on an ASGI server such as Uvicorn with more than one worker configured. FastAPI provides documentation on running a server with Uvicorn. For more information about how many Uvicorn workers should be configured for a given system, see this answer.
Loved by over 4 million developers and more than 90,000 organizations worldwide, Sentry provides code-level observability to many of the world’s best-known companies like Disney, Peloton, Cloudflare, Eventbrite, Slack, Supercell, and Rockstar Games. Each month we process billions of exceptions from the most popular products on the internet.