The uvloop
project is great with an amazing performance, and a good replacement for the default asyncio module, if Linux is used. Unfortunately uvloop
is not available for Windows.
Just for interest I wanted to know how a multi threaded version competes with the uvloop
, which is single threaded.
Is a comparable performance possible, or is a solution with multiple threads even faster?
So I took the echoserver.py example from the uvloop
project and extended it with support for fastthreadpool
.
Here is a simplified code example of a socket server with fastthreadpool
:
def pool_echo_server(address, threads, size):
sock = socket(AF_INET, SOCK_STREAM)
sock.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
sock.bind(address)
sock.listen(threads)
with sock:
while True:
client, addr = sock.accept()
pool.submit(pool_echo_client, client, size)
def pool_echo_client(client, size):
client.setsockopt(IPPROTO_TCP, TCP_NODELAY, 1)
b = bytearray(size)
bl = [ b ]
with client:
try:
while True:
client.recvmsg_into(bl)
client.sendall(b)
except:
pass
pool = fastthreadpool.Pool(8)
pool.submit(pool_echo_server, addr, 8, 4096)
pool.join()
For a complete example please have a look into the examples/bench directory.
Folowing benchmarks were executed on a Ryzen 7 with Linux Mint 18.3, kernel 4.13.0-37-generic and Python 3.6. echoserver.py and echoclient.py were executed on the same machine.
echoclient.py was always run with 5 parallel workers. Only the message size was modified for the different tests. I’ve only compared uvloop
using simple protocol with the multi threaded version.
Module | Server buffer size | Message size | Messages/s | MB/s |
---|---|---|---|---|
uvloop | 1000 bytes | 128220 | 122,28 | |
uvloop | 4kB | 108581 | 424,14 | |
uvloop | 64kB | 26004 | 1625,25 | |
threads | 4kB | 1000 bytes | 423226 | 403,62 |
threads | 4kB | 4kB | 112033 | 437,63 |
threads | 8kB | 4kB | 256438 | 1001,71 |
threads | 16kB | 4kB | 381320 | 1489,53 |
threads | 4kB | 64kB | 6690 | 418,13 |
threads | 64kB | 64kB | 66672 | 4167 |
threads | 128kB | 64kB | 62236 | 3889,75 |
threads | 256kB | 64kB | 60292 | 3768,25 |
Folowing benchmarks were executed on a Core i5-8250U with Linux Mint 18.3, kernel 5.2.3-050203-generic and Python 3.7.2. echoserver.py and echoclient.py were executed on the same machine. echoclient.py was always run with 5 parallel workers. Only the message size was modified for the different tests.
Module | Server buffer size | Message size | Messages/s | MB/s |
---|---|---|---|---|
uvloop | 1000 bytes | 34628 | 33,21 | |
uvloop | 4kB | 35975 | 140,53 | |
uvloop | 64kB | 2170 | 135,68 | |
uvloop/streams | 1000 bytes | 65588 | 62,55 | |
uvloop/streams | 4kB | 61856 | 241,63 | |
uvloop/streams | 64kB | 1625,25 | ||
uvloop/protocol | 1000 bytes | 34628 | 33,21 | |
uvloop/protocol | 4kB | 35975 | 140,53 | |
uvloop/protocol | 64kB | 26004 | 1625,25 | |
asyncio | 1000 bytes | 128220 | 122,28 | |
asyncio | 4kB | 108581 | 424,14 | |
asyncio | 64kB | 26004 | 1625,25 | |
asyncio/streams | 4kB | 108581 | 424,14 | |
asyncio/streams | 4kB | 108581 | 424,14 | |
asyncio/streams | 64kB | 26004 | 1625,25 | |
asyncio/protocol | 64kB | 26004 | 1625,25 | |
asyncio/protocol | 4kB | 108581 | 424,14 | |
asyncio/protocol | 64kB | 26004 | 1625,25 | |
threads | 4kB | 1000 bytes | 423226 | 403,62 |
threads | 4kB | 4kB | 112033 | 437,63 |
threads | 8kB | 4kB | 256438 | 1001,71 |
threads | 16kB | 4kB | 381320 | 1489,53 |
threads | 4kB | 64kB | 6690 | 418,13 |
threads | 64kB | 64kB | 66672 | 4167 |
threads | 128kB | 64kB | 62236 | 3889,75 |
threads | 256kB | 64kB | 60292 | 3768,25 |
The results show clearly that the multi threaded version performs much better if the buffer size on the server side is adjusted perfectly depending on the message size. The magic why the multi threaded example is so fast is that it uses recvmsg_into which writes the received data into a preallocated buffer.