Existing implementations of thread pools have a relatively high overhead in certain situations. Especially apply_async in multiprocessing.pool.ThreadPool and submit in concurrent.futures. ThreadPoolExecutor at all (see the benchmarks in the docs directory).

In case of ThreadPoolExecutor don’t use wait. It can be extremely slow! If you’ve only a small number of jobs and the jobs have a relatively long processing time, then these overheads don’t count. But in case of high number of jobs with short processing time the overhead of the above implementations will noticeably slow down the processing speed. The fastthreadpool module solves this issue, because it has a very small overhead in all situations.

Although fastthreadpool is lightweight it has some additional cool features like methods for later scheduling, repeating events and generator functions for worker callback functions.

In addition to get the best performance I’ve also written a fast and lightweight semaphore which is more than 20 times faster than the one which comes with the Python installation.

Some reasons why fastthreadpool is so fast:

  • Avoid locks as much as possible
  • Use deque instead of Queue
  • Do not create a class instance for every work item

The first test in benchmarks.py has a minimum worker callback function which just returns the given parameter. The main thread calculates the sum of the returned values. This is the most extreme case where the overhead of the thread pool implementation counts as much as possible. This is not the typical use case but it shows the overhead of the different thread pool implementations very well.

The results show that map in ThreadPool performs good, but map in fastthreadpool is a bit more efficient.

But apply_async has a very bad performance, even worse than submit in ThreadPoolExecutor. submit in fastthreadpool performs very well.

Thread Pool Function Time
single threaded for loop 0.378
fastthreadpool map 0.166
ThreadPool map_async 0.280
ThreadPoolExecutor map 53.072
fastthreadpool submit 2.679
ThreadPool apply_async 76.350
ThreadPoolExecutor submit 59.161

A more typical case shows the last example where the worker threads serialize and compress data.

Thread Pool Function Time
single threaded for loop 0.628
fastthreadpool map 0.598
ThreadPool map_async 0.609
ThreadPoolExecutor map 1.192
fastthreadpool submit 0.659
ThreadPool apply_async 1.317
ThreadPoolExecutor submit 1.169

As you can see the worker threads are still 2 times faster with fastthreadpool when submitting single jobs to the pool than the other 2 thread pool implementations.

Again this example shows clearly if speed matter then avoid concurrent.futures. Although it has a nice interface it is really slow.

For examples how to use fastthreadpool please have a look at the examples directory.

Check out the fastthreadpool module on github, licensed under the MIT license.