Threading in Python: Beating Moore’s Law Share your comment!

Threading in Python Example

Threading in Python

Herb Sutter of C++ fame wrote in 2005 that the end was in sight for single core CPUs keeping up with Moore’s Law. The way forward was multiple cores and concurrency, i.e. doing multiple things at the same time. If you have multiple systems or even multiple CPUs in a system then you write code to run across different systems perhaps using message passing as in MPI.

It’s only in the last ten years or so that multi-core CPUs have become the norm. Before we had dual cores, a single CPU could only do concurrent programming by using threads. A thread is a thread of execution, where the processor executes some code.
The CPU can only execute one block of code at a time but it can switch to executing a different block of code and save out the contents of the registers then load the registers for the next set of code. That single CPU couldn’t run the code any faster but if say one thread was reading data from a slow device then in between reading a number of bytes from the device it could switch to another thread do some computation then switch to the data thread and read more data. That way the CPU didn’t just sit spinning its wheels while waiting for data,

With dual processors, each processor can run programs in parallel if the programming language supports it. So then we come to Python. It was devised 25 years ago before multi-core CPUs existed. Python has had a threads module for a long time (named to _threads in Python 3). Later on a higher level module threading was introduced. But first a word about GIL – the Global Interpreter Lock.

GIL

Cpython, the standard interpreter for Python has a problem with threads. It’s not thread safe so only one thread can access it at a time to avoid problems. A thread holds the GIL (Global Interpreter Lock) when running but if it’s blocked the GIL is released and another thread gets a chance to run. Worth repeating; only one thread can be executed at once thanks to the GIL. If however non-Python code is run then the benefits of threading can be achieved. As we’ll see later there is a way with Python code as well.

So one way to get the best out of Python threading is to use computation and I/O together. Here’s a simple example of old school threading. It fetches four URLs serially then the same again threaded. There are two modules for threading in Python, the older thread (deprecated and called _thread in Python 3) and the newer threading which I’ll use.

The output is:

So the parallel version runs over three times faster. This was run on Ubuntu 16.04 LTS on a windows PC inside Virtual box so you may get a different ratio on real hardware.

The Modern Way

Rather than worry about individual threads and the impact of GIL, the multiprocessing package spawns processes with a similar API to threading. If you want to start another process use multiprocessing.

This is the above example using the multiprocessing thread pool.

The execution time in this case was slightly longer ( 2.2 seconds) than the original, but subsequent runs gave varying times down to 1.6 seconds. The map() function in pool calls the specified function (urlopen) for each of the urls using a thread from the pool. As before in get_responses() pool.join() waits for all threads to complete.

Concurrent Futures

Yet another way and possibly the best is concurrent.futures which has been in Python since 3.2.

I’ve chosen to use ProcessPoolExecutor.submit() to call a function, passing in the now familiar urls. I’ve used the ProcessPoolExecutor which uses a pool of processes to execute the function calls asynchronously. This avoids the GIL but only objects that are picklable (i.e. serialized) can be executed and returned.

In this example the main code is in a main() function and I’ve added the import guard (the last two lines). The for future loop uses the as_completed() function to wait until each task finishes and then prints the result which is just the url. In a real program, it would be the data i.e. web page. If you use map instead of submit then the results are returned in the order of the urls. With submit though whichever you get them in the order they are returned, so fastest fetch first.

This returned the results in 1.3998 … seconds, closer to the first example time and better than the pool.map() example.

Conclusions

If you are struggling with GIL and multi-processing because your code is pure Python, try concurrent_futures.

Posted on January 4, 2017 by David Bolton, Slashdot Media Contributing Editor