How do pthreads and TBB fit together? Share your comment!

 

The short answer: these two thread libraries are quite different in implementation and purposes.   

But let’s take a step back and consider the definition of a thread. A program can launch separate threads, even on a single core machine. The operating system (sometimes even at the hardware level), when running multiple threads on a single core, will switch between the threads, swapping out the CPU registers, including the code register and any registers used with a thread’s stack that contains local variables, and continue execution in another thread. Then it can swap back to the first thread, and so on. Or, another approach is a thread executes until it voluntarily yields to other threads.

POSIX threads, often simply called “pthreads” is a standard for multi-threaded programming, and part of the POSIX standard created by IEEE. The pthreads library was originally defined in 1995, before multiple cores were common in desktop computers. The entire POSIX standard was revised most recently in 2008.

And they differ how?

The fundamental difference in implementation is that POSIX is implemented as a set of C-callable functions. TBB is a set of C++ template classes and functions.

The pthreads library includes functions for creating a new thread, used like so:

int p = pthread_create(&threadinfo, NULL, threadfunc, (void *)args);

I don’t have space for a full tutorial on pthreads, but essentially that’s what most of the library looks like. You manually call into the API to spawn a new thread. There are almost a hundred functions for managing the creation and deletion of threads, and managing the communication between and synchronization of the threads. In other words, with pthreads, you’re in charge of starting all the threads and writing your own higher-level algorithms and data structures.

TBB, on the other hand, is a C++ library that provides parallelism by operating at a much higher level. It provides classes and algorithms for you, such as thread-safe list classes and everything else I’ve been discussing in this column.

For example, if you want to divide up an array and perform a parallel for-loop on chunks of the array, such that separate threads operate on different parts of the array simultaneously, you can easily do so using TBB. There’s a blocked_range template class that assists in splitting the array, and then various loops, including a parallel_for loop that simplify the spawning of parallel threads in a loop.

You could certainly do something similar with pthreads. But pthreads doesn’t provide a range class, so you would have to code your own. And there isn’t a parallel for loop, so you’ll have to spawn the threads separately, and have each thread perform a separate loop.

Now the good news

But, you can actually use both together. In fact, if you download and install the full Threading Building Blocks, you’ll see that inside TBB actually uses pthreads for some of its work, and a couple of the examples use both TBB and pthreads. They co-exist quite nicely. So if you are using pthreads and need a blocked_range that splits up arrays for you, then you can use the one in TBB without rolling your own, and then spawn your threads yourself using pthreads. (However, quite frankly, I find it easier to just go with TBB throughout.)

Generally speaking, then, if you need a thread-safe data structure such as a collection, or a thread-safe function, unless you’re an expert on writing thread-safe code, your best bet is to go with a proven and tested library such as Threading Building Blocks.

But if you have a want to write lower-level code where you spawn your own threads, you can use pthreads, or even combine the two.

Posted on by Jeff Cogswell, Geeknet Contributing Editor
2 comments
swaroopcool21
swaroopcool21

Can you also elaborate on OpenMP vs TBB. May be a 3-way comparison or something like that?

JoeS5263
JoeS5263

@swaroopcool21 OpenMP vs TBB:  OpenMP is not as flexible as TBB, but on the other hand that means you don't have to type as much stuff into your code to use it.  In fact, OpenMP is used by #pragmas, and the whole idea is that if you ignore the pragmas you have a correctly functioning single threaded program.  That can be very handy for testing!   The usual "map" case just requires this line just before your for loop:

#pragma omp parallel for

Now, you do have the ability to specify that some variable name is shared or private or private but initialized from shared at start, and several more such options.  If you don't specify OpenMP does a reasonable job of deciding which variables are shared.  Anything declared outside the loop is shared.  The loop variable is private.  Variables declared inside are private.  There are some more rules that I don't remember right now, but even these are enough to make a typical for loop need only the simple pragma above.  By default, OpenMP decides how many threads to spawn. There are several ways to control that.

TBB requires that you to change the "for" to "parallel_for" at least. That means the code has been modified and cannot be simply compiled as single threaded with a compiler switch, unlike OpenMP.  And in examples TBB uses 

parallel_for(blocked_range<int>(0,n), [&](blocked_range<int> r) {    for( int i=r.begin(); i!= r.end(); ++i) ...   });

we can see it has set up an iterator r that must be local to each thread.

Now, there is positive stuff to say about TBB. OpenMP uses system threads.  Creating a system thread on Windows is time consuming.  This limits the applicability of OpenMP to cases where the overhead of creating the thread pool is justified.  It makes people split the combined pragma above into #pragma omp parallel (this makes it create the thread pool) followed by more than one construct that uses it, such is #pragma omp for, so that the thread pool is only created once and destroyed once over several uses.    TBB on the other hand makes its own threads which are lighter, and it does load balancing and work stealing (if a thread has no more tasks in its queue, but another thread has more than the one it's working on, the idle thread will steal work from another thread's queue).  OpenMP does not do that; it assumes that when it divides a loop into equal sized blocks of iteration that the blocks all take about the same time - a reasonable assumption, usually.

Note that for TBB to do work stealing that implies that it has divided the loop iterations into considerably more chunks than there are threads.  I think OpenMP does not do that by default, although it certainly can be made to do that.  I believe by default OpenMP makes as many threads as the hardware can support, and divides loop iterations into that many pieces.  

My decision (for my group) was that we would use OpenMP.   My big gripe now is that OpenMP is up to rev 4.0, but the latest Microsoft C++ compiler only supports OpenMP 2.0. There was some very good stuff added before 4.0 that MS does not support.  Boo, and shame on them.  Intel compiler and gcc both do support OpenMP 4.0.