In my previous blog, I talked about the two main ways you can program for parallel computing with the Intel compiler: Using Cilk Plus compiler extension, and by using the Threading Building Blocks, or TBB. TBB is an open source library that provides a large selection of parallel-safe algorithms and data structures.
Notice I said parallel-safe, not the usual thread-safe. That’s because a function or class can claim to be thread-safe, but that doesn’t necessarily mean it’s always safe for parallel programming. And while I’m essentially making up the term parallel-safe, I’ve seen a great number of classes out there that boldly claim thread-safe, but will break under parallel development. Buyer beware! It’s easy to make a class thread safe whereby the internal functions don’t clash and ignore each other or create deadlocks. But that doesn’t always mean the functions will operate correctly under true parallel code. Or they might work just fine, but ultimately get executed in serial, thus providing no performance benefit.
The main issue here is reduction. The developer of a class might include various locks in the member functions to ensure thread safety such that no races or deadlocks occur. But if you’re hoping for maximum horsepower in your code, then you might find that the class was coded in such a way that the functions in the class make over-use of critical sections, causing all threads to stop as one thread executes. Then when that thread is done the next one runs, and so on. The end result is essentially a serial rather than parallel run. So much for maximum horsepower. Yet the class is thread-safe, right? The problem, of course, is that the developer didn’t factor in parallel programming (such as parallel for loops, and especially reduction), and instead only worked to prevent multiple threads from clashing.
TBB to the rescue
Take a look at the parallel reduce algorithm found in the TBB documentation. This algorithm, like the others, is implemented as a template function. This algorithm helps you break up your own algorithms to operate simultaneously on smaller chunks of data. This class isn’t for the faint of heart; you’re going to have to work a bit to put it to use. However, if you use it correctly—by writing your class to include the correct functions, including the sub range operation and the reduction operation—the end result will be a high-performance reduction class. (Remember, you’re creating a class that provides the correct functions, and then calling the template function; thus, the part you create is a class, not just a function.)
So what does this mean for your coding? In this case, instead of just relying on a third-party class, you wrote your own class, and then made use of the reduction template function. That will probably mean a bit more work compared to dropping in some free class you found out on the Internet. But the payoff is increased performance.
Also, there are other classes in TBB you don’t need to write yourself, including, for example, concurrent_vector, which is a parallel form of the standard vector class. Take a look at the whole documentation and you’ll see how much is there.
What about portability?
Now one question I’ve been asked is what this will do for portability. The answer: Not a problem! That’s where TBB shines. Unlike Cilk Plus, where you’re using compiler extensions that won’t port to all compilers, TBB is written as an open-source, compiler-independent library, and as such will work with most modern compilers.
But before ending, I need to point out that Intel has ported their Cilk Plus extensions to the GNU C++ compiler. That means that although you still can’t take your Cilk Plus code to every C++ compiler, you can use it with more than just the Intel compiler, so you’re not totally locked in. Still, TBB goes even further and should work with any compiler. Bottom line: You can use compiler extensions or an open-source library to accomplish your parallel code. The choice is yours.