Intel® Threading Building Blocks (Intel® TBB) is a widely used, award-winning C++ template library for creating reliable, portable, and scalable parallel applications. Use Intel® TBB for a simple and rapid way of developing robust task-based parallel applications that scale to available processor cores, are compatible with multiple environments, and are easier to maintain.
To read more, click here.








I don't think you're going to see "parallelizing would be just a compiler switch and the source code would be analyzed by the compiler for opportunities to parallelize" is something I don't think you'll see in at least the first half of this century. People have been working on this for decades and there is little or nothing to show for it. The best that has been achieved has been compiler directives inserted by the programmer to give hints. Parallel programming is not a programming style change like OOP, it's an algorithmic shift. First and foremost, often doing something serially and in parallel requires a different algorithm - that is if you want it to run efficiently. Is parallelizing the code, even if it appears so at first glance by the programmer going to be sufficiently faster to exceed the overhead involved? Data flow? Welcome to the wonderful world of non-deterministic programming. You sound like you've been doing this for a while and yet you think it's easy. You must be incredibly good at it. Nobody has been able to write a program that can do it. I think it's hard. And heterogeneous parallel processing is very hard. I'm not going to debate the merits of various methodologies because I haven't tried them all. We all like to see the code we're running but we all use libraries, have for decades and always will.
I have read a book about this, after which my group chose not to use it; we use Open-MP.
I was in agreement with that decision. We wanted to achieve most of the gain with a minimum of intrusiveness. We felt Open-MP met that goal. It is supported inside our software development tools - not an external library. It is used by wrapping existing code in #pragma declarations, not by new calls to library functions - which I know seems like a minor point, but I feel that readability of source code is important. Using it did not significantly increase the footprint of our software. Open-MP by default uses all available threads to run code in parallel, it can be constrained to use a specified number. It automatically divides the work and in most cases automatically "joins" (combines into one result) the results; for more interesting cases it supports several ways to control what happens for "join". So far we are happy with our ability to benefit from it where needed, with little effort.
My opinion is that the ideal parallel code builder would be even more automatic. In an ideal world, since people think in one stream, parallelizing would be just a compiler switch and the source code would be analyzed by the compiler for opportunities to parallelize, and then that would be automatically implemented in the compiled code - without even a #pragma in the source to mark each section to run in parallel.