Compiling TBB programs and examples on Linux Ubuntu Share your comment!


This week I’m going to talk about something I haven’t covered before: Linux. Specially, running Threading Building Blocks on Ubuntu. We’ll look at using TBB with the GNU c++ compiler; soon we’ll also cover topics such as the Cilk Plus extensions for the GNU C++ compiler, as well as the actual Intel Parallel Studio on Linux.

I’m using version 12.04 of Ubuntu here. The steps for installing everything are pretty simple. I’ll assume you already have the GNU C++ compiler. (If you don’t, it’s easy to find instructions on that, elsewhere.)

But you need to install the TBB. The great thing is it’s available through the apt-get command-line package manager. You can also get it through the Software Center, by searching for “tbb” and installing the one called libtbb-dev. In either case, the version I’m seeing right now in the package manager is 4.0, release 223.1. The version on the Threading Building Blocks website is 4.1. But it’s not a big difference.   

To use the command line, just type:

sudo apt-get install libtbb-dev

(I’m assuming you’re logged in as a non-root user who has sudo access.)

Amazingly, this set up everything and I didn’t need to do any additional configuration—no setting environment variables or anything. To try it out, I created a quick C++ file like this, which tests out the parallel_for:


using namespace std;
using namespace tbb;

long len = 5000;
float *set1 = new float[len];
float *set2 = new float[len];
float *set3 = new float[len];

class GrainTest {
void operator()( const blocked_range& r ) const {
//std::cout << r.begin() << std::endl;
for (long i=r.begin(); i!=r.end(); ++i ) {
set3[i] = (set1[i] * set2[i]) / 2.0 + set2[i];

int main() {
parallel_for(blocked_range(0,len, 100), GrainTest() );
return 0;

You can use your favorite text editor; I’m using scite, which you can install through apt-get with the name scite.

Also, I’m running my Linux in a virtual machine; ideally you wouldn’t want to do this, especially if the virtual machine is limited on what cores it can see. You want to be able to use all the cores! So I lowered the length of my arrays to be really small, primarily to see if it would compile and run.

To compile it, you can execute the g++ command from the command-line like so:

g++ test1.cpp -ltbb

where test1.cpp is what I called the C++ file. I’m not specifying a name for the final executable, so by default it’s a.out. You can change it to something better with the –o option.

There’s also a nice set of examples available. These are the standard TBB examples, and are again available through apt.get. Here’s how you install them:

sudo apt-get install tbb-examples

When you do this, the examples will be put in a strange place, /usr/share/doc/tbb-examples. I recommend moving it to your own home directory (using sudo mv) and then changing both the owner and group to yourself to make it easy to try them. To do that, make sure you’re in the directory containing tbb-examples, and run:

sudo chown -R jeff tbb-examples
sudo chgrp -R jeff tbb-examples

but replacing “jeff” with your own username. 

After that, take a look at the README.Debian file in the root example directory, and follow the instructions for building it, including the line showing how to unzip all the .gz files. All the source and Makefiles start out compressed for some reason. Note, though, that running make doesn’t just compile each example; it actually runs it afterwards. You can go poking through the makefiles if you want to change this behavior. (Hint: the “all” target runs both release and test targets.)

Take a close look at these examples. They’re very good, and they all compile and run beautifully on Linux, and they should give you what you need to get started.

Posted on January 4, 2013 by Jeff Cogswell, Geeknet Contributing Editor

nice info. I ran most of the examples from tbb. By the way where can I get the right tutorial for that?

get soon in touch



Hi Jeff,

I'm a C programmer and I only have pthread theoretical knowledge. But I really liked your comments about the reduction of work using TBB rather than pthread. Could you provide some example, please? I'll appreciate that.

Best regards,

Anderson Bestteti

Sandro Alves de Souza
Sandro Alves de Souza

Hello Jeff.
May you tell us what the difference between using TBB library and using another thread libraries like "pthread"?
In my modest opinion, both libraries provide ways to create threads (parallel tasks).
What are the real benefits using TBB library?
Thanks in advance.


@Sandro Alves de Souza 

Hi Sandro, thanks for commenting! POSIX Threads (or pthread) is an older library that has been modernized a bit, but requires a different approach to programming -- you call functions in the pthread library to spawn new threads, passing in pointers of a function. Essentially the difference is that TBB is a higher-level library where you create instances classes that are thread-ready. Assuming pthread can spawn threads on multiple cores (I've heard there are newer versions that can, but I haven't tried it), you would still have to write your own classes from scratch that perform such functions as splitting an array into chunks and executing loops in parallel, and then combining ("reducing") the results into one.

So my experience is that TBB is a much bigger, feature-rich template-based C++ library for targeting multiple cores on today's processors, whereas pthread is for lower-level work and primarily consists of C functions. I imagine you could do most of the same things in pthread, but you would have to write a lot more code.

If there's interest, perhaps this would be a good topic for a future blog.




@jeffcogswell @Sandro Alves de Souza I'm a physicist, not a CStist, but are you saying that pthreads spawned by a particular process cannot be assigned to different cores? Do you mean sockets? I'm not sure about other OSes, but I don't believe that Linux treats threads and processes differently.  So in my (possibly flawed) understanding, they can be executed on whatever core the scheduler places them. I understand that there may be problems with cache sharing that are beyond my current knowledge, but I thought this was a problem of overhead and not preclusion.