In my recent blog called “Optimizing Your Loops with Vectorization,” I introduced the basics of vectorization. In this blog, I’m covering the next step in vectorization, which involves breaking out the vectorized code into its own function.
Normally, when you write a vectorized loop, you can’t call any functions, or the loop won’t get vectorized. Try this: take the code from my aforementioned blog, and move the calculation into its own function like the following (but don’t run the code yet, as we’re not ready. And if you do run it, you’ll need to press Ctrl+C to end it as it would take a very long time to print out 110 million numbers):
01 float domath(float op1, float op2) {02 std::cout << op1 << std::endl;03 return sqrt(op1) + sqrt(op2);04 };05 06 void vectorize1() {07 std::cout << sizeof(float) << std::endl;08 long len = 110000000;09 float *set1 = new float[len];10 float *set2 = new float[len];11 float *set3 = new float[len];12 13 srand( time(0) ); // seed the generator14 cilk_for (long i=0; i<len; i++) {15 set1[i] = 100.0 * rand() / RAND_MAX;16 set2[i] = 100.0 * rand() / RAND_MAX;17 }18 19 SYSTEMTIME starttime;20 SYSTEMTIME endtime;21 GetSystemTime(&starttime);22 for (long i=0; i<len; i++) {23 set3[i] = domath(set1[i], set2[i]);24 }25 GetSystemTime(&endtime);26 std::cout << (endtime.wSecond * 1000 + endtime.wMilliseconds) - (starttime.wSecond * 1000 + starttime.wMilliseconds) << std::endl;27 28 delete[] set1;29 delete[] set2;30 delete[] set3;31 }
Checking for Vectorization
Now we want to see if the loop gets vectorized. The compiler has an option whereby it will give us details on what’s vectorized and what isn’t. Open up your project properties, and go down to the C/C++ section and find Diagnostics [Intel C++]. This page controls what diagnostics information the compiler provides us. Set Vectorization Diagnostic Level to “Loops Successfully and Unsuccessfully Vectorized (2) (/Qvec-report2).”
Rebuild your project. You should see some messages in the Output window with the line number of your loop, like so:
1 C:\dev\tests\ParallelStudio1\ParallelStudio1\ParallelStudio1.cpp(504,2): warning : loop was not vectorized: existence of vector dependence.
The “dependency” it’s referring to is the extra function that can’t be vectorized, in this case the cout’s insertion operator, as well as the existence of the cout object itself. Now remove the cout line altogether, and add declspec like so:
1 float domath(float *op1, float *op2) {2 return sqrt(*op1) + sqrt(*op2);3 };
Compile the code again. This time when you look at the output you should see a message like so:
1 1>C:\dev\tests\ParallelStudio1\ParallelStudio1\ParallelStudio1.cpp(491,55): warning : FUNCTION WAS VECTORIZED.2 1>C:\dev\tests\ParallelStudio1\ParallelStudio1\ParallelStudio1.cpp(511,2): warning : LOOP WAS VECTORIZED.
Now the loop is vectorized, and the function was created in a manner to allow for vectorization. The code will now run substantially faster.
Why This Works
At a fundamental level, the processor is able to perform simultaneous floating point operations stored in a single register. But in order to make this work, the compiler must be able to generate a vectorized form of the function being called (in our case, “domath”). The compiler needs to be able to compile this down to a simple function that makes use of simultaneous floating point operations. Functions like the one we wrote, which only have math in them and no external function calls, can be compiled like that. With external function calls, they can’t.
In Case You’re Curious
Incidentally, here’s how I determined if the cout object itself was part of the problem. I knew the insertion operator was causing a problem since that’s a function call. A lot of beginner C++ programmers are under the assumption that cout is a keyword, when in fact it’s an object. An object has an address, and that address can be cast to an integer, and subsequently used in a calculation. The value is meaningless, but that allows us to use cout in our function without the insertion operator present, and without cout getting optimized out. Here’s what I did:
1 int x = (int)(void *)(std::cout);2 return sqrt(*op1) + sqrt(*op2) + x;
That’s not particularly useful code, but it let me force cout into the optimized code.
Coming Up
Next time we’ll look at what floating point operations are available to us, and how we can get the most out of them.
Meanwhile, here’s a question for you: Have you been able to re-compile any of your mathematically intensive code using these techniques and see performance increase?
Let me know in the Comments section below.
_______________________________________________________________
Jeff Cogswell is a Geeknet contributing editor, and is the author of several tech books including C++ All-In-One Desk Reference For Dummies, C++ Cookbook, and Designing Highly Useable Software. A software engineer for over 20 years, Jeff has written extensively on many different development topics. An expert in C++ and JavaScript, he has experience starting from low-level C development on Linux, up through modern web development in JavaScript and jQuery, PHP, and ASP.NET MVC.







