Generating Advanced Vector Extensions (AVX) Assembly Code with the Intel Compiler 2 Comments

 In my last blog, entitled “Digging Deeper Into Vectorized Parallel Assembly Code”, we looked at the assembly code generated by the Intel compiler. The assembly code supports vectorization of our loops. But we ended up with what were probably more questions than answers. So in this blog post, let’s tackle a few of the questions.

Before diving in, however, we need to consider the options that the Intel C/C++ Compiler gives us. Many of the options are available through the Project Options dialog box. But let’s look at the command-line help for clarification. The Intel compiler is available to use as a command-line. We won’t be compiling with it through the command-line, but we will use it to look at our options. If you’re using the default installation location, the compiler is the icl tool, found at:

c:\Program Files (x86)\Intel\Parallel Studio 2011\Composer\bin\ia32\icl.exe

When you run this program with the /help option, you can see the different options. Of importance are those found in the Code Generation section of the help output. The /Qx options generate code that runs exclusively on processors that support a certain architecture, such as SSE4.1, or AVX. (AVX is the most recent architecture.) The /Qax options, on the other hand, generate code for the specific architecture, as well as additional code for those processors that don’t support the architecture. There are many more options, but these are the two of interest to us.

When you open the Project Options dialog, you can choose between these sets of options. However, not all options are available. The /Qx options are in the Code Generation [Intel C++] section under the “Intel Processor-Specific Optimization” dropdown, but the newest option, AVX, isn’t present. The /Qax options are in that same section but under the “Add Processor-Optimized Code Path” dropdown. And again, the AVX option is not there. That’s the architecture I want to target here, since it’s the newest. Here’s how you enable that option. Set both of the two dropdowns that I just mentioned to “None.” That will cause the IDE to not automatically add any /Qx or /Qax options. Then, add the options manually by going to the Command Line section of the project options. Go down to the “Additional Options” and type in /QxAVX. Then click OK. This will tell the compiler to generate the AVX level code, but also separate code for processors that don’t have the capability.

Here’s the C++ code I’m working with. First, the vectorized function:

1
__declspec(vector) float domath(float op1, float op2) {
2
   return sqrt(op1) + sqrt(op2);
3
};

And the code that calls it:

1
#pragma simd
2
for (long i=0; i
3
   set3[i] = domath(set1[i], set2[i]);
4
}

At this point, I recommend looking at my last blog post here, and recompiling. Now let’s look at the assembly and see what we have. The assembly code ends up with both an actual function version of the domath function (probably in case a non-vectorized function wants to call it), as well as an in-line version that gets inserted right inside the loop. I’m only interested in the inline version. Here’s what I ended up with. First, there’s a version that looks like this:

1
vmovss    xmm0, DWORD PTR [edx+ebx*4]
2
vmovss    xmm1, DWORD PTR [ecx+ebx*4]
3
vsqrtss   xmm0, xmm0, xmm0
4
vsqrtss   xmm1, xmm1, xmm1
5
vaddss    xmm2, xmm0, xmm1
6
vmovss    DWORD PTR [esi+ebx*4], xmm2 

These are the actual AVX vectorized operations, and this is what gets executed on my machine. But there are some other versions that can run instead. Early in the program, the code tests the processor capabilities and stores them in a variable. Prior to running this vectorized version, the code does a test for what bits are in that capabilities variable.

But that brings up an interesting question: This high-end vectorized code lives inside the one executable. A lot of computers, however, don’t have the capability to run that particular code. That part of the code doesn’t run, fortunately, due to the if statement. But what does the computer do with that part of the code? Fortunately, it just ignores it. It’s just gibberish. The if statement causes the computer to skip over that part of the code, and so the code never runs.

We’re one step closer to a fuller understanding, but we still have more questions from last time as well as probably some new ones. So please share your thoughts below, especially if you have new questions. Enjoy!

________________________________________________________________

Jeff Cogswell is a Geeknet contributing editor, and is the author of several tech books including C++ All-In-One Desk Reference For Dummies, C++ Cookbook, and Designing Highly Useable Software. A software engineer for over 20 years, Jeff has written extensively on many different development topics. An expert in C++ and JavaScript, he has experience starting from low-level C development on Linux, up through modern web development in JavaScript and jQuery, PHP, and ASP.NET MVC.

Posted on by Jeff Cogswell, Geeknet Contributing Editor
2 comments
Sort: Newest | Oldest
Anders Borg
Anders Borg

"But what does the computer do with that part of the code?"

As you say it's just ignored (jumped over), as the executable is the same independent of the processor. There's nothing more to say really. Assembly 101 :).

Stephen Sharp
Stephen Sharp

How do you get the complete C++ keyword command list to work with parallel processing?