Tackling the complex topic of parallel programming for a high-performance machine such as the Intel Xeon Phi Coprocessor might seem like a terrifying ordeal. But authors Jim Jeffers and James Reinders provide a logical, manageable, and real-world approach to the task.
Parallel programming isn’t for the faint of heart. Sure, many of us have studied serial algorithms and have mastered them. We may also have studied a little bit of parallel programming. But putting concepts into practice requires a lot more expertise than just understanding algorithms. It requires a strong grasp of the underlying hardware architecture, for starters, including processing at the assembler level.
One architecture that’s especially complex is the Xeon Phi Coprocessor. Featuring dozens of cores (typically 61, but different models are available), it works alongside a main Xeon processor. How do you learn to program such a beast?
I started by reading the book Intel Xeon Phi Coprocessor High-Performance Programming, by Intel’s Jim Jeffers and James Reinders. (ISBN: 978-0124104143. Retail: $51). Published this spring by Morgan Kaufmann, the book spans more than a dozen chapters.
I especially enjoyed how accessible the book is. While Jeffers and Reinders are known experts in the field of parallel programming, you don’t need a doctorate in engineering to understand what they’re saying. Right at the beginning, the authors explain the architecture behind the Coprocessor and how it fits together with the main Xeon processor. They take you through examples of logging into Linux that runs on the coprocessor, and inspecting the individual cores.
Despite being easy to read, the information packed into this book is by no means basic. Using the metaphor of a race car, in Chapter 2 the authors start out with a simple program that doesn’t use multiple threads but does at least have vectorization. They then explain how to move the code up to two threads, and then higher, up to all the cores. Examples are given in different programming languages, including C and Fortran.
As the chapters progress, the offered code and examples change from simple and contrived to more real-world like scenarios. I found this approach especially helpful, like when first-semester Physics students are told to “neglect friction” – even though real-world requires consideration of friction. In similar fashion, Jeffers and Reinders provide real-world scenarios you can use as a basis for production code.
The book includes two especially important chapters; one on vectorization, and one on coding for parallel threads. Regular readers of this blog will recognize these two main approaches to parallel programming, and this book explains them both well. For example, the vectorization chapter provides a full rundown of the SIMD directives, as well as how to align your data. The authors even take you down to the assembly level of the compiled examples, enabling you to see exactly what’s going on at the processor level.
As both authors are employees of Intel, they naturally prefer Parallel Studio as a development tool. But they also provide plenty of instructions for standards such as OpenMP that work with more compilers, and the book is by no means an advertisement for Parallel Studio. (Personally, I think if you’re programming for the Xeon Phi Coprocessor, you’ll at least want to use the Intel compiler; the full Parallel Studio would certainly help.)
Later in the book, the authors present much more information about the coprocessor’s architecture, including management and administration. Chapter 9, for example, examines the system software running the coprocessor, including different software platform stacks. Chapter 10 offers an in-depth discussion on the Linux system that runs on the coprocessor, and discusses several management utility (such as rebooting the coprocessor; shutting it down; mounting file systems; and so on).
One interesting aspect of the Xeon Phi coprocessor is that it provides additional support for high-performance mathematical operations. As such, the authors have devoted Chapter 11 to the Intel Math Kernel Library (MKL). This chapter explains three different programming models available for programming with the MKL. There’s a short overview of the MKL, and then extensive information on compiling and building to enable the code to run properly on the coprocessor. For example, some MKL functions can run distributed across multiple cores; the book provides information on how to make this happen.
Chapter 12 is devoted to the next step up: Multi-node, distributed systems, with each node running a Xeon and one or more Xeon Phi coprocessors. The chapter primarily focuses on how to coordinate the nodes using cluster tools such as the Message Passing Interface (MPI library). This is a very advanced chapter that not everyone may need, but it’s worth a read.
The next chapter, 13, focuses on profiling and timing. Even if you skip Chapter 12, you’ll definitely want to read this one. While it might seem surprising that this information is at the end of the book, it probably makes sense: After you’ve mastered all the concepts, it is time to go to the next level and profile your application so that you can fine tune it.
Even if you don’t work your way completely through the book, I strongly recommend reading this chapter. It offers a nice overview of profiling, and provides a lot of definitions and concepts that can be used both in Xeon Phi programming, and in programming overall.
Finally, there’s a short summary (numbered as Chapter 14) that provides some thoughtful parting advice. Remember, the two authors are programming gurus with many years of experience. The times I have met these two gentlemen, I have listened closely and quietly to anything they can tell me. That’s what this chapter is for.
I didn’t expect this book to so accessible. The writing is top-notch, the examples have real application. And text contains plenty of code, it doesn’t overwhelm the narrative. There’s far more text than code, and plenty of valuable information. So if you’re going to learn parallel programming, especially for the Xeon Phi, this is the book to turn to. I’m ready to read it a second time.