How Xeon Phi Stacks Up to GPUs 3 Comments

Bookmark and Share

Xeon Phi lead architect George Chrysos  presented comparisons between using Xeon Phi co-processors instead of graphics-processor units (GPUs) at the recent Hot Chips conference. According to the Top500 Super Computer Sites ranking, Intel’s many-integrated core (MIC) architecture not only outperformed the two top GPU-based supercomputers on the most recent Top500 list, but was also “greener” by virtue of providing more performance-per-Watt.

“We optimized Intel’s Xeon Phi coprocessor to deliver leading performance per watt for highly parallel technical computing workloads,” Chrysos explained. “Power efficiency or performance-per-watt on the target workloads was our key metric of goodness.”

The strategy worked too, according to the most recent Top500 list of parallel processing supercomputers. The Top500 list compared the performance and energy consumption of Intel’s Xeon Phi based “Discovery” supercomputer to two GPU-based supercomputers–one using Nvidia- and one ATI-based GPUs. 

Intel’s Discovery Cluster was rated 150 on the Top500 and used Xeon E5-2670 8C 2.6-GHz main processors communicating with Xeon Phi coprocessors over a fourteen data rate (FDR) Infiniband interconnect. Intel’s Xeon-Phi-based Discovery Cluster was eight percent more power efficient than Barcelona Supercomputing Center’s Nvidia Tesla GPU-based supercomputer rated 177 on the Top500. The Bull SA’s B505 at the Barcelona Supercomputing Center used Xeon E5649 6C 2.5-GHz main processors with Nvidia 2090 GPUs connected by quad-data-rate (QDR) Infiniband. The Discovery Xeon Phi Cluster also edged out in power efficiency the Nagasaki Degima Cluster supercomputer rated at 456 on the Top500. The Degima is based on Intel i5 main cores and ATI Radeon GPUs communicating using a quad-data-rate (QDR) Infiniband (see figure).

The main reason for the better power-to-Watt performance profile of Xeon Phi coprocessors over GPUs was the extension of Intel’s sophisticated power management technology to the 50+ cores on a Xeon Phi die. As a consequence, only the cores that are currently running parallel threads were consuming significant amounts of power.

“We put Intel’s world-class power management technology into the Xeon Phi,”  Chrysos concludes. “When individual cores are idle, or the Xeon Phi is not processing anything, we reduce its power consumption proportionately.”

Chrysos notes that Intel made three major architectural improvements to optimize the Xeon Phi architecture in order to achieve its higher performance-per-watt rating over GPUs.

First, Intel boosted execution optimization to achieve 80-percent improved core performance, as measured by the CPU-intensive Spec CPU FP 2006 benchmark. The faster speed allowed tasks to execute more quickly, and was accomplished by switching among four parallel threads per core, rather than potentially wasting time doing speculative instruction processing as is common on pipelined architectures. In addition, Intel added a hardware instruction pre-fetcher to the Xeon Phi, a 512-bit wide L1 cache, a large 512kbyte L2 cache and a large translation look-aside buffer (TLB).

The second major improvement to performance-per-Watt was achieved by widening the single-instruction-multiple-data (SIMD) instructions to 512-bits. New SIMD instruction set features were also added–including register masking for vectorization of conditional branches for better pipelining and gather/scatter functions for faster loads-and-stores from irregular addresses. Also extended math-unit operations were added to allow vectorization of many common transcendental, square-root, reciprocal, logarithm and power functions.

The third major power efficiency improvement to the Xeon Phi was accomplished by adding a 512-bit wide bi-directional ring to connect cores to each other and to memory, along with a new streaming vector store instruction that conserves bandwidth when writing output-only arrays.

________________________________________________________________

Colin Johnson is a Geeknet contributing editor and veteran electronics journalist, writing for publications from McGraw-Hill’s Electronics to UBM’s EETimes. Colin has written thousands of technology articles covered by a diverse range of major media outlets, from the ultra-liberal National Public Radio (NPR) to the ultra-conservative Rush Limbaugh Show. A graduate of the University of Michigan’s Computer, Control and Information Engineering (CICE) program, his master’s project was to “solve” the parallel processing problem 20 years ago when engineers thought it would only take a few years. Since then, he has written extensively about the challenges of parallel processors, including emulating those in the human brain in his John Wiley & Sons book Cognizers – Neural Networks and Machines that Think.

Posted on by R. Colin Johnson, Geeknet Contributing Editor
3 comments
Richard Rankin
Richard Rankin

I just realized after reading what I have written that it may not be in the interests of some parties to have average citizens with access to powerful computers running intelligent software. In the time of Adam Smith people BOUGHT things based on quality and price. Now people are SOLD things based on name brand, how cool they are, forget quality and price. People used to be ELECTED to office, now they BUY their way in. What if suddenly someone gave them the power to see through the layers of crap to the facts and numbers that underlie them? What if someone gave them the adjunctive intelligence tools to take back the power, the knowledge and the freedom that was theirs? Maybe it’s not the average person who should be afraid of AI but the smoke and mirrors guys who are facing an enhanced public with a new kind of x-ray vision.

Richard Rankin
Richard Rankin

OK, I'm starting to see some real content here. The flops/watt issue is critical in the long run and for multiple reasons. Is there a paper that goes with the Hot Chips presentation? Are there more in-depth, technical papers on the Xeon Phi architecture? When can I get a Xeon Phi board to work with? I already have a Nvidea Kepler. And how about easing up on the parallel tools prices? This has got to be a microscopic portion of your total revenue and the sale of these tools will have a direct effect on the applications available to drive sales of the hardware. These tools are an investment for Intel if you truly believe in the direction you are taking. PC sales are sliding. But with the power I see available in the near future to implement software that can do things like level the playing field between the average investor and the hedge fund quant, this trend could change dramatically. Consumer electronics like iPads will always have their place, but if I can give the average person the power to run an application that can cut through the internet BS and actually compare available insurance plans, it will be happening on their "personal super computer" not their phone.