Independent Test: Xeon Phi Shocks Tesla GPU Share your comment!

Intel’s Xeon Phi coprocessor outperforms Nvidia’s Tesla graphic-processing unit (GPU) on the operations used by “solver” applications in science and engineering, according to independent tests at Ohio State University.

When comparing Intel’s Xeon Phi to Nvidia’s Tesla, most reviewers dwell on how much easier it is to rewrite parallel programs for the Intel coprocessor, since it runs the same x86 instruction set as a 64-bit Pentium. 

Nvidia’s “Cuda” cores on its Tesla coprocessor, on the other hand, do not even try to emulate the x86 instruction set, opting instead for more economical instructions that allow it to cram many more cores on a chip.

As a result, Nvidia’s Tesla has 40-times more cores (2,496) than Intel’s Xeon Phi (60). The question then becomes: “is it worth it” to rewrite x86 parallel software for Nvidia’s Cuda, in order to gain access to the thousands of more cores available with Tesla over Xeon Phi?

Intel’s Xeon Phi SE10P (red) beat Nvidia’s Tesla C2050 and K20 GPUs (light and dark green, respectively) in 18 out of 22 tests. The Xeon Phi also beat dual Xeon X5680s (each with six cores for 12 cores total, light blue) and dual Xeon E5-2670s (each with eight cores for 16 total, dark blue) in 15 out of 22 tests. Source: Ohio State

To find the answer, Ohio State decided to narrow down the question to the types of parallel programs scientific researchers run regularly. For the test, researchers chose the parallel processing operations routinely performed on large sparse matrices. Variously called eigensolvers, linear solvers and graph-mining algorithms, these applications encode vast parallelism into wide-dense vectors multiplied by the large sparse matrices.

The results? Xeon Phi outperformed even the fastest Tesla coprocessor–the K20 with 2,496 cores each running at .7 GHz–while using only 61 cores each running at 1.1 GHz.

The coprocessors were tested on two batteries of 22 matrix operations–44 total–resulting in speeds ranging from 4.9-to-13.2G FLOPS for Tesla on the first battery.

The Xeon Phi, on the other hand, achieved up to 15 G-FLOPS on the first battery, beating the Tesla on 12 of the first 22 tests. 

For the second battery, the Xeon Phi outperformed the Tesla on 18 of the 22 tests, achieving a peak of 120 G-FLOPS with over 60 G-FLOPS on eight of the 22, whereas the Tesla never quite achieved 60G FLOPS on any of the 44 tests. 

The Ohio State researchers also compared the Xeon Phi to several other configurations, including a different model Tesla (C2050) as well as against conventional multi-core “Westmere” and “Sandy Bridge” Xeon processors.

The analysis also includes some interesting findings regarding the bandwidth and latency of the Xeon Phi memory space, compared to both Tesla GPUs and conventional multi-core Xeon processors. Read details in Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi.

Posted on February 20, 2013 by R. Colin Johnson, Geeknet Contributing Editor