NCSA Scientist Backs MICs over GPUs 1 Comment

Bookmark and Share

While the massively parallel Xeon Phi coprocessor faces supercomputers leveraging Nvidia’s graphic processing units (GPUs), Intel’s many-integrated-core (MIC) architecture will prevail, according to the Senior Research Scientist at the National Center for Supercomputing Applications (NCSA) Innovative Systems Laboratory at the University of Illinois, Urbana-Champaign

In a presentation at the Fifth International Workshop on Parallel Programming Models and Systems Software for High-End Computingn (P2S2) held in Pittsburgh Sept. 10-12, Volodymyr Kindratenko asserted that GPU accelerators will eventually lose out to Intel’s MIC because its architecture only requires a fine-tuning of parallel x86 code already running on supercomputers today.

“The software development effort for a MIC is more comparable to a performance tuning effort, rather than with a code reimplementation as is needed to use GPUs,” he told attendees at his session entitled   Hardware/Software Divergence in Accelerator Computing. “MIC architecture will eventually win the battle.”

Kindratenko cited as support conventional tools and programming languages such as ICC, IDB, VTune, C/C++ with pragmas and libraries and the ease of performance tuning.

“You can just recompile with MIC and start optimizing, but with GPUs you cannot simply recompile–you have to rewrite your algorithm just to get it to run before you can even start optimizing it.”

NCSA’s Lincoln Cluster used Intel Xeon main processors and Nividia Tesla graphic processing units (GPUs).

NCSA has built several large-scale GPU-based supercomputers, including “Lincoln”, which houses 1536 cores (384 Xeon quad-core “Hapertown” processors) connected over PCIe to 96 Nvidia Tesla S1070 coprocessors, each with four 512-bit wide GPUs with 240 treads each. NCSA more recently got advance access to Intel’s first MIC processor, the Xeon Phi, which it has been testing by running the same parallel molecular dynamics algorithms and other benchmarks already rewritten to run on the GPU-based LIncoln.

Kindratenko, however, emphasized his opinions come from practical considerations, not from any tests performed at NCSA. According to Kindratenko, the MIC architecture just fits-in better overall to existing supercomputer architectures, rather than GPUs, which are vector-based.

 “To get any advantage with a GPU, your code has to be vectorizable,” said Kindratenko. “The MIC architecture is broader, with many cores, wide vector units and high bandwidth to memory.”

Kindratenko concluded his presentation, made at the P2S2 panel session entitled “Battle of the Accelerator Stars,” by reflecting on how the MIC architecture is only considered an accelerator today could go mainstream.

“When the ‘war’ is over, what we consider today to be an accelerator, will be in our mainstream processor,” he predicted.

The “Battle of the Accelerator Stars” panel discussion targeted hardware/software divergence in accelerator computing as its theme, according to panel moderator, professor Yong Chen at Texas Tech University.

“The panel discussion generally concluded that accelerators will play a critical role in the future computing systems from high-performance systems, high-end servers, to desktop,” said Chen. “Programmability remains a critical issue for the wide adoption and success of accelerator computing. The hardware and software platforms that ease programmability, while not sacrificing performance, are most likely to win the battle.”

P2S2 was held in conjunction with The 41st International Conference on Parallel Processing (ICPP).

________________________________________________________________

Colin Johnson is a Geeknet contributing editor and veteran electronics journalist, writing for publications from McGraw-Hill’s Electronics to UBM’s EETimes. Colin has written thousands of technology articles covered by a diverse range of major media outlets, from the ultra-liberal National Public Radio (NPR) to the ultra-conservative Rush Limbaugh Show. A graduate of the University of Michigan’s Computer, Control and Information Engineering (CICE) program, his master’s project was to “solve” the parallel processing problem 20 years ago when engineers thought it would only take a few years. Since then, he has written extensively about the challenges of parallel processors, including emulating those in the human brain in his John Wiley & Sons book Cognizers – Neural Networks and Machines that Think.

Posted on by R. Colin Johnson, Geeknet Contributing Editor
5 comments
Gaetane
Gaetane

@S_T_E_V_E_H  Well spotted about the sponsor there, sir! Okay, in everyone's defense, I did not scrutinize the article thoroughly for every nuance so I could be wrong, but it seems to me the opinion expressed here is based pretty much on fact that you have to rewrite and recompile GPU but not MIC so therefore MIC is the... "winner". 

It says "Kindratenko, however, emphasized his opinions come from practical considerations, not from any tests performed at NCSA". Now, I'm no expert, but I reckon a fair amount of rewriting and recompiling had to be done when we transitioned from Babbage's Analytical Engine. Rewriting is not the problem. Time is always the problem and time is money. Perhaps a more balanced prediction might be that the faster system will ultimately be the "winner" because of the universal law of computing: More Transactions = More Money. This is why Linux runs the internet. And the supercomputers. ^_^ 

What might happen in the meantime is the easier to adopt system (winks at you Intel) will be cost-effectively leveraged (not fully adopted) while all of us coders are gleefully vectorizing our code. 

Now if anyone needs me, I'm off to commune with The Google and see if I can find a nice comparative table of these mysterious test results...



PifPaf
PifPaf

Well, we can find past senior scientists opinions about new trends and technologies.... So we'll see in twenty years :-D

swaroopcool21
swaroopcool21

I second the opinion of "S_T_E_V_E_H". Although Xeon Phi has been shown to perform better, but the bottom line is "On which application?". My personal opinion is that superiority of Xeon Phi or GPU depends on the application. For application which involves massive simple SIMD operations GPU is to lead no matter what. But for complicated operation there is no match for intel....

S_T_E_V_E_H
S_T_E_V_E_H

Hmmm... with a page banner of "Sponsored by Intel", I say follow the dollar and you'll find the underlying basis for the opinion.  If MIC does prevail, perhaps it might be for reasons other than whiz-bang technology status ... say, customer laziness, a.k.a. easy adoption.

Salman Marvasti
Salman Marvasti

Interesting conclusion, though I would think there is room for both unless MIC is faster than GPU.