Intel’s Xeon Phi – its first commercial Many Integrated Core (MIC) processor officially due out this fall – promises to bring massive multiprocessors down from the lofty heights of world-class supercomputers to the domain of enterprise servers and workstations.
The Knights Ferry MIC architecture board housed this 32-core Aubrey Isle processor, the forerunner of the 50-core Xeon Phi, to be available on Knights Corner boards this fall. Source: Intel
By installing 50-core Xeon Phi processors on Knight’s Corner PCIe 3.0 boards, any Xeon-based server or workstation will be able to access the teraFLOPS performance levels previously only available to government labs and well-endowed corporate researchers.
If we look inside the Xeon Phi, however, we do not find exotic, untested technologies like those that have drained the R&D budgets of rival multiprocessor startups, but leading-edge semiconductor processes and architectural features that already have been proven out by existing Intel multi-core processors.
Intel’s latest 22-nanometer CMOS process – the Ivy Bridge die shrink of its proven Sandy Bridge microarchitecture – uses its pioneering 3-D FinFET transistors that already have put Intel years ahead of its semiconductor rivals worldwide.
High-speed ring topology
But just as important to the Xeon Phi’s performance is its use of the high-speed ring architecture, which had already been perfected for Intel’s second-generation Core processors and serves as the backbone of its latest multi-core Xeon processors.
Ring topologies are considered the ultimate for on-chip communications among up to 10-core processors, but have traditionally been considered too prone to congestion for linking more than a dozen or so cores. However, for the 50-core Xeon Phi, analysts claim that widened, bi-directional rings are still viable.
“In a regular processor, the system is at the mercy of whatever code the user wants to run,” says Gartner Inc. Vice President Martin Reynolds. “But the Xeon Phi will generally be handling carefully structured workloads, where the paths around the ring can all be managed, and the code can be set up to optimize the use of the ring.”
Intel’s use of high-speed ring topologies for interprocessor communications has been proven out in the latest incarnation of its popular Xeon E5 family, which uses twin 256-bit wide rings encircling eight cores for bi-directional interprocessor communications. For the 32-core prototype chips on its Knight’s Ferry board – the predecessor to the 50-core Knight’s Corner boards due out this fall – Intel’s boosted its ring topology to 1024-bits wide, offering bi-directional 512-bit wide rings and matching SIMD units.
Real-world use cases
The tactic worked, according the Leibniz Supercomputing Centre (Germany) and the Korea Institute of Science and Technology Information (KISTI), both of which were beta-sites. CERN (Switzerland), who recently uncovered evidence of the Higgs boson, as announced earlier this month, was also a very active tester of Knight’s Ferry boards. MIC servers and workstations using Knight’s Ferry boards also have been demonstrated by Colfax, Dell, Hewlett-Packard, IBM, SGI, and Supermicro.
Interprocessor communications on-chip are well handled by rings, but communications between Xeon E5 supervisor processors and the forthcoming Knight’s Corner coprocessor boards will be over the PCIe bus. Its multiple-gigabit-per-second serial lanes also will handle interprocessor communications among Xeon Phi chips on separate Knight’s Corner boards, a topology that the Texas Advanced Computing Center (TACC) promises to assemble into a multi-petaFLOPS supercomputer configuration called Stampede. Likewise, Cray has announced it will offer Xeon Phi based coprocessors for its next-generation Cascade supercomputers.
Boosts for the future
For the future, Intel is aiming to boost the petaFLOPS performance of supercomputers based on its MIC architecture into the exaFLOPS range, which may necessitate a move to high-speed mesh interconnection topologies. Intel already has demonstrated experimental on-chip mesh interconnects for its single-chip cloud computer (SCC), as well as experimental silicon-chip-based lasers for implementing high-speed optical links for chip-to-chip comm. However, for now, Intel’s massively wide on-chip rings still have plenty of headroom for putting tera- to petaFLOPS of processing power inside MIC-enabled Xeon Phi based supercomputers, servers, and workstations.
Colin Johnson is a Geeknet contributing editor and veteran electronics journalist, writing for publications from McGraw-Hill’s Electronics to UBM’s EETimes. Colin has written thousands of technology articles covered by a diverse range of major media outlets, from the ultra-liberal National Public Radio (NPR) to the ultra-conservative Rush Limbaugh Show. A graduate of the University of Michigan’s Computer, Control and Information Engineering (CICE) program, his master’s project was to “solve” the parallel processing problem 20 years ago when engineers thought it would only take a few years. Since then, he has written extensively about the challenges of parallel processors, including emulating those in the human brain in his John Wiley & Sons book Cognizers – Neural Networks and Machines that Think.