Haswell Architectural Ups Parallelism Ante 2 Comments on Haswell Architectural Ups Parallelism Ante


Intel has revealed new architectural details about the Haswell micro-architecture and its support for parallel processing. The faster, lower power Haswell will drastically cut power across the board, offer almost double the graphics processing speed and will include new advanced vector extension (AVX) instructions and other parallel enhancements, the company says. Two- and four-core Haswell processors for PCs, tablets and Ultrabooks will be available early in 2013, with workstation and server models to come later.

No need to wait, though; Haswell can already be taken for a test drive  since it is supported by the Parallel- and Cluster-Studio XE 2013 suites (which also allow programmers to kick-the-tires of the 50+ core Xeon Phi coprocessor. Both Haswell and the Xeon Phi are cast in Intel’s advanced 22 nanometer process technology using its next-generation 3-D tri-gate transistors.

As its 4th-generation Intel Core processor family, the Haswell micro-architecture is designed to be faster, more energy efficient and more secure than any other Intel processor to date. High definition (HD) graphics support, integrated input-output (I/O) processing, faster encryption instructions and hardware-based security features have been added, along with a doubling of bandwidth to its level one (L1) and level two (L2) caches.

According to Intel, existing software will see immediate 10 percent gain and double performance of the built-in graphics. Other graphics-heavy apps will get accelerated even more. Intel expects a new wave of innovative mobile designs for Core processors, citing over 140 recent design wins for PC, tablet and Ultrabooks to be introduced in 2013.

“You’ll see our customers delivering sleek and cool convertible designs, as well as radical breakthrough experiences across a growing spectrum of mobile devices,” David Perlmutter, Intel’s executive vice president and chief product officer , said at the recent Intel Developers Forum (IDF).

Executive vice president and chief product officer David Perlmutter provides updates about Haswell at the recent Intel Developers Forum.

To support next-generation applications, Intel also recently announced a new “perceptual computing” effort that uses human-like 3-D vision sensors to perceive a user’s gestures and actions. A new software development kit

(SDK) called the Intel Perceptual Computing Software Development Kit includes algorithms for gesture interaction, facial- and voice-recognition as well as augmented reality applications. The perceptual computing SDK works with the 3-D vision sensor from Creative called its Interactive Gesture Camera Developer Kit.  See an exclusive video of the technology in action  here

Power to Parallelism

Intel has reduced the idle power of its 4th- generation Intel Core family by more than 20 times over the second-  generation, allowing Haswell micro-architecture processors to operate on less than 10 watts., enabling thinner, lighter mobile devices with much longer battery life. Intel also recently revealed plans to add a new line of even lower-power processors based on the Haswell micro architecture for intelligent embedded designsin 2013.

The Haswell micro architecture also has added significant new support for the needs of parallel programming through  transactional synchronization extensions. TSX lets programmers specify regions of code for transactional synchronization, which is useful in shared-memory multithreaded applications that employ lock-based synchronization. By avoiding unnecessary serializations and exploiting concurrency that would otherwise be hidden, the extensions provide fine-grain locking performance while only requiring programmer to use coarse-grain locks,  A new Restricted Transactional Memory (RTM) interface allows programmers to define transactional regions more flexibly, as well as provide an alternative code path for when transactional execution is not successful.

Haswell also features Advanced Vector Extensions (AVX) for floating point, and AVX2 instructions for integer data types, including a new fused multiple-add that the chip executes twice per cycle, thus doubling its floating point performance over Sandy Bridge.


Colin Johnson is a Geeknet contributing editor and veteran electronics journalist, writing for publications from McGraw-Hill’s Electronics to UBM’s EETimes. Colin has written thousands of technology articles covered by a diverse range of major media outlets, from the ultra-liberal National Public Radio (NPR) to the ultra-conservative Rush Limbaugh Show. A graduate of the University of Michigan’s Computer, Control and Information Engineering (CICE) program, his master’s project was to “solve” the parallel processing problem 20 years ago when engineers thought it would only take a few years. Since then, he has written extensively about the challenges of parallel processors, including emulating those in the human brain in his John Wiley & Sons book Cognizers – Neural Networks and Machines that Think.

Posted on October 2, 2012 by R. Colin Johnson, Geeknet Contributing Editor
  • Richard Rankin

    There is one small aspect of Intel’s direction mentioned above that I’m not sure I understand. I’m not sure if this is the right forum but I’m going to fire away. It seems as though Intel is making improvements to the vector processing units of their chips and emphasizing this functionality while moving less rapidly in the direction of massive core counts and emphasizing more powerful cores. Ignoring hyperthreading, the choice seems to be between more cores(pipelines) or more complex cores. I don’t want to underestimate the complexity of managing a bzillion cores but I’m getting mixed signals from designers in terms of pros and cons. I’d like to see a side by side comparison, a debate or perhaps a barroom brawl regarding the relative merits of these two approaches regarding several aspects. Engineering complexity, performance per watt, difficulty of implementing systems (perhaps some applications are better for one than the other), approaches to heterogeneous processing, which I do now, but of which there are undoubdtedly many and the degree of heterogeneity may increase given certain developments in tools. For example, FPGAs have been getting easier to program and at some point, having a large transister count chip that can be converted from one high-performance, special purpose processor to another in a nanosecond with a stack of a a dozen designs standing ready may become an asset. Especially when I can defend a patent on a piece of logic as a circuit (in an FPGA), far more easily than a patent on a piece of C++ code even when they are functionally identical. Can someone comment on this or start another post coverying these topics?

  • R. Colin Johnson

    Since the Xeon Phi will not be available until later this fall, you’ll have to wait to test it out for yourself. The only comparison metrics between using x86-cores instead of the simpler vector-oriented cores of GPUs, is the Hot Chips presentation here:
    But this link is mainly for others, since I see you have already commented on that story. Thanks!