Most Recent Tune Posts RSS

Mere Mortals: Compile Fortran, C, C++ like a Ninja

Highly optimized computing routines are often associated with low-level programming. Assembly code, intrinsic functions, OS-level multithreading interfaces and other sharp weapons used by ”ninja programmers” are believed necessary for penetrating deep into hardware and snatching every FLOP — especially when optimization is performed for a computing accelerator. In this case, …

Read Full Post Posted in Tune | Leave a comment

Improving Your Coding AND Professional Craft

  Becoming a better parallel coder can also make you smarter in your professional field, according to Michael D’Mello, Program Manager, Tools Immersion Program at Intel. Increasing mastery of programming tools not only ups your development game, but over time can become “an integral part of your workflow and thought …

Read Full Post Posted in Tune | Leave a comment

Meeting New Challenges of Scaling Parallelism Across and Within Cores

  With today’s powerful new multi-core processors, if you’re not vectorizing – breaking data into chunks – you risk leaving 8x or 16X of Intel Xeon Phi’s theoretical peak power on the table and not scaling code efficiently. The solution, according to Intel’s Ron Green, is for programmers to start …

Read Full Post Posted in Tune | Leave a comment

Tuning OpenMP Applications

  High performance computing (HPC) has a long history and today is critical to business, research, and science. Clusters consisting of thousands of machines help enable many advances of modern science with both theoretical and practical implications, working 24/7 to enrich the lives of every person on earth. This article …

Read Full Post Posted in Tune | Leave a comment

The Chess Puzzle: Learning to Love Fast Rejection

There is much to be said for fast rejection. It saves time and effort that can be better spent searching elsewhere. This article discusses a parallel algorithm for solving a chess puzzle that exploits fast rejection. It a good demonstration of basic Intel® Cilk™ Plus programming to solve an interesting puzzle. …

Read Full Post Posted in Tune | Leave a comment

How Developers Can Handle the New Hardware Complexity

More processors, more cores, more threads, wider registers. The latest generation of new processors introduces new complexity. Today’s “hardware explosion” requires a new way of thinking about architecting, building and tuning parallel programs to take advantage of powerful new capabilities. Join Intel Senior Engineer Gary Carleton and Go Parallel Editor …

Read Full Post Posted in Tune | Leave a comment

Identifying, Modeling, Designing, Optimizing Parallelism— Intel’s Ronald Green previews topics and tips from the 2013 Intel Software Conference Road Show

New processors with up to 61 cores and bigger vector units open up whole new realms of for parallel developers. Join compiler and optimization whiz Ronald Green at the upcoming Intel Software Conference road show and learn how you can get the most from these exciting new capabilities. Ronald is …

Read Full Post Posted in Tune | Leave a comment

Get 2.5X performance improvement using Full Vectors versus Scalar

To help the compiler generate better vector code, sometimes it helps to decompose complex data structures to allow the compiler to understand the available parallelism and vectorize the code.     Decomposing data accesses may allow the compiler to use more advanced features like vector gather and scatter. Though adjacent …

Read Full Post Posted in Tune | Leave a comment

Achieving Better Parallel Performance of Fortran Programs

Learn how to identify hotspots — the most time-consuming program units, how to effectively use available cores, how to discover causes of ineffective utilization, and much more.  This information webinar presentation shows how you can leverage parallelization technology to achieve better performance on multicore systems. Includes a high-level overview of …

Read Full Post Posted in Tune | Leave a comment

Offload Runtime for the Intel(R) Xeon Phi Coprocessor

The Intel® Xeon Phi™ coprocessor platform has a software stack that enables new programming models, including offload of computation from host processor to the Intel® Xeon Phi™ coprocessor to improve response time and/or throughput. A new paper shares draws on insights from a multi-year, intensive development effort to answer common …

Read Full Post Posted in Tune | Leave a comment