Most Recent Build Posts RSS



Making the Most of the MKL Eigensolver

It always amuses me how having a procedure named after someone can make it sound exotic or fancy. However, with words that have “Eigen” at the start, it’s surprisingly not named after anyone called Eigen but appears to be a German word that means “characteristic” or “peculiarity.” An Eigensolver is …

Read Full Post Posted in Build | Tagged , , | Leave a comment

Maximizing the Message Passing Interface

You walk into the Monday morning staff meeting and your boss tells you that your next assignment is an application that communicates with company servers across the country. He says that he read about Intel’s Message Passing Interface (MPI) and suggests using it. Based on Intel’s reputation, you know it …

Read Full Post Posted in Build | Tagged , , | Leave a comment

New Intel HPC apps keep spaceships, datacenters safe and cool

Supercomputer app that models stability, hot spots in hypersonic reentry could also model chemical spills or the flow of air conditioning in datacenters Tools developed to build supercomputer-driven 3-D models that simulate the lift and re-entry of next-generation of space vehicles could also help solve complex thermodynamic problems in places …

Read Full Post Posted in Build | Tagged , , , , | Leave a comment

Examining Explicit Vectorization

Is your code getting the most out of the latest CPU features and cores? This blog demonstrates how to ensure the highest performance possible through explicit vectorization. In this blog, I’m going to talk about explicit vectorization. First, I want to say a few words about what vectorization means for …

Read Full Post Posted in Build | Tagged , | Leave a comment

Avoiding Multicore Pitfalls: Get the Performance You Code For

As use of multicore architectures continues to grow, it becomes increasingly important to know how to maximize the performance of every core. However, some coding techniques can lead to race conditions that skew results or false sharing conditions that can drastically reduce — or eliminate — all the performance gains …

Read Full Post Posted in Build | Tagged , , | Leave a comment

Optimizing for Intel MIC Part 2: Vectorizing for Added Performance Gains

There’s more than one way to get the most out of the latest Intel multicore and many core architectures embodied in Xeon and Xeon Phi, and this series shows several ways to do just that. In “Optimization Techniques for the Intel MIC Architecture. Part 2 of 3: Strip-Mining for Vectorization,” …

Read Full Post Posted in Build | Tagged , , | Leave a comment

Parallel Algorithm Boot Camp

I have spent a good bit of time writing about parallelization using Parallel Studio. My blogs have included using OpenMP, Cilk, and Threaded Building Blocks. And while these are important because they provide an opportunity to parallelize code that is normally sequential, it is now time to look into algorithms …

Read Full Post Posted in Build | Tagged , , | Leave a comment

Get Best Memory-Consumption Scalability of Intel MPI Library

High-performance computing applications tend to use most of the available memory on a node, making it difficult to estimate the memory consumption of MPI libraries. But there are ways to estimate memory consumption and ways to fine-tune the Intel MPI Library settings to reduce the memory footprint. For example, users …

Read Full Post Posted in Build | Tagged , | Leave a comment

Multicore Optimization Realized: Tuning for Intel MICs

As the number of cores available to programmers has grown, so have the opportunities to exceed results expected by Moore’s Law. Servers routinely offer dozens of Xeon cores and the ability to run hundreds of simultaneous threads on Xeon Phi coprocessors. The question is how best to parallelize and optimize …

Read Full Post Posted in Build | Tagged , , | Leave a comment

Reproducing Results With Intel MPI Library

With high-performance computing, floating-point operations in numerical codes can introduce differences that increase with each iteration. The Intel MPI Library offers algorithms to gather conditionally reproducible results, even when the MPI rank distribution environment differs from run to run. Learn more about how you can achieve conditionally reproducible outcomes without …

Read Full Post Posted in Build | Tagged , | Leave a comment