Most Recent Tune Posts RSS



Fast Matrix Multiply Fortran Program Using OpenMP

David Bolton demonstrates how to speed up an intensive Fortran program, making it three times as fast by using OpenMP. First, he runs an unoptimized version that takes about 18 seconds to do a matrix multiplication of two 650 x 650 arrays. Then he runs it optimized in just six …

Read Full Post Posted in Tune | Tagged , | Leave a comment

Parallel Power to the Programmer: Coding Course Leads the Way

As systems with multiple CPUS, each carrying multiple cores become increasingly popular for a large variety of workloads, organizations unsurprisingly want to take full advantage of each CPU and coprocessor that are part of the execution environment. Those who include Xeon Phi coprocessors as part of their infrastructure will be …

Read Full Post Posted in Tune | Tagged , | Leave a comment

Go Lock-Free to Keep Your Code Up to Speed

To get the best performance in parallel programming, you want to try to avoid locks and critical sections, which can slow down your code. In this blog, Jeff Cogswell investigates lock-free programming and explains briefly how it works. Then he’ll show you where to learn more from a master with …

Read Full Post Posted in Tune | Tagged | Leave a comment

4 Steps to Tune Up MPI Apps for Boosted Performance

Scientific and engineering programmers want to get every bit of performance possible from clustered systems. The growing popularity of MPI applications calls for new tools to analyze and improve overall system performance. This white paper discusses a methodical four-step approach to profiling and analyzing MPI performance using Intel Trace Analyzer …

Read Full Post Posted in Tune | Leave a comment

Multicore vs. Vectorized: Programming Techniques Compared

Parallel programming includes two separate technologies multicore and vectorized programming. But what is the difference and how can the two work together? Jeff Cogswell tackles this question. Here at Go Parallel, we’ve talked about two primary ways you can use parallel programming: multicore and vectorized. I’ve received a few emails …

Read Full Post Posted in Verify | Leave a comment

Calculating Pi with Monte Carlo and MKL

The Math Kernel library provides a great way for calculating huge arrays of random numbers. Creating a Monte Carlo simulation is then easy once you have these random numbers. Jeff Cogswell shows how you can use both MKL and a Monte Carlo algorithm to estimate pi, thus learning the mechanics …

Read Full Post Posted in Build | Leave a comment

Tackling Concurrent Kernel Offloading in Xeon Phi

The Xeon Phi coprocessor includes 61 cores, allowing for great scalability in programs. But some programs don’t scale well, requiring different approaches to maximizing performance. In this blog, Jeff Cogswell explores a chapter from the book “High Performance Parallelism Pearls,” which covers this topic in detail. Here at Go Parallel, …

Read Full Post Posted in Build | Leave a comment

Tapping into Random Number Generators in MKL

The Math Kernel Library (MKL) includes a whole set of random number generators that are parallel-friendly and thread-safe. These generators can quickly fill entire arrays with random numbers, even when the arrays contain millions of elements, all with a single function call. Jeff Cogswell puts them to use and looks …

Read Full Post Posted in Build | Leave a comment

Solving the N-Body Problem in Parallelism

A common problem in physics and science, the n-body problem requires huge amounts of calculations to solve—making it an excellent application for parallel programming. Jeff Cogswell discusses how a new book that tackles big problems with parallel coding helps solve n-body models as well. If you’re interested in taking your …

Read Full Post Posted in Verify | Leave a comment

Manage Your Threads with Task Arenas in TBB

Threading Building Blocks (TBB) includes a default task scheduler that works well but is somewhat limited. If you need more control over your thread scheduling, you can use a task arena. Jeff Cogswell shows you how it works. One of the C++ and parallel programming experts at Intel named Anton …

Read Full Post Posted in Build | Leave a comment