While it’s important to master the tools used in parallel development such as Parallel Composer, Parallel Amplifier and Parallel Inspector, it’s good now and then to step back and let the work of others inspire your work. Today we’ll see how a company called Flow Science used the full Intel® Cluster Studio XE tool suite to enhance the serial and parallel performance of its FLOW-3D app.
Beyond Theory and Technique
In parallel development, verification is important. While developing parallel algorithms might at first seem easy, it’s skill that must be honed. Intel Parallel Studio helps out by making suggestions and verifying your parallel code is correct.
Here on Go Parallel, we’ve provided lots of information about using various tools to perform verification at different levels. For example, in this blog I showed how to use Parallel Amplifier to do verification, even though most people think of using Parallel Inspector. Both tools can be used for different aspects of the development process. Inspector is usually used for verification and memory leaks, Amplifier for other types of verification.
It’s easy to get wrapped up in theory and technique. But how well do these tools perform in the real world? And how can we get new ideas about ways to maximize our development efforts.
When a big company like Intel publishes case studies, they’re usually targeted at management. As a part-time writer, I find myself reading a lot of case studies to gather facts and figures for articles. But as a full-time software engineer, I’ve found that reading cases really helps my own development.
So let’s look at a real case focusing on parallel verification. We’ll start with a look at the basic problems. (Impatient types can skip to the next section.)
The challenge: Fluid Dynamics and Parallel Programming
If you’ve taken any physics classes or, better, are fortunate to work in a physics laboratory, you’ve learned about fluid dynamics, the study of how fluids flow. But more specifically, it’s about how fluids flow through and around things, relevant to many fields including hydraulics, aerodynamics, and even climatology and meteorology (since the Earth’s atmosphere is a gaseous fluid). The mathematics is pretty amazing, even using the “partial derivatives” we suffered in Calculus class.
Typically fluid dynamics occurs in three dimensions. Modeling such systems takes advanced computing power. Consider, for example, a weather system. Depending on the method of calculation (because Earth’s atmosphere doesn’t end abruptly), the effective height of the atmosphere is 8.2 kilometers. For the U.S. alone, that means the atmosphere takes up about 80 million cubic kilometers. Suppose you want to model not just the entire U.S. atmosphere, but the motions at point at every cubic meter. You’re dealing with an enormous amount of data. The points alone are 8.0 x 10^16 data positions, and that’s just positions. Add to that complete calculations at each point. It’s pretty obvious you’ll need more than a simple quad-core processor for this kind of work.
Similarly, if you’re dealing with a small enclosure with fluid flowing say 1 meter wide, deep, and tall. You’re dealing with 1 million date points if you’re only looking at the millimeter level. Switch to the micron, which is one millionth of a meter, and you’re dealing with 1×10^27 data points.
The calculations in fluid dynamics are hard enough. But now take the established algorithms and split them out to run in parallel across 61 cores or more. Using full vectorization on each core, then things get messy very quickly. But persist and you’ll get code that works. But then we must ask: Does it use reduction correctly? Does it accurately pull in all data for the final results?
Flow Science: Masterful Use of Verification Tools
Now we look to the real world answer. A company called Flow Science Flow, which supports a worldwide customer base of commercial, academic, and government users, has created a fluid dynamics application using Parallel Studio (see the full case study here). Flow Science used the full Intel® Cluster Studio XE tool suite to enhance the serial and parallel performance of its FLOW-3D app.
Although at Go Parallel we usually focus on the C++ Compiler, Flow Science used the Intel Fortran Compiler. Parallel Studio’s verification tools were key. Building such a product was certainly no easy task, no matter how large the team of programmers and engineers. The code has to be right, and there are only so many tests you can perform. That’s where Parallel Inspector and Parallel Amplifier both come to save the day.
Challenge: The company’s customers face ever-larger data sets. And they continue to demand accurate solutions in less time. Moreover, introduction of multicore architecture makes parallelization difficult, since the computational load keeps changing throughout the simulation.
So the initial challenge was to maintain the accuracy and consistency of results, while greatly reducing simulation time. The team decided to extend the shared memory parallelism of FLOW-3D to a hybrid MPI*-OpenMP* version.
They soon discovered that introducing a distributed memory approach made debugging difficult. Furthermore, once errors were corrected, obtaining good speedups or scaling on a higher number of cores was difficult. So the next challenge was addressing scalability and parallel performance.
Developers used the Intel® MPI Library to enable distributed memory performance. By just switching runtime environment variables, company engineers and our customers have been able to achieve maximum interconnect performance. This feature enabled Flow Science to provide the Intel MPI runtime toolkit as part of the user installation for a seamless user experience.
For Flow Science, the primary benefit of using the Intel Cluster Studio XE suite was improved customer satisfaction due to better speedups for larger, more complex problems. Other benefits included reduced development time and costs. Bottom line: Flow Science enabled faster, more accurate simulation with Intel® software development tools, delivering improved results even as customer data sets grow larger.
I doubt every the programmer on the fluid dynamics project had gigantic Xeon Phi processors for their desktop development machines, but that’s okay. With Parallel Studio, you can develop and debug on a simple quad-core, and then let the runtime automatically scale your program up to a Many Integrated Core (MIC) architecture. And with the help of the correct verification tests on your machine, you can be assured your code is correct. For real.