This article highlights the features of Intel® Cluster studio Xe by using them to build and analyze LAMMPs, a benchmark used in spec MPI. We will describe build settings for the Intel® C++ Compiler that optimize performance and how to use the Intel® MPI message library to deliver best-in-class performance for LAMMPs on Intel® architecture-based clusters. We will use Intel® trace Analyzer and Intel® trace Collector to illuminate the use of MPI APIs that cause performance problems in LAMMPS, and show how to compare trace files with the Intel Trace Analyzer GUI to get detailed analysis of message passing with aligned timelines. We will also show how to use the Intel® MPI correctness checking library to look for MPI coding errors. Additionally, we will show how to use Intel® VTuneTM Amplifier XE to visualize application scaling on individual nodes. The techniques described in this article may be applied to similar types of complex cluster applications by using diverse technology such as MPI and openMP* across multiple machines.
A further zoom into an atom information exchange provides a look at the individual MPI API calls over time. The black lines indicate which ranks or processes exchanged messages, and there are two distinct time periods within a time step where messages are exchanged.