A few weeks back we spoke with Intel’s Walter Shands on using Intel VTune Amplifier 2013 to optimize HMMER, a molecular biology package, written in Fortran. The main point outlined in Shand’s Intel webinar October 16: VTune Amplifier can significantly improve multicore performance of algorithmically complex, purpose-specific code while guarding against the introduction of errors — even where the science (and algorithmic fine points) might be obscure to the person doing the optimizing.
This ‘black box’ notion of performance optimization is pragmatic and economically efficient in a global scientific and engineering computing milieu where good code has legs, gets broadly shared, and access to original authors and maintainers may be limited. Recent news (see below) hammers home the point — much of the juice in HPC and multicore – whether we’re talking about optimizing in code or in hardware — needs to be squeezed out at levels remote from code semantics.
Disk I/O Slows Performance
In the process of optimizing HMMER, though, Shands hit a wall about as remote to code semantics as it gets; he determined that HMMER’s main loop was I/O bound. No matter how much he optimized the object code with VTune, performance was gated by how fast test data could be loaded from disk. In our pre-webinar interview, he predicted developers and implementers would increasingly encounter this limitation as products like Xeon Phi make it easier to bring ‘big’ scientific and engineering applications to more affordable platforms.
That’s yet another reason to like Intel’s recent announcement of a new generation of Solid State Disk (SSD) drives, aimed at HPC (high-performance computing) and ‘big data’ applications. The new Intel SSD DC S3700 series is a 6GB/s SATA-compatible drive, available in 2.5” footprints up to 800GB and 1.8” up to 400GB, able to read from sequentially at 500MB/s and push out 75,000 4K blocks on random reads, and about half that on random writes (75000/36000 IOPS). Intel has engineered the drive to overcome cost vs. performance issues historically associated with solid-state – using 25nm process Multi-Level Cell (MLC) technology for high storage density, faster writes, reduced error rates and to extend endurance to an impressive ten full content-writes per day over a projected five-year device service lifespan – ideal for enterprise and scientific apps.
NAG Update for Multicore
The idea of long code provenance is well reflected in two other news items. First, Numerical Algorithms Group (NAG) has just released a version of the NAG Library for SMP and Multicore for the 60-core Xeon Phi. This library, founded in 1970 to compile inter-university work in numerical algorithms, incorporates contributions from several generations of mathematicians. The history of NAG and its contributors is deeply linked, well worth reading, and reminds us that the term ‘legacy’ – often used in computing as a swipe against outmoded code or hardware — can also have very positive associations.
SC12 Kickoff, Student Cluster Competition
The second item: this week kicks off the Supercomputing (SC12) show in Salt Lake City – one highlight of which (both for participants and their sponsoring vendors) is the Student Cluster Competition. Over several days, six-person teams from colleges around the world will compete to assemble and configure their best designs (submitted for approval in April) for a high-performance/low-power-drain cluster, install software and run benchmark datasets against four well-known scientific computing packages – shooting for optimal throughput within the contest’s 26W energy budget.
Though student teams had some pre-contest support of HPC specialists from the institutions (e.g., Oak Ridge National Laboratory) supporting the packages, once the contest begins, they are on their own. So this is a nice portrayal – over several long, and no-doubt, harrowing, days – for how platforms can and must be engineered and optimized around black-box codes. Good luck to all participants – we’ll talk about the winners when they’re announced.