Intel Cluster Studio XE 2012 (see http://goparallel.sourceforge.net/intel-cluster-studio-xe-2012-overview-from-intel-software-conference-2012/ and http://software.intel.com/en-us/articles/intel-cluster-studio-xe/) is the latest iteration of Intel’s development suite for HPC clusters. Its powerful, wizard-based code-commenting and code-generating workflow interfaces make counterintuitively simple the task of parallelizing applications efficiently on grand-scale cluster and supercomputing platforms, using Cilk Plus and Threaded Building Blocks to parallelize on local nodes, and highly configurable MPI (Message-Passing Interface) patterns to deploy across nodes in the cluster. Huge clusters are supported – presently up to a theoretical maximum of 90,000 nodes.
Applications employing several types of parallelism are intrinsically very complex, and can be hard to debug. So Intel offers tools like Intel Inspector XE – an upgrade to Thread Checker – that are designed for use early in development cycles, to analyze source code and resolve threading and memory errors – both predictively and in early, small-scale test runs. At the node level, tuning can be improved, and issues quickly resolved with Intel VTune Amplifier XE, which does profiling, hotspot identification, and lets you drill down to graphically and numerically assess locks and waits, thread balance, and transitions, enabling resolution to single lines of code and individual functions. For performing similar graphically enhanced analysis of applications on the target cluster, Intel has also introduced a tool called Intel Trace Analyzer & Collector, which instruments and traces execution across clustered machines, monitors all the MPI traffic, and offers a wide range of drill-down tools to help fix bugs, identify bottlenecks, and optimize MPI pattern choices (e.g. blocking, non-blocking, etc.) for best performance.
For some mission critical applications, though – especially those aimed at healthcare, scientific, aerospace, military, and other markets where software faults can be catastrophic – further assurance is required, both that programs have been written according to specifications, and that they will perform reliably given all combinations of input data and conditions. This is the domain of “V&V” – Validation and Verification. While a V&V process exists around any coding project, ramping up criticality and scale necessitates more deliberate, formal, and reliable treatment.
Eventually – typically sooner in the domain of hardware than in software, at least today – you enter the zone of “formal verification:” a mathematical and logical discipline used in abstracting and describing systems (hardware, software, etc.) in ways that can be verified by theorem-proving. In effect, the program (or other system under test) becomes a theorem predicting its end-states, and proving this theorem complete and correct means the program works as advertised. A reasonable place to start exploring this area is a presentation by Intel’s John Harrison (http://www.cl.cam.ac.uk/~jrh13/slides/lics-22jun03.pdf) , delivered in 2003 at the Ottawa LICS conference, which shows clearly why programs can be hard to test, why assumptions about them may be very wrong, why testing with normal inputs provides no assurance that any other input won’t break the software, and why formal verification is thus useful. The presentation goes remarkably deeply (and with virtually no math) into explaining why formal verification is hard, and how it can be made easier.
One way Intel and others make it easier is by using automated and interactive theorem-proving via tools like HOL (which stands for Higher Order Logic) Light, a proof-checker written in CAML (Categorical Abstract Machine Language), a derivative of Standard Machine Language. CAML and its sibling MLs are functional languages that help you map formal logic into code and treat programs as proofs – the underlying notion that programs actually are proofs articulated as the famous Curry-Howard-Lambek correspondence, showing the three-way isomorphism among intuitionistic logic, typed lambda calculus, and Cartesian closed categories. CH correspondence has also recently been recommended as a method for partitioning search-spaces explored by genetic algorithms, where each node (called a “species”) is indexed by its Curry-Howard isomorphic proof. Now hold that thought.
The trouble is that – as software and the parallel components and systems it runs on get individually denser and more numerous – all this austere and beautiful math gets bigger and harder for machines to model and exhaustively verify, given finite resources and time, and with the understanding that most formal verification is carried out interactively, with highly trained mathematicians tweaking the theorem-proving software every step of the way. This is a huge problem for organizations like the military, who want to develop and run perfectly reliable systems at epic scales.
So, late last year, DARPA initiated a program, and created a community around research into CSFV – Crowd Sourced Formal Verification. Basically, the DARPA CSFV program is looking for ways to take units of software, turn them into collections of theorem templates in a logical calculus, then programmatically create games based on these templates, have people play these games to explore the search space and statistically prove the individual theorems, then compile back the results of many plays to make progress towards verifying the program property under evaluation. If it works for protein-folding, why not for software?
If you’re interested in joining the effort, check out the program homepage at http://www.darpa.mil/Our_Work/I2O/Programs/Crowd_Sourced_Formal_Verification_(CSFV).aspx. The Proposer’s Day Briefing PDF by program manager Dr. Drew Dean of DARPA provides a full explanation of the program, its goals, and the range of skills and capabilities required to participate (plenty of room in this program, by the way, for people who aren’t Turing Fellows – they also need webdevs and cool kids to make these games fun, after all). Anyone qualified can join the program’s online community.
Meanwhile, folks with more of a hunger to grasp the mathematical and logical connections between game theory and problems in formal verification should check out the elegant and erudite review modules (http://pub.ist.ac.at/gametheory/) for the course in Game Theory in Formal Verification, taught in 2010 by Krishnendu Chatterjee at the new Institute for Science and Technology in Klosterneuburg, Austria.
Do you have an interesting verification anecdote? I’d love to hear about it so please let me know in the Comments section below.
John Jainschigg is a Geeknet contributing editor, and is CEO of World2Worlds, Inc., a digital agency focused on immersive technology and gaming. John’s initial intro to concurrency was via interrupt and re-entrancy programming at the assembler level on Z80 and 68000-based systems. He wrote concurrent, time-critical packet-switching applications on HP-UX RISC machines in the late 1980s, and since then has worked up and down the client-server stack in Java, C++, PHP, and other conventional and scripting languages, and more recently, in task-specific, state-based, radically concurrent languages like LSL.