I’ve known this day was coming – but when I saw Knights Corner clearly sustaining a TeraFlop (DGEMM, wide range of block sizes) per second - I was surprised by my emotional reaction inside. Hard to describe; it was a good feeling.
Tuesday November 15, 2011, we showed a Knights Corner co-processor for the first time outside Intel. It is fresh silicon – first silicon – which is always exciting (if it works at all). Not only does it work, we were able to boot Linux on it and demonstrate it doing a very real sustained TeraFLOP/s. We ran DGEMM with many block sizes (with a lot of consistency, which is something that not all hardware and software can do). Our Math Kernel Library product will include DGEMM for Knights Corner when it comes out as a product, so this will be reproducible by all.
To our knowledge – we demonstrated the world’s fastest DGEMM, and the first to go above one TeraFLOP/s on DGEMM. And it is a conservative measure: real, sustained TeraFLOP/s (not “raw” or other theoretical measures). And it is doing it now, not just on “paper.” That part really hit home as I looked at it.
I knew what I would see, Knights Corner was not a surprise to me. But when I saw it, and could reach out and touch it – I was struck by the power. I was part of the ASCI Red project between Intel and Sandia National Labs, that built the world’s first TeraFLOP/s computer. We got to the same point (one TeraFLOP/s), before we finished building the machine, in December 1996. Now, we’ve done it again… this time with a single processor. Both used x86 processors from Intel – ASCI Red used over nine thousand Pentium Pro processors (later upgraded to Pentium II Xeon processors to be the first past 2 TeraFLOP/s), and now on a single pre-production Knights Corner to do the same.
Obviously, both projects involve a lot of people both inside and outside Intel. We have a great team inside Intel, and great partners, that can all feel good about both accomplishments. I’m happy to be one of a handful of people involved in both “firsts” to a TeraFLOP/s.
And, the trend continues. By the end of this decade, we should see a TeraFLOP/s from a 20W part (simple math: ExaFLOP/s at 20MW, means a TeraFLOP/s will be 20W). A DGEMM sustained TeraFLOP/s in a notebook… it’s coming. For now, we have Knights Corner which is plenty amazing.