The best optimized code on multi-core and many-core chip architectures can deliver 10-100X the performance of “parallelism unaware” naive C/C++ code. Unfortunately, programmers can no longer rely ever-increasing clock frequencies to close this growing “Ninja” performance gap. What’s a developer to do? A fascinating new technical paper by Intel shows how well-known algorithmic techniques and modern compiler technology can close performance benchmark gaps to an average of just 1.3X, with low programming effort.
Find out how here. (pdf)






