Many-core processors target improved computational performance by making available various forms of architectural parallelism, including but not limited to multiple cores and vector instructions. However, approaches to parallel programming based on targeting these low-level parallel mechanisms directly lead to overly complex, non-portable, and often unscalable and unreliable code.
A more structured approach to designing and implementing parallel algorithms is useful to reduce the complexity of developing software for such processors, and is particularly relevant for many-core processors with a large amount of parallelism and multiple parallelism mechanisms. In particular, efficient and reliable parallel programs can be designed around the composition of deterministic algorithmic skeletons, or patterns. While improving the productivity of experts, specific patterns and fused combinations of patterns can also guide relatively inexperienced users on developing efficient algorithm implementations that have good scalability.
The approach to parallelism described in this document includes both collective “data-parallel” patterns, such as map and reduce, as well as structured “task-parallel” patterns, such as pipelining and superscalar task graphs.