%0 Journal Article
%T Autotuning Strategies For Reducing Synchronization Costs In Multithreaded Kernels
%A Apan Qasem
%J ARPN Journal of Systems and Software
%D 2012
%I ARPN Publishers
%X Emergence of multicore architectures has opened up new opportunities for thread-level parallelism and dramatically increased the theoretical peak on current systems. However, achieving a high fraction of peak performance requires careful orchestration of many architecture-sensitive parameters, both on-chip and across the interconnect. In particular, the presence of shared-caches on multicore architectures makes it necessary to consider, in concert, issues related to thread synchronization and data locality. This paper studies the complex interaction among several compiler-level code transformations that affect data locality, achieved parallelism and synchronization and communication costs. We characterize this interaction using static analysis and generate a search space suitable for efficient automatic performance tuning. We also develop a heuristic based on number of threads; data reuse patterns, and the size and configuration of the shared cache, to estimate the optimal synchronization interval for pipeline-parallelized code. We validate our choice of tuning parameters and evaluate our heuristic with experiments on a set of scientific kernels on four different multicore platforms. The results show that our proposed heuristic is able to estimate the optimal synchronization window with reasonable accuracy and able to achieve significant performance improvement.
%K Compiler Design
%K Parallelism
%K Software Infrastructure
%U http://scientific-journals.org/journalofsystemsandsoftware/archive/vol2no4/vol2no4_4.pdf