资源类型

期刊论文 1

年份

2015 1

关键词

检索范围:

排序: 展示方式:

Improving performance portability for GPU-specific OpenCL kernels onmulti-core/many-coreCPUs by analysis-based

Mei WEN,Da-fei HUANG,Chang-qing XUN,Dong CHEN

《信息与电子工程前沿(英文)》 2015年 第16卷 第11期   页码 899-916 doi: 10.1631/FITEE.1500032

摘要: OpenCL is an open heterogeneous programming framework. Although OpenCL programs are functionally portable, they do not provide performance portability, so code transformation often plays an irreplaceable role. When adapting GPU-specific OpenCL kernels to run on multi-core/many-core CPUs, coarsening the thread granularity is necessary and thus has been extensively used. However, locality concerns exposed in GPU-specific OpenCL code are usually inherited without analysis, which may give side-effects on the CPU performance. Typically, the use of OpenCL’s local memory on multi-core/many-core CPUs may lead to an opposite performance effect, because local-memory arrays no longer match well with the hardware and the associated synchronizations are costly. To solve this dilemma, we actively analyze the memory access patterns using array-access descriptors derived from GPU-specific kernels, which can thus be adapted for CPUs by (1) removing all the unwanted local-memory arrays together with the obsolete barrier statements and (2) optimizing the coalesced kernel code with vectorization and locality re-exploitation. Moreover, we have developed an automated tool chain that makes this transformation of GPU-specific OpenCL kernels into a CPU-friendly form, which is accompanied with a scheduler that forms a new OpenCL runtime. Experiments show that the automated transformation can improve OpenCL kernel performance on a multi-core CPU by an average factor of 3.24. Satisfactory performance improvements are also achieved on Intel’s many-integrated-core coprocessor. The resultant performance on both architectures is better than or comparable with the corresponding OpenMP performance.

关键词: OpenCL     Performance portability     Multi-core/many-core CPU     Analysis-based transformation    

标题 作者 时间 类型 操作

Improving performance portability for GPU-specific OpenCL kernels onmulti-core/many-coreCPUs by analysis-based

Mei WEN,Da-fei HUANG,Chang-qing XUN,Dong CHEN

期刊论文