Create optimized GPU applications in any mainstream GPU programming language (CUDA, HIP, OpenCL, OpenACC).
What Kernel Tuner does:
pip install kernel_tuner[cuda]
, pip install kernel_tuner[opencl]
, or pip install kernel_tuner[hip]
pip install kernel_tuner[cuda,opencl,hip]
More information on installation, also for other languages, in the installation guide.
import numpy as np
from kernel_tuner import tune_kernel
kernel_string = """
__global__ void vector_add(float *c, float *a, float *b, int n) {
int i = blockIdx.x * block_size_x + threadIdx.x;
if (i<n) {
c[i] = a[i] + b[i];
}
}
"""
n = np.int32(10000000)
a = np.random.randn(n).astype(np.float32)
b = np.random.randn(n).astype(np.float32)
c = np.zeros_like(a)
args = [c, a, b, n]
tune_params = {"block_size_x": [32, 64, 128, 256, 512]}
tune_kernel("vector_add", kernel_string, n, args, tune_params)
More examples here.
C++ magic to integrate auto-tuned kernels into C++ applications
C++ data types for mixed-precision CUDA kernel programming
Monitor, analyze, and visualize auto-tuning runs
Contributions are welcome! For feature requests, bug reports, or usage problems, please feel free to create an issue. For more extensive contributions, check the contribution guide.
If you use Kernel Tuner in research or research software, please cite the most relevant among the publications on Kernel Tuner. To refer to the project as a whole, please cite:
@article{kerneltuner,
author = {Ben van Werkhoven},
title = {Kernel Tuner: A search-optimizing GPU code auto-tuner},
journal = {Future Generation Computer Systems},
year = {2019},
volume = {90},
pages = {347-358},
url = {https://www.sciencedirect.com/science/article/pii/S0167739X18313359},
doi = {https://doi.org/10.1016/j.future.2018.08.004}
}