Exploring using stdpar and Cython
See the accompanying post on the NVIDIA Developer Blog here.
These Notebooks demonstrate how to accelerate Python code on the GPU using Cython and nvc++ with stdpar.
First, you'll need the NVIDIA HPC SDK, which
provides the nvc++
compiler. A minimum version of 20.9 is required to run these examples.
Note that unless your NVIDIA driver supports CUDA 11.0, you will want to download the version
that is bundled with two previous CUDA versions (10.1 and 10.2).
Once installed, please ensure that the nvc++
executable is in your PATH.
Further, your GPU must have CUDA capability >= 6.0 to exploit -stdpar
feature.
You will also need the development version of Cython.
The simplest way to get the minimum required version is to use pip
:
python -m pip install git+https://github.com/cython/cython@90684ac416f0349761074e242be4d981de40ce0f
Install Python dependencies:
python -m pip install numpy pandas matplotlib
This step is optional. To run the CPU Parallel benchmarks, you will need gcc >= 9.1
as well as the TBB library. On Ubuntu 20.04
gcc-9
should already be the default, and I did apt install libtbb-dev
to get
TBB.