Tutorial on building a gpu compiler backend in LLVM
The goal of this tutorial is to show a simple example on how to generate ptx from the llvm ir and how to write the IR itself to access cuda features.
For the sake of demonstration a language frontend is also provided. The main idea of the language is to support pointwise (aka elementwise) operations with gpu acceleration.
If you are just curios about the code generation backend, you can jump directly to The code generator for NVPTX backend part.
See the How to build the project? documentation for further details.