custom_matmul_kernels

Customized matrix multiplication kernels

GPL-3.0 License

Stars
52

Custom Matmul Kernels

This repository contains source code for this blog post.

Dependency

  • Python 3.7.10 or higher
  • CuPy 7.4.0 or higher
  • Pytorch 1.8.1 or higher
  • Only tested with CUDA 11.2