Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference
fat_llama is a Python package for upscaling audio files to FLAC or WAV formats using advanced audio processing techniques
A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis
Rust bindings to the NVIDIA NVBIT binary instrumentation API
This is a project, where I give you a way to use SOLIDWORKS on Linux!
Simple tests for JAX, PyTorch, and TensorFlow to test if the installed NVIDIA drivers are being properly picked up
A CLI tool which lets you install proprietary NVIDIA drivers and much more easily on Fedora Linux (32 or above and Rawhide)
A git repository containing an NLP example using DL4J (cuda) in Java
SDK for GPU accelerated genome assembly and analysis
Matrix multiplication example performed with OpenMP, OpenACC, BLAS, cuBLABS, and CUDA