Rust bindings to the NVIDIA NVBIT binary instrumentation API
Matrix multiplication example performed with OpenMP, OpenACC, BLAS, cuBLABS, and CUDA
A git repository containing an NLP example using DL4J (cuda) in Java
Compare the performance of matrix multiplication among GPU shared memory, GPU global memory and CPU