Compare the performance of matrix multiplication among GPU shared memory, GPU global memory and CPU
MIT License
bash run.sh
SDK for GPU accelerated genome assembly and analysis
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofl...
Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mi...
Best practices & guides on how to write distributed pytorch training code
The fastest Tropical number matrix multiplication on GPU
cuda编程学习入门
Python library for fast time-series analysis on CUDA GPUs
Some CUDA design patterns and a bit of template magic for CUDA
Codes for learning cuda. Implementation of multiple kernels.
Playing with CUDA and GPUs in Google Colab
GitHub Action to install CUDA
A curated list of awesome GPGPU (CUDA/OpenCL/Vulkan) resources
My experiments with MPI and OpenMP
The fastest way to compute matrix profiles on CPU and GPU!
(2024/2025) A library and environment for parallel processing in a power-limited CPU+GPU cluster ...