My experiments with MPI and OpenMP
Statistics for this project are still being loaded, please check back later.
Abstraction Library for Parallel Kernel Acceleration
Simplex mesh adaptivity for HPC
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofl...
HPC solver for nonlinear optimization problems
(2024/2025) A library and environment for parallel processing in a power-limited CPU+GPU cluster ...
Python library for fast time-series analysis on CUDA GPUs
cuda编程学习入门
A curated list of awesome GPGPU (CUDA/OpenCL/Vulkan) resources
Playing with CUDA and GPUs in Google Colab
Some CUDA design patterns and a bit of template magic for CUDA
Templated C++/CUDA implementation of Model Predictive Path Integral Control (MPPI)
Archlinux PKGBUILDs for Data Science, Machine Learning, Deep Learning, NLP and Computer Vision
Compare the performance of matrix multiplication among GPU shared memory, GPU global memory and CPU
Par4All is an automatic parallelizing and optimizing compiler (workbench) for C and Fortran seque...
CUDA C++ Core Libraries