Superfast CUDA implementation of Word2Vec and Latent Dirichlet Allocation (LDA)
Simple tests for JAX, PyTorch, and TensorFlow to test if the installed NVIDIA drivers are being properly picked up
The purpose of this tutorial is to learn how to install and prepare TensorFlow framework to train your own convolutional neural network object detection classifier for multiple objects, starting from scratch
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis
Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mining frequent item sets in a set of transactions, implementation in Python
Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference
Dockerfiles and manual for easy build of docker image with CUDA10
💣 SMH – a computer vision project for automatic, precision mortar strike calculations in Squad
Julia client for OmniSci GPU-accelerated SQL engine and analytics platform
A general cubic equation solver and quartic equation minimisation solver written for CPU and Nvidia GPUs, for more details and results, see: https://arxiv
Gradient Descent Optimizers and Genetic Algorithms using GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI