Simple tests for JAX, PyTorch, and TensorFlow to test if the installed NVIDIA drivers are being properly picked up
The purpose of this tutorial is to learn how to install and prepare TensorFlow framework to train your own convolutional neural network object detection classifier for multiple objects, starting from scratch
This repository lists some awesome public Rust projects, Videos, Blogs and Jobs
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis
Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mining frequent item sets in a set of transactions, implementation in Python
Par4All is an automatic parallelizing and optimizing compiler (workbench) for C and Fortran sequential programs
Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference
CUDA implementation of autoregressive linear attention, with all the latest research findings
Eden converts your python function into a hosted endpoint with minimal changes to your existing code