Best practices & guides on how to write distributed pytorch training code
An architecture for LLMs' continual-learning and long-term memories
A low-footprint GPU accelerated Speech to Text Python package for the Jetpack 5 era bolstered by an optimized graph
A Rust library integrated with ONNXRuntime, providing a collection of Computer Vison and Vision-Language models
ezlocalai is an easy to set up local artificial intelligence server with OpenAI Style Endpoints
The fastest Tropical number matrix multiplication on GPU
A simple yet sufficiently fast (attenuated) Radon and backproject implementation using KernelAbstractions
Glasses detection, classification and segmentation
CUDA implementation of autoregressive linear attention, with all the latest research findings
Rust bindings to the NVIDIA NVBIT binary instrumentation API