A Rust library integrated with ONNXRuntime, providing a collection of Computer Vison and Vision-Language models
Gradient Descent Optimizers and Genetic Algorithms using GPUs, CPUs, and FPGAs via CUDA, OpenCL, and oneAPI
A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface
ezlocalai is an easy to set up local artificial intelligence server with OpenAI Style Endpoints
3D Gaussian Splatting, reimagined: Unleashing unmatched speed with C++ and CUDA from the ground up!
A high-performance inference system for large language models, designed for production environments
The fastest Tropical number matrix multiplication on GPU
A simple yet sufficiently fast (attenuated) Radon and backproject implementation using KernelAbstractions
Glasses detection, classification and segmentation
CUDA implementation of autoregressive linear attention, with all the latest research findings
Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis
🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm
Rust bindings to the NVIDIA NVBIT binary instrumentation API