Canny edge detector implemented in CUDA C/C++
(2024/2025) A library and environment for parallel processing in a power-limited CPU+GPU cluster environment
Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference
Best practices & guides on how to write distributed pytorch training code
An architecture for LLMs' continual-learning and long-term memories
Templated C++/CUDA implementation of Model Predictive Path Integral Control (MPPI)
fat_llama is a Python package for upscaling audio files to FLAC or WAV formats using advanced audio processing techniques
A low-footprint GPU accelerated Speech to Text Python package for the Jetpack 5 era bolstered by an optimized graph
This is a discord bot running on llama cpp with the llama 3 model and image geneartion