Goal: Low power cluster capable of serving 24+ streams of 4KHDR60 source transcodes while consuming no more than 100W at peak and idling at less than 10W
MIT License
An architecture for LLMs' continual-learning and long-term memories
Computer vision library with focus on heterogeneous systems
Abstraction Library for Parallel Kernel Acceleration
A curated list of awesome GPGPU (CUDA/OpenCL/Vulkan) resources
Some CUDA design patterns and a bit of template magic for CUDA
Real-time large scale dense visual SLAM system
Provides an environment for compiling TensorFlow or PyTorch with CUDA for aarch64 on an x86 machi...
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofl...
Archlinux PKGBUILDs for Data Science, Machine Learning, Deep Learning, NLP and Computer Vision
Real-time dense visual SLAM system
Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the ...
Achieve peak performance on x86 CPUs and NVIDIA GPUs
CLTune: An automatic OpenCL & CUDA kernel tuner
Cross-platform, customizable multimedia/video processing framework. With strong GPU acceleration...
A highly optimised C++ library for mathematical applications and neural networks.