Optimized primitives for collective multi-GPU communication
OTHER License
OpenCL is the most powerful programming language ever created. Yet the OpenCL C++ bindings are cu...
Monte Carlo Numerical Linear Algebra Package
Tuned OpenCL BLAS
The rewritten engine, originally for tensorflow. Now all other backends have been ported here.
libcluon is a small and efficient, single-file and header-only library written in modern C++ to p...
CUDA C++ Core Libraries
CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer...
CUDA Templates for Linear Algebra Subroutines
A portable high-level API with CUDA or OpenCL back-end
Some CUDA design patterns and a bit of template magic for CUDA
Open source drivers for the Kinect for Windows v2 device
Collective communications library with various primitives for multi-machine training.
Benchmarking Deep Learning operations on different hardware
Fast Deep Learning Library (DLL) for C++ (ANNs, CNNs, RBMs, DBNs...)
GPU implementation of a Full Search Block Matching Motion Estimation Algorithm