int8_t and int16_t matrix multiply based on https://arxiv.org/abs/1705.01991
OTHER License
Statistics for this project are still being loaded, please check back later.
OpenCL is the most powerful programming language ever created. Yet the OpenCL C++ bindings are cu...
Some CUDA design patterns and a bit of template magic for CUDA
Performance-portable, length-agnostic SIMD with runtime dispatch
A translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation
A C++ port of karpathy/makemore: an autoregressive character-level language model for making more...
C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for...
tensor4 - pytorch to C++ convertor using lightweight templated tensor library
Monte Carlo Numerical Linear Algebra Package
A C++ header-only library of statistical distribution functions.
HIP: C++ Heterogeneous-Compute Interface for Portability