Up to 200x Faster Inner Products and Vector Similarity — for Python, JavaScript, Rust, and C, supporting f64, f32, f16 real & complex, i8, and binary vectors using SIMD for both x86 AVX2 & AVX-512 and Arm NEON & SVE 📐
APACHE-2.0 License
Examples of using Perl to augment NASM and vice versa
SIMD macro assembler unified for ARM, MIPS, PPC and x86
Achieve peak performance on x86 CPUs and NVIDIA GPUs
LLVM (Low Level Virtual Machine) Guide. Learn all about the compiler infrastructure, which is des...
MIPS Simulator npm package
Open Source Architecture Code Analyzer
Accelerate aggregated MD5 hashing performance up to 8x for AVX512 and 4x for AVX2. Useful for ser...
JIT compiler in Go
Go library providing algorithms optimized to leverage the characteristics of modern CPUs
NanoJIT is a small, cross-platform C++ library that emits machine code.
The HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitive...
Package with auto-vectorized math functions for Go
An unofficial cuda assembler, for all generations of SASS, hopefully :)
Firth: A Forth for the Z80 CPU
TinyFive is a lightweight RISC-V emulator and assembler written in Python with neural network exa...