🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm
A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface
Best practices & guides on how to write distributed pytorch training code
Instant-ngp in pytorch+cuda trained with pytorch-lightning (high quality with high speed, with only few lines of legible code)
Install PyTorch distributions with computation backend auto-detection
Archlinux PKGBUILDs for Data Science, Machine Learning, Deep Learning, NLP and Computer Vision
Simple tests for JAX, PyTorch, and TensorFlow to test if the installed NVIDIA drivers are being properly picked up
Yolov5 Object Detection In OSRS using Python code, Detecting Cows - Botting
Dockerfiles and manual for easy build of docker image with CUDA10
Provides an environment for compiling TensorFlow or PyTorch with CUDA for aarch64 on an x86 machine