CUDA-Learn-Notes

🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

GPL-3.0 License

Stars
1.3K
CUDA-Learn-Notes - HGEMM Up to 115 TFLOPS:L20 Latest Release

Published by DefTruth 2 days ago

What's Changed

Full Changelog: https://github.com/DefTruth/CUDA-Learn-Notes/compare/v2.4.13...v2.4.15

CUDA-Learn-Notes - HGEMM Up to 113 TFLOPS:L20

Published by DefTruth 2 days ago

What's Changed

Full Changelog: https://github.com/DefTruth/CUDA-Learn-Notes/compare/v2.4.12...v2.4.13

CUDA-Learn-Notes - v2.4.12 SGEMM TF32 Swizzle

Published by DefTruth 6 days ago

What's Changed

New Contributors

Full Changelog: https://github.com/DefTruth/CUDA-Learn-Notes/compare/v2.4.11...v2.4.12

CUDA-Learn-Notes - v2.4.11 HGEMM Block Swizzle

Published by DefTruth 7 days ago

What's Changed

Full Changelog: https://github.com/DefTruth/CUDA-Learn-Notes/compare/v2.4.10...v2.4.11

CUDA-Learn-Notes - v2.4.10 SGEMM TF32 Stage 2/3

Published by DefTruth 8 days ago

What's Changed

Full Changelog: https://github.com/DefTruth/CUDA-Learn-Notes/compare/v2.4.9...v2.4.10

CUDA-Learn-Notes - v2.4.9 HGEMM WMMA Stage

Published by DefTruth 10 days ago

What's Changed

Full Changelog: https://github.com/DefTruth/CUDA-Learn-Notes/compare/v2.4.8...v2.4.9

CUDA-Learn-Notes - v2.4.8 HGEMM WMMA Part-1

Published by DefTruth 12 days ago

What's Changed

Full Changelog: https://github.com/DefTruth/CUDA-Learn-Notes/compare/v2.4.7...v2.4.8

CUDA-Learn-Notes - v2.4.7 SGEMM Copy Async

Published by DefTruth 13 days ago

What's Changed

Full Changelog: https://github.com/DefTruth/CUDA-Learn-Notes/compare/v2.4.6...v2.4.7

CUDA-Learn-Notes - v2.4.6 HGEMM Copy Async

Published by DefTruth 15 days ago

What's Changed

New Contributors

Full Changelog: https://github.com/DefTruth/CUDA-Learn-Notes/compare/v2.4.5...v2.4.6

CUDA-Learn-Notes - v2.4.5 HGEMM Double Buffers

Published by DefTruth 23 days ago

What's Changed

Full Changelog: https://github.com/DefTruth/CUDA-Learn-Notes/compare/v2.4.4...v2.4.5

CUDA-Learn-Notes - v2.4.4 Pack HGEMM

Published by DefTruth 24 days ago

What's Changed

Full Changelog: https://github.com/DefTruth/CUDA-Learn-Notes/compare/v2.4.3...v2.4.4

CUDA-Learn-Notes - v2.4.3 Pack Softmax

Published by DefTruth 26 days ago

What's Changed

Full Changelog: https://github.com/DefTruth/CUDA-Learn-Notes/compare/v2.4.2...v2.4.3

CUDA-Learn-Notes - v2.4.2 Pack RMSNorm

Published by DefTruth 27 days ago

What's Changed

Full Changelog: https://github.com/DefTruth/CUDA-Learn-Notes/compare/v2.4.1...v2.4.2

CUDA-Learn-Notes - v2.4.1 Pack LayerNorm

Published by DefTruth 28 days ago

What's Changed

Full Changelog: https://github.com/DefTruth/CUDA-Learn-Notes/compare/v2.4...v2.4.1

CUDA-Learn-Notes - v2.4 Pack Reduce LDST

Published by DefTruth 29 days ago

What's Changed

Full Changelog: https://github.com/DefTruth/CUDA-Learn-Notes/compare/v2.3.1...v2.4

CUDA-Learn-Notes - v2.3.1 f16x8 Pack Elementwise

Published by DefTruth about 1 month ago

What's Changed

New Contributors

Full Changelog: https://github.com/DefTruth/CUDA-Learn-Notes/compare/v2.3...v2.3.1

CUDA-Learn-Notes - v2.3 Refactor 6/N

Published by DefTruth about 1 month ago

What's Changed

Full Changelog: https://github.com/DefTruth/CUDA-Learn-Notes/compare/v2.2...v2.3

CUDA-Learn-Notes - v2.2 Refactor 5/N

Published by DefTruth about 1 month ago

What's Changed

Full Changelog: https://github.com/DefTruth/CUDA-Learn-Notes/compare/2.1...v2.2

CUDA-Learn-Notes - v2.1 Refactor 4/N Part-4

Published by DefTruth about 2 months ago

What's Changed

Full Changelog: https://github.com/DefTruth/CUDA-Learn-Notes/compare/v2.0...2.1

CUDA-Learn-Notes - v2.0 Refactor 4/N

Published by DefTruth about 2 months ago