🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
GPL-3.0 License
No README available, please check again later.
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reprod...
ncnn is a high-performance neural network inference framework optimized for the mobile platform
A Python module for compiling PyTorch graphs to C
Tensors and Dynamic neural networks in Python with strong GPU acceleration
PoolFormer: MetaFormer Is Actually What You Need for Vision (CVPR 2022 Oral)
校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-perfor...
tensor4 - pytorch to C++ convertor using lightweight templated tensor library
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating poin...
The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relat...
High quality, fast, modular reference implementation of SSD in PyTorch
This is code of book "Learn Deep Learning with PyTorch"
PyTorch入门教程,在线阅读地址:https://datawhalechina.github.io/thorough-pytorch/
Some tricks of pytorch...