Best practices & guides on how to write distributed pytorch training code
MIT License
Statistics for this project are still being loaded, please check back later.
Make your own AI easily !
A KMeans implemented in C++ with Python bindings and GPU acceleration
SDK for GPU accelerated genome assembly and analysis
An architecture for LLMs' continual-learning and long-term memories
LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!
Efficient Deep Learning Systems course materials (HSE, YSDA)
Performance-optimized wheels for TensorFlow (SSE, AVX, FMA, XLA, MPI)
Superfast CUDA implementation of Word2Vec and Latent Dirichlet Allocation (LDA)
Object detection for video surveillance
3D Gaussian Splatting, reimagined: Unleashing unmatched speed with C++ and CUDA from the ground up!
NumPy实现类PyTorch的动态计算图和神经网络框架(MLP, CNN, RNN, Transformer)
The fastest way to compute matrix profiles on CPU and GPU!
Dockerfiles and manual for easy build of docker image with CUDA10.X and cuDNN7.6 to run TensorFlo...
Simple tests for JAX, PyTorch, and TensorFlow to test if the installed NVIDIA drivers are being p...
Yolov5 Object Detection In OSRS using Python code, Detecting Cows - Botting