https://github.com/Bruce-Lee-LY/decoding_attention

Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.

BSD-3-CLAUSE License

Stars
14

Statistics for this project are still being loaded, please check back later.

Related Projects