A simple but robust PyTorch implementation of RetNet from "Retentive Network: A Successor to Transformer for Large Language Models" (https://arxiv.org/pdf/2307.08621.pdf)
MIT License
Full Changelog: https://github.com/fkodom/yet-another-retnet/compare/0.5.0...0.5.1
Published by fkodom 12 months ago
Significant efficiency improvements to the chunkwise formulation, thanks to @leor-c 🎉
Full Changelog: https://github.com/fkodom/yet-another-retnet/compare/0.4.2...0.5.0
Published by fkodom 12 months ago
Full Changelog: https://github.com/fkodom/yet-another-retnet/compare/0.4.1...0.4.2
Published by fkodom 12 months ago
Full Changelog: https://github.com/fkodom/yet-another-retnet/compare/0.4.0...0.4.1
Published by fkodom about 1 year ago
Full Changelog: https://github.com/fkodom/yet-another-retnet/compare/0.3.1...0.4.0
Published by fkodom about 1 year ago
Full Changelog: https://github.com/fkodom/yet-another-retnet/compare/0.3.0...0.3.1
Published by fkodom about 1 year ago
More streamlined support for training
RetNet.forward
is no longer just a wrapper for RetNet.forward_parallel
. It accepts inputs
, labels
Tensors, and returns a loss value.
class RetNet:
...
def forward(self, inputs: Tensor, labels: Tensor) -> Tensor:
pred = self.forward_parallel(inputs)
criterion = nn.CrossEntropyLoss()
return criterion(rearrange(pred, "b n c -> (b n) c"), labels.flatten())
Published by fkodom about 1 year ago
Full Changelog: https://github.com/fkodom/yet-another-retnet/compare/0.1.3...0.2.0
Published by fkodom about 1 year ago
Set default layer_norm_eps=1e-6
, as updated in the official implementation:
https://github.com/microsoft/torchscale/commit/2c29de0fb3e5e559181f0fb4854330c5b35961cd
Published by fkodom about 1 year ago
Remove extra complex conjugation from the relative position embedding.
Reference: https://github.com/microsoft/torchscale/issues/49
Published by fkodom about 1 year ago
Bug fix for automatic PyPI version resolver 😞
Published by fkodom about 1 year ago
First stable public release
RetNet
moduleRetNetDecoderLayer
, MultiScaleRetention
)retention_parallel
, retention_recurrent
)Published by fkodom about 1 year ago
First release candidate for PyPI