Reference implementation of the Transformer architecture optimized for Apple Neural Engine (ANE)
OTHER License
Bot releases are hidden (Show)
Published by atiorh about 2 years ago
torch.nn.LayerNorm
and ane_transformers.reference.layer_norm.LayerNormANE
apply scale and bias terms in opposite orders. In order to accurately restore a state_dict trained using the former into the the latter, we adjust the bias term. This change slightly improves the parity between the Hugging Face PyTorch model's outputs and ane_transformers CoreML model's outputs.Published by atiorh over 2 years ago