Easy to use, state-of-the-art Neural Machine Translation for 100+ languages
APACHE-2.0 License
Bot releases are hidden (Show)
Published by nreimers over 3 years ago
The mbart50 & m2m models required in version 1 the fairseq library. This caused several issues: fairseq cannot be used on Windows, multi-processing did not work with fairseq models, loading and using the models were quite complicated.
With this release, the fairseq dependency is removed and mbart50 / m2m models are loaded with huggingface transformers version >= 4.4.0
From a user perspective, no changes should be visible. But from a developer perspective, this simplifies the architecture of EasyNMT and allows new futures more easily be integrated.
Models can now be saved to disc by calling:
model.save(output_path)
Models can be loaded from disc by calling:
model = EasyNMT(output_path)
Loading of any Huggingface Translation Model is now simple. Simply pass the name or the model path to the following code:
from easynmt import EasyNMT, models
article = """EasyNMT is an open source library for state-of-the-art neural machine translation. Installation is simple using
pip or pre-build docker images. EasyNMT provides access to various neural machine translation models. It can translate
sentences and documents of any length. Further, it includes code to automatically detect the language of a text."""
model = EasyNMT(translator=models.AutoModel('facebook/mbart-large-en-ro'))
print(model.translate(article, source_lang='en_XX', target_lang='ro_RO'))
This loads the facebook/mbart-large-en-ro model from the model hub.
Note: Models might use different language codes, e.g. the mbart model uses 'en_XX' instead of 'en' and 'ro_RO' instead of 'ro'. To make the language code consistent, you can pass a lang_map
:
from easynmt import EasyNMT, models
article = """EasyNMT is an open source library for state-of-the-art neural machine translation. Installation is simple using
pip or pre-build docker images. EasyNMT provides access to various neural machine translation models. It can translate
sentences and documents of any length. Further, it includes code to automatically detect the language of a text."""
output_path = 'output/mbart-large-en-ro'
model = EasyNMT(translator=models.AutoModel('facebook/mbart-large-en-ro', lang_map={'en': 'en_XX', 'ro': 'ro_RO'}))
#Save the model to disc
model.save(output_path)
# Load the model from disc
model = EasyNMT(output_path)
print(model.translate(article, target_lang='ro'))
Published by nreimers over 3 years ago
This release brings several improvements and is the first step towards the release of a Docker Image + REST API.
Improvements:
document_language_detection=False
for the translate
method. Also, text is now lower cased before the language is detected (the lang. detection scripts had issues with all upper case textmodel = EasyNMT(model_name, max_length=100)
, then all sentences with more than 100 word pieces will be truncated to at max 100 word pieces. This can prevent OOM with too long sentences.model = EasyNMT(model_name, load_translator=False)
. This will prevent the loading of the translation engine.Roadmap
Published by nreimers over 3 years ago
fastText is used for automatic language detection, as it provides the highest speed and best accuracy.
However, it can be complicated to install it on Windows as it requires a C/C++ compiler.
This release adds two alternative language identifiers:
pip install langid
pip install langdetect
If fastText is not available, langid / langdetect will be used as alternative language detection methods.
For installation on Windows, you can run the following commands:
pip install --no-deps easynmt
pip install tqdm transformers numpy nltk sentencepiece langid
Further, you have to install pytorch as described here:
https://pytorch.org/get-started/locally/
If you want to install fastText on Windows, I can recommend this link:
https://anaconda.org/conda-forge/fasttext
Published by nreimers over 3 years ago
fastText language detection did not work well if the text was in UPPERCASE.
Adding lower() to the string before the language identification step significantly improved the performance.
Published by nreimers over 3 years ago
First release of EasyNMT - Easy-to-use, state-of-the-art machine translation using transformers architecture.