EasyNMT | Python Ecosystem Directory

Bot releases are hidden (Show)

EasyNMT - v2.0.0 - Removal of Fairseq dependency, Usage of HF Model Hub Latest Release

Published by nreimers over 3 years ago

mbart50 & m2m models now use huggingface transformers

The mbart50 & m2m models required in version 1 the fairseq library. This caused several issues: fairseq cannot be used on Windows, multi-processing did not work with fairseq models, loading and using the models were quite complicated.

With this release, the fairseq dependency is removed and mbart50 / m2m models are loaded with huggingface transformers version >= 4.4.0

From a user perspective, no changes should be visible. But from a developer perspective, this simplifies the architecture of EasyNMT and allows new futures more easily be integrated.

Saving models

Models can now be saved to disc by calling:

model.save(output_path)

Models can be loaded from disc by calling:

model = EasyNMT(output_path)

Loadings models from huggingface model hub

Loading of any Huggingface Translation Model is now simple. Simply pass the name or the model path to the following code:

from easynmt import EasyNMT, models
article = """EasyNMT is an open source library for state-of-the-art neural machine translation. Installation is simple using
pip or pre-build docker images. EasyNMT provides access to various neural machine translation models. It can translate 
sentences and documents of any length. Further, it includes code to automatically detect the language of a text."""

model = EasyNMT(translator=models.AutoModel('facebook/mbart-large-en-ro')) 
print(model.translate(article, source_lang='en_XX', target_lang='ro_RO'))

This loads the facebook/mbart-large-en-ro model from the model hub.

Note: Models might use different language codes, e.g. the mbart model uses 'en_XX' instead of 'en' and 'ro_RO' instead of 'ro'. To make the language code consistent, you can pass a lang_map:

from easynmt import EasyNMT, models

article = """EasyNMT is an open source library for state-of-the-art neural machine translation. Installation is simple using
pip or pre-build docker images. EasyNMT provides access to various neural machine translation models. It can translate 
sentences and documents of any length. Further, it includes code to automatically detect the language of a text."""

output_path = 'output/mbart-large-en-ro'
model = EasyNMT(translator=models.AutoModel('facebook/mbart-large-en-ro', lang_map={'en': 'en_XX', 'ro': 'ro_RO'}))

#Save the model to disc
model.save(output_path)

# Load the model from disc
model = EasyNMT(output_path)
print(model.translate(article,  target_lang='ro'))

EasyNMT - v1.1.0 - Improvements, Docker Image & REST API

Published by nreimers over 3 years ago

This release brings several improvements and is the first step towards the release of a Docker Image + REST API.

Improvements:

Docker REST API: We have published Docker images for a REST API, that allows the easy usage of EasyNMT. Just run the Docker image and starts translating using REST API calls: more info
Google Colab REST API Hosting: We have published a colab notenbook that shows to to wrap EasyNMT in a REST API and host it on Google Colab with a free GPU. Useful if you need to translate large amounts.
Long sentences are translated first: Sentences are sorted before they are translated in order to waste minimal time with padding tokens. In the previous version, the shortest sentences were translated first and then later the longer sentences. Now the order is reversed. This has several advantages: If an OOM happens, it happens at the start of the translation process and not at the end. Also, the estimate from the progress bar is more accurate as the longest and slowest sentences are now translated first.
Improve language detection: Automatic language is still an issue, especially for mixed languages. Language detection is now performed on document level and not on sentence level. If you need sentence level lang. detection on sentence level you can set document_language_detection=False for the translate method. Also, text is now lower cased before the language is detected (the lang. detection scripts had issues with all upper case text
Max length parameter: When you create your model like this: model = EasyNMT(model_name, max_length=100), then all sentences with more than 100 word pieces will be truncated to at max 100 word pieces. This can prevent OOM with too long sentences.
Load model without translator: If you just want to use the language detection methods, you can now load your model like model = EasyNMT(model_name, load_translator=False). This will prevent the loading of the translation engine.

Roadmap

As soon as Huggingface transformers v4.4.0 is released, the dependency on fairseq can be removed as the mBART50 and m2m models will be available in HF transformers. This will make the installation on a Windows machine possible

EasyNMT - v1.0.2 - Alternative Language Detection Methods

Published by nreimers over 3 years ago

fastText is used for automatic language detection, as it provides the highest speed and best accuracy.

However, it can be complicated to install it on Windows as it requires a C/C++ compiler.

This release adds two alternative language identifiers:

[langid][(https://github.com/saffsd/langid.py) - Can be installed via pip install langid
langdetect - Can be installed via pip install langdetect

If fastText is not available, langid / langdetect will be used as alternative language detection methods.

For installation on Windows, you can run the following commands:

pip install --no-deps easynmt
pip install tqdm transformers numpy nltk sentencepiece langid

Further, you have to install pytorch as described here:
https://pytorch.org/get-started/locally/

If you want to install fastText on Windows, I can recommend this link:
https://anaconda.org/conda-forge/fasttext

EasyNMT - v1.0.1 - Bugfix

Published by nreimers over 3 years ago

fastText language detection did not work well if the text was in UPPERCASE.

Adding lower() to the string before the language identification step significantly improved the performance.

EasyNMT - v1.0.0 - First Release

Published by nreimers over 3 years ago

First release of EasyNMT - Easy-to-use, state-of-the-art machine translation using transformers architecture.

Package Rankings

Top 6.75% on Proxy.golang.org

Top 4.01% on Pypi.org

Related Projects

FlagEmbedding

Retrieval and Retrieval-augmented LLMs

02 Aug 2023 6,718

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by ...

21 Aug 2023 4,880

matmulfreellm

Implementation for MatMul-free LM.

23 Apr 2024 2,889

OpenMoE

A family of open-sourced Mixture-of-Experts (MoE) Large Language Models

08 Aug 2023 1,368

smaller-labse

Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSE

14 Sep 2021 14

xlm-v-experiments

Experiments for XLM-V Transformers Integeration

05 Feb 2023 6

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

29 Aug 2017 29,423

TransCoder

Public release of the TransCoder research project https://arxiv.org/pdf/2006.03511.pdf

10 Jul 2020 1,688

Code-LMs

Guide to using pre-trained large language models of source code

25 Nov 2021 1,765

OpenChatKit

03 Mar 2023 9,003

sentiment-discovery

Unsupervised Language Modeling at scale for robust sentiment classification

30 Nov 2017 1,061

aidapter

Adapter / facade for language models (OpenAI, Anthropic, Cohere, local transformers, etc)

23 May 2023 18

modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

25 Jul 2022 6,146

languagemodels

Explore large language models in 512MB of RAM

07 May 2023 1,154

nllb-serve

Meta's "No Language Left Behind" models served as web app and REST API

27 Jul 2022 172