mediawiki-services-machinetranslation

GitHub mirror of the mediawiki/services/machinetranslation repository. Development happens at https://gerrit.wikimedia.org. Please see https://www.mediawiki.org/wiki/Developer_account if you wish to contribute.

MIT License

Stars
5

MinT machine translation system

MinT is a machine translation system hosted by Wikimedia Foundation. It uses multiple Neural Machine translation models to provide translation between large number of languages.

Currently used models:

The models are optimized for performance using OpenNMT CTranslate2

Usage

Installation

Clone the repository. Install the system dependencies:

sudo apt install wget unzip build-essential cmake

Create a python virtual environment and install dependencies

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Then run the service:

./entrypoint.sh

By default it will run in http://0.0.0.0:8989.

Using docker

Clone the repository, build the docker image and run it.

docker build -t wikipedia-mt .
docker run -dp 8989:8989 wikipedia-mt:latest

Open http://0.0.0.0:8989/ using browser

Environment variables

For above configurations, Use a value less than or equal to the available CPU cores.

Monitoring

Application can be monitored using graphite. Run the graphite-statsd docker, and point the statsd-host to it

docker run -d \
 --name graphite \
 --restart=always \
 -p 80:80 \
 -p 2003-2004:2003-2004 \
 -p 2023-2024:2023-2024 \
 -p 8125:8125/udp \
 -p 8126:8126 \
 graphiteapp/graphite-statsd

Now set the env value STATSD_HOST to localhost and STATSD_PORT to 8125. STATSD_PREFIX environment variable can be used to override the default "machinetranslation" prefix.

Example:

STATSD_HOST=127.0.0.1 gunicorn

License

MinT is licensed under MIT license. See License.txt

MinT uses multiple machine translation models from various projects internally. Please refer the following table for their respective license details.

Project License for Code/ Library Documentation/ Public Models License/ Data Set
NLLB-200 MIT CC-BY-SA-NC 4.0
OpusMT MIT CC-BY 4.0
IndiTrans2 MIT CC-0 (No Rights Reserved)
Softcatala MIT MIT
MADLAB-400 Apache 2.0 Apache 2.0
Related Projects