Seq2seq model with attention for automatic orthographic simplification
MIT License
Welcome to ortografix, a seq2seq model for automatic ortografic simplification, coded with pytorch 1.4.
via pip:
pip3 install ortografix
or, after a git clone:
python3 setup.py install
To train a model, run:
ortografix train \
--data /abs/path/to/training/data \
--model-type gru \
--shuffle \
--hidden-size 256 \
--num-layers 1 \
--bias \
--dropout 0 \
--learning-rate 0.01 \
--epochs 10 \
--print-every 100 \
--use-teacher-forcing \
--teacher-forcing-ratio 0.5 \
--output-dirpath /abs/path/to/output/directory/whereto/save/model \
--with-attention \
--character-based
To qualitatively evaluate the output of the model on a set of 10 randomly selected sentences from a given dev/test set, run:
ortografix evaluate \
--data /abs/path/to/test/data.txt \
--model /abs/path/to/model/directory/ \
--random 10
To quantitatively evaluate the output of the model on a given dev/test set, run:
ortografix evaluate \
--data /abs/path/to/test/data.txt \
--model /abs/path/to/model/directory
Quantitative evaluation will return:
All measure are computed via textdistance.