ortografix

Welcome to ortografix, a seq2seq model for automatic ortografic simplification, coded with pytorch 1.4.

Install

via pip:

pip3 install ortografix

or, after a git clone:

python3 setup.py install

Train

To train a model, run:

ortografix train \
--data /abs/path/to/training/data \
--model-type gru \
--shuffle \
--hidden-size 256 \
--num-layers 1 \
--bias \
--dropout 0 \
--learning-rate 0.01 \
--epochs 10 \
--print-every 100 \
--use-teacher-forcing \
--teacher-forcing-ratio 0.5 \
--output-dirpath /abs/path/to/output/directory/whereto/save/model \
--with-attention \
--character-based

Test

Qualitative evaluation

To qualitatively evaluate the output of the model on a set of 10 randomly selected sentences from a given dev/test set, run:

ortografix evaluate \
--data /abs/path/to/test/data.txt \
--model /abs/path/to/model/directory/ \
--random 10

Quantitative evaluation

To quantitatively evaluate the output of the model on a given dev/test set, run:

ortografix evaluate \
--data /abs/path/to/test/data.txt \
--model /abs/path/to/model/directory

Quantitative evaluation will return:

The sum of all edit (Levenshtein) distance computed across all test pairs
The average edit distance computed across all test pairs
The average normalized edit distance
The average normalized edit similarity

All measure are computed via textdistance.

Package Rankings

Top 25.81% on Pypi.org

Badges

Extracted from project README

Related Projects

AnyText

Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>

18 Sep 2023 4,242

starcoder2

Home of StarCoder2!

08 Dec 2023 1,732

ocrd_detectron2

OCR-D wrapper for detectron2 based segmentation models

21 Jan 2022 16

counterix

Generating count-based Distributional Semantic Models

03 Mar 2020 2

Attention-OCR

Visual Attention based OCR

09 Jun 2016 1,111

entropix

Entropy, Zipf's law and distributional semantics

16 Dec 2018 3

ImageCaptioning.pytorch

I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch ...

10 Feb 2017 1,419

pyfn

A python module to process data for Frame Semantic Parsing

22 Aug 2018 22

korean-spacing-model

한국어 문장 띄어쓰기(삭제/추가) 모델입니다. 데이터 준비 후 직접 학습이 가능하도록 작성하였습니다.

16 Sep 2020 54

embeddix

A general purpose toolkit for word embeddings

07 Mar 2020 0

surya

OCR, layout analysis, reading order, line detection in 90+ languages

10 Jan 2024 6,739

pytextclassifier

pytextclassifier is a toolkit for text classification. 文本分类，LR，Xgboost，TextCNN，FastText，TextRNN，B...

28 Apr 2017 482

LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

11 Dec 2020 12,124

augtxt

yet another text augmentation python package

22 Nov 2020 2

motion-diffusion-model

The official PyTorch implementation of the paper "Human Motion Diffusion Model"

29 Sep 2022 3,082