APACHE-2.0 License
Applying declination and conjugation rules to lemmata.
I'm looking for a collaborator with knowledge about (or interest in) German linguistics; in particular syntax & morphology.
Poeple with limited programming experience are welcome.
Drop me a message: hamster
[ät] "bbaw" (dot) 'de'.
Software is not production ready and requires more unit testing.
Known Problems:
The software should work for VERB
and ADJ
.
It will fail for the genus of relative pronouns and coreference resolution if the genus of replaced NOUN
changes.
The software was developed for processing German-language texts (lang: de).
The flexion
git repo is available as PyPi package
pip install flexion
Download a Transducer model
python scripts/download_transducer.py --model=smor
Download demo data for unit tests
mkdir tmp
wget -O tmp/de_hdt-ud-dev.conllu https://raw.githubusercontent.com/UniversalDependencies/UD_German-HDT/master/de_hdt-ud-dev.conllu
The flexion.replace
function expects a sentence formated as conllu dictionary.
import flexion
import io
import conllu
# read CoNLL-U data
iowrapper = io.open("tmp/de_hdt-ud-dev.conllu", "r", encoding="utf-8")
dat = [s for s in conllu.parse_incr(iowrapper)]
# select a sentence examples
print(dat[5].metadata.get('text'))
# '" Diesen Gerüchten liegt eine unseriöse Recherche zugrunde .'
# Generate augmentations
lemma = "Gerücht"
substitute = "Spekulation"
augmentations = flexion.replace(lemma, substitute, dat[5])
print(augmentations)
# ['" Diesen Spekulationen liegt eine unseriöse Recherche zugrunde.']
You can use packages like SpaCy, TranKit or Stanza to parse sentences and transform them to Python dictionary with CoNLL-U fields as keys. See this notebook for examples.
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt --no-cache-dir
pip install -r requirements-dev.txt --no-cache-dir
# jupyter notebooks
pip install -r requirements-demo.txt --no-cache-dir
python -m spacy download de_core_news_lg
(If your git repo is stored in a folder with whitespaces, then don't use the subfolder .venv
. Use an absolute path without whitespaces.)
jupyter lab
flake8 --ignore=F401 --exclude=$(grep -v '^#' .gitignore | xargs | sed -e 's/ /,/g')
PYTHONPATH=. pytest
Publish
pandoc README.md --from markdown --to rst -s -o README.rst
python setup.py sdist
twine upload -r pypi dist/*
find . -type f -name "*.pyc" | xargs rm
find . -type d -name "__pycache__" | xargs rm -r
rm -r .pytest_cache
rm -r .venv
Please open an issue for support.
Please contribute using Github Flow. Create a branch, add commits, and open a pull request.