latin-corpus

Citable corpus of texts in Latin

Stars
3

latin-corpus

A JVM library for working with a citable corpus of morphologically parsed texts in Latin.

latin-corpus reads output from a morphological parser built with tabulae, and applies it to a citable text corpus. latin-corpus supports higher-level manipulation of the corpus than tabulae's token-level paring. It can profile usage of arbitrary combinations of morphological features or vocabulary in a corpus, and can filter the corpus to include or exclude passages containing a specified set of features or vocabulary.

Current versions: 6.0.0 / 7.0.0-pr6

The latincorpus library is undergoing active development as part of a three-year project beginning in the academic year 2020-2021 in the Classics Department at the College of the Holy Cross. For more information about the project, see https://lingualatina.github.io/.

The current published release is 6.0.0. The 7.x series represents an API-breaking, substantial reworking of the library. Binaries of prerelease versions as well as published releases are available from bintray with the version designation 7.0.0.-prN: for maven, ivy or gradle coordinates, see this page.

See release notes.

Documentation