A collection of tools for applying word senses to large corpora
GPL-2.0 License
The C-Cat library provides libraries for large scale text processing using the hadoop framework. It's ultimate goal is to provide tools and libraries for automatically customizing a wordnet ontology based on the contents of a particular corpus.
It is structured into three sub-modules:
This project utilized maven as it's build system. Most of the library dependencies are handled via maven, but a few jars are from libraries that have not been mavenized yet.
To install these jars into maven, run
./add_non_maven_jars.sh
Then, build the entire project with
mvn package
This will create two jars in target: extendOntology-1.0.jar and extendOntology-1.0-jar-with-dependencies.jar. To run any of the mains provided without maven, include both of these jars in the classpath.