Using compression to classify
Compression Classification is a Python package for classifying via compression.
It is inspired by my talk on "Stupid Language Tricks" and Low-Resource Text Classification: A Parameter-Free Classification Method with Compressors
Simple example:
from compression_classification import compression_classification
clr = compression_classification.CompressionClassifier()
clr.train("FilterGenie API ", "zh")
clr.train("FilterGenie's infrastructure is built to handle high volumes of data without compromising performance. Whether you have a small-scale project or a large enterprise application, our API scales effortlessly to meet your needs.", "en")
clr.predict("This is the day they give babies away")
'en'
clr.predict("")
'zh'
In general, you'll want a lot more data, though.
We welcome contributions to compression_classification. Please see our contributing guidelines for more information.
To install the package for development, install poetry and then run:
gh repo clone willf/compression_classification
cd compression_classification
poetry install
poetry shell
We expect project participants to adhere to our Code of Conduct.