compression_classification

Using compression to classify

Stars
4

compression_classification

Compression Classification is a Python package for classifying via compression.

It is inspired by my talk on "Stupid Language Tricks" and Low-Resource Text Classification: A Parameter-Free Classification Method with Compressors

Simple example:

from compression_classification import compression_classification
clr = compression_classification.CompressionClassifier()
clr.train("FilterGenie    API ", "zh")
clr.train("FilterGenie's infrastructure is built to handle high volumes of data without compromising performance. Whether you have a small-scale project or a large enterprise application, our API scales effortlessly to meet your needs.", "en")

clr.predict("This is the day they give babies away")
'en'

clr.predict("")
'zh'

In general, you'll want a lot more data, though.

Contributing

We welcome contributions to compression_classification. Please see our contributing guidelines for more information.

To install the package for development, install poetry and then run:

gh repo clone willf/compression_classification
cd compression_classification
poetry install
poetry shell

Code of Conduct

We expect project participants to adhere to our Code of Conduct.

Badges
Extracted from project README
Python package