compression_classification

Compression Classification is a Python package for classifying via compression.

It is inspired by my talk on "Stupid Language Tricks" and Low-Resource Text Classification: A Parameter-Free Classification Method with Compressors

Simple example:

from compression_classification import compression_classification
clr = compression_classification.CompressionClassifier()
clr.train("FilterGenie    API ", "zh")
clr.train("FilterGenie's infrastructure is built to handle high volumes of data without compromising performance. Whether you have a small-scale project or a large enterprise application, our API scales effortlessly to meet your needs.", "en")

clr.predict("This is the day they give babies away")
'en'

clr.predict("")
'zh'

In general, you'll want a lot more data, though.

Contributing

We welcome contributions to compression_classification. Please see our contributing guidelines for more information.

To install the package for development, install poetry and then run:

gh repo clone willf/compression_classification
cd compression_classification
poetry install
poetry shell

Code of Conduct

We expect project participants to adhere to our Code of Conduct.

Badges

Extracted from project README

Related Projects

text-classification-baseline

Pipeline for fast building text classification TF-IDF + LogReg baselines.

27 Jul 2021 63

CompressAI

A PyTorch library and evaluation platform for end-to-end compression research

23 Jun 2020 1,160

wbz

A parallel implementation of the bzip2 data compressor in python, this data compression pipeline ...

07 Jan 2022 13

compr

A text compression tool & library

08 Feb 2015 4

python-lzo

Python bindings for the LZO data compression library

15 Sep 2011 74

pytextclassifier

pytextclassifier is a toolkit for text classification. 文本分类，LR，Xgboost，TextCNN，FastText，TextRNN，B...

28 Apr 2017 482

bert-finetuning-catalyst

Code for BERT classifier finetuning for multiclass text classification

23 Dec 2020 70

compression

Data compression in TensorFlow

15 May 2018 853

craystack

Compression tools for machine learning researchers

02 Nov 2018 82

myriade

✨🌲 Hierarchical extreme multiclass and multi-label classification.

30 Sep 2020 17

nlp-stuff

A bit of everything about text and nlp [IN PROGRESS]

05 Jun 2017 28

gptzip

Losslessly encode text natively with arithmetic coding and HuggingFace Transformers

11 Jul 2024 38

npc_gzip

Code for Paper: “Low-Resource” Text Classification: A Parameter-Free Classification Method with C...

25 May 2023 1,768

nlpcommon

NLP common tools.

28 Dec 2021 5

lossyless

Generic image compressor for machine learning. Pytorch code for our paper "Lossy compression for ...

19 Nov 2020 114