Early Text Classification in Python
MIT License
This repository contains the implementation of the Early Text Classification framework.
The problem of classification is a widely studied one in supervised learning. Nonetheless, there are scenarios that received little attention despite its applicability. One of such scenarios is early text classification, where one needs to know the category of a document as soon as possible. The importance of this variant of the classification problem is evident in tasks like sexual predator detection, where one wants to identify an offender as early as possible. This framework highlights the two main pieces involved in this problem: classification with partial information and deciding the moment of classification.
Based on the paper:
Loyola J.M., Errecalde M.L., Escalante H.J., Montes y Gomez M. (2018) Learning When to Classify for Early Text Classification. In: De Giusti A. (eds) Computer Science CACIC 2017. CACIC 2017. Communications in Computer and Information Science, vol 790. Springer, Cham. [Springer Link] [SEDICI Link]
The jupyter notebook example.ipynb shows how to use this framework. You need to specify the following parameters:
dataset
. There should be two files named {dataset_name}_train.txt
and {dataset_name}_test.txt
. Each file must have the following structure for each document i
: {label_i}[TAB]{document_i}
. Both corpus must end with an empty line.'all'
.term_frec
.fit(X, y)
, predict(X)
and get_params()
similar to those in the scikit-learn API. The method fit
should accept a sparse matrix as the parameter X
.fit(X, y)
, predict(X)
and get_params()
similar to those in the scikit-learn API.This code was developed and tested on Python 3.6 and depends on: