Sume is an implementation of the concept-based ILP model for summarization.
GPL-3.0 License
The sume module is an automatic summarization library written in Python.
sume contains the following extraction algorithms:
A typical usage of this module is:
import sume
# directory from which text documents to be summarized are loaded. Input
# files are expected to be in one tokenized sentence per line format.
dir_path = "/tmp/"
# create a summarizer, here a concept-based ILP model
s = sume.models.ConceptBasedILPSummarizer(dir_path)
# load documents with extension 'txt'
s.read_documents(file_extension="txt")
# compute the parameters needed by the model
# extract bigrams as concepts
s.extract_ngrams()
# compute document frequency as concept weights
s.compute_document_frequency()
# prune sentences that are shorter than 10 words, identical sentences and
# those that begin and end with a quotation mark
s.prune_sentences(mininum_sentence_length=10,
remove_citations=True,
remove_redundancy=True)
# solve the ilp model
value, subset = s.solve_ilp_problem()
# outputs the summary
print '\n'.join([s.sentences[j].untokenized_form for j in subset])
If you use sume, please cite the following paper: