sume

Sume is an implementation of the concept-based ILP model for summarization.

GPL-3.0 License

Stars

37

View Code on GitHub

Ecosystems: Python

sume

The sume module is an automatic summarization library written in Python.

Description

sume contains the following extraction algorithms:

Concept-based ILP model for summarization
(Gillick & Favre, 2009)

A typical usage of this module is:

import sume

# directory from which text documents to be summarized are loaded. Input
# files are expected to be in one tokenized sentence per line format.
dir_path = "/tmp/"

# create a summarizer, here a concept-based ILP model
s = sume.models.ConceptBasedILPSummarizer(dir_path)

# load documents with extension 'txt'
s.read_documents(file_extension="txt")

# compute the parameters needed by the model
# extract bigrams as concepts
s.extract_ngrams()

# compute document frequency as concept weights
s.compute_document_frequency()

# prune sentences that are shorter than 10 words, identical sentences and
# those that begin and end with a quotation mark
s.prune_sentences(mininum_sentence_length=10,
                  remove_citations=True,
                  remove_redundancy=True)

# solve the ilp model
value, subset = s.solve_ilp_problem()

# outputs the summary
print '\n'.join([s.sentences[j].untokenized_form for j in subset])

Citing the sume module

If you use sume, please cite the following paper:

Florian Boudin, Hugo Mougard and Benoît Favre, Concept-based Summarization
using Integer Linear Programming: From Concept Pruning to Multiple Optimal
Solutions, Proceedings of the 2015 Conference on Empirical Methods in
Natural Language Processing (EMNLP).

Contributors

Florian Boudin
Hugo Mougard

Related Projects

Abstractive-Multi-Document-Text-Summarization

awesome-text-summarization

The guide to tackle with the Text Summarization

04 Oct 2017 1,272

textsum

CLI & Python API to easily summarize text-based files with transformers

18 Dec 2022 122

text-summarizer

A repos for USTH SE 2020 Group 1 project. It's quite obvious in the title.

usc_dae

Repository for Unsupervised Sentence Compression using Denoising Auto-Encoders

ice-score

[EACL 2024] ICE-Score: Instructing Large Language Models to Evaluate Code

pke

Python Keyphrase Extraction module

13 Nov 2015 1,556

bpe-summarizer

Auto summarization from BPE tokenization

xmnlp

xmnlp：提供中文分词, 词性标注, 命名体识别，情感分析，文本纠错，文本转拼音，文本摘要，偏旁部首，句子表征及文本相似度计算等功能

04 Feb 2018 1,231

wiki-text-summarizer-keyword-extractor

Uses Beautiful Soup to read Wiki pages, Gensim to summarize, NLTK to process, and extracts keywor...

sumy

Module for automatic summarization of text documents and HTML pages.

20 Feb 2013 3,431

Summarizer

Text summarization Python library (in progress)

Article-Summarizer

Uses frequency analysis to summarize text.

04 Jan 2017 183

languagemodels

Explore large language models in 512MB of RAM

07 May 2023 1,154

summarize-template

Show a summary of a Django or Jinja template