collatepdf

A simple Python script to collate multiple PDFs into a single PDF.

MIT License

Stars
25
Committers
1

collatepdf

A simple Python script to collate multiple PDFs into a single PDF with:

  • an optional cover page,
  • automatic TOC generation with global page numbering,
  • automatic page resizing to ensure all pages in the collated PDF have the same dimensions,
  • an overlay bar on each page with the current file name and global page number.

Note: this is a quick-and-dirty, alpha-quality script I wrote for my own needs. Use at your own risks. Please feel free to improve it if you find it useful.

Structure of the collated PDF

  • Optional cover PDF
  • Auto-generated table of contents
  • Optional divider pages between documents
  • PDF documents

Cover and table of contents

Optional divider page and first document page with overlay bar

Installation

Dependencies:

  • Python
  • pypdf
  • reportlab

Installation instructions:

git clone https://github.com/rossant/collatepdf.git
cd collatepdf/
pip install -e .

Usage

There are two steps:

  • Generate an index.txt page from your list of PDF documents to collate, and edit it manually to specify the optional divider pages.
  • Generate the collated PDF.
# Create the index file.
collatepdf makeindex samples/docs/*.pdf -o samples/index.txt

# Optionally: manually edit samples/index.txt

# Generate the collated PDF with a cover PDF.
collatepdf makepdf samples/index.txt -c samples/cover.pdf -o samples/collated.pdf

Index file

This is the sample index file generated by the first command above:

# Comments start with #.
# Empty lines are ignored.
# Index processing is stopped with `# STOP` on a line.
# PDF files are included by putting their paths on each line.
# Divider pages are included as follows: `@ Some title / Subtitle below`.

@ samples/docs/example
samples/docs/example.pdf

@ samples/docs/sample
samples/docs/sample.pdf