Extract structured text from pdfs quickly
APACHE-2.0 License
Convert PDF to markdown quickly with high accuracy
pdfrw is a pure Python library that reads and writes PDFs
Python module to drive the awesome pdftk binary.
Community maintained fork of pdfminer - we fathom PDF
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) ...
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of ...
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipula...
Prepare documents for distribution
OCR, layout analysis, reading order, line detection in 90+ languages
Improved file parsing for LLM’s
a tool to quickly create sweet PDF files from text files
A CLI toolset to generate table of contents for PDF files automatically.
Combine LaTeX docs into a single PDF
📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interfac...
Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.