pdf-text-extraction

cli for extracting text from PDF files (and maybe possibly tables)

APACHE-2.0 License

Stars
63

Bot releases are hidden (Show)

pdf-text-extraction - v1.1.5 Latest Release

Published by galkahana 4 months ago

Added binaries in Releases section.
For the sake of simplicity the executables do not provide "bidi" support. Will be considered on demand.

What's Changed

Full Changelog: https://github.com/galkahana/pdf-text-extraction/compare/v1.1.4...v1.1.5

pdf-text-extraction - v1.1.4

Published by galkahana 9 months ago

Improved security when parsing PDFs that can potentially cause endless loop via inter-referencing form xobjects.

What's Changed

Full Changelog: https://github.com/galkahana/pdf-text-extraction/compare/v1.1.3...v1.1.4

pdf-text-extraction - v1.1.3

Published by galkahana 10 months ago

This release:

  • update to most recent PDFWriter lib for some security updates. also uses direct url to a smaller release package that doesn't include testing materials
  • borrowing from recent changes in PDFWriter excludes test materials from this repo package

Full Changelog: https://github.com/galkahana/pdf-text-extraction/compare/v1.1.2...v1.1.3

pdf-text-extraction - v1.1.2

Published by galkahana about 1 year ago

What's Changed

Full Changelog: https://github.com/galkahana/pdf-text-extraction/compare/v1.1.1...v1.1.2

What's Changed

Full Changelog: https://github.com/galkahana/pdf-text-extraction/compare/v1.1...v1.1.1

pdf-text-extraction - v1.1

Published by galkahana over 1 year ago

What's Changed

Full Changelog: https://github.com/galkahana/pdf-text-extraction/compare/v1.0...v1.1

pdf-text-extraction - TextExtraction package 1.0

Published by galkahana over 1 year ago

Releasing as a cmake package. You can now use TextExtraction::TextExtraction as an imported target in your project to extract text in your own code.

What's Changed

New Contributors

Full Changelog: https://github.com/galkahana/pdf-text-extraction/commits/v1.0