Provides a Jruby wrapper for Apache PDFBox library to extract plain text from PDF documents.
MIT License
This gem lets you extract plain text from PDF documents. It is a Jruby wrapper for the Apache PDFBox library.
Add this line to your application's Gemfile:
gem 'pdfbox_text_extraction'
And then execute:
$ bundle
Or install it yourself as:
$ gem install pdfbox_text_extraction
To extract all text on every page:
extracted_text = PdfboxTextExtraction.run(path_to_pdf)
To extract text inside a crop area:
extracted_text = PdfboxTextExtraction.run(
path_to_pdf,
{
crop_x: 0, # crop area top left corner x-coordinate
crop_y: 1.0, # crop area top left corner y-coordinate
crop_width: 8.5, # crop area width
crop_height: 9.4, # crop area height
}
)
git checkout -b my-new-feature
)git commit -am 'Add some feature'
)git push origin my-new-feature
)Copyright (c) 2016 Jo Hund. See (MIT) LICENSE for details.