gImageReader

A Gtk/Qt front-end to tesseract-ocr.

GPL-3.0 License

Stars
1.6K
Committers
125

Bot releases are hidden (Show)

gImageReader - gImageReader-3.4.2

Published by manisandro 9 months ago

gImageReader 3.4.2 (Feb 05 2024):

  • Bufgixes:
    • [Qt] Fix crash in FileTreeModel::findFile with temporary file
    • [Gtk] Correctly notify hOCR tree updates when merging items
    • [Win32] Fix dictionary installation directory
    • Quote x_font property in hOCR documents
    • Assorted Wayland fixes
  • Enhancements:
    • Add support for PoDoFo 0.10.x
    • Apply brightness/contrast/resolution/invert to all selected images
  • Updated translations
  • See https://github.com/manisandro/gImageReader/compare/v3.4.1...v3.4.2 for details
gImageReader - gImageReader-3.4.1

Published by manisandro over 1 year ago

gImageReader 3.4.1 (Jan 29 2023):

  • Bugfixes:
    • Fix warning about text in pdf incorrectly shown even if PDF has no text
    • Adapt for enchant2 dictionary location change
    • [Qt] Fix setting custom font for text editor
    • Fix crash in batch export dialog when selecting folder with no hocr html files
    • Assorted Wayland fixes
    • [Gtk] Fix incorrectly passing export filename to hOCR text and ODT export
    • [Gtk] Rework Utils::string_html_escape to fix possible unicode string corruption
  • Enhancements:
    • Add 2px margin to autodetected areas
    • Allow specifying custom script tessdatas by prepending prefix with script/ and leaving empty lang code
    • Use correct file extensions in crash save files
    • Make WConf visibility persistent
  • Updated translations
  • See https://github.com/manisandro/gImageReader/compare/v3.4.0...v3.4.1 for details
gImageReader - gImageReader-3.4.0

Published by manisandro over 2 years ago

gImageReader 3.4.0 (Jan 28 2022):

  • Add support for tesseract 5.0
  • Add Qt6 support
  • Add thumbnail view for source documents
  • Add batch mode for recognizing multiple documents
  • Display sources in a tree
  • Allow opening output files directly from the source tree if they exist next to the source with the same basename
  • Allow moving image selection boxes
  • Text: Add multi-tab support
  • HOCR: Allow specifying whether new output is inserted/appended
  • HOCR: Allow opening multiple files at once, also from command line
  • HOCR: Add proof-reading widget (Qt interface only)
  • HOCR: New batch export dialog
  • HOCR: Add quick navigation for low confidence words
gImageReader - gImageReader-3.3.1

Published by manisandro about 5 years ago

gImageReader 3.3.1 (Jul 28 2019):

  • HOCR: propagate attributes to manually added elements (@foghawk)
  • HOCR: improve spelling of hyphenated words (@foghawk)
  • HOCR: improve spelling of words with special characters (@foghawk)
  • HOCR: allow specifying a DPI to assume for image sources when exporting to PDF (@foghawk)
  • HOCR: allow use to choose whether to sanitize hyphens when exporting to PDF
  • HOCR: Attempt to map 639-2 language codes to ISO 639-1 to set spelling language
  • Allow specifying character whitelist / blacklist for recognition
  • Various bugfixes
  • Translation updates
  • Full details in commit log: https://github.com/manisandro/gImageReader/commits/master
gImageReader - gImageReader-3.3.0

Published by manisandro about 6 years ago

gImageReader 3.3.0 (Sep 26 2018):
This is the first stable release of the 3.3.x series. The main change compared to 3.2.99 is support for the script traineddatas which were introduced with tesseract 4.x.

As with previous releases, the Windows builds using tesseract 4 are still to be considered experimental.

For a full list of changes between 3.2.99 and 3.3.0, see the git commit log.

gImageReader - gImageReader-3.2.99

Published by manisandro over 6 years ago

gImageReader 3.2.99 (Feb 24 2018)
This is the beta release for gImageReader 3.3.0. The main highlight is a much expanded hOCR editor, and many bug fixes. Consult the changelog below for details. Special thanks to @ZaMaZaN4iK and @SantosSi for their valuable contributions both in code and improvement ideas.

There are a number of incomplete translations, so this would be a great moment for interested people to update their translations. gImageReader now hosts its translations on Weblate, so translating is easier than ever!

Please report any issues you might find to ensure a polished 3.3.0 release.

As with previous releases, the Windows builds using tesseract 4 are to be considered experimental.

Binary packages for Linux are available for Ubuntu in the gImageReader-devel PPA and for Fedora in this COPR repository.

Changelog

  • Add support for reading DJVU documents
  • Add support for encrypted PDF files
  • Rewrite HOCR editor and greatly expand its functionality:
    • Allow displaying confidence values in HOCR tree
    • Allow clicking in the canvas to jump to the corresponding item in the HOCR tree
    • Support mass-editing of HOCR child item attributes from parent
    • Honour font family attributes if possible
    • Honour and allow toggling bold and italic attributes
    • Correctly honour the baseline
    • Add search/replace and substitution list support
    • Add preview mode while editing
    • Allow manually adding lines, words and paragraphs
    • Allow swapping items
    • Automatically adjust parent bounding boxes when resizing and removing children
    • Add navigation toolbar to facilitate navigating through the HOCR tree
    • Use relative paths to source files in HOCR HTML document if source files are on same level or below the HOCR file
    • Add export to text
    • Add export to ODT
    • Allow choosing paper size in PDF export
    • Allow setting document metadata in PDF export
    • Allow setting encryption in PDF export
    • [Qt] Allow using QPrinter as PDF export backend, which has better support for complex scripts
gImageReader - gimagereader-3.2.3

Published by manisandro over 7 years ago

gImageReader 3.2.3 (Jul 01 2017):

  • Fix broken hOCR export
  • Add option to prepend source filename / page to plain text output

Please note that the tesseract4.0.0.git2b854e3 builds are experimental, intended for those who want to try out the latest tesseract 4.0.0 alpha version. Make sure you update your tessata files if you use that version!

gImageReader - gimagereader-3.2.2

Published by manisandro over 7 years ago

gImageReader 3.2.2 (Jun 30 2017):

  • Attempt to use original source image for PDF output
  • Allow collapsing/expanding branches of hOCR tree via context menu
  • Recognize guillemets as quote characters
  • Fix crash when adding zero-page sources
  • Fix possible crash when rapidly switching documents
  • [Gtk] Fix output pane orientation not properly restored
  • [Gtk] Don't crash when rendering of image fails
  • [Gtk] Fix icons not appearing with recent Gtk versions
  • [Qt] Don't display empty image if rendering of downscaled image fails

Please note that the tesseract4.0.0.git2b854e3 builds are experimental, intended for those who want to try out the latest tesseract 4.0.0 alpha version. Make sure you update your tessata files if you use that version!

gImageReader - gimagereader-3.2.1

Published by manisandro over 7 years ago

gImageReader 3.2.1 (Feb 10 2017):

  • Add possibility to rotate individual pages of multipage documents
  • Ensure the tessdata manager downloads compatible tesseract languge definitions
  • Add CCITT Group4 compression option for monochrome PDF export
  • Allow choosing between diffuse and threshold dithering for monochrome PDF export
  • Preview JPEG compression quality in PDF output preview
  • Make brightness/contrast/resolution changes affect all selected sources
  • [Qt] Support multipage images through QImageReader (Qt5.9+ will support multipage TIFFs)
  • [Gtk] Fix hang when saving selection image
  • [Qt] Fix possible deadlock when rapidly switching sources
  • Updated translations

Update Feb 13 2017
Added experimental windows builds using tesseract-4.0.0 alpha.

gImageReader - gimagereader-3.2.0

Published by manisandro almost 8 years ago

gImageReader 3.2.0 (Nov 23 2016):

This is the first stable release of the 3.2.x series. It includes many bug fixes since 3.1.99, most of which were tracked down and patched by Daniel Plakhotich.

Starting from 3.2.0 I'll be maintaining a FAQ page.

Changelog:

gImageReader - gimagereader-3.1.99

Published by manisandro about 8 years ago

gImageReader 3.1.99 (Oct 13 2016):

This is the release candidate for gImageReader 3.2. The main highlight is a greatly enhanced hOCR editor and PDF export functionality.

Please report any issues you may find to ensure a polished 3.2.0 final release. If the translation for your language is missing or incomplete, this would be a good moment to submit an updated translation according to the instructions in the Readme.

Many thanks to all the users who provided valuable feedback and suggestions.

Changelog

  • General improvements:
    • Catch critical tesseract errors which otherwise result in the application crashing
    • Improve spelling dictionary auto-installation logic
    • Allow choosing whether to store language files (language definitions, spelling dictionaries) in system-wide or user-local directories
  • Plain text mode improvements:
    • Allow recognizing user-defined regions on multiple pages
    • Also treat \u2014 character as a hyphen
    • Make preserve paragraphs option correctly deal with trailing whitespace
  • hOCR editor improvements:
    • Add "Add to dictionary" and "Ignore word" actions to spell-checking menu in hOCR editor
    • Exclude non-word characters from spell-checking
    • Allow merging adjacent word items
    • Allow adjusting bounding boxes of document elements by resizing the selection in the canvas
    • Allow removing arbitrary items from the document tree
    • Allow defining custom graphic regions from context-menu of the respective page item
  • PDF export improvements:
    • Add previewing capability
    • Take into account baseline information to better position the words in the generated PDF
    • Add options to choose color format and compression of images written to PDF, allowing to greatly reduce the size of PDF
    • Correctly handle paper size and DPI
    • Improve logic for uniformizing word and line spacing
    • Make sure correct hypen character is used, allowing PDF applications to correctly find hyphenated words
  • New and updated translations
  • Various bug fixes
  • Full details in commit log: https://github.com/manisandro/gImageReader/commits/master
gImageReader - gimagereader-3.1.91

Published by manisandro over 8 years ago

gImageReader 3.1.91 (May 03 2016):

This is a beta release. Please report any issues you may find.

For the translation status, see https://translations.launchpad.net/gimagereader

Note: On recent Windows versions, if you want to use the Tessdata Manager, you currently need to run the program as administrator (via right-click on the application shortcut).

gImageReader - gimagereader-3.1.90

Published by manisandro over 8 years ago

gImageReader 3.1.90 (Apr 28 2016):

  • gImageReader 3.2 beta 1
  • Add an initial hOCR editor implementation, with possibility to save as hOCR HTML, PDF with invisible text overlay, or a PDF reconstructed from the extracted text and graphics
  • Allow selecting and working on multiple sources at once
  • Add a tessdata manager, to conveniently manage tesseract language definitions directly from the application
  • Show a progress bar when recognizing, add a cancel button
  • Modernized Gtk UI
  • Expose script and orientation detection support
  • Possiblity to pan via middle button drag
  • Remove the need to specify the culture code in custom language definitions, and use a built-in language-culture mapping instead to search for spelling dictionaries
  • Various bug fixes
  • Full details in commit log: https://github.com/manisandro/gImageReader/commits/master

This is a beta release. Please report any issues you may find.

For the translation status, see https://translations.launchpad.net/gimagereader

gImageReader - gimagereader-3.1.2

Published by manisandro over 9 years ago

gImageReader 3.1.2 (Jun 30 2015):

  • Fix incorrect behavior of "Append to current text" with multiple recognition areas

Update Feb 19 2016
Windows installers built against tesseract 3.04.00 are available for testing. People encountering crashes when using traineddata files generated for tesseract 3.04.00 should try these.

Update Feb 27 2016
Tesseract 3.04.00 Windows Installers rebuilt to include SSL libraries (fixes dictionary autoinstall failures). Links to tessdata files in manual have also been updated.

gImageReader - gimagereader-3.1.1

Published by manisandro over 9 years ago

gImageReader 3.1.1 (Jun 11 2015):

  • Fix titlebar now shown when window maximized in Gnome 3
  • New translations: Chinese (Hong Kong), Chinese (Taiwan)
  • Updated translations: Russian, Portoguese
gImageReader - gimagereader-3.1

Published by manisandro over 9 years ago

gImageReader 3.1 (May 1 2015):

  • Add option to draw whitespace
  • Allow searching and replacing only in selected portion of output text
  • Add "preserve paragraphs" postprocessing option
  • Allow to open files via drag and drop
  • Improve rendering of certain PDF files with the Qt interface
  • Fix scanning broken with certain scanners under Windows
  • Support automatic spelling dictionary installation under Windows
  • Allow saving scans in other formats than png
  • Handful of bugs fixed
  • Full details in commit log: https://github.com/manisandro/gImageReader/commits/master
gImageReader - gimagereader-3.0.1

Published by manisandro almost 10 years ago

gImageReader 3.0.1 (Jan 4 2015):

Windows users:
gImageReader 3.0.1 is compiled against a patched Qt5 version which should fix occasional crashes when cutting/pasting text in the output pane.

gImageReader - gimagereader-3.0

Published by manisandro almost 10 years ago

gImageReader 3.0 (Dec 12 2014):

  • gImageReader 3.0 stable
  • New Qt4/5 interface, as alternative to the Gtk interface
  • Fixed scanning on Windows
  • Memorize image settings (brightness, contrast, etc) when switching images
  • Search forward and backward, replace all, case sensitive search
  • Many bug fixes
  • Translation updates
  • Full details in commit log: https://github.com/manisandro/gImageReader/commits/master

Linux packages:

Attention Windows Users
Some anti-virus scanners might report that the setup exe contains a troyan. Such reports are most likely a false positives, see http://nsis.sourceforge.net/NSIS_False_Positives. (NSIS is the installation system used to build the Windows installer.) Unfortunately there is little I can do on my side to fix this.

Update for Windows Users
The previously uploaded version, which used the Qt4 libraries, appeared to crash on Windows 64bit systems, apparently due to a bug in Qt4. I've now switched to using Qt5 on Windows, which should hopefully fix the issues.