PDF to XML ALTO file converter
GPL-2.0 License
Bot releases are visible (Hide)
New in version 0.4 (apart various bug fixes):
support for xpdf language support package for language-specific fonts like Arabic, Chinese-simplified, Japanese, etc. they are pre-installed locally and portable
refined line number detection and fixing a bug which could result in random missing numbers in the ALTO output
update to xpdf-4.03
fix issue with character spacing due to invalid rotation condition
update dependencies and dependency install script
Published by kermitt2 about 4 years ago
New in version 0.3:
line number detection: line numbers (typically added for review in manuscripts/preprints) are specifically identified and not anymore mixed with the rest of text content, they will be grouped in a separate block or, optionally, not outputted in the ALTO file (noLineNumbers
option)
removal of -blocks
option, the block information are always returned for ensuring ALTO validation (<TextBlock>
element)
bug fixing on reading order
fix possible incorrect XMax and YMax values at 0 on block coordinates having only one line
Published by kermitt2 about 5 years ago
New in version 0.2:
Note: this released version was used for Grobid release 0.5.6