img2table

img2table is a table identification and extraction Python Library for PDF and images, based on OpenCV image processing

MIT License

Downloads
30K
Stars
527
Committers
2

Bot releases are hidden (Show)

img2table - img2table 1.3.0

Published by xavctn about 2 months ago

Features

  • Complete overhaul of the line detection algorithm to improve detection of lines defined by background color changes
  • Improvement in detection of semi-bordered cells
  • Update detection of rows in borderless tables
  • Add support for Surya OCR
  • Add detection of implicit columns via the implicit_columns parameter
  • Optimization of code performance via numba refactoring
  • Update of examples notebooks

Bug fixes

  • Fix bug with text position when extracting text from rotated PDFs
img2table - img2table 1.2.11 Latest Release

Published by xavctn 8 months ago

  • Simpler and more consistent line detection
  • Detection of discontinuous columns in borderless tables
img2table - 1.2.10

Published by xavctn 8 months ago

  • Fix miscellaneous code left from legacy processing
  • Add margin to top/bottom of borderless tables
img2table - 1.2.9

Published by xavctn 8 months ago

What's Changed

  • Update metrics computation and borderless table detection
  • Add compatibility with Python 3.12
  • Add support for documents with black backgrounds
img2table - img2table 1.2.8

Published by xavctn 10 months ago

  • Fix division by zero bug introduced in previous release
img2table - img2table 1.2.7

Published by xavctn 10 months ago

  • Fix bugs
  • Improve computation of image metrics on noisy documents
  • Modify row detection for borderless tables in order to account for merged cells
  • Implement Adaptive Run Length Smoothing Algorithm in order to isolate text areas in images
img2table - img2table 1.2.6

Published by xavctn 10 months ago

  • Fix bugs related to OCR / table content extraction
img2table - img2table 1.2.5

Published by xavctn 11 months ago

  • Fix bug in line detection
  • Fix bug in cell creation
  • Optimization of algorithm performances
img2table - img2table 1.2.4

Published by xavctn 11 months ago

  • Improved processing of tables with dotted lines
  • Add detection of semi-bordered cells in tables
  • Update borderless table algorithm
  • Speed improvements and code optimization (2 to 4x faster depending on inputs)
img2table - img2table 1.2.3

Published by xavctn 12 months ago

  • Add HTML representation to extracted tables
  • Call OCR only on pages/images containing tables
  • Bump Pillow requirements for vulnerabilities
img2table - img2table 1.2.2

Published by xavctn about 1 year ago

  • Add option to pass keyword arguments for PaddleOCR/EasyOCR/docTR constructors
  • Fix bug with PaddleOCR on blank images
  • Line filtering for borderless table recognition
  • Update deprecated polars code
img2table - img2table 1.2.1

Published by xavctn about 1 year ago

  • Fix issues related to latest polars release
  • Improve detection of columns in document layout
  • Fix rare bug leading to no detected lines
  • Add coherency checks on borderless tables
  • Wrap text when exporting to xlsx
img2table - img2table 1.2.0

Published by xavctn about 1 year ago

  • Improvement on document layout analysis in order to detect borderless table areas
  • Modification of handling of optional dependencies
img2table - img2table 1.0.11

Published by xavctn about 1 year ago

  • Add support for docTR
  • Fixes on line detection
img2table - img2table 1.0.10

Published by xavctn about 1 year ago

  • Drop Python 3.7 support
  • Allow PaddleOCR on Python 3.11
  • Improve detection of intersection between lines and words
img2table - img2table 1.0.9

Published by xavctn about 1 year ago

  • Deprecate Python 3.7
  • Fix PanicExceptions in polars code
  • Replace deprecated polars functions
  • Fix requirements for EasyOCR
img2table - img2table 1.0.8

Published by xavctn about 1 year ago

  • Fix paddlepaddle version to avoid errors on certain platforms
  • Better handling of exceptions in polars crossjoins
img2table - img2table 1.0.7

Published by xavctn over 1 year ago

  • Fix error with merged cells when creating xlsx files
img2table - img2table 1.0.6

Published by xavctn over 1 year ago

  • Improvement on line detection accuracy
  • Adapt detection of implicit rows to image characteristics
  • Handle column based documents for borderless table detection
img2table - img2table 1.0.5

Published by xavctn over 1 year ago

  • Fix bug in kernel size when using medianBlur
  • Fix bug for GCP Vision missing vertex coordinates
Package Rankings
Top 7.07% on Pypi.org
Related Projects