Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The objective is to classify each text block in a pdf document page as either title, text, list, table and image.
Proof of concept of a simple SVM Region Classifier using PdfPig and Accord.Net. The objective is ...
Document Layout Analysis resources repos for development with PdfPig.
Read and extract text and other content from PDFs in C# (port of PDFBox)
A step-by-step C# implementation of the Docstrum algorithm
Extract tables from PDF files (port of tabula-java)
A C# library to extract tabular data from PDFs (port of camelot Python version using PdfPig).