Proof of concept of a simple SVM Region Classifier using PdfPig and Accord.Net. The objective is to classify each text block in a pdf document page as either title, text, list, table and image.
MIT License
Document Layout Analysis resources repos for development with PdfPig.
A step-by-step C# implementation of the Docstrum algorithm
Read and extract text and other content from PDFs in C# (port of PDFBox)
Extract tables from PDF files (port of tabula-java)
Using a MaskRCNN model trained on the PublayNet dataset with ML.Net in C# / .Net for Document lay...
Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The ...