Make plagiarism detection easier. This script will find similar sentences between given files and highlight them in a side by side comparison.
MIT License
Screens legal text and extracts sentences containing user input party name-predicate phrases
yet another text augmentation python package
Parse vision is an open source tool to visualise what OCR is parsing in a PDF document to help de...
Images of Text to Text: Call Tesseract from Python and OCR a directory of pdfs
a tool to quickly create sweet PDF files from text files
Tools for running OCR against files stored in S3
A collection of Python Scripts made for fun, while exploring Python 🐍
Inspired by google c4, here is a series of colossal clean data cleaning scripts focused on Common...
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Compares two PDF files by appearance, not by content. It can be used in the command line, in orde...
A python project for checking plagiarism of documents based on cosine similarity
Edit Distance Based Search and Replace
Convert various source codes into pdf file with custom features
Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) ...