marker

Convert PDF to markdown quickly with high accuracy

GPL-3.0 License

Downloads

18.7K

Stars

15.5K

View Code on GitHub Visit Website View on X

Ecosystems: Python

marker - Significant speedup Latest Release

Published by VikParuchuri 4 months ago

This release has a 15% GPU speedup, 3x CPU, 7x MPS. The speedup comes from new surya models for layout and text detection that are a lot more efficient.

This is a "best case" speedup, if you need to OCR or do equation recognition, the speedup will be lower. But it will still be a lot faster.

marker - Fix transformers bugs

Published by VikParuchuri 4 months ago

New transformers version introduces a new kwarg in donut models. Handle this case by ignoring it.
New transformers version breaks MPS compatibility by using torch .isin to do a comparison. Handle this by setting the pytorch mps fallback setting.

marker - Pagination, bug fixes

Published by VikParuchuri 4 months ago

Add a setting to enable output pagination
Enable convert.py to use mps (but less memory efficient than cpu/cuda)
Fix bug with inference ram setting
Fix bug with pdf names with dots in them
Fix bug with images at the end of blocks

marker - Fix convert.py bug

Published by VikParuchuri 5 months ago

Fix model device check.

marker - Specify page range

Published by VikParuchuri 5 months ago

Make it more clear MPS can't be used with convert.py
Specify page range in convert with start_page and max_pages

marker - Python 3.12 compatibility

Published by VikParuchuri 5 months ago

Remove ray to enable python 3.12 compatibility
Removing ray frees a lot of VRAM (since we can use torch shared tensors), so on average with convert.py each process takes 3GB VRAM. This enables much higher throughput (was between 4.5GB and 5GB before).

marker - OCR speedups

Published by VikParuchuri 5 months ago

Pull in new surya and pdftext versions for speedups in OCR and text extraction, respectively
Refine heuristics to reduce OCR false positives (and true positives, unfortunately)
Enable float batch multipliers

marker - Speed improvements

Published by VikParuchuri 5 months ago

Enable parallel text extraction, with worker count settings
Bump surya version to pull in layout/line segmentation speed improvements, and OCR bug fix

marker - Faster OCR

Published by VikParuchuri 5 months ago

OCR is now ~2.5x faster, due to improvements in surya

marker - Speed up inference

Published by VikParuchuri 5 months ago

(from surya) faster ocr, line detection, layout inference
Unpin transformers version after testing

Should be significantly faster now, but haven't fully benchmarked, since I'm running low on time this week!

marker - Fix memory leak

Published by VikParuchuri 5 months ago

Fix a memory leak (fixed in surya, bumped the version). This caused high CPU memory usage on long docs.
Improve load_all_models to take device and dtype

marker - Marker v2

Published by VikParuchuri 6 months ago

Basically a full rewrite!

Main features:

Extracts and saves images
Improved table formatting
Better markdown wrapping
Better reading order on complex docs
Improved OCR engine with more language options
Simple pip package install (no more required system dependencies), so can be used easily on Windows
Can be used commercially (pymupdf and layoutlmv3 dependencies removed)

It takes ~2x as long to run now, but seems like a decent tradeoff.

See the README for details.

Package Rankings

Top 6.64% on Proxy.golang.org

Top 35.67% on Pypi.org

Related Projects

qna

AI pretends to be paper/textbook author, you can ask it questions about the paper as a whole, spe...

Complex-YOLOv4-Pytorch

The PyTorch Implementation based on YOLOv4 of the paper: "Complex-YOLO: Real-time 3D Object Detec...

03 Jul 2020 1,234

textsum

CLI & Python API to easily summarize text-based files with transformers

18 Dec 2022 123

LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

11 Dec 2020 12,124

pdftext

Extract structured text from pdfs quickly

24 Apr 2024 317

Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

02 Jul 2021 1,323

diffusers-torchao

End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 ...

05 Aug 2024 213

nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents

07 Jun 2023 8,824

surya

OCR, layout analysis, reading order, line detection in 90+ languages

10 Jan 2024 6,739

ocrd_detectron2

OCR-D wrapper for detectron2 based segmentation models

AnyText

Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>

18 Sep 2023 4,242

open-parse

Improved file parsing for LLM’s

22 Mar 2024 2,405

gaussian-splatting

Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"

04 Jul 2023 13,857

confectionary

a tool to quickly create sweet PDF files from text files

s3-ocr

Tools for running OCR against files stored in S3

28 Jun 2022 115