Implementation of Nougat Neural Optical Understanding for Academic Documents
MIT License
Bot releases are hidden (Show)
Published by lukas-blecher about 1 year ago
nougat-small weights
nougat-base weights
a state-of-the-art-level open visual language model | 多模态预训练模型
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Latte: Latent Diffusion Transformer for Video Generation.
pix2tex: Using a ViT to convert images of equations into LaTeX code.
The PyTorch Implementation based on YOLOv4 of the paper: "Complex-YOLO: Real-time 3D Object Detec...
OCR-D wrapper for detectron2 based segmentation models
Modeling, training, eval, and inference code for OLMo
中文对话0.2B小模型(ChatLM-Chinese-0.2B),开源所有数据集来源、数据清洗、tokenizer训练、模型预训练、SFT指令微调、RLHF优化等流程的全部代码。支持下游任务sf...
OCR, layout analysis, reading order, line detection in 90+ languages
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
I decide to sync up this repo and self-critical.pytorch. (The old master is in old master branch ...
GLM (General Language Model)
[CVPR 2024] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
Semantic Image Synthesis with SPADE
[CVPR2020] Adversarial Latent Autoencoders