blip-caption

A CLI tool for generating captions for images using Salesforce BLIP.

Installation

Install this tool using pip or pipx:

pipx install blip-caption

The first time you use the tool it will download the model from the Hugging Face model hub.

The small model is 945MB. The large model is 1.8GB. The models will be downloaded and stored in ~/.cache/huggingface/hub/ the first time you use them.

Usage

To generate captions for an image using the small model, run:

blip-caption IMG_5825.jpeg

Example output:

a lizard is sitting on a branch in the woods

To use the larger model, add --large:

blip-caption IMG_5825.jpeg --large

Example output:

there is a chamelon sitting on a branch in the woods

Here's the image I used:

If you pass multiple files the path to each file will be output before its caption:

blip-caption /tmp/photos/*.jpeg
/tmp/photos/IMG_2146.jpeg
a man holding a bowl of salad and laughing
/tmp/photos/IMG_0151.jpeg
a cat laying on a red blanket

JSON output

The --json flag changes the output to look like this:

blip-caption /tmp/photos/*.* --json

[{"path": "/tmp/photos/IMG_2146.jpeg", "caption": "a man holding a bowl of salad and laughing"},
 {"path": "/tmp/photos/IMG_0151.jpeg", "caption": "a cat laying on a red blanket"},
 {"path": "/tmp/photos/IMG_3099.MOV", "error": "cannot identify image file '/tmp/photos/IMG_3099.MOV'"}]

Any errors are returned as a {"path": "...", "error": "error message"} object.

Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:

cd blip-caption
python3 -m venv venv
source venv/bin/activate

Now install the dependencies and test dependencies:

pip install -e '.[test]'

To run the tests:

pytest

Package Rankings

Top 38.27% on Pypi.org

Badges

Extracted from project README

Related Projects

simple-image-recaptioning

Recaption large (Web)Datasets with vllm and save the artifacts.

11 Sep 2024 4

multi-mediawiki-rag

A simple RAG chatbot that can retrieve from a mediawiki data dump

15 Dec 2023 17

instruct-pix2pix

09 Jan 2023 6,280

llm-gpt4all

Plugin for LLM adding support for the GPT4All collection of models

09 Jul 2023 177

python-image-gen

Simple python based app/script using DALL-E

22 Dec 2022 1

clip-text-decoder

Generate text captions for images from their embeddings.

11 Nov 2021 97

img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M u...

11 Aug 2021 3,256

Image-Caption-Generator

The LSTM model generates captions for the input images after extracting features from pre-trained...

07 Dec 2018 48

datasette-render-image-tags

Turn any URLs ending in .jpg/.png/.gif into img tags with width 200

04 Sep 2022 4

llm-clip

Generate embeddings for images and text using CLIP with LLM

12 Sep 2023 60

llm-replicate

LLM plugin for models hosted on Replicate

18 Jul 2023 58

visionscript

A high-level programming language for using computer vision.

04 Jul 2023 342

imaginAIry

Pythonic AI generation of images and videos

12 Sep 2022 7,929

caption-by-committee

Using LLMs and pre-trained caption models for super-human performance on image captioning.

14 Dec 2022 27

clip_music_video

Code for making music videos using CLIP

18 May 2021 170