CVND-Image-Captioning-Project

This implementation of the CVND-Image-Captioning-Project is built for Udacity's Computer Vision Nanodegree.

Dependencies:

Libraries

Install PyTorch and torchvision; this should install the latest version of PyTorch.

conda install pytorch torchvision -c pytorch

conda install pytorch-cpu -c pytorch
pip install torchvision

Install a few required pip packages, which are specified in the requirements text file (including OpenCV).

pip install -r requirements.txt

Dstaset

To download the COCO dataset, clone this repo: https://github.com/cocodataset/cocoapi

git clone https://github.com/cocodataset/cocoapi.git

cd cocoapi/PythonAPI  
make  
cd ..

Download some specific data from here: http://cocodataset.org/#download (described below)

Under Annotations, download:
- 2014 Train/Val annotations [241MB] (extract captions_train2014.json and captions_val2014.json, and place at locations cocoapi/annotations/captions_train2014.json and cocoapi/annotations/captions_val2014.json, respectively)
- 2014 Testing Image info [1MB] (extract image_info_test2014.json and place at location cocoapi/annotations/image_info_test2014.json)
Under Images, download:
- 2014 Train images [83K/13GB] (extract the train2014 folder and place at location cocoapi/images/train2014/)
- 2014 Val images [41K/6GB] (extract the val2014 folder and place at location cocoapi/images/val2014/)
- 2014 Test images [41K/6GB] (extract the test2014 folder and place at location cocoapi/images/test2014/)

The project is structured as a series of Jupyter notebooks and .py files:

The final output is one senense per image. In 3_Inference.ipynb, results are given for some images chosen randomly.

Overall, the model identifies objects and describes them correctly but may give partly correct explanation of some images.

Correctly captioned images

Almost correctly captioned images

Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of Py...

Text to Image Generation (Reverse image captioning): This task is just the reverse of image capti...

Transformer-based image captioning extension for pytorch/fairseq

Multilayer Authenticity Identifier (MAI), a CNN model that attempts to identify synthetic AI images.

[ICPR 2020] "Neural Compression and Filtering for Edge-assisted Real-time Object Detection in Cha...