Multi-digit prediction from Google Street's images using deep CNN with TensorFlow, OpenCV and Python.
MIT License
This project explores how Convolutional Neural Networks (CNNs) can be used to effectively identify a series of digits from real-world images that are obtained from The Street View House Numbers (SVHN) Dataset. CNNs have evolved dramatically every year since the inception of the ImageNet Challenge in 2010.
I am attempting to predict a series of numbers given an image of house numbers from the SVHN dataset. An important thing to take note is that instead of the standard identification of numbers, as with the MNIST dataset, I now need to correctly detect the numbers and the sequence of numbers.
I used Python and Tensorflow to build the model. This implementation also uses TensorBoard extensively for visualizations.
I recommend starting a GPU instance using Amazon's AWS. I have created an image and replicated it across all regions. You can easily run this set of code on the GPU instance within a few minutes. Simply search for TFAMI
under community AMIs
when you are launching your instance. More information on the specific IDs can be obtained from the following Github repository.
mkdir log_trial_1
mkdir log_trial_2
python load_data.py
python model_trial_1.py
tensorboard --logdir=log_trial_1
python model_trial_2.py
tensorboard --logdir=log_trial_2
Port is in use: 6006
if you run tensorboard twice on different trials.lsof -i:6006
or whatever the port number is.kill -9 <PID>
where the PID is the number you can find when you run the command above.tensorboard --logdir=log_trial_2
To guide you through, I have made a detailed report. You can refer to the report here.
This is an open source project governed by the license in this repository.