Investigating Dataset contains information about 10,000+ movies collected from The Movie Database (TMDb)
In this project, I have to choose any one Dataset for investigation out of 4. Click here to open a document with links and information about datasets that I can investigate for this project.
I have choosen TMDb Movie Data
for my Investigation in this project.
For the this project, I will conduct my own data analysis and create a file to share my findings. I will start by taking a look at the dataset and brainstorm what questions I could answer using it. Then i will use pandas and NumPy to answer the questions that I am most interested in, and create a report sharing the answers. I have not been required to use inferential statistics or machine learning to complete this project, but I will make it clear in my communications that my findings are tentative. This project is open-ended in that they aren't looking for one right answer.
In this project, I'll go through the data analysis process and see how everything fits together. I'll use the Python libraries NumPy, pandas, and Matplotlib which make writing data analysis code in Python a lot easier! Not only that, these are sought-after skills by employers!
This project contains 2 files and 2 folder:
data.csv
: The dataset file containing 10k+ entries of movies that I have worked on.report.ipynb
: The investigation of the dataset has been done in this jupyter notebook file.export/
: Folder containing HTML and PDF file of notebook.plots/
: Contains images of all the plots that are displayed in report.ipynb
file.This data set contains information about 10,000 movies collected from The Movie Database (TMDb). Contains data such as title, cast, director, runtime, budget, revenue, release year
etc.
cast
and genres
, contain multiple values separated by pipe (|) characters.cast
column. Nothing to care much of, I leave them as is._adj"
show the budget and revenue of the associated movie in terms of 2010 dollars, accounting for inflation over time.This project requires Python 3 and the following Python libraries installed:
You will also need to have software installed to run and execute a Jupyter Notebook
If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included.
In a terminal or command window, navigate to the top-level project directory Investigate_TMDb_Movies/
(that contains this README) and run one of the following commands:
ipython notebook report.ipynb
or
jupyter notebook report.ipynb
or if you have 'Jupyter Lab' installed
jupyter lab
This will open the Jupyter/iPython Notebook software and project file/folder in your browser.
My project was reviewed by a Udacity reviewer against the Investigating a Dataset rubric. All criteria found in the rubric must be meeting specifications for me to pass.
My Project Review by an Udacity Reviewer