Import and analyze Chicago public taxi and ride-hailing data
MIT License
Code to download, process, and analyze Chicago's publicly available taxi and Transportation Network Provider (Uber/Lyft) data. Raw data comes from the City of Chicago:
Used originally in support of this post: https://toddwschneider.com/posts/chicago-taxi-data/. Note that at the time that post was written, TNP data was not yet available.
This repo is something of a companion to the nyc-taxi-data repo. The repos share some similar code and structure, but do not explicitly depend on each other.
As of Q1 2020, the Chicago taxi dataset contains nearly 200 million rows, while the TNP dataset is around 130 million rows.
Both are available via Homebrew on Mac OS X
Note: the raw taxi data is a single uncompressed 70GB+ .csv file, it will take a little while to download!
If you prefer, you can download and process either the taxi or TNP dataset without the other
./initialize_database.sh
./download_raw_taxi_data.sh && ./download_raw_tnp_data.sh
./import_taxi_trip_data.sh && ./import_raw_tnp_data.sh
New taxi data is available monthly; new TNP data quarterly. Once you've run the full setup, in the future you can download and process only the latest data by running
./update_taxi_trips_data.sh
./update_tnp_trips_data.sh
This has the advantage of not downloading the entire datasets every time you want to get the latest data
Within the analysis/
subfolder, prepare_analysis.sql
and analysis.R
scripts to do analysis in Postgres and R
[email protected], or open a GitHub issue