This project implements a complete pipeline for taxi fare prediction in New York City, using an event-based data stream and a data lake for data storage and analysis.
The project was developed with:
โ Python โ Apache Kafka โ Apache Airflow โ Apache Spark โ Fast API โ Docker
taxi-fare/
โ
โโโ dags/
โ โโโ taxi_raides_dag.py
โโโ data/
โ โโโ train.csv
โโโ docker/
โ โโโ airflow.dockerfile
โ โโโ api.dockerfile
โโโ jars/
โ โโโ aws-java-sdk-bundle-1.12.262.jar
โ โโโ hadoop-aws-3.3.4.jar
โโโ src/
โ โโโ api.py
โ โโโ consolidate.py
โ โโโ consumer.py
โ โโโ producer.py
โ โโโ utils.py
โโโ docker-compose.yml
โโโ requirements.txt
โโโ README.md
Clone the project:
$ git clone https://github.com/GesielLopes/taxi-fare.git
Access the project folder:
$ cd taxi-fare
Download the train.csv file in https://www.kaggle.com/competitions/new-york-city-taxi-fare-prediction/data and save it in the data folder
Download the aws-java-sdk-bundle-1.12.262.jar in https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-bundle/1.12.262 and save it in the jars folder
Download the hadoop-aws-3.3.4.jar in https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/3.3.4 and save it in the jars folder
Execute docker compose to run the data project:
# Execute docker compose
$ docker compose up -d
Accessing via terminal, with curl for example:
$ curl -X 'GET' 'http://localhost:8000/api/' -H 'accept: application/json'
$ curl -X 'GET' 'http://localhost:8000/api/?pickup_date=2011-12-13' -H 'accept: application/json'
$ curl -X 'GET' 'http://localhost:8000/api/?pickup_longitude=-73.9755630493164&pickup_latitude=40.752681732177734' -H 'accept: application/json'
$ curl -X 'GET' 'http://localhost:8000/api/?pickup_date=2011-12-13&pickup_longitude=-73.9755630493164&pickup_latitude=40.752681732177734' -H 'accept: application/json'
Accessing via browser. Just access via url:
http://localhost:8000/api
http://localhost:8000/api/?pickup_date=2011-12-13
http://localhost:8000/api/?pickup_longitude=-73.9755630493164&pickup_latitude=40.752681732177734
http://localhost:8000/api/?pickup_date=2011-12-13&pickup_longitude=-73.9755630493164&pickup_latitude=40.752681732177734
Accessing API Swagger via browser. Just access via url:
http://localhost:8000/docs
This project is licensed under the MIT License.