case-study-accidents

Spark analysis on the accidents-data

Stars
0
Committers
2

PySpark Crash Analysis

This repository contains a PySpark project for analyzing crash data. The project includes various analyses using data from CSV files.

Project Structure

  • utils.py: Contains utility functions, including the read_csv function for reading CSV files into Spark DataFrames.
  • main.py: Contains the main analysis code, including data loading, transformations, and computations.
  • data/: Directory containing CSV files used for analysis.

Prerequisites

  • Python
  • Apache Spark
  • PySpark
  • Required Python packages (listed in requirements.txt)

Setup

  1. Clone the Repository

    git clone https://github.com/yourusername/your-repository.git
    cd your-repository
    
    

Analytics

  • Analysis 1: Find the number of crashes (accidents) in which number of males killed are greater than 2?
  • Analysis 2: How many two wheelers are booked for crashes?
  • Analysis 3: Determine the Top 5 Vehicle Makes of the cars present in the crashes in which driver died and Airbags did not deploy.
  • Analysis 4: Determine number of Vehicles with driver having valid licences involved in hit and run?
  • Analysis 5: Which state has highest number of accidents in which females are not involved?
  • Analysis 6: Which are the Top 3rd to 5th VEH_MAKE_IDs that contribute to a largest number of injuries including death
  • Analysis 7: For all the body styles involved in crashes, mention the top ethnic user group of each unique body style
  • Analysis 8: Among the crashed cars, what are the Top 5 Zip Codes with highest number crashes with alcohols as the contributing factor to a crash (Use Driver Zip Code)
  • Analysis 9: Count of Distinct Crash IDs where No Damaged Property was observed and Damage Level (VEH_DMAG_SCL~) is above 4 and car avails Insurance
  • Analysis 10: Determine the Top 5 Vehicle Makes where drivers are charged with speeding related offences, has licensed Drivers, used top 10 used vehicle colours and has car licensed with the Top 25 states with highest number of offences (to be deduced from the data)
Related Projects