U.S. Auto Accident Analysis
Understanding major factors in crash fatality rates and comparing fatality characteristics by state over time.
Tools
Goals & Data Overview
The project goal was to assess whether crash fatality rates are significantly affected by some of the most widely cited auto accident factors: drinking, drugs, seatbelt use, weather, and whether a pickup truck was involved.
I used spatial analysis to understand which states have the highest crash fatality rates, which factors are most prevalent, and how the factors change over time.
The data used on the project: (1)National Highway Traffic Safety Administration data on 177,410 fatal auto accidents between 2017-2021 and the people involved in them and (2) US Census Bureau state population data for the years 2017-2021.
Methodology
The NHTSA Fatality Analysis Reporting System (FARS) data for 2017-2021 were used. The datasets “Accidents” and “Persons” were joined using VLOOKUP in Excel to create a dataset that included information on all of the people involved in every fatal auto accident within this time period.
Python data analysis packages were then used for additional analysis via Jupyter Notebooks.
Findings
Exploratory Data Analysis
Initial exploratory data analysis found that the factor that had the highest correlation with Crash Fatality Rate was whether or not drivers wore seatbelts. Additionally, most crashes involve the death of half or all people involved in the collision.
Fatalities Over Time
Traffic fatalities are cyclical in nature - they peak in summer months and decline during winter. This is likely due to less risky behaviors during colder months when people tend to stay indoors more. Fast sports cars are typically driven in the summer months, which could be a contributing factor.
Key Factors Affecting Crash Fatality
Simple multivariate regression analysis was performed to attempt to quantify the effect that each factor has on the crash fatality rate. The most significant finding was that drivers not wearing seatbelts increases crash fatality rate by 12.3%.
Areas with Highest Crash Fatalities
In 2021 (the most recent year complete data is available), Mississippi has the highest crash fatality rate at 26 people per 100,000. South Carolina, Arkansas, New Mexico, Louisiana, and Montana also have rates above 20 per 100,000.
Key Takeaways
Seatbelt Usage has the Highest Impact on Crash Fatality
Out of all factors included in the analysis, the factor that influenced crash fatality rate the most was seatbelt usage. Crash fatality rates increase by 12.3% when drivers don’t wear seatbelts, all else equal.
Mississippi, South Carolina, and Arkansas have the Highest Fatality Rates
These states had the highest fatality rates consistently. Further research should be performed to understand why these states have such high crash fatality rates.
Each year has a cycle of high fatalities during active summer months and low fatalities during the winter. 2020 had the lowest total fatalities around the time the COVID-19 pandemic started and record high fatalities during the summer.
Fatalities Peak in Late Summer and Dip in Winter
Challenges
Creating the dummy variables was difficult and time-consuming, especially for the Weather variable because so many different types of weather exist. Pickup trucks were also difficult to categorize. Meticulous selection of attributes was required to obtain the desired results.
The heatmap didn’t identify especially strong correlations, so additional exploratory analysis was required to uncover deeper insights.
Graphs that were generated using the Python package matplotlip often didn’t have correct x labeling, so manual labeling was required in most cases