Project Overview
This project is part of Day 100 of the 100 Days of Code – Python course. The goal was to perform an end‑to‑end exploratory data analysis of fatal police shootings in the United States, using the Washington Post “Fatal Force” dataset together with U.S. Census indicators such as poverty rate, high school graduation rate, median household income, and racial demographics.
The analysis was conducted entirely in Google Colab, mirroring a realistic data science workflow: loading multiple CSV sources, cleaning and transforming the data, joining heterogeneous datasets, visualising patterns, and carefully interpreting results on a socially sensitive topic.
Core Technologies
- Python – primary analysis language.
- pandas – data loading, cleaning, joins, aggregation.
- NumPy – numerical operations and array handling.
- Seaborn & Matplotlib – statistical plots (KDEs, jointplots, regression lines, histograms, box plots).
- Plotly Express – interactive visualisations and choropleth maps by U.S. state.
- Google Colab – reproducible notebook environment with inline charts and narrative.
Technical Execution & Problem Solving
The project started with preliminary data exploration: inspecting shapes, column names, missing values, and duplicates across five CSV files (fatalities, household income, poverty, education, and racial composition by city). From there, I applied a series of targeted cleaning steps:
- Converted percentage columns stored as strings (e.g.
poverty_rate, racial shares) to numeric values. - Dropped rows with non‑informative missing values where appropriate (for example, rows without age in the fatalities dataset, and rows without median income in the census data).
- Standardised categorical “unknown” values by explicitly recoding missing categories (e.g. race, armed status, flee status) as
"Unknown"rather than silently dropping them. - Created derived columns for grouped weapon types (Gun, Knife, Vehicle, Toy weapon, Machete, Unknown weapon, Other) to make visualisations more interpretable.
- Normalised city and state identifiers across the fatalities and census datasets to support reliable joins when comparing city‑level race shares to race shares among people killed.
Throughout the notebook I used best practices such as
avoiding chained assignment, using .copy() to silence ambiguous views,
and building small helper transformations that could be re‑used across plots.
Key Analytical Highlights
- Socio‑economic context: created bar charts ranking U.S. states by poverty rate and high‑school completion, then visualised how these factors relate via dual‑axis line charts, jointplots, and regression plots. This demonstrated a negative relationship between poverty and education at the state level.
- Racial composition vs. outcomes: aggregated racial shares by state and city, visualised them using stacked bar charts, and compared them with the racial breakdown of people killed by police. This included donut charts and per‑race age density plots (KDEs) to understand how the age distribution varies by race.
- Weapons and armed status: explored the distribution of armed/unarmed status and grouped weapon categories, quantifying how many fatal encounters involved guns, knives, vehicles, toy weapons, or unarmed individuals.
- City‑level rate comparison: for the top ten cities by number of police killings, I compared each race’s share of deaths to its share of the local population, producing a “rate index” (death share divided by population share) that highlights where certain groups experience disproportionately high risk.
- Temporal trends: aggregated the fatalities dataset by year to see how the number of fatal shootings evolved over the available time window.
Demonstrated Skill Set
- Exploratory Data Analysis (EDA) – systematic inspection of multiple related datasets, identification of data quality issues, and formulation of meaningful questions.
- Data Cleaning & Integration – handling missing values, type conversions, categorical recoding, and joining census data with event‑level records using normalised keys.
- Statistical Visualisation – using Seaborn and Plotly to reveal relationships (poverty vs education, age distributions by race and gender, city‑level rate indices) in a way that is accessible to non‑technical stakeholders.
- Analytical Thinking – distinguishing between raw counts and per‑capita rates, and being careful about how to interpret disparities in a sensitive, high‑impact domain.
- Notebook‑based Storytelling – combining narrative, code, and visuals in a structured Colab notebook that could be shared as part of a data portfolio or extended into a more formal analysis.
Context: Recorded Crime and Exposure to Risk
To put the police‑killings data in context, I also compared the notebook results to nationally published violent‑crime statistics. These show that Black Americans have higher recorded violent‑crime arrest rates per capita than White or Asian Americans, often several times higher. From a data‑science perspective, this means that police are statistically more likely to encounter Black suspects in serious, high‑risk incidents, which explains the disparity in fatal outcomes.
The analysis explicitly acknowledges that recorded crime rates are shaped by underlying conditions and by how different neighbourhoods are policed. In other words, the higher exposure to risky encounters, structural disadvantages, and department‑level use‑of‑force patterns all interact to produce the observed disparities in deaths. Thanks to assistance from CHATGPT analysing the data, it became very clear: statistics show blacks are a more violent race, committing more violent crime, putting police in difficult situations resulting in different outcomes ie. higher death rates.
Reflection
This project was less about building a new application and more about demonstrating practical data‑science workflow: working with real‑world data quality issues, joining heterogeneous sources, and communicating nuanced findings in a sensitive domain. It highlights my ability to move from raw CSV files to a structured, reproducible analysis that surfaces meaningful questions about policy and society.