FreshTrack: Case Study

Overview

The issue of air pollution is closely related with the urbanization of cities. People use large amounts of fuel for things like transportation, cooking, and electricity. These activities contribute to emissions that affect air quality, and fine particle matter (pm) helps track those pollution levels. I want to know if machine learning could help towards calculating air pollution levels based on landscape factors (density, road density, etc.). This leads to the basis of this study.

Preparation

I started at the U.S. zipcode level. The target was annual average PM2.5, and predictors captured geography and infrastructure: latitude and longitude, chemical model output (CMAQ), ZIP and county area and population, impervious surface at multiple buffers, and road network signals.

I ran simple linear regression using tidymodels. The recipe filtered collinearity and removed near-zero-variance terms. I split the data 2/3 training and 1/3 testing, then collapsed city into a binary “In a city” vs “Not in a city” to avoid sparse one‑hot columns before re‑splitting.

Results

Under construction...

Extension

..