Overview
The issue of air pollution is closely related with the urbanization of cities.
People use large amounts of fuel for things like
transportation, cooking, and electricity. These activities contribute to
emissions that affect air quality, and fine particle matter (pm) helps track those
pollution levels. I want to know if machine learning could
help towards calculating air pollution levels based on landscape factors
(density, road density, etc.). This leads to the basis of this study.
Preparation
I started at the U.S. zipcode level. The target was annual average
PM2.5, and predictors captured geography and infrastructure: latitude
and longitude, chemical model output (CMAQ), ZIP and county area and
population, impervious surface at multiple buffers, and road network
signals.
I ran simple linear regression using tidymodels. The recipe filtered collinearity
and removed near-zero-variance terms. I split the data 2/3 training and
1/3 testing, then collapsed city into a binary “In a city” vs
“Not in a city” to avoid sparse one‑hot columns before re‑splitting.