Close
About
FAQ
Home
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Using multi-angle imaging spectroradiometer aerosol mixture properties and meteorology for PM₂.₅ assessment in Iran
(USC Thesis Other)
Using multi-angle imaging spectroradiometer aerosol mixture properties and meteorology for PM₂.₅ assessment in Iran
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
USING MULTI-ANGLE IMAGING SPECTRORADIOMETER AEROSOL MIXTURE
PROPERTIES AND METEOROLOGY FOR PM
2.5
ASSESSMENT IN IRAN
By
Yifang Zhang
A Thesis Presented to the
FACULTY OF THE USC KECK SCHOOL OF MEDICINE
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
MASTER OF SCIENCE
BIOSTATISTICS
May 2020
Copyright 2020 Yifang Zhang
ii
Table of Contents
List of Tables.......................................................................................................................... iii
List of Figures......................................................................................................................... iv
Abstract.................................................................................................................................... v
1 Introduction........................................................................................................................... 1
2 Methods................................................................................................................................. 4
2.1 Air Pollution Monitoring and Meteorological Data............................................... 4
2.2 MISR AOD............................................................................................................. 5
2.3 Data Preprocessing and Integration........................................................................ 6
2.4 Air Pollution Prediction Using Machine Learning Methods.................................. 7
3 Results and Conclusions........................................................................................................ 8
3.1 Prediction Performance........................................................................................... 8
3.2 Discussion............................................................................................................... 9
References ............................................................................................................................... 11
iii
List of Tables
Table 1. List of Data used in the Study.................................................................................... 14
Table 2. Description of Machine Learning Methods Applied to the AOD Products Retrievals and
Meteorological Variables to Predict PM2.5............................................................................... 15
Table 3. Characteristics of the Study Population..................................................................... 16
Table 4. Sample Sizes of Spatiotemporally Matched MISR Mixtures and Meteorological
Variables with Test R
2
for each Machine Learning Method and Pollutant. Largest R
2
for each
Pollutant is indicated in bold.................................................................................................... 16
iv
List of Figures
Figure 1. Iran Study Region Showing 33 Ground-Level Air Pollution Monitoring
Stations......................................................................................................................................17
Figure 2. Iran Study Region Showing 23 Ground-level Air Pollution Monitoring Stations
Concentrated in Tehran............................................................................................................. 18
Figure 3. Variable Importance based on Gradient Boosting Models: Daily Averaged PM2.5
Model (left); MISR Overpass Time Averaged PM2.5 Model (Right).......................................19
Figure 4. Predicted PM2.5 from Gradient Boosting over Tehran City Averaged over Year 2013
for Daily-Averaged PM2.5 (top) and MISR Overpass Time Averaged PM2.5 (Bottom)........ 20
v
Abstract
Particulate matter air pollution with aerodynamic diameter less than 2.5 µm (PM2.5) has
been associated with numerous detrimental health effects and is therefore of important concern
for public health. Research efforts to better estimate and predict PM2.5 have recently incorporated
satellite observations of aerosol optical depth (AOD) due to its spatial and temporal coverage,
particularly in areas of the world such as the Middle East where there are limited ground-based
monitoring networks. The Multiangle Imaging SpectroRadiometer (MISR) instrument onboard
NASA’s Terra satellite was launched in late 1999 and provides operational AOD as well as AOD
properties including information on particle size, shape, and absorption. Furthermore, at 4.4 km x
4.4 km, the spatial resolution of MISR’s Version 23 aerosol product is well suited for
neighborhood-level health effects assessments.
Leveraging 33 PM2.5 ground-monitoring locations across Iran we linked coincident MISR
overpasses as well as gridded meteorological data including 10m wind components (u and v),
temperature, boundary layer height, downward UV radiation, evaporation, surface pressure,
cloud cover, precipitation and humidity, vegetation cover and dust. All data were hourly, so we
examined both daily averages as well as averaged during the MISR overpass time (10:00-13:00).
Three machine learning algorithms were used separately for prediction and compared: Gradient
Boosting (GB), Random Forest (RF) and Support Vector Machines (SVM). Gradient Boosting
shows the best prediction performance among the three methods with R
2
of 0.619 for daily-
averaged PM2.5 and R
2
of 0.554 for the MISR overpass time averaged PM2.5. These results
indicate that the 4.4 km MISR AOD product and meteorological variables can provide reliable
predictions of PM2.5 over Iran.
1
1. Introduction
Air pollution is one of the major environmental and public health concerns of Middle
East countries such as Iran, which has both local sources as well as dust storms given its
geographic position. Iran is in the dust belt region due to its arid and semiarid climates and its
widely covered desert including Great Kavir desert and Lut deserts (Rezaei et al, 2019; Zahedi et
al, 2018). Particulate matter with aerodynamic diameter less than 2.5 µm (PM2.5) is one of the
major pollutants, which can be associated with an increased risk of developing several chronic
diseases like cancer, respiratory disease and other health problems (Joharestani et al, 2019; Lee
et al, 2012). One study showed that Iran’s annual PM2.5 concentrations between 2015 and 2018
reached 86.8 ± 33 µg m
-3
, which far exceeds the PM2.5 annual mean guideline of 10 µg m
-3
suggested by World Health Organization (WHO) (Joharestani et al, 2019). Long-term levels of
this magnitude have been attributed to an excess of approximately 41,000 (95% CI 35,634 -
47,014) deaths and 3,000,000 (95% CI 2,632,101 – 3,389,342) years of life lost (Shamispour et
al., 2019). Thus, spatial and temporally resolved PM2.5 estimates in Iran can not only help
facilitate health effects studies but are important in developing appropriate solutions for the
government to mitigate their air pollution problem.
Ground-based monitoring for collecting PM2.5 concentration data is very important, but
such measurements have limited spatial coverage particularly in areas of the Middle East. The
irregularly distributed ground monitor stations in Iran and sparse coverage of rural and desert
areas can reduce the accuracy and representatives of the measurements (Lee et al, 2012).
Integrating remote sensing data with ground-based measurements using appropriate statistical or
machine learning methods facilitates PM2.5 predictions with better spatial and temporal coverage
(Franklin et al, 2018; Ghotbi et al, 2016; Tian et al, 2010; Lee et al, 2012). Specifically, polar-
2
orbiting satellite remote sensing observations of aerosol optical depth (AOD) provides an
effective and relatively low-cost means to supplementing ground-level PM2.5 monitoring
networks. Several satellite instruments including the Moderate Resolution Imaging
Spectroradiometer (MODIS) and Multi-angle Imaging SpectroRadiometer (MISR) onboard
NASA Terra satellite retrieve AOD and other aerosol properties. However, with the ability to
retrieve AOD globally and its nine camera angles, MISR can detect particle optical and
microphysical properties more reliably than other instruments (Franklin et al, 2018; Rezaei et al,
2019). MISR distinguishes AOD by size (including small, medium and large), shape (spherical,
non-spherical) and absorption (absorbing, non-absorbing) (Franklin et al, 2018). MISR has been
successfully adopted to estimate PM2.5 over a variety of regions globally (Franklin et al, 2018;
Meng et al, 2018) as its retrieval is it also shows that the accuracy of the prediction model can be
varied among different regions (Ghotbi et al, 2016). And the AOD values are limited to retrieved
for snow-covered and desert-covered regions due to the bright surface reflectance (Franklin et al,
2018; Lee et al, 2012).
Previous research has predicted PM2.5 from satellite AOD using linear regression (Van
Donkelaar et al, 2019). Although useful for its simplicity, there is resultant measurement error in
the exposure estimates based on these methods that can lead to biases and underestimated
standard errors of health effects estimates (Alexeeff et al, 2014). Incorporating additional
spatiotemporal information such as meteorology and land use and using non-linear or machine
learning methods reduces measurement error, resulting in improved downstream epidemiological
assessments (Ghotbi et al, 2016).
The objective of this study is to develop an effective prediction model to spatially and
temporally predict PM2.5 concentrations over Iran from 2009 to 2014. We combined 4.4 km
3
MISR Version 23 (V23) global aerosol product with ground-level monitoring data, local
meteorological parameters and land use information. We applied and evaluated several machine
learning methods including Gradient Boosting, Random Forest, and Support Vector Machine.
With the best performing model, we use observed AODs and meteorology data over the country
to produce daily maps of predicted PM2.5 concentrations.
4
2. Methods
Iran is located between latitudes 24° and 40° N, and longitudes 44° and 64° E (Figure 1
and 2), and has a highly mixed landscape with desert in the eastern part, mixed forests in the
northern part close to the Caspian Sea, the Iranian Plateau in the middle and mountains in the
western part whose highest point is 5,610 m. Due its diverse topography, the air quality can be
affected by different locations with heterogenous elevation, wind and temperature and other
climatological features. Iran’s climate and precipitation also varies with longitude and latitude,
ranging from arid and semi-arid from eastern part to the subtropical climate among eastern and
forest regions. The data sources, which are detailed below, include ground measured PM2.5
concentrations, satellite derived AODs at 4.4 km spatial resolution, and meteorological data
(Table 1).
2.1 Air Pollution Monitoring and Meteorological Data
At 33 ground-level air pollution monitoring stations within Iran, both PM2.5 and PM10
concentrations were collected on an hourly basis from 2009 to 2014. In this study we are only
concerned with PM2.5. As shown in Figure 1, the spatial distribution of monitoring stations is
quite clustered in major metropolitan areas. We averaged hourly PM2.5 concentrations to produce
a 24-hour (daily) average and a three-hour average (from 10:00 to 13:00) to coincide with the
MISR overpass time.
Meteorological data were retrieved from the European Centre for Medium-Range
Weather Forecasts (ECMWF), a research institute that provides global meteorological and
weather prediction data. The ECMWF ERA5 is a re-analysis product that combines vast amounts
of historical observations from both satellites and weather stations into global estimates using
5
advanced modelling and data assimilation systems
(https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5 ). The ERA5 provides
hourly data on a fixed grid having 30 km x 30 km spatial resolution for variables including 10m
wind (u and v), temperature, boundary layer height, downward UV radiation, evaporation,
surface pressure, cloud cover, precipitation and humidity. It also provides land use data including
vegetation cover as well as estimates of dust. In the PM2.5 prediction models we considered
meteorological and land use variables as inputs that could affect the formation, physical
characteristics and distribution of the particles and pollutants (Ghotbi et al, 2016).
2.2 MISR AOD
The MISR instrument onboard the Terra satellite from NASA was launched in 1999 and
provides climate information in four spectral bands from nine different viewing angles (Franklin
et al, 2018; You et al, 2015). The Terra satellite overpass time is between 10:00 and 13:00 local
time. This analysis used observations from new Level 2 version 23 (V23) operational MISR
aerosol product (Garay et al, 2017) for the years 2009–2014, which are available from the NASA
Langley Research Center's Atmospheric Science Data Center (ASDC) Distributed Active
Archive Center (DAAC) (https://eosweb.larc.nasa.gov/project/misr/cgas_table). The spatial
resolution of V23 aerosol product is 4.4 km x 4.4 km, which is much narrower compared to the
MISR aerosol product version 22 (V22) with a resolution of 17.6 km, and also provides better
accuracy (Garay et al, 2017). However, the MISR aerosol data are not on a fixed grid as many
other satellite products, so they are considered centroid points with a resolution of 4.4 km.
The MISR aerosol data used in this study includes total column AOD, size-fractionated
AOD including AOD small, AOD medium and AOD large, and other component-particle optical
6
properties covering ranges of absorbing, non-absorbing, spherical, and non-spherical aerosols.
We spatially and temporally matched ground measured PM2.5 with coincident overpasses of the
MISR V32 AOD and AOD properties.
2.3 Data Preprocessing and Integration
Since the datasets from three sources are utilized in the study, they needed to be
processed and merged to be on consistent temporal and spatial resolutions for further analysis.
To avoid extreme values, the PM2.5 measurements were filtered to exclude values exceeding
smaller than the 1% quantile and larger than the 99% quantile. Missing values, negative values
and extreme values were removed, and only complete cases are retained for analysis. The
longitude and latitude for all datasets were projected to a local UTM zone 40N projection (EPSG
of 32640) in units of km.
Many of the 33 ground-level monitoring stations in Iran were clustered closely to each
other in metropolitan areas, within the MISR resolution of 4.4 km. To avoid repeated linkages
and similarity of AOD product information within this resolution we found that matching the
nearest MISR observation to the PM2.5 monitoring site based on the minimum distance calculated
between the two was more appropriate, compared to taking the average among MISR pixels
within a 4.4 km buffer centered at each monitoring site. The same matching strategy was used to
match the ERA5 meteorological data with the monitoring sites, taking the nearest 30 km grid cell
to each of the 33 ground monitor sites.
Once matched, two different averages of PM2.5 were calculated; one taking 24-hour
(daily) average of the hourly concentrations, and the other is taking average from 10:00 to 13:00,
which is the time period of the satellite overpass. Measurements of MISR 4.4 km products and
7
meteorological variables were both taking 24-hour (daily) average due to the lack of hourly
observations.
In addition to meteorology and land use from ERA5, we created temporal variables
including Julian date, month, and day of the year to include in the models. We also used the
monitoring sites instead of the coordinates (latitude and longitude) of MISR observations as
geospatial predictors to introduce additional spatial variability into the models.
2.4 Air Pollution Prediction Using Machine Learning Methods
Gradient Boosting (GB), Random Forest (RF) and Support Vector Machines (SVM), all
in a regression setting, were each considered as the prediction models for daily-averaged PM2.5 or
MISR overpass time averaged PM2.5. A brief description of each method is provided in Table 2.
Validation included first setting a unique seed and splitting the data into a 70% training
set and a 30% testing set. Then 10-fold cross validation is used for parameter tuning for GB and
SVM, and 500 trees are used for random forest. After parameter tuning, we used the optimal
combination of tuning parameters to train each model on the training set and predict on the
testing set. The testing R
2
is used as the final selection criteria among the machine learning
methods.
8
3. Results and Conclusions
3.1 Prediction Performance
After matching ground measurement of PM2.5 with MISR AODs and meteorological
variables, we got the complete dataset used for analysis including 601 observations and 31
relevant variables for the years 2009 - 2014 for prediction. (See Table 3) The performance of
each machine learning method is assessed by test R
2
. (See Table 4) Gradient Boosting performs
best among the three methods both for daily-averaged PM2.5 and MISR ovepass time averaged
PM2.5, with the highest R
2
of 0.619 and 0.554 respectively. And SVM performed poorly for both
two kinds of PM2.5 measurements, with R
2
of 0.192 and 0.198 respectively. The performance of
random forest for the daily-averaged PM2.5 is a little better than the random effect, with R
2
=
0.516. The prediction performance for daily-averaged PM2.5 has an overall higher R
2
than that of
MISR overpass time PM2.5. Figure 3 shows the variable importance based on the gradient
boosting model, which indicates that elevation, longitude, latitude, high vegetation cover,
medium AOD, instantaneous 10-meter wind gust, surface pressure and day of the year are the
most influential variables for both daily averaged PM2.5 and MISR overpass time averaged PM2.5.
The difference between two models is: julian date and evaporation, which are relatively more
influential in the daily averaged model; small AOD and non-spherical AOD are relatively more
influential in the MISR overpass model.
Using the best performing regression models, we predicted values of the daily-averaged
PM2.5 and PM2.5 for the MISR overpass time over Tehran, the capital of Iran, with AOD
products, MISR pixels, and meteorological data matched to the nearest MISR pixel. Prediction
maps were generated for the two PM2.5 averaging times averaged over the year 2013 (See Figure
2). The predicted daily-averaged PM2.5 peaked at 128.51 µg m
-3
, compared to 125.771 µg m
-3
for
9
MISR overpass time averaged predicted value, which shows in the south west area of Tehran.
And the lowest predicted values are produced in the north of Tehran, which are 11.98 µg m
-3
on
the daily basis and 8.451 µg m
-3
on the MISR overpass time basis.
3.2 Discussion
This analysis is an attempt to produce reliable prediction of PM2.5 concentrations by using
MISR 4.4km-resolution AOD components and meteorological information. Among three tests
machine learning methods, gradient boosting was the most effective at predicting PM2.5 using
MISR AOD products matched with meteorology at 33 ground PM2.5 monitoring sites with a R
2
of
0.619 and 0.554 for the 24-hour and 3-hour (overpass time) averages, respectively. Interestingly
the 24-hour average model for PM2.5 performed better than the model that was temporally
matched with the overpass time.
The V23 4.4km x 4.4km AOD products have shown to be a more effective method for
the PM2.5 prediction, compared to the historical MISR product with 17.6 km spatial resolution
and MODIS AOD (Franklin et al, 2017). Our results are in line with previous studies using
MISR AOD (Franklin et al 2018; Meng et al, 2018; Chau et al 2020). These studies have
observed a R
2
of 0.461 for the PM2.5 prediction with AOD mixtures and ground monitor data
matched by 10 km buffer (Franklin et al, 2018) in Mongolia in the absence of other predictors
such as meteorology. Using random forest and generalized additive models, Meng et al (2018)
found a R
2
of 0.66 obtained by prediction model with MISR AOD, meteorology and land use
information over California (Meng et al, 2018). Similarly, over California, Chau et al (2020)
found a R
2
of 0.68 for predicting PM2.5 in Southern California. Meteorology and land use are
clearly important predictors in these studies.
10
According to these studies, it is shown that the choices for spatial matching by 4.4 km x
4.4 km buffer or 10km x 10km buffer, or by nearest distance may lead to different prediction
performance. And the distribution of monitor stations among Iran is mostly concentrated in the
urban area and such sparse collection of data should be considered as a problem when doing
prediction for PM2.5 concentrations.
Future work including the considerations of the complexity of terrain, land use
information and more ground measurements among Iran will help provide more information in
the air pollution prediction.
11
References
[1] Franklin, M., Chau K., Kalashnikova O., Garay M., Enebish T., & Sorek-Hamer M.. (2018).
Using Multi-Angle Imaging SpectroRadiometer Aerosol Mixture Properties for Air Quality
Assessment in Mongolia. Remote Sensing, 10(8). https://doi.org/10.3390/rs10081317
[2] Ghotbi, S., Sotoudeheian, S., & Arhami, M. (2016). Estimating urban ground-level PM10
using MODIS 3km AOD product and meteorological parameters from WRF model. Atmospheric
Environment, 141, 333–346. https://doi.org/10.1016/j.atmosenv.2016.06.057
[3] Reports on Atmospheric Pollution Findings from Tarbiat Modares University Provide New
Insights (Analysis of Spatio-temporal Dust Aerosol Frequency Over Iran Based On Satellite
Data). (Report). (2019). Global Warming Focus.
[4] Lim, C., Thurston, G., Shamy, M., Alghamdi, M., Khoder, M., Mohorjy, A., … Costa, M.
(2018). Temporal variations of fine and coarse particulate matter sources in Jeddah, Saudi
Arabia. Journal of the Air & Waste Management Association, 68(2), 123–138.
https://doi.org/10.1080/10962247.2017.1344158
[6] Mirzaei, M., Amanollahi, J., & Tzanis, C. (2019). Evaluation of linear, nonlinear, and hybrid
models for predicting PM 2.5 based on a GTWR model and MODIS AOD data. Air Quality,
Atmosphere and Health, 12(10), 1215–1224. https://doi.org/10.1007/s11869-019-00739-z
[7] Joharestani, M., Cao, C., Ni, X., Bashir, B., & Talebiesfandarani, S. (2019). PM 2.5
prediction based on random forest, XGBoost, and deep learning using multisource remote
sensing data. Atmosphere, 10(7). https://doi.org/10.3390/atmos10070373
[8] Zahedi Asl, S., Farid, A., & Choi, Y. (n.d.). Assessment of CALIOP and MODIS aerosol
products over Iran to explore air quality. Theoretical and Applied Climatology, 137(1), 117–131.
https://doi.org/10.1007/s00704-018-2555-9
12
[9] Tian, J., & Chen, D. (2010). A semi-empirical model for predicting hourly ground-level fine
particulate matter (PM 2.5) concentration in southern Ontario from satellite remote sensing and
ground-based meteorological measurements. Remote Sensing of Environment, 114(2), 221–229.
https://doi.org/10.1016/j.rse.2009.09.011
[10] Lee, H., Coull, B., Bell, M., & Koutrakis, P. (2012). Use of satellite-based aerosol optical
depth and spatial clustering to predict ambient PM2.5 concentrations. Environmental Research,
118, 8–15. https://doi.org/10.1016/j.envres.2012.06.011
[11] Meng, X., Garay, M., Diner, D., Kalashnikova, O., Xu, J., & Liu, Y. (2018). Estimating
PM2.5 speciation concentrations using prototype 4.4 km-resolution MISR aerosol properties over
Southern California. Atmospheric Environment, 181, 70–81.
https://doi.org/10.1016/j.atmosenv.2018.03.019
[12] You, W., Zang, Z., Pan, X., Zhang, L., & Chen, D. (2015). Estimating PM2.5 in Xi’an,
China using aerosol optical depth: A comparison between the MODIS and MISR retrieval
models. Science of the Total Environment, 505, 1156–1165.
https://doi.org/10.1016/j.scitotenv.2014.11.024
[13] Franklin, M., Kalashnikova, O., & Garay, M. (2017). Size-resolved particulate matter
concentrations derived from 4.4km-resolution size-fractionated Multi-angle Imaging
SpectroRadiometer (MISR) aerosol optical depth over Southern California. Remote Sensing of
Environment, 196, 312–323. https://doi.org/10.1016/j.rse.2017.05.002
[14] Van Donkelaar, A., Martin, R. V., Li, C., & Burnett, R. T. (2019). Regional Estimates of
Chemical Composition of Fine Particulate Matter Using a Combined Geoscience-Statistical
Method with Information from Satellites, Models, and Monitors [Research-article].
Environmental Science and Technology, 53(5), 2595–2611.
13
[15] Alexeeff, S. E., Schwartz, J., Kloog, I., Chudnovsky, A., Koutrakis, P., & Coull, B. a.
(2014). Consequences of kriging and land use regression for PM2.5 predictions in epidemiologic
analyses: insights into spatial variability using high-resolution satellite data. Journal of Exposure
Science & Environmental Epidemiology, October 2013, 1–7.
[16] Garay, M.J.; Kalashnikova, O.V.; Bull, M.A. Development and assessment of a higher-
spatial-resolution (4.4 km) MISR aerosol optical depth product using AERONET-DRAGON
data. Atmos. Chem. Phys. 2017, 17, 5095–5106.
[17] Chau, K., Franklin, M., & Gauderman, W. J. (2020). Satellite-Derived PM2.5 Composition
and Its Differential Effect on Children’s Lung Function. Remote Sensing, 12(1028).
https://doi.org/10.3390/rs12061028
[18] Shamsipour, M., Hassanvand, M. S., Gohari, K., Yunesian, M., Fotouhi, A., Naddafi, K.,
Sheidaei, A., Faridi, S., Akhlaghi, A. A., Rabiei, K., Mehdipour, P., Mahdavi, M., Amini, H., &
Farzadfar, F. (2019). National and sub-national exposure to ambient fine particulate matter
(PM2.5) and its attributable burden of disease in Iran from 1990 to 2016. Environmental
Pollution, 255(10), 1–10. https://doi.org/10.1016/j.envpol.2019.113173
14
Tables
Table 1. List of Data used in the Study.
Data Type Parameter Abbreviation Unit Source
Spatial Longitude Lon_utm Km
Latitude Lat_utm Km
Elevation Elev_avg Km
Time Julian date Julian
Month Month
Day of the year Day
AOD product AOD Aod_avg unitless MISR 4.4 km
products Angstrom_Exponent_550_860nm Angs_exp_550_860_avg unitless
Absorption_Aerosol_Optical_Depth absorp_aod_avg unitless
Nonspherical_Aerosol_Optical_Depth nonsph_aod_avg unitless
Small_Mode_Aerosol_Optical_Depth small_aod_avg unitless
Medium_Mode_Aerosol_Optical_Depth medium_aod_avg unitless
Large_Mode_Aerosol_Optical_Depth large_aod_avg unitless
Meteorological
Variables
10m uwind u10 ms
-1
European
Centre for
Medium-
Range
Weather
Forecasts
(ECMWF)
ERA5 product
10m vwind v10 ms
-1
2m temperature t2m K
Boundary layer height blh m
Downward UV radiation at the surface uvb J m
-2
Evaporation e m of water
equivalent
Surface pressure sp Pa
Total cloud cover tcc (0-1)
Total precipitation tp m
Relative humidity at 1000pha r %
2m dewpoint temperature d2m K
Forecast albedo fal (0-1)
High cloud cover hcc (0-1)
High vegetation cover cvh (0-1)
Instantaneous 10-meter wind gust i10fg ms
-1
Low cloud cover lcc (0-1)
Low vegetation cover cvl (0-1)
Medium cloud cover mcc (0-1)
Ground Measured PM 2.5 measurement on the daily basis daily_avg μg m
−
3 Iran ground-
level PM 2.5
monitors
PM 2.5 measurement over 10am-13:00pm misrpass_avg μg m
−
3
15
Table 2. Description of Machine Learning Methods Applied to the AOD Products
Retrievals and Meteorological Variables to Predict PM2.5.
Method Description Tuning
GB Methods to improve prediction of weak decision trees by fitting a
model to the residuals and stopping after many iterations
Tuned for the number of trees, the
shrinkage by 10-fold cross validation.
RF Methods to generate uncorrelated trees by considering a random
subset at each split.
Tuned for the number of trees and the node
size.
SVM Regression with linear or non-linear kernels on the AOD products
and meteorological variables with minimized prediction errors by
tolerating soft margins of error.
Tuned for the parameter of the kernel (i.e.
linear kernel, polynomial kernel of degree
and radial basis function kernel), the soft
margin constant C and the epsilon.
16
Table 3. Characteristics of the Study Population.
Variables Mean Median SD IQR Pearson correlation
(with daily PM 2.5)
Pearson
correlation
(with
overpass
PM 2.5)
24-hr PM 2.5 39.67 34.63 22.01 22.07
3-hr PM 2.5 38.80 32.35 24.66 24.02
Elevation 1254 1213 174.47 203.16 0.06 0.04
AOD 0.28 0.24 0.14 0.16 0.19 0.17
Angstrom exponent 1.01 0.98 0.33 0.46 -0.20 -0.13
Absorbing AOD 0.01 0.01 0.01 0.01 0.01 0.04
Non-spherical AOD 0.02 0.00 0.05 0.03 0.21 0.21
AOD small 0.14 0.12 0.07 0.07 0.01 0.03
AOD medium 0.03 0.02 0.03 0.04 0.24 0.23
AOD large 0.11 0.09 0.09 0.09 0.19 0.16
Wind direction u -0.18 -0.09 1.09 1.24 0.11 0.10
Wind direction v 0.14 0.17 0.90 1.18 -0.09 -0.07
Temperature (2m) 294.2 295.2 7.15 9.35 0.15 0.12
Boundary Layer Height 832.6 815.9 314.92 403.55 0.16 0.13
UV radiation 110682 117443 21678 32709 0.06 0.03
Evapotranspiration 0.00 0.00 0.00 0.00 0.06 0.07
Wind speed 83397 83351 4856 8441 0.00 0.00
Total cloud cover 0.12 0.06 0.14 0.18 -0.02 -0.02
Total precipitation 0.00 0.00 0.00 0.00 0.00 0.02
Relative humidity 30.85 29.18 12.23 15.77 -0.09 -0.05
Dew point temperature (2m) 275.2 275.2 4.65 5.96 0.04 0.05
Forecast albedo 0.22 0.21 0.06 0.03 -0.04 -0.01
High cloud cover 0.07 0.01 0.12 0.07 -0.04 -0.06
High vegetation cover 0.08 0.05 0.10 0.09 0.35 0.30
Wind gust (10m) 5.64 5.36 1.54 2.02 0.22 0.21
Low cloud cover 0.02 0.02 0.06 0.01 -0.01 -0.01
Low vegetation cover 0.86 0.92 0.17 0.21 -0.24 -0.20
Medium cloud cover 0.06 0.02 0.08 0.09 0.00 0.03
Longitude 108.56 -1.63 270.59 10.31 -0.12 -0.01
Latitude 3937 3967 119.21 9.56 -0.11 0.01
Julian date 15574 15611 307.74 602 0.01 -0.01
Month 7.34 8.00 2.16 3 0.03 0.07
Day of Year 208.6 218.0 65.91 106 0.02 0.06
Table 4. Sample Sizes of Spatiotemporally Matched MISR Mixtures and Meteorological
Variables with Test R
2
for each Machine Learning Method and Pollutant. Largest R
2
for
each Pollutant is indicated in bold.
Pollutant Training(N) Testing(N) Total(N) GB RF SVM
Daily-averaged PM 2.5 420 181 601 0.619 0.516 0.192
Overpass-averaged PM 2.5 420 181 601 0.554 0.448 0.198
* The following abbreviations are used in this table: GB: Gradient Boosting; RF: Random Forest; SVM: Support
Vector Machines
17
Figures
Figure 1. Iran Study Region Showing 33 Ground-Level Air Pollution Monitoring Stations.
18
Figure 2. Iran Study Region Showing 23 Ground-level Air Pollution Monitoring Stations
Concentrated in Tehran.
19
Figure 3. Variable Importance based on Gradient Boosting Models: Daily Averaged PM2.5
Model (left); MISR Overpass Time Averaged PM2.5 Model (Right).
20
Figure 4. Predicted PM2.5 from Gradient Boosting over Tehran City Averaged over Year
2013 for Daily-Averaged PM2.5 (top) and MISR Overpass Time Averaged PM2.5
(Bottom).
Abstract (if available)
Abstract
Particulate matter air pollution with aerodynamic diameter less than 2.5 μm (PM₂.₅) has been associated with numerous detrimental health effects and is therefore of important concern for public health. Research efforts to better estimate and predict PM₂.₅ have recently incorporated satellite observations of aerosol optical depth (AOD) due to its spatial and temporal coverage, particularly in areas of the world such as the Middle East where there are limited ground-based monitoring networks. The Multiangle Imaging SpectroRadiometer (MISR) instrument onboard NASA’s Terra satellite was launched in late 1999 and provides operational AOD as well as AOD properties including information on particle size, shape, and absorption. Furthermore, at 4.4 km × 4.4 km, the spatial resolution of MISR’s Version 23 aerosol product is well suited for neighborhood-level health effects assessments. ❧ Leveraging 33 PM₂.₅ ground-monitoring locations across Iran we linked coincident MISR overpasses as well as gridded meteorological data including 10m wind components (u and v), temperature, boundary layer height, downward UV radiation, evaporation, surface pressure, cloud cover, precipitation and humidity, vegetation cover and dust. All data were hourly, so we examined both daily averages as well as averaged during the MISR overpass time (10:00-13:00). Three machine learning algorithms were used separately for prediction and compared: Gradient Boosting (GB), Random Forest (RF) and Support Vector Machines (SVM). Gradient Boosting shows the best prediction performance among the three methods with R² of 0.619 for daily-averaged PM₂.₅ and R² of 0.554 for the MISR overpass time averaged PM₂.₅. These results indicate that the 4.4 km MISR AOD product and meteorological variables can provide reliable predictions of PM₂.₅ over Iran.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Machine learning approaches for downscaling satellite observations of dust
PDF
Covariance-based distance-weighted regression for incomplete and misaligned spatial data
PDF
Uncertainty quantification in extreme gradient boosting with application to environmental epidemiology
PDF
Assessment of land cover change in Southern California from 2003 to 2011 using Landsat Thematic Mapper
PDF
Comparison of models for predicting PM2.5 concentration in Wuhan, China
PDF
Statistical downscaling with artificial neural network
PDF
Forecasting traffic volume using machine learning and kriging methods
PDF
Downscaling satellite observations of dust with deep learning
PDF
Inference correction in measurement error models with a complex dosimetry system
PDF
Spatial analysis of PM₂.₅ air pollution in association with hospital admissions in California
PDF
Hierarchical regularized regression for incorporation of external data in high-dimensional models
PDF
Prediction and feature selection with regularized regression in integrative genomics
PDF
Two-step testing approaches for detecting quantitative trait gene-environment interactions in a genome-wide association study
PDF
Cell-specific case studies of enhancer function prediction using machine learning
PDF
Analysis of factors associated with breast cancer using machine learning techniques
PDF
Machine learning-based breast cancer survival prediction
PDF
Spatial modeling of non-tailpipe emissions and its association with children's lung function
PDF
Characterization and discovery of genetic associations: multiethnic fine-mapping and incorporation of functional information
PDF
Nonlinear modeling and machine learning methods for environmental epidemiology
PDF
Assessment of the mortality burden associated with ambient air pollution in rural and urban areas of India
Asset Metadata
Creator
Zhang, Yifang
(author)
Core Title
Using multi-angle imaging spectroradiometer aerosol mixture properties and meteorology for PM₂.₅ assessment in Iran
School
Keck School of Medicine
Degree
Master of Science
Degree Program
Biostatistics
Defense Date
05/06/2020
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
aerosol types,Air pollution,machine learning,MISR,OAI-PMH Harvest,particulate matter
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Franklin, Meredith (
committee chair
), Gauderman, William (
committee member
), Lewinger, Juan Pablo (
committee member
)
Creator Email
yifangz@usc.edu,yifangz29@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-299567
Unique identifier
UC11663371
Identifier
etd-ZhangYifan-8459.pdf (filename),usctheses-c89-299567 (legacy record id)
Legacy Identifier
etd-ZhangYifan-8459.pdf
Dmrecord
299567
Document Type
Thesis
Rights
Zhang, Yifang
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
aerosol types
machine learning
MISR
particulate matter