×

Comparison of several linear statistical models to predict tropospheric ozone concentrations. (English) Zbl 1431.62520

Summary: This study aims to evaluate the performance of five linear statistical models in the prediction of the next-day hourly average ozone concentrations. The selected models are as follows: (i) multiple linear regression, (ii) principal component regression, (iii) independent component regression (ICR), (iv) quantile regression (QR) and (v) partial least squares regression (PLSR). As far as it has been known, no study comparing the performance of these five linear models for predicting tropospheric ozone concentrations has been presented. Moreover, it is the first time that ICR is applied with this aim. The considered ozone predictors are meteorological data (hourly averages of temperature, relative humidity and wind speed) and environmental data (hourly average concentrations of sulphur dioxide, carbon monoxide, nitrogen oxide, nitrogen dioxide and ozone) of the previous day collected at an urban site with traffic influences. The analysed periods were May and June 2003. The QR model, which tries to model the entire distribution of the O\(_3\) concentrations, presents a better performance in the training step, because it tries to model the entire distribution of the O\(_3\) concentrations. However, it presents worst predictions in the test step. This means that a new procedure that is better than the one applied (\(k\)-nearest neighbours algorithm) and can estimate the percentiles of the output variable in the test data set with more precision should be found. From the five statistical models tested in this study, the PLSR model presents the best predictions of the tropospheric ozone concentrations.

MSC:

62P12 Applications of statistics to environmental and related topics
62J05 Linear regression; mixed models
Full Text: DOI

References:

[1] DOI: 10.1016/j.atmosenv.2007.10.041 · doi:10.1016/j.atmosenv.2007.10.041
[2] DOI: 10.1016/j.chemosphere.2004.03.017 · doi:10.1016/j.chemosphere.2004.03.017
[3] DOI: 10.1016/S1364-8152(03)00136-1 · doi:10.1016/S1364-8152(03)00136-1
[4] DOI: 10.1016/S0048-9697(02)00251-6 · doi:10.1016/S0048-9697(02)00251-6
[5] DOI: 10.1016/j.chemosphere.2004.07.043 · doi:10.1016/j.chemosphere.2004.07.043
[6] DOI: 10.1002/env.916 · doi:10.1002/env.916
[7] DOI: 10.1016/j.envsoft.2004.07.008 · doi:10.1016/j.envsoft.2004.07.008
[8] DOI: 10.1016/j.envsoft.2007.04.012 · doi:10.1016/j.envsoft.2007.04.012
[9] DOI: 10.1016/S0950-3293(99)00048-8 · doi:10.1016/S0950-3293(99)00048-8
[10] Eberly L. E., Methods in Molecular Biology, Vol. 404: Topics in Biostatistics pp 165– (2007)
[11] DOI: 10.1016/j.atmosenv.2007.10.044 · doi:10.1016/j.atmosenv.2007.10.044
[12] DOI: 10.1016/j.ecolmodel.2004.06.043 · doi:10.1016/j.ecolmodel.2004.06.043
[13] Harman H. H., Modern Factor Analysis, 3. ed. (1976)
[14] DOI: 10.1016/S0893-6080(00)00026-5 · doi:10.1016/S0893-6080(00)00026-5
[15] Abdi H., Encyclopedia of Social Sciences Research Methods (2003)
[16] DOI: 10.1016/S0169-7439(01)00155-1 · doi:10.1016/S0169-7439(01)00155-1
[17] DOI: 10.2307/1913643 · Zbl 0373.62038 · doi:10.2307/1913643
[18] DOI: 10.1016/j.stamet.2005.09.008 · Zbl 1248.62113 · doi:10.1016/j.stamet.2005.09.008
[19] DOI: 10.1016/S1352-2310(99)00359-3 · doi:10.1016/S1352-2310(99)00359-3
[20] DOI: 10.3844/ajessp.2008.445.453 · doi:10.3844/ajessp.2008.445.453
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.