×

Online monitoring of air quality using PCA-based sequential learning. (English) Zbl 07832661

Summary: Air pollution surveillance is critically important for public health. One air pollutant, ozone, is extremely challenging to analyze properly, as it is a secondary pollutant caused by complex chemical reactions in the air and does not emit directly into the atmosphere. Numerous environmental studies confirm that ozone concentration levels are associated with meteorological conditions, and long-term exposure to high ozone concentration levels is associated with the incidence of many diseases, including asthma, respiratory, and cardiovascular diseases. Thus, it is important to develop an air pollution surveillance system to collect both air pollution and meteorological data and monitor the data continuously over time. To this end, statistical process control (SPC) charts provide a major statistical tool. But most existing SPC charts are designed for cases when the in-control (IC) process observations at different times are assumed to be independent and identically distributed. The air pollution and meteorological data would not satisfy these conditions due to serial data correlation, high dimensionality, seasonality, and other complex data structure. Motivated by an application to monitor the ground ozone concentration levels in the Houston-Galveston-Brazoria (HGB) area, we developed a new process monitoring method using principal component analysis and sequential learning. The new method can accommodate high dimensionality, time-varying IC process distribution, serial data correlation, and nonparametric data distribution. It is shown to be a reliable analytic tool for online monitoring of air quality.

MSC:

62Pxx Applications of statistics
Full Text: DOI

References:

[1] ABDUL-WAHAB, S. A., BAKHEIT, C. S. and AL-ALAWI, S. M. (2005). Principal component and multiple regression analysis in modelling of ground-level ozone and factors affecting its concentrations. Environ. Model. Softw. 20 1263-1271.
[2] ALAVA, J. J. and SINGH, G. G. (2022). Changing air pollution and CO2 emissions during the COVID-19 pandemic: Lesson learned and future equity concerns of post-COVID recovery. Environ. Sci. Policy 130 1-8.
[3] ALTMAN, N. S. (1990). Kernel smoothing of data with correlated errors. J. Amer. Statist. Assoc. 85 749-759. MathSciNet: MR1138355
[4] APLEY, D. W. and TSUNG, F. (2002). The autoregressive t2 chart for monitoring univariate autocorrelated processes. J. Qual. Technol. 34 80-96.
[5] BARUA, S. and NATH, S. D. (2021). The impact of COVID-19 on air pollution: Evidence from global data. J. Clean. Prod. 298 126755. Digital Object Identifier: 10.1016/j.jclepro.2021.126755 Google Scholar: Lookup Link · doi:10.1016/j.jclepro.2021.126755
[6] BERAN, J. (1992). Statistical methods for data with long-range dependence. Statist. Sci. 7 404-416.
[7] CAPIZZI, G. and MASAROTTO, G. (2008). Practical design of generalized likelihood ratio control charts for autocorrelated data. Technometrics 50 357-370. Digital Object Identifier: 10.1198/004017008000000280 Google Scholar: Lookup Link MathSciNet: MR2528658 · doi:10.1198/004017008000000280
[8] CAPIZZI, G. and MASAROTTO, G. (2011). A least angle regression control chart for multidimensional data. Technometrics 53 285-296. Digital Object Identifier: 10.1198/TECH.2011.10027 Google Scholar: Lookup Link MathSciNet: MR2867502 · doi:10.1198/TECH.2011.10027
[9] CAREY, I. M., ATKINSON, R. W., KENT, A. J., VAN STAA, T., COOK, D. G. and ANDERSON, H. R. (2013). Mortality associations with long-term exposure to outdoor air pollution in a national English cohort. Am. J. Respir. Crit. Care Med. 187 1226-1233.
[10] CHATTERJEE, S. and QIU, P. (2009). Distribution-free cumulative sum control charts using bootstrap-based control limits. Ann. Appl. Stat. 3 349-369. Digital Object Identifier: 10.1214/08-AOAS197 Google Scholar: Lookup Link MathSciNet: MR2668711 · Zbl 1160.62095 · doi:10.1214/08-AOAS197
[11] CHICKEN, E., PIGNATIELLO, J. J. and SIMPSON, J. R. (2009). Statistical process monitoring of nonlinear profiles using wavelets. J. Qual. Technol. 41 198-212.
[12] CROSIER, R. B. (1988). Multivariate generalizations of cumulative sum quality-control schemes. Technometrics 30 291-303. Digital Object Identifier: 10.2307/1270083 Google Scholar: Lookup Link MathSciNet: MR0959530 · Zbl 0651.62095 · doi:10.2307/1270083
[13] DE BRABANTER, K., DE BRABANTER, J., SUYKENS, J. A. K. and DE MOOR, B. (2011). Kernel regression in the presence of correlated errors. J. Mach. Learn. Res. 12 1955-1976. MathSciNet: MR2819023 zbMATH: 1280.62046 · Zbl 1280.62046
[14] DE KETELAERE, B., HUBERT, M. and SCHMITT, E. (2015). Overview of PCA-based statistical process-monitoring methods for time-dependent, high-dimensional data. J. Qual. Technol. 47 318-335.
[15] DONG, Y. and QIN, S. J. (2018). A novel dynamic PCA algorithm for dynamic data modeling and process monitoring. J. Process Control 67 1-11.
[16] DRAXLER, R. R. (2000). Meteorological factors of ozone predictability at Houston, Texas. J. Air Waste Manage. Assoc. 50 259-271.
[17] ENVIRONMENTAL PROTECTION AGENCY (1999). Guideline for Developing an Ozone Forecasting Program. Environmental Protection Agency, Washington.
[18] EPANECHNIKOV, V. A. (1969). Non-parametric estimation of a multivariate probability density. Theory Probab. Appl. 14 153-158. MathSciNet: MR0250422 · Zbl 0194.50001
[19] FERRER, A. (2007). Multivariate statistical process control based on principal component analysis (MSPC-PCA): Some reflections and a case study in an autobody assembly process. Qual. Eng. 19 311-325.
[20] GORAI, A. K., TULURI, F., TCHOUNWOU, P. B. and AMBINAKUDIGE, S. (2015). Influence of local meteorology and NO2 conditions on ground-level ozone concentrations in the eastern part of Texas, USA. Air Qual. Atmos. Health 8 81-96.
[21] HAWKINS, D. M. (1987). Self-starting cusum charts for location and scale. J. R. Stat. Soc., Ser. D, Stat. 36 299-316.
[22] HAWKINS, D. M., QIU, P. and KANG, C. W. (2003). The changepoint model for statistical process control. J. Qual. Technol. 35 355-366.
[23] HEALTH EFFECTS INSTITUTE (2019). State of Global Air 2019. Health Effects Institute, Boston, MA.
[24] HOTELLING, H. (1947). Multivariate quality control. In Techniques of Statistical Analysis (C. Eisenhart, M. Hastay and W. A. Wallis, eds.) 111-184. McGraw Hill.
[25] JACKSON, J. E. (1991). A User’s Guide to Principal Components. Wiley, New York. · Zbl 0743.62047
[26] JACOB, D. J. and WINNER, D. A. (2009). Effect of climate change on air quality. Atmos. Environ. 43 51-63.
[27] JENKIN, M. E. and CLEMITSHAW, K. C. (2000). Ozone and other secondary photochemical pollutants: Chemical processes governing their formation in the planetary boundary layer. Atmos. Environ. 34 2499-2527.
[28] JOHNSON, R. A. and WICHERN, D. W. (2007). Applied Multivariate Statistical Analysis, 6th ed. Pearson Prentice Hall, Upper Saddle River, NJ. MathSciNet: MR2372475 · Zbl 1269.62044
[29] KNOTH, S., SALEH, N. A., MAHMOUD, M. A., WOODALL, W. H. and TERCERO-GÓMEZ, V. G. (2023). A critique of a variety of “memory-based” process monitoring methods. J. Qual. Technol. 55 18-42.
[30] KNOTH, S. and SCHMID, W. (2004). Control charts for time series: A review. In Frontiers in Statistical Quality Control 7. Front. Stat. Qual. Control 210-236. Physica, Heidelberg. MathSciNet: MR2071564
[31] KNOTH, S., TERCERO-GÓMEZ, V. G., KHAKIFIROOZ, M. and WOODALL, W. H. (2021). The impracticality of homogeneously weighted moving average and progressive mean control chart approaches. Qual. Reliab. Eng. Int. 37 3779-3794.
[32] KOURTI, T. and MACGREGOR, J. F. (1996). Multivariate SPC methods for process and product monitoring. J. Qual. Technol. 28 409-428.
[33] KU, W., STORER, R. H. and GEORGAKIS, C. (1995). Disturbance detection and isolation by dynamic principal component analysis. Chemom. Intell. Lab. Syst. 30 179-196.
[34] LENNOX, B., MONTAGUE, G. A., HIDEN, H. G., KORNFELD, G. and GOULDING, P. R. (2001). Process monitoring of an industrial fed-batch fermentation. Biotechnol. Bioeng. 74 125-135. Digital Object Identifier: 10.1002/bit.1102 Google Scholar: Lookup Link · doi:10.1002/bit.1102
[35] LI, G., QIN, S. J. and ZHOU, D. (2014). A new method of dynamic latent-variable modeling for process monitoring. IEEE Trans. Ind. Electron. 61 6438-6445.
[36] LI, W., XIANG, D., TSUNG, F. and PU, X. (2020). A diagnostic procedure for high-dimensional data streams via missed discovery rate control. Technometrics 62 84-100. Digital Object Identifier: 10.1080/00401706.2019.1575284 Google Scholar: Lookup Link MathSciNet: MR4058601 · doi:10.1080/00401706.2019.1575284
[37] LIU, Y., ZHOU, Y. and LU, J. (2020). Exploring the relationship between air pollution and meteorological conditions in China under environmental governance. Sci. Rep. 10 14518. Digital Object Identifier: 10.1038/s41598-020-71338-7 Google Scholar: Lookup Link · doi:10.1038/s41598-020-71338-7
[38] LOWRY, C. A., WOODALL, W. H., CHAMP, C. W. and RIGDON, S. E. (1992). A multivariate exponentially weighted moving average control chart. Technometrics 34 46-53. · Zbl 0761.62144
[39] MEI, Y. (2010). Efficient scalable schemes for monitoring a large number of data streams. Biometrika 97 419-433. Digital Object Identifier: 10.1093/biomet/asq010 Google Scholar: Lookup Link MathSciNet: MR2650748 · Zbl 1406.62088 · doi:10.1093/biomet/asq010
[40] MONTGOMERY, D. C. (2012). Introduction to Statistical Quality Control. Wiley, New York.
[41] NOOROSSANA, R., SAGHAEI, A. and AMIRI, A. (2011). Statistical Analysis of Profile Monitoring. Chapman Hall/CRC, Boca Raton, FL.
[42] OPSOMER, J., WANG, Y. and YANG, Y. (2001). Nonparametric regression with correlated errors. Statist. Sci. 16 134-153. Digital Object Identifier: 10.1214/ss/1009213287 Google Scholar: Lookup Link MathSciNet: MR1861070 · Zbl 1059.62537 · doi:10.1214/ss/1009213287
[43] ORDÓÑEZ, C., MATHIS, H., FURGER, M., HENNE, S., HÜGLIN, C., STAEHELIN, J. and PRÉVÔT, A. S. H. (2005). Changes of daily surface ozone maxima in Switzerland in all seasons from 1992 to 2002 and discussion of summer 2003. Atmos. Chem. Phys. 5 1187-1203.
[44] Page, E. S. (1954). Continuous inspection schemes. Biometrika 41 100-115. Digital Object Identifier: 10.1093/biomet/41.1-2.100 Google Scholar: Lookup Link MathSciNet: MR0088850 · Zbl 0056.38002 · doi:10.1093/biomet/41.1-2.100
[45] PSARAKIS, S. and PAPALEONIDA, G. E. A. (2007). SPC procedures for monitoring autocorrelated processes. Qual. Technol. Quant. Manag. 4 501-540. Digital Object Identifier: 10.1080/16843703.2007.11673168 Google Scholar: Lookup Link MathSciNet: MR2422803 · doi:10.1080/16843703.2007.11673168
[46] QIU, P. (2014). Introduction to Statistical Process Control. Chapman Hall/CRC, Boca Raton, FL.
[47] QIU, P. (2018). Some perspectives on nonparametric statistical process control. J. Qual. Technol. 50 49-65.
[48] QIU, P., LI, W. and LI, J. (2020). A new process control chart for monitoring short-range serially correlated data. Technometrics 62 71-83. Digital Object Identifier: 10.1080/00401706.2018.1562988 Google Scholar: Lookup Link MathSciNet: MR4058600 · doi:10.1080/00401706.2018.1562988
[49] QIU, P. and XIANG, D. (2014). Univariate dynamic screening system: An approach for identifying individuals with irregular longitudinal behavior. Technometrics 56 248-260. Digital Object Identifier: 10.1080/00401706.2013.822423 Google Scholar: Lookup Link MathSciNet: MR3207851 · doi:10.1080/00401706.2013.822423
[50] QIU, P. and XIE, X. (2022). Transparent sequential learning for statistical process control of serially correlated data. Technometrics 64 487-501. Digital Object Identifier: 10.1080/00401706.2021.1929493 Google Scholar: Lookup Link MathSciNet: MR4506615 · doi:10.1080/00401706.2021.1929493
[51] QIU, P., ZOU, C. and WANG, Z. (2010). Nonparametric profile monitoring by mixed effects modeling. Technometrics 52 265-277. Digital Object Identifier: 10.1198/TECH.2010.08188 Google Scholar: Lookup Link MathSciNet: MR2723706 · doi:10.1198/TECH.2010.08188
[52] ROBERTS, S. W. (1959). Control chart tests based on geometric moving averages. Technometrics 1 239-250.
[53] SEXTON, K. and LINDER, S. H. (2015). Houston’s novel strategy to control hazardous air pollutants: A case study in policy innovation and political stalemate. Environ. Health Insights 9 1-12. Digital Object Identifier: 10.4137/EHI.S15670 Google Scholar: Lookup Link · doi:10.4137/EHI.S15670
[54] SHEWHART, W. A. (1931). Economic Control of Quality of Manufactured Product. D. Van Nostrand Company, New York.
[55] STATHEROPOULOS, M., VASSILIADIS, N. and PAPPA, A. (1998). Principal component and canonical correlation analysis for examining air pollution and meteorological data. Atmos. Environ. 32 1087-1095.
[56] SUN, W., PALAZOGLU, A., SINGH, A., ZHANG, H., WANG, Q., ZHAO, Z. and CAO, D. (2015). Prediction of surface ozone episodes using clusters based generalized linear mixed effects models in Houston-Galveston-Brazoria area, Texas. Atmos. Pollut. Res. 6 245-253.
[57] TARTAKOVSKY, A. G., ROZOVSKII, B. L., BLAŽEK, R. B. and KIM, H. (2006). Detection of intrusions in information systems by sequential change-point methods. Stat. Methodol. 3 252-293. Digital Object Identifier: 10.1016/j.stamet.2005.05.003 Google Scholar: Lookup Link MathSciNet: MR2240956 · Zbl 1248.94032 · doi:10.1016/j.stamet.2005.05.003
[58] TSUNG, F. (2000). Statistical monitoring and diagnosis of automatic controlled processes using dynamic PCA. Int. J. Prod. Res. 38 625-637. · Zbl 0944.90519
[59] VANHATALO, E. and KULAHCI, M. (2016). Impact of autocorrelation on principal components and their use in statistical process control. Qual. Reliab. Eng. Int. 32 1483-1500.
[60] VENTER, Z., AUNAN, K., CHOWDHURY, S. and LELIEVELD, J. (2020). COVID-19 lockdowns cause global air pollution declines. Proc. Natl. Acad. Sci. USA 117 18984-18990.
[61] WANG, K. and JIANG, W. (2009). High-dimensional process monitoring and fault isolation via variable selection. J. Qual. Technol. 41 247-258.
[62] WANG, X., KRUGER, U. and IRWIN, G. W. (2005). Process monitoring approach using fast moving window PCA. Ind. Eng. Chem. Res. 44 5691-5702.
[63] Weng, J., Zhang, Y. and Hwang, W.-S. (2003). Candid covariance-free incremental principal component analysis. IEEE Trans. Pattern Anal. Mach. Intell. 25 1034-1040.
[64] WORLD HEALTH ORGANIZATION (1976). Photochemical Oxidants: Environmental Health Criteria 7. World Health Organization, Geneva.
[65] XIANG, D., QIU, P. and PU, X. (2013). Nonparametric regression analysis of multivariate longitudinal data. Statist. Sinica 23 769-789. MathSciNet: MR3086655 · Zbl 1433.62113
[66] XIANG, D., QIU, P., WANG, D. and LI, W. (2022). Reliable post-signal fault diagnosis for correlated high-dimensional data streams. Technometrics 64 323-334. Digital Object Identifier: 10.1080/00401706.2021.1979100 Google Scholar: Lookup Link MathSciNet: MR4457326 · doi:10.1080/00401706.2021.1979100
[67] XIE, X., QIAN, N. and QIU, P. (2024). Supplement to “Online monitoring of air quality using PCA-based sequential learning.” https://doi.org/10.1214/23-AOAS1803SUPP
[68] XIE, X. and QIU, P. (2023). Control charts for dynamic process monitoring with an application to air pollution surveillance. Ann. Appl. Stat. 17 47-66. Digital Object Identifier: 10.1214/22-aoas1615 Google Scholar: Lookup Link MathSciNet: MR4539021 · Zbl 07656966 · doi:10.1214/22-aoas1615
[69] YANG, K. and QIU, P. (2018). Spatiotemporal incidence rate data analysis by nonparametric regression. Stat. Med. 37 2094-2107. Digital Object Identifier: 10.1002/sim.7622 Google Scholar: Lookup Link MathSciNet: MR3802936 · doi:10.1002/sim.7622
[70] ZHANG, K. and FAN, W. (2008). Forecasting skewed biased stochastic ozone days: Analyses, solutions and beyond. Knowl. Inf. Syst. 14 299-326.
[71] ZHAO, X., ZHANG, X., XU, X., XU, J., MENG, W. and PU, W. (2009). Seasonal and diurnal variations of ambient PM2.5 concentration in urban and rural environments in Beijing. Atmos. Environ. 43 2893-2900.
[72] ZOU, C. and QIU, P. (2009). Multivariate statistical process control using LASSO. J. Amer. Statist. Assoc. 104 1586-1596. Digital Object Identifier: 10.1198/jasa.2009.tm08128 Google Scholar: Lookup Link MathSciNet: MR2750580 · Zbl 1205.62214 · doi:10.1198/jasa.2009.tm08128
[73] ZOU, C., WANG, Z., JIANG, W. and ZI, X. (2015). An efficient online monitoring method for high-dimensional data streams. Technometrics 57 374-387. Digital Object Identifier: 10.1080/00401706.2014.940089 Google Scholar: Lookup Link MathSciNet: MR3384952 · doi:10.1080/00401706.2014.940089
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.