×

Data adaptive functional outlier detection: analysis of the Paris bike sharing system data. (English) Zbl 07814107

Summary: Bike sharing systems (BSSs) have become an increasingly popular means of sustainable transportation, and have been implemented in many cities worldwide. Our approach contributes to the identification of abnormal patterns by applying real-time occupancy data from Paris. In particular, we propose a novel functional outlier detection algorithm based on a two-step approach: In the first stage, a clean dataset is obtained based on the combined effect of two extreme statistics calculated from random sampling; in the second stage, a multiple testing approach based on the clean dataset is proposed, in which the false discovery rate (FDR) control procedure is used to adaptively choose the thresholds for the hypothesis tests. Extensive numerical simulations were conducted to compare the outlier detection performance with those of other state-of-art methods. The proposed approach is then applied to the Paris Vélib’ bike sharing system dataset to identify abnormal patterns that are of particular interest to BSS operators for identifying system inefficiencies and update policies.

MSC:

62-XX Statistics
93-XX Systems theory; control
Full Text: DOI

References:

[1] Mennatallah Amer, Markus Goldstein, Nearest-neighbor and clustering based anomaly detection algorithms for rapidminer, in: Procssdings of the 3rd RapidMiner Community Meeting and Conference (RCOMM 2012), 2012, pp. 1-12.
[2] Angiulli, Fabrizio; Pizzuti, Clara, Fast outlier detection in high dimensional spaces, (European conference on principles of data mining and knowledge discovery (2002), Springer), 15-27 · Zbl 1020.68527
[3] Arribas-Gil, Ana; Romo, Juan, Shape outlier detection and visualization for functional data: the outliergram, Biostatistics, 15, 4, 603-619 (2014)
[4] Anthony Bagnall, Jason Lines, William Vickers, Eamonn Keogh, The UEA & UCR time series classification repository.http://www.timeseriesclassification.com, 2018.
[5] Barnett, Vic; Lewis, Toby, Outliers in statistical data (1984), Wiley Series in Probability and Mathematical Statistics. Applied Probability and Statistics · Zbl 0638.62002
[6] Bouveyron, Charles; Côme, Etienne; Jacques, Julien, The discriminative functional mixture model for a comparative analysis of bike sharing systems, Ann. Appl. Stat., 9, 4, 1726-1760 (2015) · Zbl 1397.62511
[7] Breunig, Markus M.; Kriegel, Hans-Peter; Ng, Raymond T.; Sander, Jörg, Lof: identifying density-based local outliers, (Proceedings of the 2000 ACM SIGMOD international conference on Management of data (2000)), 93-104
[8] Cerioli, Andrea, Multivariate outlier detection with high-breakdown estimators, J. Am. Stat. Assoc., 105, 489, 147-156 (2010) · Zbl 1397.62167
[9] Wenlin Dai, Marc G. Genton, Directional outlyingness for multivariate functional data, Comput. Stat. Data Anal. 131 (2019) 50-65. ISSN 0167-9473. doi: 10.1016/j.csda.2018.03.017. · Zbl 1471.62049
[10] Rupam Deb, Alan Wee-Chung Liew, Noisy values detection and correction of traffic accident data, Inf. Sci. 476 (2019) 132-146. ISSN 0020-0255. doi: 10.1016/j.ins.2018.10.002.
[11] Febrero, Manuel; Galeano, Pedro; González-Manteiga, Wenceslao, A functional analysis of NOx levels: location and scale estimation and outlier detection, Comput. Stat., 22, 3, 411-427 (2007) · Zbl 1197.62154
[12] Febrero, Manuel; Galeano, Pedro; González-Manteiga, Wenceslao, Outlier detection in functional data by depth measures, with application to identify abnormal NOx levels, Environmetrics, 19, 4, 331-345 (2008)
[13] Fei, T. L.; Kai, M. T.; Zhou, Z. H., Isolation forest, (IEEE International Conference on Data Mining (2008))
[14] Fischer, Aurélie, On the number of groups in clustering, Stat. Prob. Lett., 81, 12, 1771-1781 (2011) · Zbl 1225.62083
[15] Godichon-Baggioni, Antoine; Maugis-Rabusseau, Cathy; Rau, Andrea, Clustering transformed compositional data using k-means, with applications in gene expression and bicycle sharing system data, J. Appl. Stat., 46, 1, 47-65 (2019) · Zbl 1516.62301
[16] Grubbs, Frank E., Procedures for detecting outlying observations in samples, Technometrics, 11, 1, 1-21 (1969)
[17] Trevor Harris, J.; Tucker, Derek; Li, Bo; Shand, Lyndsay, Elastic depths for detecting shape anomalies in functional data. Technometrics, Technometrics, 1-11 (2020)
[18] Peilan He, Guiyuan Jiang, Siew-Kei Lam, Yidan Sun, Learning heterogeneous traffic patterns for travel time prediction of bus journeys, Inf. Sci. 512 (2020) 1394-1406. ISSN 0020-0255. doi: 10.1016/j.ins.2019.10.073.
[19] He, Yong; Zhang, Xinsheng; Wang, Pingping; Zhang, Liwen, High dimensional Gaussian copula graphical model with FDR control, Comput. Stat. Data Anal., 113, 457-474 (2017) · Zbl 1464.62089
[20] He, Zengyou; Xiaofei, Xu.; Deng, Shengchun, Discovering cluster-based local outliers, Pattern Recogn. Lett., 24, 9-10, 1641-1650 (2003) · Zbl 1048.68084
[21] Huang, Huang; Sun, Ying, A decomposition of total variation depth for understanding functional outliers, Technometrics, 61, 4, 445-458 (2019)
[22] Hyndman, Rob J.; Ullah, Md Shahid, Robust forecasting of mortality and fertility rates: a functional data approach, Comput. Stat. Data Anal., 51, 10, 4942-4956 (2007) · Zbl 1162.62434
[23] Edwin M. Knorr, Raymond T. Ng, A unified notion of outliers: Properties and computation, in: KDD, vol. 97, 1997, pp. 219-222.
[24] Liu, Shentai; Qin, Zhida; Gan, Xiaoying; Wang, Zhen, Scod: A novel semi-supervised outlier detection framework, (2019 IEEE/CIC International Conference on Communications in China (ICCC) (2019)), 316-321
[25] López-Pintado, Sara; Romo, Juan, On the concept of depth for functional data, J. Am. Stat. Assoc., 104, 486, 718-734 (2009) · Zbl 1388.62139
[26] Olszewski, Robert T.; Faloutsos, Christos; Dot, David Banks, Generalized feature extraction for structural pattern recognition in time-series data, (Time-Series Data (2001), Carnagie Mellon University), (Technical report)
[27] Primus, P.; Haunschmid, V.; Praher, P.; Widmer, G., Anomalous sound detection as a simple binary classification problem with careful selection of proxy outlier examples, (Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2020) (2020))
[28] Rayana, Shebuti; Zhong, Wen; Akoglu, Leman, Sequential ensemble learning for outlier detection: A bias-variance perspective, (2016 IEEE 16th International Conference on Data Mining (ICDM) (2016)), 1167-1172
[29] Ren, Haojie; Chen, Nan; Zou, Changliang, Projection-based outlier detection in functional data, Biometrika, 104, 2, 411-423 (2017) · Zbl 1506.62547
[30] Chengyuan Sun, Yizhen Yin, Haobo Kang, Hongjun Ma, A distributed principal component regression method for quality-related fault detection and diagnosis, Inf. Sci. 600 (2022) 301-322. ISSN 0020-0255. doi: 10.1016/j.ins.2022.03.069. · Zbl 1533.93529
[31] Lirong Sun, Kaili Wang, Lini Xu, Chonghui Zhang, Tomas Balezentis, A time-varying distance based interval-valued functional principal component analysis method – a case study of consumer price index, Inf. Sci. 589 (2022) 94-116. ISSN 0020-0255. doi: 10.1016/j.ins.2021.12.113.
[32] Sun, Ying; Genton, Marc G., Functional boxplots, J. Comput. Graph. Stat., 20, 2, 316-334 (2011)
[33] Tang, Jian; Chen, Zhixiang; Fu, Ada Wai-Chee; Cheung, David W., Enhancing effectiveness of outlier detections for low density patterns, (Pacific-Asia Conference on Knowledge Discovery and Data Mining (2002), Springer), 535-548 · Zbl 1048.68925
[34] Fei Teng, Jian Teng, Lu Qiao, Shengdong Du, Tianrui Li, A multi-step forecasting model of online car-hailing demand, Inf. Sci. 587 (2022) 572-586. ISSN 0020-0255. doi: 10.1016/j.ins.2021.12.044.
[35] Ullah, Shahid; Finch, Caroline F., Applications of functional data analysis: A systematic review, BMC Med. Res. Methodol., 13, 1, 1-12 (2013)
[36] Vinue, Guillermo; Epifanio, Irene, Robust archetypoids for anomaly detection in big functional data, Adv. Data Anal. Classif., 15, 2, 437-462 (2021) · Zbl 07363880
[37] Wang, X.; Wang, H.; Wang, Y., A density weighted fuzzy outlier clustering approach for class imbalanced learning, Neural Comput. Appl., 32, 5 (2020)
[38] Yin Xia, T.; Cai, Tony; Li, Hongzhe, Joint testing and false discovery rate control in high-dimensional multivariate regression, Biometrika, 105, 2, 249-269 (2018) · Zbl 07072411
[39] Xie, Weiyi; Kurtek, Sebastian; Bharath, Karthik; Sun, Ying, A geometric approach to visualization of variability in functional data, J. Am. Stat. Assoc., 112, 519, 979-993 (2017)
[40] Fang Yao, Hans Georg Mueller, Jane Ling Wang, Functional data analysis for sparse longitudinal data, J. Am. Stat. Assoc. 100(470) (2005) 577-590. · Zbl 1117.62451
[41] Guan, Yu.; Zou, Changliang; Wang, Zhaojun, Outlier detection in functional observations with applications to profile monitoring, Technometrics, 54, 3, 308-318 (2012)
[42] Junlong Zhao, Chao Liu, Lu Niu, Chenlei Leng, Multiple influential point detection in high dimensional regression spaces, J. R. Stat. Soc.: Ser. B (Statistical Methodology) 81(2) (2019) 385-408. · Zbl 1420.62301
[43] Jia Zhu, Changqin Huang, Min Yang, Gabriel Pui Cheong Fung, Context-based prediction for road traffic state using trajectory pattern mining and recurrent convolutional neural networks, Inf. Sci. 473 (2019) 190-201. ISSN 0020-0255. doi: 10.1016/j.ins.2018.09.029.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.