×

Evolutionary optimisation of large-scale activity clustering with increased automation. (English) Zbl 1520.90167

Summary: In understanding travel demand, research often relies on global positioning system (GPS) data, which is becoming ubiquitous. The spatiotemporal nature of the GPS data allows for studying the behaviour at rich and detailed levels. Some of the steps in the data processing frequently rely on unsupervised machine learning techniques such as activity clustering and complex network analysis. Given the scale of the research phenomena in terms of the data volumes and geospatial extent, it is surprising that current processes still require much human intervention. The lack of automation and large scale data processing remains a challenge to routine decision support and more flexible research application. The work reported here contributes towards both these aspects. Through the use of a configurable, large-scale evolutionary algorithm, applying an expert-provided gold standard measure of expected clustering behaviour, a near-optimal clustering configuration is achieved while avoiding over-fitting. The optimisation workflow provides improved automation for repeatable and reproducible results. The workflow is even adaptable to clustering experiments with different optimisation configurations.

MSC:

90B99 Operations research and management science

Software:

TPOT; QGIS; Genocop; NSGA-II
Full Text: DOI

References:

[1] Albanna, B.H., Moawad, I.F., Moussa, S.M., Sakr, M.A., 2015. Semantic Trajectories: A Survey from Modeling to Application. In: Information Fusion and Geographic Information Systems. IF&GIS’2015, pp. 59-76.
[2] Alvarez-Garcia, J. A.; Ortega, J. A.; Gonzalez-Abril, L.; Velasco, F., Trip destination prediction based on past GPS log using a hidden Markov model, Expert Syst. Appl., 37, 8166-8171 (2010)
[3] Biju, G. S.; Anilkumar, A. K., An integrated genetic algorithm with clone operator, Int. J. Pure Appl. Math. Sci., 9, 2, 145-164 (2016)
[4] Birch, C. P.D.; Oom, S. P.; Beecham, J. A., Rectangular and hexagonal grids used for observation, experiment and simulation in ecology, Ecol. Model., 206, 206, 347-359 (2007)
[5] Cich, G.; Knapen, L.; Bellemans, T.; Janssens, D.; Wets, G., Threshold settings for TRIP/STOP detection in GPS traces, J. Ambient Intell. Humaniz. Comput., 7, 3, 395-413 (2016)
[6] Clementini, E., Cohn, A.G., 2014. RCC*-9 and CBM*. In: International Conference on Geographic Information Science, Vol. 8728. pp. 349-365.
[7] Coello, C. A., An updated survey of GA-based multiobjective optimization techniques, ACM Comput. Surv., 32, 2, 109-143 (2000)
[8] Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T., A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Trans. Evol. Comput., 6, 2, 182-197 (2002)
[9] Ester, M., Kriegel, H.-P., Sander, J., Xu, X., 1996. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. In: KDD’96, pp. 226-231.
[10] Feurer, M.; Hutter, F., Hyperparameter optimization, (Automated Machine Learning (2019), Springer), 3-33
[11] Fogel, D. B., An introduction to simulated evolutionary optimisation, IEEE Trans. Neural Netw., 5, 3-14 (1994)
[12] Hales, T., The honeycomb conjecture, Discrete Comput. Geom., 25, 25, 1-22 (2001) · Zbl 1007.52008
[13] Hu, M.-K., Visual pattern recognition by moment invariants, IRE Trans. Inf. Theory, 8, 2, 179-187 (1962) · Zbl 0102.13304
[14] Jaderberg, M.; Dalibard, V.; Osindero, S.; Czarnecki, W. M.; Donahue, J.; Razavi, A.; Vinyals, O.; Green, T.; Dunning, I.; Simonyan, K.; Fernando, C.; Kavukcuoglu, K., Population based training of neural networks (2017)
[15] Jiang, T.; Gradus, J. L.; Rosellini, A. J., Supervised machine learning: A brief primer, Behav. Therapy, 51, 5, 675-687 (2020)
[16] Joubert, J.; Axhausen, K., Inferring commercial vehicle activities in Gauteng, South Africa, J. Transport Geogr., 19, 1, 115-124 (2011)
[17] Joubert, J. W.; Axhausen, K. W., A complex network approach to understand commercial vehicle movement, Transportation, 40, 3, 729-750 (2013)
[18] Joubert, J. W.; Meintjes, S., Computational considerations in building inter-firm networks, Transportation, 42, 857-878 (2015)
[19] Joubert, J. W.; Meintjes, S., Repeatability & reproducibility: Implications of using GPS data for freight activity chains, Transp. Res. B, 76, 81-92 (2015)
[20] Joubert, J. W.; Meintjes, S., Freight activity chain generation using complex networks of connectivity, Transp. Res. Procedia, 12, 425-435 (2016)
[21] Klippel, A.; Worboys, M.; Duckham, M., Identifying factors of geographic event conceptualisation, Int. J. Geogr. Inf. Sci., 22, 2, 183-204 (2008)
[22] Krause, C. M.; Zhang, L., Short-term travel behavior prediction with GPS, land use, and point of interest data, Transp. Res. B, 123, 349-361 (2019)
[23] Leal, S. S.; de Almeida, P. E.M.; Chung, E., Active control for traffic lights in regions and corridors: an approach based on evolutionary computation, Transp. Res. Procedia, 25, 1769-1780 (2017)
[24] Malzer, C.; Baum, M., A hybrid approach to hierarchical density-based cluster selection, (2020 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems. 2020 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, MFI (2020), IEEE)
[25] Michalewicz, Z., Genetic Algorithms + Data Structures=Evolution Programs (2013), Springer
[26] Nurmi, P., Identifying Meaningful Places (2009), University of Helsinki, (Ph.D. thesis)
[27] Olson, R. S.; Moore, J. H., TPOT: A tree-based pipeline optimization tool for automating machine learning, (Automated Machine Learning (2019), Springer), 151-160
[28] Pidd, M., Why modelling and model use matter, J. Oper. Res. Soc. (2010), 61, 14-24 (2010) · Zbl 1193.90156
[29] QGIS Development Team, M., QGIS Geographic Information System (2019), Open Source Geospatial Foundation Project: Open Source Geospatial Foundation Project Oregon, United States
[30] Rodriguez, M. A.; Jarur, M. C., A genetic algorithm for searching spatial configurations, IEEE Trans. Evol. Comput., 9, 3, 252-270 (2005)
[31] Seo, T.; Kusakabe, T.; Gotoh, H.; Asakura, Y., Interactive online machine learning approach for activity-travel survey, Transp. Res. B, 123, 362-373 (2019)
[32] Sfyridis, A.; Agnolucci, P., Annual average daily traffic estimation in England and Wales: An application of clustering and regression modelling, J. Transp. Geogr., 83, Article 102658 pp. (2020)
[33] Sharman, B. W.; Roorda, M. J., Analysis of freight global positioning system data: Clustering approach for identifying trip destinations, Transp. Res. Rec., 2246, 1, 83-91 (2011)
[34] Shi, Z.; Pun-Cheng, L. S., Spatiotemporal data clustering: A survey of methods, Int. J. Geo-Inf., 8, 3 (2019)
[35] Sipper, M.; Fu, W.; Ahuja, K.; Moore, J. H., Investigating the parameter space of evolutionary algorithms, BioData Min., 11, 2 (2018)
[36] Slowik, A.; Kwasnicka, H., Evolutionary algorithms and their applications to engineering problems, Neural Comput. Appl., 32, 12363-12379 (2020)
[37] Such, F. P.; Madhavan, V.; Conti, E.; Lehman, J.; Stanley, K. O.; Clune, J., Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning (2018)
[38] Tani, L.; Rand, D.; Veelken, C.; Kadastik, M., Evolutionary algorithms for hyperparameter optimization in machine learning for application in high energy physics, Eur. Phys. J. C, 81, 2 (2021)
[39] Telikani, T.; Tahmassebi, A.; Banzhaf, W.; Gandomi, A. H., Evolutionary machine learning: A survey, ACM Comput. Surv., 54, 8, 1-35 (2022)
[40] Zhang, Y.; Gong, Y.; Gao, Y.; Wang, H.; Zhang, J., Parameter-free voronoi neighborhood for evolutionary multimodal optimization, IEEE Trans. Evol. Comput., 24, 2, 335-349 (2020)
[41] Zhang, Y.; Lu, C.; Hu, F.; Liu, Z., Multi-objective genetic algorithm based on cloning mechanism and its application, J. Convergence Inf. Technol., 7, 20, 535-543 (2012)
[42] Zhang, Z.; Wei, L.; Lim, A., An evolutionary local search for the capacitated vehicle routing problem minimising fuel consumption under three-dimensional loading constraints, Transp. Res. B, 82, 20-35 (2015)
[43] Zhong, J.; Shen, M.; Zhang, J.; Chung, H. S.; Shi, Y.; Li, Y., A differential evolution algorithm with dual populations for solving periodic railway timetable scheduling problem, IEEE Trans. Evol. Comput., 17, 4, 512-527 (2013)
[44] Zhou, C.; Frankowski, D.; Ludford, P.; Shekhar, S.; Terveen, L., Discovering personal gazetteers: An interactive clustering approach, (GIS ’04 Proceedings of the \(12{}^{t h}\) Annual ACM International Workshop on Geographic Information Systems, Vol. 76 (2004)), 266-273
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.