×

Improving the non-compensatory trace-clustering decision process. (English) Zbl 07744700

Summary: In flexible environments (such as healthcare or customer service), the observed behavior is expected to considerably vary, namely there is no dominant flow path. Such a high variability obstructs the process discovery task since it regularly leads to “spaghetti” process models. Trace clustering is about grouping behaviors, and discovering a distinct model per group, thus delivering more comprehensible results. In previous works, we have proposed a multiple-criteria non-compensatory approach to create a similarity metric and finally perform trace clustering. The main problem that we tried to respond to is how to summarize a process event log, when a lot of variability exists, thus facilitating knowledge discovery. The underpinnings of the non-compensatory approach are first the fact that a sufficient number of criteria must be concordant with the similarity (concordance setting) and second that there should not exist any criterion raising a veto logic, that is, among the criteria that are not concordant, none of them must be conflicting with the similarity (discordance setting). This work challenges improved support for the decision-maker (DM) and it extends the previous approach by (i) proposing an improved clustering technique based on spectral clustering; (ii) guiding the clustering process by allowing reinforced or counterveto effects and pairwise constraints; (iii) handling outliers through a trimming approach as an integer linear program. All improvements aiming at making elements of the trace-clustering process more accessible to the DMs and enhancing the understandability of the analysis.
{© 2021 The Authors. International Transactions in Operational Research © 2021 International Federation of Operational Research Societies}

MSC:

90-XX Operations research, mathematical programming

Software:

quadprog; GitHub
Full Text: DOI

References:

[1] van derAalst, W.M.P., Berti, A., 2020. Discovering object‐centric Petri nets. Fundamenta Informaticae175, 1-40. · Zbl 1497.68354
[2] Appice, A., Malerba, D., 2016. A co‐training strategy for multiple view clustering in process mining. IEEE Transactions on Services Computing9, 6, 832-845.
[3] Bickel, S., Scheffer, T., 2004. Multi‐view clustering. Fourth IEEE International Conference on Data Mining (ICDM04). IEEE, Piscataway, NJ.
[4] Chapelle, O., Scholkopf, B., A.Zien, E., 2009. Semi‐supervised learning (Chapelle, O. et al., eds. 2006) [book reviews]. IEEE Transactions on Neural Networks20, 3, 542-542.
[5] Chatain, T., Carmona, J., vanDongen, B., 2017. Alignment‐based trace clustering. In Conceptual Modeling. Springer International Publishing, Berlin, pp. 295-308.
[6] Chung, F., 1996. Spectral Graph Theory. American Mathematical Society, Providence, RI.
[7] Cohn, D., McCallum, A., Caruana, R., 2008. Semi‐supervised clustering with user feedback. In Constrained Clustering: Advances in Algorithms, Theory, and Applications. Data Mining and Knowledge Discovery Series. CRC Press, Boca Raton, FL, pp. 17-32. · Zbl 1161.68759
[8] De Weerdt, J., vandenBroucke, S., Vanthienen, J., Baesens, B., 2013. Active trace clustering for improved process discovery. IEEE Transactions on Knowledge and Data Engineering25, 12, 2708-2720.
[9] Delias, P., Doumpos, M., Grigoroudis, E., Matsatsinis, N., 2017. A non‐compensatory approach for trace clustering. International Transactions in Operational Research26, 5, 1828-1846. · Zbl 07766374
[10] Delias, P., Doumpos, M., Grigoroudis, E., Matsatsinis, N., 2020. Improving the non‐compensatory trace clustering decision process with robustness concerns. In Linden, I. (ed.), Turon, A. (ed.), Dargam, F. (ed.), Jayawickrama, U. (ed.) (eds) Cognitive Decision Support Systems & Technologies. University of Zaragoza, Zaragoza, Spain, pp. 175-180.
[11] Delias, P., Lagopoulos, A., Tsoumakas, G., Grigori, D., 2018. Using multi‐target feature evaluation to discover factors that affect business process behavior. Computers in Industry99, 253-261.
[12] Ding, S., Jia, H., Du, M., Xue, Y., 2018. A semi‐supervised approximate spectral clustering algorithm based on HMRF model. Information Sciences429, 215-228.
[13] Downey, R.G., Fellows, M.R., 1995. Fixed‐parameter tractability and completeness II: on completeness for w[1]. Theoretical Computer Science141, 1‐2, 109-131. · Zbl 0873.68059
[14] vanEngelen, J.E., Hoos, H.H., 2019. A survey on semi‐supervised learning. Machine Learning109, 2, 373-440. · Zbl 1441.68215
[15] Ester, M., Kriegel, H.P., Sander, J., Xu, X., 1996. A density‐based algorithm for discovering clusters in large spatial databases with noise. KDD’96: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, Vol. 96, pp. 226-231.
[16] Figueira, J.R., Greco, S., Roy, B., Słowiński, R., 2010. ELECTRE methods: main features and recent developments. In Zopounidis, C. (ed.), Pardalos, P.M. (ed.) (eds) Handbook of Multicriteria Analysis, Vol. 103. Springer, Berlin, pp. 51-89.
[17] Gallegos, M.T., Ritter, G., 2005. A robust method for cluster analysis. The Annals of Statistics33, 1, 347-380. · Zbl 1064.62074
[18] García‐Escudero, L.A., Gordaliza, A., Matrán, C., Mayo‐Iscar, A., 2008. A general trimming approach to robust cluster Analysis. The Annals of Statistics36, 3, 1324-1345. · Zbl 1360.62328
[19] García‐Escudero, L.A., Gordaliza, A., Matrán, C., Mayo‐Iscar, A., 2010. A review of robust clustering methods. Advances in Data Analysis and Classification4, 2-3, 89-109. · Zbl 1284.62375
[20] Greco, G., Guzzo, A., Pontieri, L., Sacca, D., 2006. Discovering expressive process models by clustering log traces. IEEE Transactions on Knowledge and Data Engineering18, 8, 1010-1027.
[21] Grubbs, F.E., 1969. Procedures for detecting outlying observations in samples. Technometrics11, 1, 1-21.
[22] Hinneburg, A., 2014. Concepts of visual and interactive clustering. In Aggarwal, C. (ed.), Reddy, C. (ed.) (eds), Data Clustering: Algorithms and Applications. Data Mining and Knowledge Discovery Series. CRC, Boca Raton, FL, pp. 483-500.
[23] Hodge, V., Austin, J., 2004. A survey of outlier detection methodologies. Artificial Intelligence Review22, 2, 85-126. · Zbl 1101.68023
[24] Janssenswillen, G., 2020. bupaR: Business Process Analysis in R. R package version 0.4.4. Available at https://www.bupar.net, https://github.com/bupaverse/bupaR (accessed July 2020).
[25] Kumar, A., Daumé, H.2011. A co‐training approach for multi‐view spectral clustering. Proceedings of the 28th International Conference on International Conference on Machine Learning. Omnipress, Madison, WI, pp. 393-400.
[26] Latva‐Koivisto, A., 2001. Finding a complexity measure for business process models. Technical Report Mat‐2.108, Helsinki University of Technology, Finland.
[27] Mendling, J., 2008. Metrics for Process Models Empirical Foundations of Verification, Error Prediction, and Guidelines for Correctness. Springer, Berlin.
[28] Moreno‐Jiménez, J.M., Vargas, L.G., 2019. Special issue on “mind and cognition in the future of decision support systems.” International Transactions in Operational Research27, 1, 703-704.
[29] Nascimento, M.C., deCarvalho, A.C., 2011. Spectral methods for graph clustering—a survey. European Journal of Operational Research211, 2, 221-231. · Zbl 1250.68228
[30] Ng, A.Y., Jordan, M.I., Weiss, Y., 2001. On spectral clustering: analysis and an algorithm. Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA, pp. 849-856.
[31] Ren, W., Li, G., 2017. Graph based semi‐supervised learning via label fitting. International Journal of Machine Learning and Cybernetics8, 3, 877-889.
[32] Roy, B., 2005. Paradigms and challenges. In Figueira, J. (ed.), Greco, S. (ed.), Ehrogott, M. (ed.) (eds) Multiple Criteria Decision Analysis: State of the Art Surveys, International Series in Operations Research & Management Science, Vol. 78. Springer New York, pp. 3-24. · Zbl 1072.90544
[33] Roy, B., Słowiński, R., 2008. Handling effects of reinforced preference and counter‐veto in credibility of outranking. European Journal of Operational Research188, 1, 185-190. · Zbl 1135.90020
[34] Roy, B., Vincke, P., 1987. Pseudo‐orders: definition, properties and numerical representation. Mathematical Social Sciences14, 3, 263-274. · Zbl 0628.90001
[35] Słowiński, R., Vanderpooten, D., 1997. Similarity relation as a basis for rough approximations. In Wang, P.P. (ed.) (ed.) Advances in Machine Intelligence & Soft‐Computing, Vol. 4. Duke University, Durham, NC, pp. 17-33.
[36] Song, M., Günther, C.W., Aalst, W.M.P., 2009. Trace clustering in process mining. In Aalst, W.M.P. (ed.), Mylopoulos, J. (ed.), Sadeh, N.M. (ed.), Shaw, M.J. (ed.), Szyperski, C. (ed.), Ardagna, D. (ed.), Mecella, M. (ed.), Yang, J. (ed.) (eds) Business Process Management Workshops, Vol. 17. Springer, Berlin, pp. 109-120.
[37] Spielman, D.A., Teng, S.H., 1996. Spectral partitioning works: planar graphs and finite element meshes. Proceedings of 37th Conference on Foundations of Computer Science. IEEE, Piscataway, NJ, pp. 96-105.
[38] Sun, Y., Bauer, B., Weidlich, M., 2017. Compound trace clustering to generate accurate and simple sub‐process models. International Conference on Service‐Oriented Computing. Springer International Publishing, Berlin, pp. 175-190.
[39] Thaler, T., Ternis, S.F., Fettke, P., Loos, P., 2015. A comparative analysis of process instance cluster techniques. Wirtschaftsinformatik Proceedings 2015, Osnabrück, pp. 423-437.
[40] Turlach, B., Weingessel, A., Moler, C., 2019. quadprog: Functions to Solve Quadratic Programming Problems. R package version 1.5‐8. Available at https://cran.r‐project.org/web/packages/quadprog/quadprog.pdf.
[41] Van Dongen, B.F., 2011. Real‐life event logs—hospital log. Available at http://data.4tu.nl/repository/uuid:d9769f3d‐0ab0‐4fb8‐803b‐0d1120ffcf54.
[42] Wang, W., Yang, J., Muntz, R.R., 1997. Sting: a statistical information grid approach to spatial data mining. Proceedings of 23rd International Conference on Very Large Data Bases. Morgan Kaufmann, Athens, Greece, pp. 186-195.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.