Abstract
The exponential growth of healthcare data poses significant challenges for clinical researchers who strive to identify meaningful patterns and correlations. The complexity of this data arises from its high dimensionality, sparsity, inaccuracy, incompleteness, longitudinality, and heterogeneity. While conventional pattern recognition algorithms can partially address issues related to high dimensionality, sparsity, inaccuracy, and longitudinality, the problems of incompleteness and heterogeneity remain a persistent challenge, particularly when analyzing electronic health records (EHRs). EHRs often encompass diverse data types, such as clinical notes (text), blood pressure readings (longitudinal numerical data), MR scans (images), and DCE-MRIs (longitudinal video data), and may only include a subset of data for each patient at any given time interval. To tackle these challenges, we propose a kernel-based framework as the most suitable approach for handling heterogeneous data formats by representing them as matrices of equal terms. Our research endeavours to develop methodologies within this framework to construct a decision support system (DSS). To achieve this, we advocate for the incorporation of preprocessing mechanisms to address the challenges of incompleteness and heterogeneity prior to integration into the kernel framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aseervatham, S.: A local latent semantic analysis-based kernel for document similarities. In: 2008 IEEE International Joint Conference on Neural Networks. IJCNN 2008. (IEEE World Congress on Computational Intelligence), pp. 214–219. IEEE (2008)
Borgwardt, K.M., Kriegel, H.P.: Shortest-path kernels on graphs. In: Fifth IEEE International Conference on Data Mining, pp. 8-pp. IEEE (2005)
Collins, M., Duffy, N.: Convolution kernels for natural language. In: Advances in Neural Information Processing Systems, pp. 625–632 (2001)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)
Daliri, M.R., Torre, V.: Shape recognition based on kernel-edit distance. Comput. Vis. Image Underst. 114(10), 1097–1103 (2010)
Deng, M., Sun, F., Chen, T.: Assessment of the reliability of protein-protein interactions and protein function prediction. In: Pacific Symposium Biocomputing (PSB 2003), pp. 140–151 (2002)
Ge, H., Liu, Z., Church, G.M., Vidal, M.: Correlation between transcriptome and interactome mapping data from saccharomyces cerevisiae. Nat. Genet. 29(4), 482–486 (2001)
Gönen, M., Alpaydın, E.: Multiple kernel learning algorithms. J. Mach. Learn. Res. 12, 2211–2268 (2011)
Grauman, K., Darrell, T.: The pyramid match kernel: efficient learning with sets of features. J. Mach. Learn. Res. 8, 725–760 (2007)
Hofmann, T., Schölkopf, B., Smola, A.J.: A review of kernel methods in machine learning. Mac-Planck-Institut für biologische, Kybernetik, Technical report 156 (2006)
Holzinger, A., Schantl, J., Schroettner, M., Seifert, C., Verspoor, K.: Biomedical text mining: state-of-the-art, open problems and future challenges. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 271–300. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43968-5_16
Krebs, K., Milani, L.: Harnessing the power of electronic health records and genomics for drug discovery. Annu. Rev. Pharmacol. Toxicol. 63, 65–76 (2023)
de Lusignan, S., Navarro, R., Chan, T., Parry, G., Dent-Brown, K., Kendrick, T.: Detecting referral and selection bias by the anonymous linkage of practice, hospital and clinic data using secure and private record linkage (SAPREL): case study from the evaluation of the improved access to psychological therapy (IAPT) service. BMC Med. Inform. Decis. Mak. 11(1), 61 (2011)
Lyu, S.: Mercer kernels for object recognition with local features. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2005, vol. 2, pp. 223–229. IEEE (2005)
Mrowka, R., Liebermeister, W., Holste, D.: Does mapping reveal correlation between gene expression and protein-protein interaction? Nat. Genet. 33(1), 15–16 (2003)
Nakaya, A., Goto, S., Kanehisa, M.: Extraction of correlated gene clusters by multiple graph comparison. Genome Inform. Ser. 12, 44–53 (2001)
Nicotra, L., Micheli, A., Starita, A.: Fisher kernel for tree structured data. In: Proceedings of the IEEE International Joint Conference on Neural Networks, pp. 1917–1922. Citeseer (2004)
Nwegbu, N., Tirunagari, S., Windridge, D.: A novel kernel based approach to arbitrary length symbolic data with application to type 2 diabetes risk. Sci. Rep. 12(1), 4985 (2022)
Panov, M., Tatarchuk, A., Mottl, V., Windridge, D.: A modified neutral point method for kernel-based fusion of pattern-recognition modalities with incomplete data sets. In: Sansone, C., Kittler, J., Roli, F. (eds.) MCS 2011. LNCS, vol. 6713, pp. 126–136. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21557-5_15
Poh, N., Merati, A., Kittler, J.: Heterogeneous information fusion: a novel fusion paradigm for biometric systems. In: 2011 International Joint Conference on Biometrics (IJCB), pp. 1–8. IEEE (2011)
Poh, N., Tirunagari, S., Windridge, D.: Challenges in designing an online healthcare platform for personalised patient analytics. In: 2014 IEEE Symposium on Computational Intelligence in Big Data (CIBD), pp. 1–6. IEEE (2014)
Ramanna, S., Tirunagari, S., Windridge, D.: Epileptic seizure detection using constrained singular spectrum analysis and 1D-local binary patterns. Health Technol. 10(3), 699–709 (2020). https://doi.org/10.1007/s12553-019-00395-4
Ripoll, V.J.R., et al.: On the intelligent management of sepsis in the intensive care unit (2012)
Roos, C., Terlaky, T., Vial, J.P.: Interior Point Methods for Linear Optimization. Springer, Berlin (2006)
Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT press, Cambridge (2001)
Shen, Y., et al.: Socialized gaussian process model for human behavior prediction in a health social network. In: ICDM, vol. 12, pp. 1110–1115. Citeseer (2012)
Smola, A.J., Ovari, Z.L., Williamson, R.C.: Regularization with dot-product kernels. In: Advances in Neural Information Processing Systems, pp. 308–314 (2001)
Tirunagari, S., Bull, S., Poh, N.: Automatic classification of irregularly sampled time series with unequal lengths: a case study on estimated glomerular filtration rate. In: 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6. IEEE (2016)
Tirunagari, S., Bull, S.C., Vehtari, A., Farmer, C., De Lusignan, S., Poh, N.: Automatic detection of acute kidney injury episodes from primary care data. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–6. IEEE (2016)
Tirunagari, S., Poh, N., Wells, K., Bober, M., Gorden, I., Windridge, D.: Movement correction in DCE-MRI through windowed and reconstruction dynamic mode decomposition. Mach. Vis. Appl. 28, 393–407 (2017)
Windridge, D., Bober, M.: A kernel-based framework for medical big-data analytics. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 197–208. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43968-5_11
Windridge, D., Mottl, V., Tatarchuk, A., Eliseyev, A.: The neutral point method for kernel-based combination of disjoint training data in multi-modal pattern recognition. In: Haindl, M., Kittler, J., Roli, F. (eds.) MCS 2007. LNCS, vol. 4472, pp. 13–21. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-72523-7_2
Yarkiner, Z., Hunter, G., O’Neil, R., de Lusignan, S.: Applications of mixed models for investigating progression of chronic disease in a longitudinal dataset of patient records from general practice. J. Biomet. Biostat. S 9, 2 (2013)
Yu, S., Tranchevent, L.C., Moor, B., Moreau, Y.: Kernel-Based Data Fusion for Machine Learning: Methods and Applications in Bioinformatics and Text Mining, vol. 345. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19406-1
Zhou, D.X.: The covering number in learning theory. J. Complex. 18(3), 739–767 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tirunagari, S., Mohan, S., Windridge, D., Balla, Y. (2023). Addressing Challenges in Healthcare Big Data Analytics. In: Morusupalli, R., Dandibhotla, T.S., Atluri, V.V., Windridge, D., Lingras, P., Komati, V.R. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2023. Lecture Notes in Computer Science(), vol 14078. Springer, Cham. https://doi.org/10.1007/978-3-031-36402-0_70
Download citation
DOI: https://doi.org/10.1007/978-3-031-36402-0_70
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36401-3
Online ISBN: 978-3-031-36402-0
eBook Packages: Computer ScienceComputer Science (R0)