Abstract
This paper presents a random forest-feature sensitivity and feature correlation (RF-FSFC) technique for enhanced heart disease prediction. The proposed methodology is implemented using the Cleveland heart disease dataset which comprises a total of 120 heart disease patient records. Data imputation was utilized for missing values, and min–max normalization was utilized for data transformation. We attempted to construct an RF-based classifier for coronary heart disease in this paper by combining feature sensitivity and correlation analysis. The sensitivity-based feature selection process ranks features according to their value in assessing CHD risk, and the feature correlation analysis phase analyses if there are any correlations between features. The heart disease prediction accuracy of 81.16% was obtained using the proposed RF-FSFCA technique by omitting five features (sex, hemoglobin, TD, CRF, and cirrhosis). When compared to the Naïve Bayes, decision tree, regression analysis, and support vector machine models, the proposed model offered a higher accuracy of 86.141% without omitting any features. It also offered sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) scores of 87.321%, 87.364%, 91.23, and 91.02 respectively. Experiment findings demonstrated that the proposed RF-FSFC approach significantly improves prediction accuracy as compared to other approaches that do not use the integrated Feature selection method.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and material
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
References
Acharya UR, Faust O, Sree V, Swapna G, Martis RJ, Kadri NA, Suri JS (2014) Linear and nonlinear analysis of normal and CAD-affected heart rate signals. Comput Methods Programs Biomed 113(1):55–68
Akay MF (2009) Support vector machines combined with feature selection for breast cancer diagnosis. Expert Syst Appl 36(2):3240–3247
Ali F, El-Sappagh S, Islam SR, Kwak D, Ali A, Imran M, Kwak KS (2020) A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion. Inf Fusion 63:208–222
Alizadehsani R, Hosseini MJ, Sani ZA, Ghandeharioun A, Boghrati R (2012) Diagnosis of coronary artery disease using cost-sensitive algorithms. In: 2012 IEEE 12th international conference on data mining workshops (ICDMW), p 9–16, Brussels, Belgium, December
Almustafa KM (2020) Prediction of heart disease and classifiers’ sensitivity analysis. BMC Bioinform 21(1):1–18
Ashish L, Kumar S, Yeligeti S (2021) Ischemic heart disease detection using support vector Machine and extreme gradient boosting method. Mater Today Proc. https://doi.org/10.1016/j.matpr.2021.01.715
Babaoğlu I, Fındık O, Bayrak M (2010) Effects of principle component analysis on assessment of coronary artery diseases using support vector machine. Expert Syst Appl 37(3):2182–2185
Baihaqi WM Setiawan NA, Ardiyanto I (2016 ) Rule extraction for fuzzy expert system to diagnose coronary artery disease. In: International conference on information technology, information systems and electrical engineering (ICITISEE), p 136–141, Yogyakarta, Indonesia, August
Bhatla N, Jyoti K (2012) An analysis of heart disease prediction using different data mining techniques. Int J Eng 1(8):1–4
Bonow RO, Carabello BA, Chatterjee K, de Leon AC, Faxon DP, Freed MD, Gaasch WH, Lytle BW, Nishimura RA, O’Gara PT, O’Rourke RA (2008) 2008 focused update incorporated into the ACC/AHA 2006 guidelines for the management of patients with valvular heart disease: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (writing committee to revise the 1998 guidelines for the management of patients with valvular heart disease) endorsed by the Society of Cardiovascular Anesthesiologists, Society for Cardiovascular Angiography and Interventions, and Society of Thoracic Surgeons. J Am Cardiol 52(13):e1–e142
Budholiya K, Shrivastava SK, Sharma V (2020) An optimized XGBoost based diagnostic system for effective prediction of heart disease. J King Saud Univ Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2020.10.013
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28
Chen H-L, Yang B, Liu J, Liu D-Y (2011) A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis. Expert Syst Appl 38(7):9014–9022
Cherian RP, Thomas N, Venkitachalam S (2020) Weight optimized neural network for heart disease prediction using hybrid lion plus particle swarm algorithm. J Biomed Inf 110:103543
Cook S, Ladich E, Nakazawa G, Eshtehardi P, Neidhart M, Vogel R, Togni M, Wenaweser P, Billinger M, Seiler C, Gay S (2009) Correlation of intravascular ultrasound findings with histopathological analysis of thrombus aspirates in patients with very late drug-eluting stent thrombosis. Circulation 120(5):391–399
Davari Dolatabadi A, Khadem SEZ, Asl BM (2017) Automated diagnosis of coronary artery disease (CAD) patients using optimized SVM. Comput Methods Programs Biomed 138:117–126
Demuth HB, Beale MH, De Jess O, Hagan MT (2014) Neural network design. Martin Hagan, Stillwater
Fayyad UM, Irani K (1992) On the handling of continuousvalued attributes in decision tree generation. Mach Learn 8(1):87–102
Frank A, Asuncion A (2010) UCI Machine learning repository, vol. 213. University of California, School of Information and Computer Science, Irvine, CA, USA, http://archive.ics.uci.edu/ml
Fraser VJ, Burd L, Liebson E, Lipschik GY, Peterson CM (2008) Diseases and disorders. Marshall Cavendish Corporation, New York
Giri D, Acharya UR, Martis RJ, Sree SV, Lim TC, Thajudin Ahamed VI, Suri JS (2013) Automated diagnosis of coronary artery disease affected patients using LDA, PCA, ICA and discrete wavelet transform. Knowl Based Syst 37:274–282
Gowthul Alam MM, Baulkani S (2019a) Geometric structure information based multi-objective function to increase fuzzy clustering performance with artificial and real-life data. Soft Comput 23(4):1079–1098
Gowthul Alam MM, Baulkani S (2019b) Local and global characteristics-based kernel hybridization to increase optimal support vector machine performance for stock market prediction. Knowl Inf Syst 60(2):971–1000
Hameed AZ, Ramasamy B, Shahzad MA, Bakhsh AAS (2021) Efficient hybrid algorithm based on genetic with weighted fuzzy rule for developing a decision support system in prediction of heart diseases. J Supercomput 77:1–21
Hamilton HJ, Shan N, Cercone N (1996) RIAC: a rule induction algorithm based on approximate classification. Computer Science Department, University of Regina, Regina
Hassan BA (2020) CSCF: a chaotic sine cosine firefly algorithm for practical application problems. Neural Comput Appl 33:1–20
Hassan BA, Rashid TA (2020) Datasets on statistical analysis and performance evaluation of backtracking search optimisation algorithm compared with its counterpart algorithms. Data Brief 28:105046
Huang C-L, Liao H-C, Chen M-C (2008) Prediction model building and feature selection with support vector machines in breast cancer diagnosis. Expert Syst Appl 34(1):578–587
ISO (1993) Guide to the expression of uncertainty in measurement. International Organization for Standardization, Geneva
Jolliffe IT (1986) Principal component analysis and factor analysis. Principal component analysis. Springer, New York, pp 115–128
Jose J, Gautam N, Tiwari M, Tiwari T, Suresh A, Sundararaj V, Rejeesh MR (2021) An image quality enhancement scheme employing adolescent identity search algorithm in the NSST domain for multimodal medical image fusion. Biomed Signal Process Control 66:102480
Kannel WB, Gordon T, Castelli WP, Margolis JR (1970) Electrocardiographic left ventricular hypertrophy and risk of coronary heart disease. The Framingham study. Ann Intern Med 72(6):813–822
Khan MA (2020) An IoT framework for heart disease prediction based on MDCNN classifier. IEEE Access 8:34717–34727
Khemphila A, Boonjing V (2011) Heart disease classification using neural network and feature selection. In: 21st international conference on systems engineering (ICSEng), p 406–409
Kim JK, Kang S (2017) Neural network-based coronary heart disease risk prediction using feature correlation analysis. J Healthc Eng. https://doi.org/10.1155/2017/2780501
Krishnaveni N, Radha V (2019) Feature selection algorithms for data mining classification: a survey. Indian J Sci Technol 1:1. https://doi.org/10.17485/ijst/2018/v12i6/139581
Lu Y, Ballew SH, Tanaka H, Szklo M, Heiss G, Coresh J, Matsushita K (2020) 2017 ACC/AHA blood pressure classification and incident peripheral artery disease: the atherosclerosis risk in communities (ARIC) study. Eur J Prev Cardiol 27(1):51–59
Maneerat Y, Prasongsukarn K, Benjathummarak S, Dechkhajorn W, Chaisri U (2016) Intersected genes in hyperlipidemia and coronary bypass patients: feasible biomarkers for coronary heart disease. Atherosclerosis 252:183-e184
Marateb HR, Goudarzi S (2015) A noninvasive method for coronary artery diseases diagnosis using a clinically-interpretable fuzzy rule-based system. J Res Med Sci 20(3):214–223
Mohammadpour RA, Abedi SM, Bagheri S, Ghaemian A (2015) Fuzzy rule-based classification system for assessing coronary artery disease. Comput Math Methods Med 2015 (article ID 564867)
Mohan S, Thirumalai C, Srivastava G (2020) Heart disease prediction using machine learning techniques. SN Comput Sci 1(6):1–6
Nakashima T, Noguchi T, Haruta S, Yamamoto Y, Oshima S, Nakao K, Taniguchi Y, Yamaguchi J, Tsuchihashi K, Seki A, Kawasaki T (2016) Prognostic impact of spontaneous coronary artery dissection in young female patients with acute myocardial infarction: a report from the angina pectoris—myocardial infarction multicenter investigators in Japan. Int J Cardiol 207:341–348
Narain R, Saxena S, Goyal AK (2016) Cardiovascular risk prediction: a comparative study of Framingham and quantum neural network based approach. Patient Prefer Adherence 10:1259–1270
N Cardiovascular Diseases (2015) (CVDs) Fact sheet N°317, WHO [updated May 2017]. http://www.who.int/mediacentre/factsheets/fs317/en/index/html
Nissen SE, Tuzcu EM, Libby P, Thompson PD, Ghali M, Garza D, Berman L, Shi H, Buebendorf E, Topol EJ, Investigators C (2004) Effect of antihypertensive agents on cardiovascular events in patients with coronary disease and normal blood pressure: the CAMELOT study: a randomized controlled trial. JAMA 292(18):2217–2225
Oliver AS, Ganesan K, Yuvaraj SA, Jayasankar T, Sikkandar MY, Prakash NB (2021) Accurate prediction of heart disease based on bio system using regressive learning based neural network classifier. J Ambient Intell Human Comput 2021:1–9
Patidar S, Pachori RB, Rajendra Acharya U (2015) Automated diagnosis of coronary artery disease using tunable-Q wavelet transform applied on heart rate signals. Knowl Based Syst 82:1–10
Piekarczyk M, Bar O, Bibrzycki Ł, Niedźwiecki M, Rzecki K, Stuglik S, Andersen T, Budnev NM, Alvarez-Castillo DE, Cheminant KA, Góra D (2021) CNN-based classifier as an offline trigger for the CREDO experiment. Sensors 21(14):4804
Polat K, Güneş S (2007) Breast cancer diagnosis using least square support vector machine. Digit Signal Proc 17(4):694–701
Quinlan JR (1996a) Improved use of continuous attributes in C4.5. J Artif Intell Res 4:77–90
Quinlan JR (1996b) Improved use of continuous attributes in C4.5. J Artif Intell Res 4:77–90
Rani P, Kumar R, Ahmed NM, Jain A (2021) A decision support system for heart disease prediction based upon machine learning. J Reliab Intell Environ 7:1–13
Sarmah SS (2020) An efficient IoT-based patient monitoring and heart disease prediction system using deep learning modified neural network. IEEE Access 8:135784–135797
Selvi RT, Muthulakshmi I (2021) Modelling the map reduce based optimal gradient boosted tree classification algorithm for diabetes mellitus diagnosis system. J Ambient Intell Human Comput 12(2):1717–1730
Shah A, Ahirrao S, Pandya S, Kotecha K, Rathod S (2021) Smart cardiac framework for an early detection of cardiac arrest condition and risk. Front Public Health 9:762303. https://doi.org/10.3389/fpubh.2021.762303
Singh P, Singh S, Pandi-Jain GS (2008) Independent component analysis for vision-inspired classification of retinal images with age-related macular degeneration. In: Proceeding of IEEE international conference on image processing SSIAI, p 65–68
Singh P, Singh S, Pandi-Jain GS (2018) Effective heart disease prediction system using data mining techniques. Int J Nanomed 13(T-NANO 2014 Abstracts):121
Singh G, Singh M, Gupta P (2021) An observational study to compare diagnostic accuracy of lever sign test, anterior drawer test and lachman test in cases of anterior cruciate ligament tears. J Doctor Res 1(1):21–28
Sornalakshmi M, Balamurali S, Venkatesulu M, Krishnan MN, Ramasamy LK, Kadry S, Lim S (2021) An efficient apriori algorithm for frequent pattern mining using mapreduce in healthcare data. Bull Electr Eng Inform 10(1):390–403
Ster B, Dobnikar A (1996) Neural networks in medical diagnosis: comparison with other methods. In: Proceedings of the international conference on engineering applications of neural networks (EANN ’96), p 427–430
Sundararaj V (2016) An efficient threshold prediction scheme for wavelet based ECG signal noise reduction using variable step size firefly algorithm. Int J Intell Eng Syst 9(3):117–126
Sundararaj V (2019) Optimised denoising scheme via opposition-based self-adaptive learning PSO algorithm for wavelet-based ECG signal noise reduction. Int J Biomed Eng Technol 31(4):325
Sundararaj V, Anoop V, Dixit P, Arjaria A, Chourasia U, Bhambri P, Rejeesh MR, Sundararaj R (2020) CCGPA-MPPT: Cauchy preferential crossover-based global pollination algorithm for MPPT in photovoltaic system. Prog Photovolt Res Appl 28(11):1128–1145
Tan KC, Teoh EJ, Yu Q, Goh KC (2009) A hybrid evolutionary algorithm for attribute selection in data mining. Expert Syst Appl 36(4):8616–8630
Tsipouras MG, Exarchos TP, Fotiadis DI, Kotsia AP, Vakalis KV, Naka KK, Michalis LK (2008) Automated diagnosis of coronary artery disease based on data mining and fuzzy modeling. IEEE Trans Inf Technol Biomed 12(4):447–458
Verma M, Kumar D (2021) A correlation-based feature selection and classification approach for autism spectrum disorder. Int J Inf Syst Model Des (IJISMD) 12(2):51–66
Vinu S (2019) Optimal task assignment in mobile cloud computing by queue based ant-bee algorithm. Wirel Pers Commun 104(1):173–197
Wang C, Zhao Y, Jin B, Gan X, Liang B, Xiang Y, Zhang X, Lu Z, Zheng F (2021) Development and validation of a predictive model for coronary artery disease using machine learning. Front Cardiovasc Med 8(20):43
Wong ND (2014) Epidemiological studies of CHD and the evolution of preventive cardiology. Nat Rev Cardiol 11(5):276–289
Xu Y, Ye H, Zhu Y, Du S, Xu G, Wang Q (2021) The efficacy of mobile health in alleviating risk factors related to the occurrence and development of coronary heart disease: a systematic review and meta-analysis. Clin Cardiol 44:609–619
Zebrack JS, Anderson JL, Maycock CA, Horne BD, Bair TL, Muhlestein JB, Group IH (2002) Usefulness of high-sensitivity C-reactive protein in predicting long-termrisk of death or acute myocardial infarction in patients with unstable or stable angina pectoris or acute myocardial infarction. Am J Cardiol 89(2):145–149
Zheng Y, Vanderbeek B, Daniel E, Stambolian D, Maguire M, Brainard D, Gee J (2013) An automated drusen detection system for classifying age-related macular degeneration with color fundus photographs. In: IEEE 10th international symposium on biomedical imaging, p 1440–1443
Funding
Not applicable.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Human and animal rights
This article does not contain any studies with human or animal subjects performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Saranya, G., Pravin, A. A novel feature selection approach with integrated feature sensitivity and feature correlation for improved prediction of heart disease. J Ambient Intell Human Comput 14, 12005–12019 (2023). https://doi.org/10.1007/s12652-022-03750-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-022-03750-y