Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Data Fusion Using Medical Records and Clinical Data to Support TB Diagnosis

Andrés F. Romero-Gómez

Alvaro D. Orjuela-Cañón

^* ,

Version 1 : Received: 18 June 2024 / Approved: 19 June 2024 / Online: 19 June 2024 (07:21:32 CEST)

How to cite: Romero-Gómez, A. F.; Orjuela-Cañón, A. D.; Jutinico, A. L.; Awad, C. E.; Vergara, E.; Palencia, M. A. Data Fusion Using Medical Records and Clinical Data to Support TB Diagnosis. Preprints 2024, 2024061316. https://doi.org/10.20944/preprints202406.1316.v1 Romero-Gómez, A. F.; Orjuela-Cañón, A. D.; Jutinico, A. L.; Awad, C. E.; Vergara, E.; Palencia, M. A. Data Fusion Using Medical Records and Clinical Data to Support TB Diagnosis. Preprints 2024, 2024061316. https://doi.org/10.20944/preprints202406.1316.v1

Abstract

Tuberculosis (TB) is an infectious disease declared a global emergency by the World Health Organization and continues as one of the world’s top ten causes of death. TB diagnosis is more critical in developing countries where demanded infrastructure for detection, and treatment complicates the efforts against the disease. These aspects related to limited resources are significant, especially in areas away from the main cities, with few mechanisms to make a timely diagnosis that contributes to successfully addressing the possible patients. Artificial intelligence has begun to be essential in providing additional strategies in the diagnosis processes for health professionals’ support. This paper uses natural language processing (NLP) and machine learning (ML) techniques to create models that can supply TB diagnosis when the needed infrastructure is unavailable. Two different sources were explored: text extracted from electronic medical records (EMR) and patient clinical data (CD). Four proposals using five different machine learning models were implemented. The first two models employed ML and each data source independently. Then, two additional approaches developed a data fusion from both sources. This strategy’s employment was analyzed with physicians according to their pertinence in the process and understanding of the EMR. Finally, the results of the data fusion were compared to each source, obtaining better performance at using only the CD, where an area under the ROC curve of 69.9±2.3% was obtained. However, the advantage of analyzing physician’s reports is the availability of this information contrasted to clinical-specific data, which can be more useful in places far from the main cities without enough basic structure for its obtaining.

Keywords

artificial intelligence; tuberculosis diagnosis; data fusion

Subject

Engineering, Bioengineering

Copyright: This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download PDF