Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Data Fusion Using Medical Records and Clinical Data to Support TB Diagnosis

Version 1 : Received: 18 June 2024 / Approved: 19 June 2024 / Online: 19 June 2024 (07:21:32 CEST)

How to cite: Romero-Gómez, A. F.; Orjuela-Cañón, A. D.; Jutinico, A. L.; Awad, C. E.; Vergara, E.; Palencia, M. A. Data Fusion Using Medical Records and Clinical Data to Support TB Diagnosis. Preprints 2024, 2024061316. https://doi.org/10.20944/preprints202406.1316.v1 Romero-Gómez, A. F.; Orjuela-Cañón, A. D.; Jutinico, A. L.; Awad, C. E.; Vergara, E.; Palencia, M. A. Data Fusion Using Medical Records and Clinical Data to Support TB Diagnosis. Preprints 2024, 2024061316. https://doi.org/10.20944/preprints202406.1316.v1

Abstract

Tuberculosis (TB) is an infectious disease declared a global emergency by the World Health Organization and continues as one of the world’s top ten causes of death. TB diagnosis is more critical in developing countries where demanded infrastructure for detection, and treatment complicates the efforts against the disease. These aspects related to limited resources are significant, especially in areas away from the main cities, with few mechanisms to make a timely diagnosis that contributes to successfully addressing the possible patients. Artificial intelligence has begun to be essential in providing additional strategies in the diagnosis processes for health professionals’ support. This paper uses natural language processing (NLP) and machine learning (ML) techniques to create models that can supply TB diagnosis when the needed infrastructure is unavailable. Two different sources were explored: text extracted from electronic medical records (EMR) and patient clinical data (CD). Four proposals using five different machine learning models were implemented. The first two models employed ML and each data source independently. Then, two additional approaches developed a data fusion from both sources. This strategy’s employment was analyzed with physicians according to their pertinence in the process and understanding of the EMR. Finally, the results of the data fusion were compared to each source, obtaining better performance at using only the CD, where an area under the ROC curve of 69.9±2.3% was obtained. However, the advantage of analyzing physician’s reports is the availability of this information contrasted to clinical-specific data, which can be more useful in places far from the main cities without enough basic structure for its obtaining.

Keywords

artificial intelligence; tuberculosis diagnosis; data fusion

Subject

Engineering, Bioengineering

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.