-
BraTS-Path Challenge: Assessing Heterogeneous Histopathologic Brain Tumor Sub-regions
Authors:
Spyridon Bakas,
Siddhesh P. Thakur,
Shahriar Faghani,
Mana Moassefi,
Ujjwal Baid,
Verena Chung,
Sarthak Pati,
Shubham Innani,
Bhakti Baheti,
Jake Albrecht,
Alexandros Karargyris,
Hasan Kassem,
MacLean P. Nasrallah,
Jared T. Ahrendsen,
Valeria Barresi,
Maria A. Gubbiotti,
Giselle Y. López,
Calixto-Hope G. Lucas,
Michael L. Miller,
Lee A. D. Cooper,
Jason T. Huse,
William R. Bell
Abstract:
Glioblastoma is the most common primary adult brain tumor, with a grim prognosis - median survival of 12-18 months following treatment, and 4 months otherwise. Glioblastoma is widely infiltrative in the cerebral hemispheres and well-defined by heterogeneous molecular and micro-environmental histopathologic profiles, which pose a major obstacle in treatment. Correctly diagnosing these tumors and as…
▽ More
Glioblastoma is the most common primary adult brain tumor, with a grim prognosis - median survival of 12-18 months following treatment, and 4 months otherwise. Glioblastoma is widely infiltrative in the cerebral hemispheres and well-defined by heterogeneous molecular and micro-environmental histopathologic profiles, which pose a major obstacle in treatment. Correctly diagnosing these tumors and assessing their heterogeneity is crucial for choosing the precise treatment and potentially enhancing patient survival rates. In the gold-standard histopathology-based approach to tumor diagnosis, detecting various morpho-pathological features of distinct histology throughout digitized tissue sections is crucial. Such "features" include the presence of cellular tumor, geographic necrosis, pseudopalisading necrosis, areas abundant in microvascular proliferation, infiltration into the cortex, wide extension in subcortical white matter, leptomeningeal infiltration, regions dense with macrophages, and the presence of perivascular or scattered lymphocytes. With these features in mind and building upon the main aim of the BraTS Cluster of Challenges https://www.synapse.org/brats2024, the goal of the BraTS-Path challenge is to provide a systematically prepared comprehensive dataset and a benchmarking environment to develop and fairly compare deep-learning models capable of identifying tumor sub-regions of distinct histologic profile. These models aim to further our understanding of the disease and assist in the diagnosis and grading of conditions in a consistent manner.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Focused Active Learning for Histopathological Image Classification
Authors:
Arne Schmidt,
Pablo Morales-Álvarez,
Lee A. D. Cooper,
Lee A. Newberg,
Andinet Enquobahrie,
Aggelos K. Katsaggelos,
Rafael Molina
Abstract:
Active Learning (AL) has the potential to solve a major problem of digital pathology: the efficient acquisition of labeled data for machine learning algorithms. However, existing AL methods often struggle in realistic settings with artifacts, ambiguities, and class imbalances, as commonly seen in the medical field. The lack of precise uncertainty estimations leads to the acquisition of images with…
▽ More
Active Learning (AL) has the potential to solve a major problem of digital pathology: the efficient acquisition of labeled data for machine learning algorithms. However, existing AL methods often struggle in realistic settings with artifacts, ambiguities, and class imbalances, as commonly seen in the medical field. The lack of precise uncertainty estimations leads to the acquisition of images with a low informative value. To address these challenges, we propose Focused Active Learning (FocAL), which combines a Bayesian Neural Network with Out-of-Distribution detection to estimate different uncertainties for the acquisition function. Specifically, the weighted epistemic uncertainty accounts for the class imbalance, aleatoric uncertainty for ambiguous images, and an OoD score for artifacts. We perform extensive experiments to validate our method on MNIST and the real-world Panda dataset for the classification of prostate cancer. The results confirm that other AL methods are 'distracted' by ambiguities and artifacts which harm the performance. FocAL effectively focuses on the most informative images, avoiding ambiguities and artifacts during acquisition. For both experiments, FocAL outperforms existing AL approaches, reaching a Cohen's kappa of 0.764 with only 0.69% of the labeled Panda data.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.
-
MONAI: An open-source framework for deep learning in healthcare
Authors:
M. Jorge Cardoso,
Wenqi Li,
Richard Brown,
Nic Ma,
Eric Kerfoot,
Yiheng Wang,
Benjamin Murrey,
Andriy Myronenko,
Can Zhao,
Dong Yang,
Vishwesh Nath,
Yufan He,
Ziyue Xu,
Ali Hatamizadeh,
Andriy Myronenko,
Wentao Zhu,
Yun Liu,
Mingxin Zheng,
Yucheng Tang,
Isaac Yang,
Michael Zephyr,
Behrooz Hashemian,
Sachidanand Alle,
Mohammad Zalbagi Darestani,
Charlie Budd
, et al. (32 additional authors not shown)
Abstract:
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geo…
▽ More
Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
A Histopathology Study Comparing Contrastive Semi-Supervised and Fully Supervised Learning
Authors:
Lantian Zhang,
Mohamed Amgad,
Lee A. D. Cooper
Abstract:
Data labeling is often the most challenging task when developing computational pathology models. Pathologist participation is necessary to generate accurate labels, and the limitations on pathologist time and demand for large, labeled datasets has led to research in areas including weakly supervised learning using patient-level labels, machine assisted annotation and active learning. In this paper…
▽ More
Data labeling is often the most challenging task when developing computational pathology models. Pathologist participation is necessary to generate accurate labels, and the limitations on pathologist time and demand for large, labeled datasets has led to research in areas including weakly supervised learning using patient-level labels, machine assisted annotation and active learning. In this paper we explore self-supervised learning to reduce labeling burdens in computational pathology. We explore this in the context of classification of breast cancer tissue using the Barlow Twins approach, and we compare self-supervision with alternatives like pre-trained networks in low-data scenarios. For the task explored in this paper, we find that ImageNet pre-trained networks largely outperform the self-supervised representations obtained using Barlow Twins.
△ Less
Submitted 10 November, 2021;
originally announced November 2021.
-
NuCLS: A scalable crowdsourcing, deep learning approach and dataset for nucleus classification, localization and segmentation
Authors:
Mohamed Amgad,
Lamees A. Atteya,
Hagar Hussein,
Kareem Hosny Mohammed,
Ehab Hafiz,
Maha A. T. Elsebaie,
Ahmed M. Alhusseiny,
Mohamed Atef AlMoslemany,
Abdelmagid M. Elmatboly,
Philip A. Pappalardo,
Rokia Adel Sakr,
Pooya Mobadersany,
Ahmad Rachid,
Anas M. Saad,
Ahmad M. Alkashash,
Inas A. Ruhban,
Anas Alrefai,
Nada M. Elgazar,
Ali Abdulkarim,
Abo-Alela Farag,
Amira Etman,
Ahmed G. Elsaeed,
Yahya Alagha,
Yomna A. Amer,
Ahmed M. Raslan
, et al. (12 additional authors not shown)
Abstract:
High-resolution mapping of cells and tissue structures provides a foundation for developing interpretable machine-learning models for computational pathology. Deep learning algorithms can provide accurate mappings given large numbers of labeled instances for training and validation. Generating adequate volume of quality labels has emerged as a critical barrier in computational pathology given the…
▽ More
High-resolution mapping of cells and tissue structures provides a foundation for developing interpretable machine-learning models for computational pathology. Deep learning algorithms can provide accurate mappings given large numbers of labeled instances for training and validation. Generating adequate volume of quality labels has emerged as a critical barrier in computational pathology given the time and effort required from pathologists. In this paper we describe an approach for engaging crowds of medical students and pathologists that was used to produce a dataset of over 220,000 annotations of cell nuclei in breast cancers. We show how suggested annotations generated by a weak algorithm can improve the accuracy of annotations generated by non-experts and can yield useful data for training segmentation algorithms without laborious manual tracing. We systematically examine interrater agreement and describe modifications to the MaskRCNN model to improve cell mapping. We also describe a technique we call Decision Tree Approximation of Learned Embeddings (DTALE) that leverages nucleus segmentations and morphologic features to improve the transparency of nucleus classification models. The annotation data produced in this study are freely available for algorithm development and benchmarking at: https://sites.google.com/view/nucls.
△ Less
Submitted 17 February, 2021;
originally announced February 2021.
-
Learning Clinical Outcomes from Heterogeneous Genomic Data Sources
Authors:
Safoora Yousefi,
Amirreza Shaban,
Mohamed Amgad,
Ramraj Chandradevan,
Lee A. D. Cooper
Abstract:
Translating the vast data generated by genomic platforms into reliable predictions of clinical outcomes remains a critical challenge in realizing the promise of genomic medicine largely due to small number of independent samples. In this paper, we show that neural networks can be trained to predict clinical outcomes using heterogeneous genomic data sources via multi-task learning and adversarial r…
▽ More
Translating the vast data generated by genomic platforms into reliable predictions of clinical outcomes remains a critical challenge in realizing the promise of genomic medicine largely due to small number of independent samples. In this paper, we show that neural networks can be trained to predict clinical outcomes using heterogeneous genomic data sources via multi-task learning and adversarial representation learning, allowing one to combine multiple cohorts and outcomes in training. We compare our proposed method to two baselines and demonstrate that it can be used to help mitigate the data scarcity and clinical outcome censorship in cancer genomics learning problems.
△ Less
Submitted 2 April, 2019;
originally announced April 2019.
-
High-throughput Execution of Hierarchical Analysis Pipelines on Hybrid Cluster Platforms
Authors:
George Teodoro,
Tony Pan,
Tahsin M. Kurc,
Jun Kong,
Lee A. D. Cooper,
Joel H. Saltz
Abstract:
We propose, implement, and experimentally evaluate a runtime middleware to support high-throughput execution on hybrid cluster machines of large-scale analysis applications. A hybrid cluster machine consists of computation nodes which have multiple CPUs and general purpose graphics processing units (GPUs). Our work targets scientific analysis applications in which datasets are processed in applica…
▽ More
We propose, implement, and experimentally evaluate a runtime middleware to support high-throughput execution on hybrid cluster machines of large-scale analysis applications. A hybrid cluster machine consists of computation nodes which have multiple CPUs and general purpose graphics processing units (GPUs). Our work targets scientific analysis applications in which datasets are processed in application-specific data chunks, and the processing of a data chunk is expressed as a hierarchical pipeline of operations. The proposed middleware system combines a bag-of-tasks style execution with coarse-grain dataflow execution. Data chunks and associated data processing pipelines are scheduled across cluster nodes using a demand driven approach, while within a node operations in a given pipeline instance are scheduled across CPUs and GPUs. The runtime system implements several optimizations, including performance aware task scheduling, architecture aware process placement, data locality conscious task assignment, and data prefetching and asynchronous data copy, to maximize utilization of the aggregate computing power of CPUs and GPUs and minimize data copy overheads. The application and performance benefits of the runtime middleware are demonstrated using an image analysis application, which is employed in a brain cancer study, on a state-of-the-art hybrid cluster in which each node has two 6-core CPUs and three GPUs. Our results show that implementing and scheduling application data processing as a set of fine-grain operations provide more opportunities for runtime optimizations and attain better performance than a coarser-grain, monolithic implementation. The proposed runtime system can achieve high-throughput processing of large datasets - we were able to process an image dataset consisting of 36,848 4Kx4K-pixel image tiles at about 150 tiles/second rate on 100 nodes.
△ Less
Submitted 14 September, 2012;
originally announced September 2012.