-
LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model
Authors:
Duy M. H. Nguyen,
Nghiem T. Diep,
Trung Q. Nguyen,
Hoang-Bao Le,
Tai Nguyen,
Tien Nguyen,
TrungTin Nguyen,
Nhat Ho,
Pengtao Xie,
Roger Wattenhofer,
James Zhou,
Daniel Sonntag,
Mathias Niepert
Abstract:
State-of-the-art medical multi-modal large language models (med-MLLM), like LLaVA-Med or BioMedGPT, leverage instruction-following data in pre-training. However, those models primarily focus on scaling the model size and data volume to boost performance while mainly relying on the autoregressive learning objectives. Surprisingly, we reveal that such learning schemes might result in a weak alignmen…
▽ More
State-of-the-art medical multi-modal large language models (med-MLLM), like LLaVA-Med or BioMedGPT, leverage instruction-following data in pre-training. However, those models primarily focus on scaling the model size and data volume to boost performance while mainly relying on the autoregressive learning objectives. Surprisingly, we reveal that such learning schemes might result in a weak alignment between vision and language modalities, making these models highly reliant on extensive pre-training datasets - a significant challenge in medical domains due to the expensive and time-consuming nature of curating high-quality instruction-following instances. We address this with LoGra-Med, a new multi-graph alignment algorithm that enforces triplet correlations across image modalities, conversation-based descriptions, and extended captions. This helps the model capture contextual meaning, handle linguistic variability, and build cross-modal associations between visuals and text. To scale our approach, we designed an efficient end-to-end learning scheme using black-box gradient estimation, enabling faster LLaMa 7B training. Our results show LoGra-Med matches LLAVA-Med performance on 600K image-text pairs for Medical VQA and significantly outperforms it when trained on 10% of the data. For example, on VQA-RAD, we exceed LLAVA-Med by 20.13% and nearly match the 100% pre-training score (72.52% vs. 72.64%). We also surpass SOTA methods like BiomedGPT on visual chatbots and RadFM on zero-shot image classification with VQA, highlighting the effectiveness of multi-graph alignment.
△ Less
Submitted 6 October, 2024; v1 submitted 3 October, 2024;
originally announced October 2024.
-
Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model
Authors:
Duy M. H. Nguyen,
An T. Le,
Trung Q. Nguyen,
Nghiem T. Diep,
Tai Nguyen,
Duy Duong-Tran,
Jan Peters,
Li Shen,
Mathias Niepert,
Daniel Sonntag
Abstract:
Prompt learning methods are gaining increasing attention due to their ability to customize large vision-language models to new domains using pre-trained contextual knowledge and minimal training data. However, existing works typically rely on optimizing unified prompt inputs, often struggling with fine-grained classification tasks due to insufficient discriminative attributes. To tackle this, we c…
▽ More
Prompt learning methods are gaining increasing attention due to their ability to customize large vision-language models to new domains using pre-trained contextual knowledge and minimal training data. However, existing works typically rely on optimizing unified prompt inputs, often struggling with fine-grained classification tasks due to insufficient discriminative attributes. To tackle this, we consider a new framework based on a dual context of both domain-shared and class-specific contexts, where the latter is generated by Large Language Models (LLMs) such as GPTs. Such dual prompt methods enhance the model's feature representation by joining implicit and explicit factors encoded in LLM knowledge. Moreover, we formulate the Unbalanced Optimal Transport (UOT) theory to quantify the relationships between constructed prompts and visual tokens. Through partial matching, UOT can properly align discrete sets of visual tokens and prompt embeddings under different mass distributions, which is particularly valuable for handling irrelevant or noisy elements, ensuring that the preservation of mass does not restrict transport solutions. Furthermore, UOT's characteristics integrate seamlessly with image augmentation, expanding the training sample pool while maintaining a reasonable distance between perturbed images and prompt inputs. Extensive experiments across few-shot classification and adapter settings substantiate the superiority of our model over current state-of-the-art baselines.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
I-MPN: Inductive Message Passing Network for Efficient Human-in-the-Loop Annotation of Mobile Eye Tracking Data
Authors:
Hoang H. Le,
Duy M. H. Nguyen,
Omair Shahzad Bhatti,
Laszlo Kopacsi,
Thinh P. Ngo,
Binh T. Nguyen,
Michael Barz,
Daniel Sonntag
Abstract:
Comprehending how humans process visual information in dynamic settings is crucial for psychology and designing user-centered interactions. While mobile eye-tracking systems combining egocentric video and gaze signals can offer valuable insights, manual analysis of these recordings is time-intensive. In this work, we present a novel human-centered learning algorithm designed for automated object r…
▽ More
Comprehending how humans process visual information in dynamic settings is crucial for psychology and designing user-centered interactions. While mobile eye-tracking systems combining egocentric video and gaze signals can offer valuable insights, manual analysis of these recordings is time-intensive. In this work, we present a novel human-centered learning algorithm designed for automated object recognition within mobile eye-tracking settings. Our approach seamlessly integrates an object detector with a spatial relation-aware inductive message-passing network (I-MPN), harnessing node profile information and capturing object correlations. Such mechanisms enable us to learn embedding functions capable of generalizing to new object angle views, facilitating rapid adaptation and efficient reasoning in dynamic contexts as users navigate their environment. Through experiments conducted on three distinct video sequences, our interactive-based method showcases significant performance improvements over fixed training/testing algorithms, even when trained on considerably smaller annotated samples collected through user feedback. Furthermore, we demonstrate exceptional efficiency in data annotation processes and surpass prior interactive methods that use complete object detectors, combine detectors with convolutional networks, or employ interactive video segmentation.
△ Less
Submitted 7 July, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
Accelerating Transformers with Spectrum-Preserving Token Merging
Authors:
Hoai-Chau Tran,
Duy M. H. Nguyen,
Duy M. Nguyen,
Trung-Tin Nguyen,
Ngan Le,
Pengtao Xie,
Daniel Sonntag,
James Y. Zou,
Binh T. Nguyen,
Mathias Niepert
Abstract:
Increasing the throughput of the Transformer architecture, a foundational component used in numerous state-of-the-art models for vision and language tasks (e.g., GPT, LLaVa), is an important problem in machine learning. One recent and effective strategy is to merge token representations within Transformer models, aiming to reduce computational and memory requirements while maintaining accuracy. Pr…
▽ More
Increasing the throughput of the Transformer architecture, a foundational component used in numerous state-of-the-art models for vision and language tasks (e.g., GPT, LLaVa), is an important problem in machine learning. One recent and effective strategy is to merge token representations within Transformer models, aiming to reduce computational and memory requirements while maintaining accuracy. Prior works have proposed algorithms based on Bipartite Soft Matching (BSM), which divides tokens into distinct sets and merges the top k similar tokens. However, these methods have significant drawbacks, such as sensitivity to token-splitting strategies and damage to informative tokens in later layers. This paper presents a novel paradigm called PiToMe, which prioritizes the preservation of informative tokens using an additional metric termed the energy score. This score identifies large clusters of similar tokens as high-energy, indicating potential candidates for merging, while smaller (unique and isolated) clusters are considered as low-energy and preserved. Experimental findings demonstrate that PiToMe saved from 40-60\% FLOPs of the base models while exhibiting superior off-the-shelf performance on image classification (0.5\% average performance drop of ViT-MAE-H compared to 2.6\% as baselines), image-text retrieval (0.3\% average performance drop of CLIP on Flickr30k compared to 4.5\% as others), and analogously in visual questions answering with LLaVa-7B. Furthermore, PiToMe is theoretically shown to preserve intrinsic spectral properties of the original token space under mild conditions
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Structure-Aware E(3)-Invariant Molecular Conformer Aggregation Networks
Authors:
Duy M. H. Nguyen,
Nina Lukashina,
Tai Nguyen,
An T. Le,
TrungTin Nguyen,
Nhat Ho,
Jan Peters,
Daniel Sonntag,
Viktor Zaverkin,
Mathias Niepert
Abstract:
A molecule's 2D representation consists of its atoms, their attributes, and the molecule's covalent bonds. A 3D (geometric) representation of a molecule is called a conformer and consists of its atom types and Cartesian coordinates. Every conformer has a potential energy, and the lower this energy, the more likely it occurs in nature. Most existing machine learning methods for molecular property p…
▽ More
A molecule's 2D representation consists of its atoms, their attributes, and the molecule's covalent bonds. A 3D (geometric) representation of a molecule is called a conformer and consists of its atom types and Cartesian coordinates. Every conformer has a potential energy, and the lower this energy, the more likely it occurs in nature. Most existing machine learning methods for molecular property prediction consider either 2D molecular graphs or 3D conformer structure representations in isolation. Inspired by recent work on using ensembles of conformers in conjunction with 2D graph representations, we propose $\mathrm{E}$(3)-invariant molecular conformer aggregation networks. The method integrates a molecule's 2D representation with that of multiple of its conformers. Contrary to prior work, we propose a novel 2D-3D aggregation mechanism based on a differentiable solver for the Fused Gromov-Wasserstein Barycenter problem and the use of an efficient conformer generation method based on distance geometry. We show that the proposed aggregation mechanism is $\mathrm{E}$(3) invariant and propose an efficient GPU implementation. Moreover, we demonstrate that the aggregation mechanism helps to significantly outperform state-of-the-art molecule property prediction methods on established datasets.
△ Less
Submitted 19 August, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation
Authors:
Duy Minh Ho Nguyen,
Tan Ngoc Pham,
Nghiem Tuong Diep,
Nghi Quoc Phan,
Quang Pham,
Vinh Tong,
Binh T. Nguyen,
Ngan Hoang Le,
Nhat Ho,
Pengtao Xie,
Daniel Sonntag,
Mathias Niepert
Abstract:
Constructing a robust model that can effectively generalize to test samples under distribution shifts remains a significant challenge in the field of medical imaging. The foundational models for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. It showcases impressive learning abilities across different tasks with the need for…
▽ More
Constructing a robust model that can effectively generalize to test samples under distribution shifts remains a significant challenge in the field of medical imaging. The foundational models for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. It showcases impressive learning abilities across different tasks with the need for only a limited amount of annotated samples. While numerous techniques have focused on developing better fine-tuning strategies to adapt these models for specific domains, we instead examine their robustness to domain shifts in the medical image segmentation task. To this end, we compare the generalization performance to unseen domains of various pre-trained models after being fine-tuned on the same in-distribution dataset and show that foundation-based models enjoy better robustness than other architectures. From here, we further developed a new Bayesian uncertainty estimation for frozen models and used them as an indicator to characterize the model's performance on out-of-distribution (OOD) data, proving particularly beneficial for real-world applications. Our experiments not only reveal the limitations of current indicators like accuracy on the line or agreement on the line commonly used in natural image applications but also emphasize the promise of the introduced Bayesian uncertainty. Specifically, lower uncertainty predictions usually tend to higher out-of-distribution (OOD) performance.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching
Authors:
Duy M. H. Nguyen,
Hoang Nguyen,
Nghiem T. Diep,
Tan N. Pham,
Tri Cao,
Binh T. Nguyen,
Paul Swoboda,
Nhat Ho,
Shadi Albarqouni,
Pengtao Xie,
Daniel Sonntag,
Mathias Niepert
Abstract:
Obtaining large pre-trained models that can be fine-tuned to new tasks with limited annotated samples has remained an open challenge for medical imaging data. While pre-trained deep networks on ImageNet and vision-language foundation models trained on web-scale data are prevailing approaches, their effectiveness on medical tasks is limited due to the significant domain shift between natural and me…
▽ More
Obtaining large pre-trained models that can be fine-tuned to new tasks with limited annotated samples has remained an open challenge for medical imaging data. While pre-trained deep networks on ImageNet and vision-language foundation models trained on web-scale data are prevailing approaches, their effectiveness on medical tasks is limited due to the significant domain shift between natural and medical images. To bridge this gap, we introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets, covering a large number of organs and modalities such as CT, MRI, X-ray, and Ultrasound. We benchmark several state-of-the-art self-supervised algorithms on this dataset and propose a novel self-supervised contrastive learning algorithm using a graph-matching formulation. The proposed approach makes three contributions: (i) it integrates prior pair-wise image similarity metrics based on local and global information; (ii) it captures the structural constraints of feature embeddings through a loss function constructed via a combinatorial graph-matching objective; and (iii) it can be trained efficiently end-to-end using modern gradient-estimation techniques for black-box solvers. We thoroughly evaluate the proposed LVM-Med on 15 downstream medical tasks ranging from segmentation and classification to object detection, and both for the in and out-of-distribution settings. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models. For challenging tasks such as Brain Tumor Classification or Diabetic Retinopathy Grading, LVM-Med improves previous vision-language models trained on 1 billion masks by 6-7% while using only a ResNet-50.
△ Less
Submitted 18 November, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Integrable Systems Arising from Separation of Variables on $S^{3}$
Authors:
Diana M. H. Nguyen,
Sean R. Dawson,
Holger R. Dullin
Abstract:
We show that the space of orthogonally separable coordinates on the sphere $S^3$ induces a natural family of integrable systems, which after symplectic reduction leads to a family of integrable systems on $S^2 \times S^2$. The generic member of the family corresponds to ellipsoidal coordinates. We use the theory of compatible Poisson structures to study the critical points and critical values of t…
▽ More
We show that the space of orthogonally separable coordinates on the sphere $S^3$ induces a natural family of integrable systems, which after symplectic reduction leads to a family of integrable systems on $S^2 \times S^2$. The generic member of the family corresponds to ellipsoidal coordinates. We use the theory of compatible Poisson structures to study the critical points and critical values of the momentum map. Interesting structure arises because the ellipsoidal coordinate system can degenerate in a variety of ways, and all possible orthogonally separable coordinate systems on $S^3$ (including degenerations) have the topology of the Stasheff polytope $K^4$, which is a pentagon. We describe how the generic integrable system degenerates, and how the appearance of global $SO(2)$ and $SO(3)$ symmetries is the main feature that organises the various degenerate systems. For the whole family we show that there is an action map whose image is an equilateral triangle. When higher symmetry is present, this triangle ``unfolds'' into a semi-toric polygon (when there is one global $S^1$-action) or a Delzant polygon (when there are two global $S^1$-actions). We believe that this family of integrable systems is a natural playground for theories of global symplectic classification of integrable systems.
△ Less
Submitted 26 February, 2023;
originally announced February 2023.
-
DRG-Net: Interactive Joint Learning of Multi-lesion Segmentation and Classification for Diabetic Retinopathy Grading
Authors:
Hasan Md Tusfiqur,
Duy M. H. Nguyen,
Mai T. N. Truong,
Triet A. Nguyen,
Binh T. Nguyen,
Michael Barz,
Hans-Juergen Profitlich,
Ngoc T. T. Than,
Ngan Le,
Pengtao Xie,
Daniel Sonntag
Abstract:
Diabetic Retinopathy (DR) is a leading cause of vision loss in the world, and early DR detection is necessary to prevent vision loss and support an appropriate treatment. In this work, we leverage interactive machine learning and introduce a joint learning framework, termed DRG-Net, to effectively learn both disease grading and multi-lesion segmentation. Our DRG-Net consists of two modules: (i) DR…
▽ More
Diabetic Retinopathy (DR) is a leading cause of vision loss in the world, and early DR detection is necessary to prevent vision loss and support an appropriate treatment. In this work, we leverage interactive machine learning and introduce a joint learning framework, termed DRG-Net, to effectively learn both disease grading and multi-lesion segmentation. Our DRG-Net consists of two modules: (i) DRG-AI-System to classify DR Grading, localize lesion areas, and provide visual explanations; (ii) DRG-Expert-Interaction to receive feedback from user-expert and improve the DRG-AI-System. To deal with sparse data, we utilize transfer learning mechanisms to extract invariant feature representations by using Wasserstein distance and adversarial learning-based entropy minimization. Besides, we propose a novel attention strategy at both low- and high-level features to automatically select the most significant lesion information and provide explainable properties. In terms of human interaction, we further develop DRG-Net as a tool that enables expert users to correct the system's predictions, which may then be used to update the system as a whole. Moreover, thanks to the attention mechanism and loss functions constraint between lesion features and classification features, our approach can be robust given a certain level of noise in the feedback of users. We have benchmarked DRG-Net on the two largest DR datasets, i.e., IDRID and FGADR, and compared it to various state-of-the-art deep learning networks. In addition to outperforming other SOTA approaches, DRG-Net is effectively updated using user feedback, even in a weakly-supervised manner.
△ Less
Submitted 30 December, 2022;
originally announced December 2022.
-
Joint Self-Supervised Image-Volume Representation Learning with Intra-Inter Contrastive Clustering
Authors:
Duy M. H. Nguyen,
Hoang Nguyen,
Mai T. N. Truong,
Tri Cao,
Binh T. Nguyen,
Nhat Ho,
Paul Swoboda,
Shadi Albarqouni,
Pengtao Xie,
Daniel Sonntag
Abstract:
Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most current SSL techniques in the medical field have…
▽ More
Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most current SSL techniques in the medical field have been designed for either 2D images or 3D volumes. In practice, this restricts the capability to fully leverage unlabeled data from numerous sources, which may include both 2D and 3D data. Additionally, the use of these pre-trained networks is constrained to downstream tasks with compatible data dimensions. In this paper, we propose a novel framework for unsupervised joint learning on 2D and 3D data modalities. Given a set of 2D images or 2D slices extracted from 3D volumes, we construct an SSL task based on a 2D contrastive clustering problem for distinct classes. The 3D volumes are exploited by computing vectored embedding at each slice and then assembling a holistic feature through deformable self-attention mechanisms in Transformer, allowing incorporating long-range dependencies between slices inside 3D volumes. These holistic features are further utilized to define a novel 3D clustering agreement-based SSL task and masking embedding prediction inspired by pre-trained language models. Experiments on downstream tasks, such as 3D brain segmentation, lung nodule detection, 3D heart structures segmentation, and abnormal chest X-ray detection, demonstrate the effectiveness of our joint 2D and 3D SSL approach. We improve plain 2D Deep-ClusterV2 and SwAV by a significant margin and also surpass various modern 2D and 3D SSL approaches.
△ Less
Submitted 4 December, 2022;
originally announced December 2022.
-
The Harmonic Lagrange Top and the Confluent Heun Equation
Authors:
Sean R. Dawson,
Holger R. Dullin,
Diana M. H. Nguyen
Abstract:
The harmonic Lagrange top is the Lagrange top plus a quadratic (harmonic) potential term. We describe the top in the space fixed frame using a global description with a Poisson structure on $T^*S^3$. This global description naturally leads to a rational parametrisation of the set of critical values of the energy-momentum map. We show that there are 4 different topological types for generic paramet…
▽ More
The harmonic Lagrange top is the Lagrange top plus a quadratic (harmonic) potential term. We describe the top in the space fixed frame using a global description with a Poisson structure on $T^*S^3$. This global description naturally leads to a rational parametrisation of the set of critical values of the energy-momentum map. We show that there are 4 different topological types for generic parameter values. The quantum mechanics of the harmonic Lagrange top is described by the most general confluent Heun equation (also known as the generalised spheroidal wave equation). We derive formulas for an infinite pentadiagonal symmetric matrix representing the Hamiltonian from which the spectrum is computed.
△ Less
Submitted 10 December, 2021; v1 submitted 28 November, 2021;
originally announced November 2021.
-
LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object Tracking
Authors:
Duy M. H. Nguyen,
Roberto Henschel,
Bodo Rosenhahn,
Daniel Sonntag,
Paul Swoboda
Abstract:
Multi-Camera Multi-Object Tracking is currently drawing attention in the computer vision field due to its superior performance in real-world applications such as video surveillance in crowded scenes or in wide spaces. In this work, we propose a mathematically elegant multi-camera multiple object tracking approach based on a spatial-temporal lifted multicut formulation. Our model utilizes state-of-…
▽ More
Multi-Camera Multi-Object Tracking is currently drawing attention in the computer vision field due to its superior performance in real-world applications such as video surveillance in crowded scenes or in wide spaces. In this work, we propose a mathematically elegant multi-camera multiple object tracking approach based on a spatial-temporal lifted multicut formulation. Our model utilizes state-of-the-art tracklets produced by single-camera trackers as proposals. As these tracklets may contain ID-Switch errors, we refine them through a novel pre-clustering obtained from 3D geometry projections. As a result, we derive a better tracking graph without ID switches and more precise affinity costs for the data association phase. Tracklets are then matched to multi-camera trajectories by solving a global lifted multicut formulation that incorporates short and long-range temporal interactions on tracklets located in the same camera as well as inter-camera ones. Experimental results on the WildTrack dataset yield near-perfect performance, outperforming state-of-the-art trackers on Campus while being on par on the PETS-09 dataset.
△ Less
Submitted 3 May, 2022; v1 submitted 23 November, 2021;
originally announced November 2021.
-
Self-Supervised Domain Adaptation for Diabetic Retinopathy Grading using Vessel Image Reconstruction
Authors:
Duy M. H. Nguyen,
Truong T. N. Mai,
Ngoc T. T. Than,
Alexander Prange,
Daniel Sonntag
Abstract:
This paper investigates the problem of domain adaptation for diabetic retinopathy (DR) grading. We learn invariant target-domain features by defining a novel self-supervised task based on retinal vessel image reconstructions, inspired by medical domain knowledge. Then, a benchmark of current state-of-the-art unsupervised domain adaptation methods on the DR problem is provided. It can be shown that…
▽ More
This paper investigates the problem of domain adaptation for diabetic retinopathy (DR) grading. We learn invariant target-domain features by defining a novel self-supervised task based on retinal vessel image reconstructions, inspired by medical domain knowledge. Then, a benchmark of current state-of-the-art unsupervised domain adaptation methods on the DR problem is provided. It can be shown that our approach outperforms existing domain adaption strategies. Furthermore, when utilizing entire training data in the target domain, we are able to compete with several state-of-the-art approaches in final classification accuracy just by applying standard network architectures and using image-level labels.
△ Less
Submitted 20 July, 2021;
originally announced July 2021.
-
TATL: Task Agnostic Transfer Learning for Skin Attributes Detection
Authors:
Duy M. H. Nguyen,
Thu T. Nguyen,
Huong Vu,
Quang Pham,
Manh-Duy Nguyen,
Binh T. Nguyen,
Daniel Sonntag
Abstract:
Existing skin attributes detection methods usually initialize with a pre-trained Imagenet network and then fine-tune on a medical target task. However, we argue that such approaches are suboptimal because medical datasets are largely different from ImageNet and often contain limited training samples. In this work, we propose \emph{Task Agnostic Transfer Learning (TATL)}, a novel framework motivate…
▽ More
Existing skin attributes detection methods usually initialize with a pre-trained Imagenet network and then fine-tune on a medical target task. However, we argue that such approaches are suboptimal because medical datasets are largely different from ImageNet and often contain limited training samples. In this work, we propose \emph{Task Agnostic Transfer Learning (TATL)}, a novel framework motivated by dermatologists' behaviors in the skincare context. TATL learns an attribute-agnostic segmenter that detects lesion skin regions and then transfers this knowledge to a set of attribute-specific classifiers to detect each particular attribute. Since TATL's attribute-agnostic segmenter only detects skin attribute regions, it enjoys ample data from all attributes, allows transferring knowledge among features, and compensates for the lack of training data from rare attributes. We conduct extensive experiments to evaluate the proposed TATL transfer learning mechanism with various neural network architectures on two popular skin attributes detection benchmarks. The empirical results show that TATL not only works well with multiple architectures but also can achieve state-of-the-art performances while enjoying minimal model and computational complexities. We also provide theoretical insights and explanations for why our transfer learning framework performs well in practice.
△ Less
Submitted 27 January, 2022; v1 submitted 4 April, 2021;
originally announced April 2021.
-
An Attention Mechanism with Multiple Knowledge Sources for COVID-19 Detection from CT Images
Authors:
Duy M. H. Nguyen,
Duy M. Nguyen,
Huong Vu,
Binh T. Nguyen,
Fabrizio Nunnari,
Daniel Sonntag
Abstract:
Until now, Coronavirus SARS-CoV-2 has caused more than 850,000 deaths and infected more than 27 million individuals in over 120 countries. Besides principal polymerase chain reaction (PCR) tests, automatically identifying positive samples based on computed tomography (CT) scans can present a promising option in the early diagnosis of COVID-19. Recently, there have been increasing efforts to utiliz…
▽ More
Until now, Coronavirus SARS-CoV-2 has caused more than 850,000 deaths and infected more than 27 million individuals in over 120 countries. Besides principal polymerase chain reaction (PCR) tests, automatically identifying positive samples based on computed tomography (CT) scans can present a promising option in the early diagnosis of COVID-19. Recently, there have been increasing efforts to utilize deep networks for COVID-19 diagnosis based on CT scans. While these approaches mostly focus on introducing novel architectures, transfer learning techniques, or construction large scale data, we propose a novel strategy to improve the performance of several baselines by leveraging multiple useful information sources relevant to doctors' judgments. Specifically, infected regions and heat maps extracted from learned networks are integrated with the global image via an attention mechanism during the learning process. This procedure not only makes our system more robust to noise but also guides the network focusing on local lesion areas. Extensive experiments illustrate the superior performance of our approach compared to recent baselines. Furthermore, our learned network guidance presents an explainable feature to doctors as we can understand the connection between input and output in a grey-box model.
△ Less
Submitted 1 December, 2020; v1 submitted 23 September, 2020;
originally announced September 2020.
-
Monodromy in Prolate Spheroidal Harmonics
Authors:
Sean R. Dawson,
Holger R. Dullin,
Diana M. H. Nguyen
Abstract:
We show that spheroidal wave functions viewed as the essential part of the joint eigenfunction of two commuting operators of $L_2(S^2)$ has a defect in the joint spectrum that makes a global labelling of the joint eigenfunctions by quantum numbers impossible. To our knowledge this is the first explicit demonstration that quantum monodromy exists in a class of classically known special functions. U…
▽ More
We show that spheroidal wave functions viewed as the essential part of the joint eigenfunction of two commuting operators of $L_2(S^2)$ has a defect in the joint spectrum that makes a global labelling of the joint eigenfunctions by quantum numbers impossible. To our knowledge this is the first explicit demonstration that quantum monodromy exists in a class of classically known special functions. Using an analogue of the Laplace-Runge-Lenz vector we show that the corresponding classical Liouville integrable system is symplectically equivalent to the C. Neumann system. To prove the existence of this defect we construct a classical integrable system that is the semi-classical limit of the quantum integrable system of commuting operators. We show that this is a semi-toric system with a non-degenerate focus-focus point, such that there is monodromy in the classical and the quantum system.
△ Less
Submitted 30 January, 2020;
originally announced January 2020.
-
The Lie-Poisson Structure of the Symmetry Reduced Regularised n-Body Problem
Authors:
Suntharan Arunasalam,
Holger R. Dullin,
Diana M. H. Nguyen
Abstract:
This paper investigates the symmetry reduction of the regularised n-body problem. The three body problem, regularised through quaternions, is examined in detail. We show that for a suitably chosen symmetry group action the space of quadratic invariants is closed and the Hamiltonian can be written in terms of the quadratic invariants. The corresponding Lie-Poisson structure is isomorphic to the Lie…
▽ More
This paper investigates the symmetry reduction of the regularised n-body problem. The three body problem, regularised through quaternions, is examined in detail. We show that for a suitably chosen symmetry group action the space of quadratic invariants is closed and the Hamiltonian can be written in terms of the quadratic invariants. The corresponding Lie-Poisson structure is isomorphic to the Lie algebra u(3,3). Finally, we generalise this result to the n-body problem for n>3.
△ Less
Submitted 13 August, 2014;
originally announced August 2014.