subscribe to arXiv mailings

Stratified Domain Adaptation: A Progressive Self-Training Approach for Scene Text Recognition

Authors: Kha Nhat Le, Hoang-Tuan Nguyen, Hung Tien Tran, Thanh Duc Ngo

Abstract: Unsupervised domain adaptation (UDA) has become increasingly prevalent in scene text recognition (STR), especially where training and testing data reside in different domains. The efficacy of existing UDA approaches tends to degrade when there is a large gap between the source and target domains. To deal with this problem, gradually shifting or progressively learning to shift from domain to domain… ▽ More Unsupervised domain adaptation (UDA) has become increasingly prevalent in scene text recognition (STR), especially where training and testing data reside in different domains. The efficacy of existing UDA approaches tends to degrade when there is a large gap between the source and target domains. To deal with this problem, gradually shifting or progressively learning to shift from domain to domain is the key issue. In this paper, we introduce the Stratified Domain Adaptation (StrDA) approach, which examines the gradual escalation of the domain gap for the learning process. The objective is to partition the training data into subsets so that the progressively self-trained model can adapt to gradual changes. We stratify the training data by evaluating the proximity of each data sample to both the source and target domains. We propose a novel method for employing domain discriminators to estimate the out-of-distribution and domain discriminative levels of data samples. Extensive experiments on benchmark scene-text datasets show that our approach significantly improves the performance of baseline (source-trained) STR models. △ Less

Submitted 17 October, 2024; v1 submitted 13 October, 2024; originally announced October 2024.

Comments: 15 pages, 12 figures, 5 tables, include supplementary materials

arXiv:2409.14700 [pdf, other]

Adaptive and Robust Watermark for Generative Tabular Data

Authors: Dung Daniel Ngo, Daniel Scott, Saheed Obitayo, Vamsi K. Potluru, Manuela Veloso

Abstract: Recent developments in generative models have demonstrated its ability to create high-quality synthetic data. However, the pervasiveness of synthetic content online also brings forth growing concerns that it can be used for malicious purposes. To ensure the authenticity of the data, watermarking techniques have recently emerged as a promising solution due to their strong statistical guarantees. In… ▽ More Recent developments in generative models have demonstrated its ability to create high-quality synthetic data. However, the pervasiveness of synthetic content online also brings forth growing concerns that it can be used for malicious purposes. To ensure the authenticity of the data, watermarking techniques have recently emerged as a promising solution due to their strong statistical guarantees. In this paper, we propose a flexible and robust watermarking mechanism for generative tabular data. Specifically, a data provider with knowledge of the downstream tasks can partition the feature space into pairs of $(key, value)$ columns. Within each pair, the data provider first uses elements in the $key$ column to generate a randomized set of ''green'' intervals, then encourages elements of the $value$ column to be in one of these ''green'' intervals. We show theoretically and empirically that the watermarked datasets (i) have negligible impact on the data quality and downstream utility, (ii) can be efficiently detected, and (iii) are robust against multiple attacks commonly observed in data science. △ Less

Submitted 23 September, 2024; originally announced September 2024.

Comments: 12 pages of main body, 2 figures, 5 tables

arXiv:2408.16408 [pdf]

High-yield large-scale suspended graphene membranes over closed cavities for sensor applications

Authors: Sebastian Lukas, Ardeshir Esteki, Nico Rademacher, Vikas Jangra, Michael Gross, Zhenxing Wang, Ha Duong Ngo, Manuel Bäuscher, Piotr Mackowiak, Katrin Höppner, Dominique Wehenkel, Richard van Rijn, Max C. Lemme

Abstract: Suspended membranes of monoatomic graphene exhibit great potential for applications in electronic and nanoelectromechanical devices. In this work, a "hot and dry" transfer process is demonstrated to address the fabrication and patterning challenges of large-area graphene membranes on top of closed, sealed cavities. Here, "hot" refers to the use of high temperature during transfer, promoting the ad… ▽ More Suspended membranes of monoatomic graphene exhibit great potential for applications in electronic and nanoelectromechanical devices. In this work, a "hot and dry" transfer process is demonstrated to address the fabrication and patterning challenges of large-area graphene membranes on top of closed, sealed cavities. Here, "hot" refers to the use of high temperature during transfer, promoting the adhesion. Additionally, "dry" refers to the absence of liquids when graphene and target substrate are brought into contact. The method leads to higher yields of intact suspended monolayer CVD graphene and artificially stacked double-layer CVD graphene membranes than previously reported. The yield evaluation is performed using neural-network-based object detection in SEM images, ascertaining high yields of intact membranes with large statistical accuracy. The suspended membranes are examined by Raman tomography and AFM. The method is verified by applying the suspended graphene devices as piezoresistive pressure sensors. Our technology advances the application of suspended graphene membranes and can be extended to other two-dimensional (2D) materials. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: 30 pages of manuscript plus 17 pages of Supporting Information

arXiv:2407.01963 [pdf, other]

Towards Unsupervised Speaker Diarization System for Multilingual Telephone Calls Using Pre-trained Whisper Model and Mixture of Sparse Autoencoders

Authors: Phat Lam, Lam Pham, Truong Nguyen, Dat Ngo, Thinh Pham, Tin Nguyen, Loi Khanh Nguyen, Alexander Schindler

Abstract: Existing speaker diarization systems typically rely on large amounts of manually annotated data, which is labor-intensive and difficult to obtain, especially in real-world scenarios. Additionally, language-specific constraints in these systems significantly hinder their effectiveness and scalability in multilingual settings. In this paper, we propose a cluster-based speaker diarization system desi… ▽ More Existing speaker diarization systems typically rely on large amounts of manually annotated data, which is labor-intensive and difficult to obtain, especially in real-world scenarios. Additionally, language-specific constraints in these systems significantly hinder their effectiveness and scalability in multilingual settings. In this paper, we propose a cluster-based speaker diarization system designed for multilingual telephone call applications. Our proposed system supports multiple languages and eliminates the need for large-scale annotated data during training by utilizing the multilingual Whisper model to extract speaker embeddings. Additionally, we introduce a network architecture called Mixture of Sparse Autoencoders (Mix-SAE) for unsupervised speaker clustering. Experimental results on the evaluation dataset derived from two-speaker subsets of benchmark CALLHOME and CALLFRIEND telephonic speech corpora demonstrate the superior performance of the proposed Mix-SAE network to other autoencoder-based clustering methods. The overall performance of our proposed system also highlights the promising potential for developing unsupervised, multilingual speaker diarization systems within the context of limited annotated data. It also indicates the system's capability for integration into multi-task speech analysis applications based on general-purpose models such as those that combine speech-to-text, language detection, and speaker diarization. △ Less

Submitted 12 September, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

Comments: Preprint, 14 pages, 6 figures

arXiv:2405.19667 [pdf, other]

Reconciling Model Multiplicity for Downstream Decision Making

Authors: Ally Yalei Du, Dung Daniel Ngo, Zhiwei Steven Wu

Abstract: We consider the problem of model multiplicity in downstream decision-making, a setting where two predictive models of equivalent accuracy cannot agree on the best-response action for a downstream loss function. We show that even when the two predictive models approximately agree on their individual predictions almost everywhere, it is still possible for their induced best-response actions to diffe… ▽ More We consider the problem of model multiplicity in downstream decision-making, a setting where two predictive models of equivalent accuracy cannot agree on the best-response action for a downstream loss function. We show that even when the two predictive models approximately agree on their individual predictions almost everywhere, it is still possible for their induced best-response actions to differ on a substantial portion of the population. We address this issue by proposing a framework that calibrates the predictive models with regard to both the downstream decision-making problem and the individual probability prediction. Specifically, leveraging tools from multi-calibration, we provide an algorithm that, at each time-step, first reconciles the differences in individual probability prediction, then calibrates the updated models such that they are indistinguishable from the true probability distribution to the decision-maker. We extend our results to the setting where one does not have direct access to the true probability distribution and instead relies on a set of i.i.d data to be the empirical distribution. Finally, we provide a set of experiments to empirically evaluate our methods: compared to existing work, our proposed algorithm creates a pair of predictive models with both improved downstream decision-making losses and agrees on their best-response actions almost everywhere. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 16 pages main body, 6 figures

arXiv:2405.08843 [pdf, other]

FLEXIBLE: Forecasting Cellular Traffic by Leveraging Explicit Inductive Graph-Based Learning

Authors: Duc Thinh Ngo, Kandaraj Piamrat, Ons Aouedi, Thomas Hassan, Philippe Raipin-Parvédy

Abstract: From a telecommunication standpoint, the surge in users and services challenges next-generation networks with escalating traffic demands and limited resources. Accurate traffic prediction can offer network operators valuable insights into network conditions and suggest optimal allocation policies. Recently, spatio-temporal forecasting, employing Graph Neural Networks (GNNs), has emerged as a promi… ▽ More From a telecommunication standpoint, the surge in users and services challenges next-generation networks with escalating traffic demands and limited resources. Accurate traffic prediction can offer network operators valuable insights into network conditions and suggest optimal allocation policies. Recently, spatio-temporal forecasting, employing Graph Neural Networks (GNNs), has emerged as a promising method for cellular traffic prediction. However, existing studies, inspired by road traffic forecasting formulations, overlook the dynamic deployment and removal of base stations, requiring the GNN-based forecaster to handle an evolving graph. This work introduces a novel inductive learning scheme and a generalizable GNN-based forecasting model that can process diverse graphs of cellular traffic with one-time training. We also demonstrate that this model can be easily leveraged by transfer learning with minimal effort, making it applicable to different areas. Experimental results show up to 9.8% performance improvement compared to the state-of-the-art, especially in rare-data settings with training data reduced to below 20%. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2403.00379 [pdf, other]

The Impact of Frequency Bands on Acoustic Anomaly Detection of Machines using Deep Learning Based Model

Authors: Tin Nguyen, Lam Pham, Phat Lam, Dat Ngo, Hieu Tang, Alexander Schindler

Abstract: In this paper, we propose a deep learning based model for Acoustic Anomaly Detection of Machines, the task for detecting abnormal machines by analysing the machine sound. By conducting extensive experiments, we indicate that multiple techniques of pseudo audios, audio segment, data augmentation, Mahalanobis distance, and narrow frequency bands, which mainly focus on feature engineering, are effect… ▽ More In this paper, we propose a deep learning based model for Acoustic Anomaly Detection of Machines, the task for detecting abnormal machines by analysing the machine sound. By conducting extensive experiments, we indicate that multiple techniques of pseudo audios, audio segment, data augmentation, Mahalanobis distance, and narrow frequency bands, which mainly focus on feature engineering, are effective to enhance the system performance. Among the evaluating techniques, the narrow frequency bands presents a significant impact. Indeed, our proposed model, which focuses on the narrow frequency bands, outperforms the DCASE baseline on the benchmark dataset of DCASE 2022 Task 2 Development set. The important role of the narrow frequency bands indicated in this paper inspires the research community on the task of Acoustic Anomaly Detection of Machines to further investigate and propose novel network architectures focusing on the frequency bands. △ Less

Submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.12179 [pdf, other]

Examining Monitoring System: Detecting Abnormal Behavior In Online Examinations

Authors: Dinh An Ngo, Thanh Dat Nguyen, Thi Le Chi Dang, Huy Hoan Le, Ton Bao Ho, Vo Thanh Khang Nguyen, Truong Thanh Hung Nguyen

Abstract: Cheating in online exams has become a prevalent issue over the past decade, especially during the COVID-19 pandemic. To address this issue of academic dishonesty, our "Exam Monitoring System: Detecting Abnormal Behavior in Online Examinations" is designed to assist proctors in identifying unusual student behavior. Our system demonstrates high accuracy and speed in detecting cheating in real-time s… ▽ More Cheating in online exams has become a prevalent issue over the past decade, especially during the COVID-19 pandemic. To address this issue of academic dishonesty, our "Exam Monitoring System: Detecting Abnormal Behavior in Online Examinations" is designed to assist proctors in identifying unusual student behavior. Our system demonstrates high accuracy and speed in detecting cheating in real-time scenarios, providing valuable information, and aiding proctors in decision-making. This article outlines our methodology and the effectiveness of our system in mitigating the widespread problem of cheating in online exams. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2312.16307 [pdf, other]

Incentive-Aware Synthetic Control: Accurate Counterfactual Estimation via Incentivized Exploration

Authors: Daniel Ngo, Keegan Harris, Anish Agarwal, Vasilis Syrgkanis, Zhiwei Steven Wu

Abstract: We consider the setting of synthetic control methods (SCMs), a canonical approach used to estimate the treatment effect on the treated in a panel data setting. We shed light on a frequently overlooked but ubiquitous assumption made in SCMs of "overlap": a treated unit can be written as some combination -- typically, convex or linear combination -- of the units that remain under control. We show th… ▽ More We consider the setting of synthetic control methods (SCMs), a canonical approach used to estimate the treatment effect on the treated in a panel data setting. We shed light on a frequently overlooked but ubiquitous assumption made in SCMs of "overlap": a treated unit can be written as some combination -- typically, convex or linear combination -- of the units that remain under control. We show that if units select their own interventions, and there is sufficiently large heterogeneity between units that prefer different interventions, overlap will not hold. We address this issue by proposing a framework which incentivizes units with different preferences to take interventions they would not normally consider. Specifically, leveraging tools from information design and online learning, we propose a SCM that incentivizes exploration in panel data settings by providing incentive-compatible intervention recommendations to units. We establish this estimator obtains valid counterfactual estimates without the need for an a priori overlap assumption. We extend our results to the setting of synthetic interventions, where the goal is to produce counterfactual outcomes under all interventions, not just control. Finally, we provide two hypothesis tests for determining whether unit overlap holds for a given panel dataset. △ Less

Submitted 13 February, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

arXiv:2312.10671 [pdf, other]

Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance

Authors: Phuc D. A. Nguyen, Tuan Duc Ngo, Evangelos Kalogerakis, Chuang Gan, Anh Tran, Cuong Pham, Khoi Nguyen

Abstract: We introduce Open3DIS, a novel solution designed to tackle the problem of Open-Vocabulary Instance Segmentation within 3D scenes. Objects within 3D environments exhibit diverse shapes, scales, and colors, making precise instance-level identification a challenging task. Recent advancements in Open-Vocabulary scene understanding have made significant strides in this area by employing class-agnostic… ▽ More We introduce Open3DIS, a novel solution designed to tackle the problem of Open-Vocabulary Instance Segmentation within 3D scenes. Objects within 3D environments exhibit diverse shapes, scales, and colors, making precise instance-level identification a challenging task. Recent advancements in Open-Vocabulary scene understanding have made significant strides in this area by employing class-agnostic 3D instance proposal networks for object localization and learning queryable features for each 3D mask. While these methods produce high-quality instance proposals, they struggle with identifying small-scale and geometrically ambiguous objects. The key idea of our method is a new module that aggregates 2D instance masks across frames and maps them to geometrically coherent point cloud regions as high-quality object proposals addressing the above limitations. These are then combined with 3D class-agnostic instance proposals to include a wide range of objects in the real world. To validate our approach, we conducted experiments on three prominent datasets, including ScanNet200, S3DIS, and Replica, demonstrating significant performance gains in segmenting objects with diverse categories over the state-of-the-art approaches. △ Less

Submitted 5 April, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

Comments: CVPR 2024. Project page: https://open3dis.github.io/

arXiv:2310.00467 [pdf, ps, other]

New results on Erasure Combinatorial Batch Codes

Authors: Phuc-Lu Le, Son Hoang Dau, Hy Dinh Ngo, Thuc D. Nguyen

Abstract: We investigate in this work the problem of Erasure Combinatorial Batch Codes, in which $n$ files are stored on $m$ servers so that every set of $n-r$ servers allows a client to retrieve at most $k$ distinct files by downloading at most $t$ files from each server. Previous studies have solved this problem for the special case of $t=1$ using Combinatorial Batch Codes. We tackle the general case… ▽ More We investigate in this work the problem of Erasure Combinatorial Batch Codes, in which $n$ files are stored on $m$ servers so that every set of $n-r$ servers allows a client to retrieve at most $k$ distinct files by downloading at most $t$ files from each server. Previous studies have solved this problem for the special case of $t=1$ using Combinatorial Batch Codes. We tackle the general case $t \geq 1$ using a generalization of Hall's theorem. Additionally, we address a realistic scenario in which the retrieved files are consecutive according to some order and provide a simple and optimal solution for this case. △ Less

Submitted 30 September, 2023; originally announced October 2023.

Comments: Allerton conference

arXiv:2307.13251 [pdf, other]

GaPro: Box-Supervised 3D Point Cloud Instance Segmentation Using Gaussian Processes as Pseudo Labelers

Authors: Tuan Duc Ngo, Binh-Son Hua, Khoi Nguyen

Abstract: Instance segmentation on 3D point clouds (3DIS) is a longstanding challenge in computer vision, where state-of-the-art methods are mainly based on full supervision. As annotating ground truth dense instance masks is tedious and expensive, solving 3DIS with weak supervision has become more practical. In this paper, we propose GaPro, a new instance segmentation for 3D point clouds using axis-aligned… ▽ More Instance segmentation on 3D point clouds (3DIS) is a longstanding challenge in computer vision, where state-of-the-art methods are mainly based on full supervision. As annotating ground truth dense instance masks is tedious and expensive, solving 3DIS with weak supervision has become more practical. In this paper, we propose GaPro, a new instance segmentation for 3D point clouds using axis-aligned 3D bounding box supervision. Our two-step approach involves generating pseudo labels from box annotations and training a 3DIS network with the resulting labels. Additionally, we employ the self-training strategy to improve the performance of our method further. We devise an effective Gaussian Process to generate pseudo instance masks from the bounding boxes and resolve ambiguities when they overlap, resulting in pseudo instance masks with their uncertainty values. Our experiments show that GaPro outperforms previous weakly supervised 3D instance segmentation methods and has competitive performance compared to state-of-the-art fully supervised ones. Furthermore, we demonstrate the robustness of our approach, where we can adapt various state-of-the-art fully supervised methods to the weak supervision task by using our pseudo labels for training. The source code and trained models are available at https://github.com/VinAIResearch/GaPro. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: Accepted to ICCV 2023

arXiv:2306.14929 [pdf, other]

A Deep Learning Architecture with Spatio-Temporal Focusing for Detecting Respiratory Anomalies

Authors: Dat Ngo, Lam Pham, Huy Phan, Minh Tran, Delaram Jarchi

Abstract: This paper presents a deep learning system applied for detecting anomalies from respiratory sound recordings. Our system initially performs audio feature extraction using Continuous Wavelet transformation. This transformation converts the respiratory sound input into a two-dimensional spectrogram where both spectral and temporal features are presented. Then, our proposed deep learning architecture… ▽ More This paper presents a deep learning system applied for detecting anomalies from respiratory sound recordings. Our system initially performs audio feature extraction using Continuous Wavelet transformation. This transformation converts the respiratory sound input into a two-dimensional spectrogram where both spectral and temporal features are presented. Then, our proposed deep learning architecture inspired by the Inception-residual-based backbone performs the spatial-temporal focusing and multi-head attention mechanism to classify respiratory anomalies. In this work, we evaluate our proposed models on the benchmark SPRSound (The Open-Source SJTU Paediatric Respiratory Sound) database proposed by the IEEE BioCAS 2023 challenge. As regards the Score computed by an average between the average score and harmonic score, our robust system has achieved Top-1 performance with Scores of 0.810, 0.667, 0.744, and 0.608 in Tasks 1-1, 1-2, 2-1, and 2-2, respectively. △ Less

Submitted 25 June, 2023; originally announced June 2023.

Comments: arXiv admin note: text overlap with arXiv:2303.04104

arXiv:2305.09463 [pdf, other]

Low-complexity deep learning frameworks for acoustic scene classification using teacher-student scheme and multiple spectrograms

Authors: Lam Pham, Dat Ngo, Cam Le, Anahid Jalali, Alexander Schindler

Abstract: In this technical report, a low-complexity deep learning system for acoustic scene classification (ASC) is presented. The proposed system comprises two main phases: (Phase I) Training a teacher network; and (Phase II) training a student network using distilled knowledge from the teacher. In the first phase, the teacher, which presents a large footprint model, is trained. After training the teacher… ▽ More In this technical report, a low-complexity deep learning system for acoustic scene classification (ASC) is presented. The proposed system comprises two main phases: (Phase I) Training a teacher network; and (Phase II) training a student network using distilled knowledge from the teacher. In the first phase, the teacher, which presents a large footprint model, is trained. After training the teacher, the embeddings, which are the feature map of the second last layer of the teacher, are extracted. In the second phase, the student network, which presents a low complexity model, is trained with the embeddings extracted from the teacher. Our experiments conducted on DCASE 2023 Task 1 Development dataset have fulfilled the requirement of low-complexity and achieved the best classification accuracy of 57.4%, improving DCASE baseline by 14.5%. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: arXiv admin note: text overlap with arXiv:2206.06057

arXiv:2305.06827 [pdf, other]

A Generic Approach to Integrating Time into Spatial-Temporal Forecasting via Conditional Neural Fields

Authors: Minh-Thanh Bui, Duc-Thinh Ngo, Demin Lu, Zonghua Zhang

Abstract: Self-awareness is the key capability of autonomous systems, e.g., autonomous driving network, which relies on highly efficient time series forecasting algorithm to enable the system to reason about the future state of the environment, as well as its effect on the system behavior as time progresses. Recently, a large number of forecasting algorithms using either convolutional neural networks or gra… ▽ More Self-awareness is the key capability of autonomous systems, e.g., autonomous driving network, which relies on highly efficient time series forecasting algorithm to enable the system to reason about the future state of the environment, as well as its effect on the system behavior as time progresses. Recently, a large number of forecasting algorithms using either convolutional neural networks or graph neural networks have been developed to exploit the complex temporal and spatial dependencies present in the time series. While these solutions have shown significant advantages over statistical approaches, one open question is to effectively incorporate the global information which represents the seasonality patterns via the time component of time series into the forecasting models to improve their accuracy. This paper presents a general approach to integrating the time component into forecasting models. The main idea is to employ conditional neural fields to represent the auxiliary features extracted from the time component to obtain the global information, which will be effectively combined with the local information extracted from autoregressive neural networks through a layer-wise gated fusion module. Extensive experiments on road traffic and cellular network traffic datasets prove the effectiveness of the proposed approach. △ Less

Submitted 17 May, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

arXiv:2305.01476 [pdf, other]

Deep Learning Based Multimodal with Two-phase Training Strategy for Daily Life Video Classification

Authors: Lam Pham, Trang Le, Cam Le, Dat Ngo, Weissenfeld Axel, Alexander Schindler

Abstract: In this paper, we present a deep learning based multimodal system for classifying daily life videos. To train the system, we propose a two-phase training strategy. In the first training phase (Phase I), we extract the audio and visual (image) data from the original video. We then train the audio data and the visual data with independent deep learning based models. After the training processes, we… ▽ More In this paper, we present a deep learning based multimodal system for classifying daily life videos. To train the system, we propose a two-phase training strategy. In the first training phase (Phase I), we extract the audio and visual (image) data from the original video. We then train the audio data and the visual data with independent deep learning based models. After the training processes, we obtain audio embeddings and visual embeddings by extracting feature maps from the pre-trained deep learning models. In the second training phase (Phase II), we train a fusion layer to combine the audio/visual embeddings and a dense layer to classify the combined embedding into target daily scenes. Our extensive experiments, which were conducted on the benchmark dataset of DCASE (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) 2021 Task 1B Development, achieved the best classification accuracy of 80.5%, 91.8%, and 95.3% with only audio data, with only visual data, both audio and visual data, respectively. The highest classification accuracy of 95.3% presents an improvement of 17.9% compared with DCASE baseline and shows very competitive to the state-of-the-art systems. △ Less

Submitted 30 April, 2023; originally announced May 2023.

arXiv:2304.07459 [pdf, other]

doi 10.1109/TIP.2023.3267621

Instance-level Few-shot Learning with Class Hierarchy Mining

Authors: Anh-Khoa Nguyen Vu, Thanh-Toan Do, Nhat-Duy Nguyen, Vinh-Tiep Nguyen, Thanh Duc Ngo, Tam V. Nguyen

Abstract: Few-shot learning is proposed to tackle the problem of scarce training data in novel classes. However, prior works in instance-level few-shot learning have paid less attention to effectively utilizing the relationship between categories. In this paper, we exploit the hierarchical information to leverage discriminative and relevant features of base classes to effectively classify novel objects. The… ▽ More Few-shot learning is proposed to tackle the problem of scarce training data in novel classes. However, prior works in instance-level few-shot learning have paid less attention to effectively utilizing the relationship between categories. In this paper, we exploit the hierarchical information to leverage discriminative and relevant features of base classes to effectively classify novel objects. These features are extracted from abundant data of base classes, which could be utilized to reasonably describe classes with scarce data. Specifically, we propose a novel superclass approach that automatically creates a hierarchy considering base and novel classes as fine-grained classes for few-shot instance segmentation (FSIS). Based on the hierarchical information, we design a novel framework called Soft Multiple Superclass (SMS) to extract relevant features or characteristics of classes in the same superclass. A new class assigned to the superclass is easier to classify by leveraging these relevant features. Besides, in order to effectively train the hierarchy-based-detector in FSIS, we apply the label refinement to further describe the associations between fine-grained classes. The extensive experiments demonstrate the effectiveness of our method on FSIS benchmarks. Code is available online. △ Less

Submitted 14 April, 2023; originally announced April 2023.

Comments: accepted by IEEE Transactions on Image Processing

arXiv:2304.07444 [pdf, other]

doi 10.1109/ACCESS.2024.3432873

The Art of Camouflage: Few-Shot Learning for Animal Detection and Segmentation

Authors: Thanh-Danh Nguyen, Anh-Khoa Nguyen Vu, Nhat-Duy Nguyen, Vinh-Tiep Nguyen, Thanh Duc Ngo, Thanh-Toan Do, Minh-Triet Tran, Tam V. Nguyen

Abstract: Camouflaged object detection and segmentation is a new and challenging research topic in computer vision. There is a serious issue of lacking data on concealed objects such as camouflaged animals in natural scenes. In this paper, we address the problem of few-shot learning for camouflaged object detection and segmentation. To this end, we first collect a new dataset, CAMO-FS, for the benchmark. As… ▽ More Camouflaged object detection and segmentation is a new and challenging research topic in computer vision. There is a serious issue of lacking data on concealed objects such as camouflaged animals in natural scenes. In this paper, we address the problem of few-shot learning for camouflaged object detection and segmentation. To this end, we first collect a new dataset, CAMO-FS, for the benchmark. As camouflaged instances are challenging to recognize due to their similarity compared to the surroundings, we guide our models to obtain camouflaged features that highly distinguish the instances from the background. In this work, we propose FS-CDIS, a framework to efficiently detect and segment camouflaged instances via two loss functions contributing to the training process. Firstly, the instance triplet loss with the characteristic of differentiating the anchor, which is the mean of all camouflaged foreground points, and the background points are employed to work at the instance level. Secondly, to consolidate the generalization at the class level, we present instance memory storage with the scope of storing camouflaged features of the same category, allowing the model to capture further class-level information during the learning process. The extensive experiments demonstrated that our proposed method achieves state-of-the-art performance on the newly collected dataset. Code is available at https://github.com/danhntd/FS-CDIS. △ Less

Submitted 5 August, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

Comments: IEEE Access 2024

arXiv:2303.04104 [pdf, other]

An Inception-Residual-Based Architecture with Multi-Objective Loss for Detecting Respiratory Anomalies

Authors: Dat Ngo, Lam Pham, Huy Phan, Minh Tran, Delaram Jarchi, Sefki Kolozali

Abstract: This paper presents a deep learning system applied for detecting anomalies from respiratory sound recordings. Initially, our system begins with audio feature extraction using Gammatone and Continuous Wavelet transformation. This step aims to transform the respiratory sound input into a two-dimensional spectrogram where both spectral and temporal features are presented. Then, our proposed system in… ▽ More This paper presents a deep learning system applied for detecting anomalies from respiratory sound recordings. Initially, our system begins with audio feature extraction using Gammatone and Continuous Wavelet transformation. This step aims to transform the respiratory sound input into a two-dimensional spectrogram where both spectral and temporal features are presented. Then, our proposed system integrates Inception-residual-based backbone models combined with multi-head attention and multi-objective loss to classify respiratory anomalies. Instead of applying a simple concatenation approach by combining results from various spectrograms, we propose a Linear combination, which has the ability to regulate equally the contribution of each individual spectrogram throughout the training process. To evaluate the performance, we conducted experiments over the benchmark dataset of SPRSound (The Open-Source SJTU Paediatric Respiratory Sound) proposed by the IEEE BioCAS 2022 challenge. As regards the Score computed by an average between the average score and harmonic score, our proposed system gained significant improvements of 9.7%, 15.8%, 17.8%, and 16.1% in Task 1-1, Task 1-2, Task 2-1, and Task 2-2, respectively, compared to the challenge baseline system. Notably, we achieved the Top-1 performance in Task 2-1 and Task 2-2 with the highest Score of 74.5% and 53.9%, respectively. △ Less

Submitted 19 June, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

arXiv:2303.00246 [pdf, other]

ISBNet: a 3D Point Cloud Instance Segmentation Network with Instance-aware Sampling and Box-aware Dynamic Convolution

Authors: Tuan Duc Ngo, Binh-Son Hua, Khoi Nguyen

Abstract: Existing 3D instance segmentation methods are predominated by the bottom-up design -- manually fine-tuned algorithm to group points into clusters followed by a refinement network. However, by relying on the quality of the clusters, these methods generate susceptible results when (1) nearby objects with the same semantic class are packed together, or (2) large objects with loosely connected regions… ▽ More Existing 3D instance segmentation methods are predominated by the bottom-up design -- manually fine-tuned algorithm to group points into clusters followed by a refinement network. However, by relying on the quality of the clusters, these methods generate susceptible results when (1) nearby objects with the same semantic class are packed together, or (2) large objects with loosely connected regions. To address these limitations, we introduce ISBNet, a novel cluster-free method that represents instances as kernels and decodes instance masks via dynamic convolution. To efficiently generate high-recall and discriminative kernels, we propose a simple strategy named Instance-aware Farthest Point Sampling to sample candidates and leverage the local aggregation layer inspired by PointNet++ to encode candidate features. Moreover, we show that predicting and leveraging the 3D axis-aligned bounding boxes in the dynamic convolution further boosts performance. Our method set new state-of-the-art results on ScanNetV2 (55.9), S3DIS (60.8), and STPLS3D (49.2) in terms of AP and retains fast inference time (237ms per scene on ScanNetV2). The source code and trained models are available at https://github.com/VinAIResearch/ISBNet. △ Less

Submitted 26 March, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

Comments: Accepted to CVPR 2023

arXiv:2302.13028 [pdf, other]

A Light-weight Deep Learning Model for Remote Sensing Image Classification

Authors: Lam Pham, Cam Le, Dat Ngo, Anh Nguyen, Jasmin Lampert, Alexander Schindler, Ian McLoughlin

Abstract: In this paper, we present a high-performance and light-weight deep learning model for Remote Sensing Image Classification (RSIC), the task of identifying the aerial scene of a remote sensing image. To this end, we first valuate various benchmark convolutional neural network (CNN) architectures: MobileNet V1/V2, ResNet 50/151V2, InceptionV3/InceptionResNetV2, EfficientNet B0/B7, DenseNet 121/201, C… ▽ More In this paper, we present a high-performance and light-weight deep learning model for Remote Sensing Image Classification (RSIC), the task of identifying the aerial scene of a remote sensing image. To this end, we first valuate various benchmark convolutional neural network (CNN) architectures: MobileNet V1/V2, ResNet 50/151V2, InceptionV3/InceptionResNetV2, EfficientNet B0/B7, DenseNet 121/201, ConNeXt Tiny/Large. Then, the best performing models are selected to train a compact model in a teacher-student arrangement. The knowledge distillation from the teacher aims to achieve high performance with significantly reduced complexity. By conducting extensive experiments on the NWPU-RESISC45 benchmark, our proposed teacher-student models outperforms the state-of-the-art systems, and has potential to be applied on a wide rage of edge devices. △ Less

Submitted 25 February, 2023; originally announced February 2023.

arXiv:2302.08533 [pdf, other]

Federated Learning as a Network Effects Game

Authors: Shengyuan Hu, Dung Daniel Ngo, Shuran Zheng, Virginia Smith, Zhiwei Steven Wu

Abstract: Federated Learning (FL) aims to foster collaboration among a population of clients to improve the accuracy of machine learning without directly sharing local data. Although there has been rich literature on designing federated learning algorithms, most prior works implicitly assume that all clients are willing to participate in a FL scheme. In practice, clients may not benefit from joining in FL,… ▽ More Federated Learning (FL) aims to foster collaboration among a population of clients to improve the accuracy of machine learning without directly sharing local data. Although there has been rich literature on designing federated learning algorithms, most prior works implicitly assume that all clients are willing to participate in a FL scheme. In practice, clients may not benefit from joining in FL, especially in light of potential costs related to issues such as privacy and computation. In this work, we study the clients' incentives in federated learning to help the service provider design better solutions and ensure clients make better decisions. We are the first to model clients' behaviors in FL as a network effects game, where each client's benefit depends on other clients who also join the network. Using this setup we analyze the dynamics of clients' participation and characterize the equilibrium, where no client has incentives to alter their decision. Specifically, we show that dynamics in the population naturally converge to equilibrium without needing explicit interventions. Finally, we provide a cost-efficient payment scheme that incentivizes clients to reach a desired equilibrium when the initial network is empty. △ Less

Submitted 16 February, 2023; originally announced February 2023.

Comments: 14 pages of main text, 26 pages in total

arXiv:2211.07833 [pdf]

doi 10.1016/j.apenergy.2023.120817

Optimal sizing of renewable energy storage: A comparative study of hydrogen and battery system considering degradation and seasonal storage

Authors: Son Tay Le, Tuan Ngoc Nguyen, Dac-Khuong Bui, Tuan Duc Ngo

Abstract: Renewable energy storage (RES) is essential to address the intermittence issues of renewable energy systems, thereby enhancing the system stability and reliability. This study presents an optimisation study of sizing and operational strategy parameters of a grid-connected photovoltaic (PV)-hydrogen/battery systems using a Multi-Objective Modified Firefly Algorithm (MOMFA). An operational strategy… ▽ More Renewable energy storage (RES) is essential to address the intermittence issues of renewable energy systems, thereby enhancing the system stability and reliability. This study presents an optimisation study of sizing and operational strategy parameters of a grid-connected photovoltaic (PV)-hydrogen/battery systems using a Multi-Objective Modified Firefly Algorithm (MOMFA). An operational strategy that utilises the ability of hydrogen to store energy over a long time was also investigated. The proposed method was applied to a real-world distributed energy project located in the tropical climate zone. To further demonstrate the robustness and versatility of the method, another synthetic test case was examined for a location in the subtropical weather zone, which has a high seasonal mismatch. The performance of the proposed MOMFA method is compared with the NSGA-II method, which has been widely used to design renewable energy storage systems in the literature. The result shows that MOMFA is more accurate and robust than NSGA-II owing to the complex and dynamic nature of energy storage system. The optimisation results show that battery storage systems, as a mature technology, yield better economic performance than current hydrogen storage systems. However, it is proven that hydrogen storage systems provide better techno-economic performance and can be a viable long-term storage solution when high penetration of renewable energy is required. The study also proves that the proposed long-term operational strategy can lower component degradation, enhance efficiency, and increase the total economic performance of hydrogen storage systems. The findings of this study can support the implementation of energy storage systems for renewable energy. △ Less

Submitted 14 November, 2022; originally announced November 2022.

arXiv:2209.03672 [pdf]

Observation of strange metal in hole-doped valley-spin insulator

Authors: Tuan Dung Nguyen, Baithi Mallesh, Seon Je Kim, Houcine Bouzid, Byeongwook Cho, Xuan Phu Le, Tien Dat Ngo, Won Jong Yoo, Young-Min Kim, Dinh Loc Duong, Young Hee Lee

Abstract: Temperature-linear resistance at low temperatures in strange metals is an exotic characteristic of strong correlation systems, as observed in high-TC superconducting cuprates, heavy fermions, Fe-based superconductors, ruthenates, and twisted bilayer graphene. Here, we introduce a hole-doped valley-spin insulator, V-doped WSe2, with hole pockets in the valence band. The strange metal characteristic… ▽ More Temperature-linear resistance at low temperatures in strange metals is an exotic characteristic of strong correlation systems, as observed in high-TC superconducting cuprates, heavy fermions, Fe-based superconductors, ruthenates, and twisted bilayer graphene. Here, we introduce a hole-doped valley-spin insulator, V-doped WSe2, with hole pockets in the valence band. The strange metal characteristic was observed in VxW1-xSe2 at a critical carrier concentration of 9.5 x 10^20 cm-3 from 150 K to 1.8 K. The unsaturated magnetoresistance is almost linearly proportional to the magnetic field. Using the ansatz R(H,T) - R(0,0) ~ [(alpha.k.T)^2+(gamma.mu.B)^2]^1/2, the gamma/alpha ratio is estimated approximately to 4, distinct from that for the quasiparticles of LSCO, BaFe2(As1-xPx)2 (gamma/alpha=1) and bosons of YBCO (gamma/alpha=2). Our observation opens up the possible routes that induce strong correlation and superconductivity in two-dimensional materials with strong spin-orbit coupling. △ Less

Submitted 8 September, 2022; originally announced September 2022.

Comments: 8 pages, 4 figures + Supplemental Material

arXiv:2208.03403 [pdf, other]

Slice-level Detection of Intracranial Hemorrhage on CT Using Deep Descriptors of Adjacent Slices

Authors: Dat T. Ngo, Thao T. B. Nguyen, Hieu T. Nguyen, Dung B. Nguyen, Ha Q. Nguyen, Hieu H. Pham

Abstract: The rapid development in representation learning techniques such as deep neural networks and the availability of large-scale, well-annotated medical imaging datasets have to a rapid increase in the use of supervised machine learning in the 3D medical image analysis and diagnosis. In particular, deep convolutional neural networks (D-CNNs) have been key players and were adopted by the medical imagin… ▽ More The rapid development in representation learning techniques such as deep neural networks and the availability of large-scale, well-annotated medical imaging datasets have to a rapid increase in the use of supervised machine learning in the 3D medical image analysis and diagnosis. In particular, deep convolutional neural networks (D-CNNs) have been key players and were adopted by the medical imaging community to assist clinicians and medical experts in disease diagnosis and treatment. However, training and inferencing deep neural networks such as D-CNN on high-resolution 3D volumes of Computed Tomography (CT) scans for diagnostic tasks pose formidable computational challenges. This challenge raises the need of developing deep learning-based approaches that are robust in learning representations in 2D images, instead 3D scans. In this work, we propose for the first time a new strategy to train \emph{slice-level} classifiers on CT scans based on the descriptors of the adjacent slices along the axis. In particular, each of which is extracted through a convolutional neural network (CNN). This method is applicable to CT datasets with per-slice labels such as the RSNA Intracranial Hemorrhage (ICH) dataset, which aims to predict the presence of ICH and classify it into 5 different sub-types. We obtain a single model in the top 4% best-performing solutions of the RSNA ICH challenge, where model ensembles are allowed. Experiments also show that the proposed method significantly outperforms the baseline model on CQ500. The proposed method is general and can be applied to other 3D medical diagnosis tasks such as MRI imaging. To encourage new advances in the field, we will make our codes and pre-trained model available upon acceptance of the paper. △ Less

Submitted 17 April, 2023; v1 submitted 5 August, 2022; originally announced August 2022.

Comments: Accepted for presentation at the 22nd IEEE Statistical Signal Processing (SSP) workshop

arXiv:2206.13392 [pdf, ps, other]

Remote Sensing Image Classification using Transfer Learning and Attention Based Deep Neural Network

Authors: Lam Pham, Khoa Tran, Dat Ngo, Jasmin Lampert, Alexander Schindler

Abstract: The task of remote sensing image scene classification (RSISC), which aims at classifying remote sensing images into groups of semantic categories based on their contents, has taken the important role in a wide range of applications such as urban planning, natural hazards detection, environment monitoring,vegetation mapping, or geospatial object detection. During the past years, research community… ▽ More The task of remote sensing image scene classification (RSISC), which aims at classifying remote sensing images into groups of semantic categories based on their contents, has taken the important role in a wide range of applications such as urban planning, natural hazards detection, environment monitoring,vegetation mapping, or geospatial object detection. During the past years, research community focusing on RSISC task has shown significant effort to publish diverse datasets as well as propose different approaches to deal with the RSISC challenges. Recently, almost proposed RSISC systems base on deep learning models which prove powerful and outperform traditional approaches using image processing and machine learning. In this paper, we also leverage the power of deep learning technology, evaluate a variety of deep neural network architectures, indicate main factors affecting the performance of a RSISC system. Given the comprehensive analysis, we propose a deep learning based framework for RSISC, which makes use of the transfer learning technique and multihead attention scheme. The proposed deep learning framework is evaluated on the benchmark NWPU-RESISC45 dataset and achieves the best classification accuracy of 94.7% which shows competitive to the state-of-the-art systems and potential for real-life applications. △ Less

Submitted 20 June, 2022; originally announced June 2022.

arXiv:2206.06057 [pdf, ps, other]

Low-complexity deep learning frameworks for acoustic scene classification

Authors: Lam Pham, Dat Ngo, Anahid Jalali, Alexander Schindler

Abstract: In this report, we presents low-complexity deep learning frameworks for acoustic scene classification (ASC). The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities. In particular, we initially transform audio recordings into Mel, Gammatone, and CQT spectrograms. N… ▽ More In this report, we presents low-complexity deep learning frameworks for acoustic scene classification (ASC). The proposed frameworks can be separated into four main steps: Front-end spectrogram extraction, online data augmentation, back-end classification, and late fusion of predicted probabilities. In particular, we initially transform audio recordings into Mel, Gammatone, and CQT spectrograms. Next, data augmentation methods of Random Cropping, Specaugment, and Mixup are then applied to generate augmented spectrograms before being fed into deep learning based classifiers. Finally, to achieve the best performance, we fuse probabilities which obtained from three individual classifiers, which are independently-trained with three type of spectrograms. Our experiments conducted on DCASE 2022 Task 1 Development dataset have fullfiled the requirement of low-complexity and achieved the best classification accuracy of 60.1%, improving DCASE baseline by 17.2%. △ Less

Submitted 13 June, 2022; originally announced June 2022.

arXiv:2206.00494 [pdf, ps, other]

Incentivizing Combinatorial Bandit Exploration

Authors: Xinyan Hu, Dung Daniel Ngo, Aleksandrs Slivkins, Zhiwei Steven Wu

Abstract: Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system. The users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations. While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users. All published work on this problem,… ▽ More Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system. The users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations. While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users. All published work on this problem, known as incentivized exploration, focuses on small, unstructured action sets and mainly targets the case when the users' beliefs are independent across actions. However, realistic exploration problems often feature large, structured action sets and highly correlated beliefs. We focus on a paradigmatic exploration problem with structure: combinatorial semi-bandits. We prove that Thompson Sampling, when applied to combinatorial semi-bandits, is incentive-compatible when initialized with a sufficient number of samples of each arm (where this number is determined in advance by the Bayesian prior). Moreover, we design incentive-compatible algorithms for collecting the initial samples. △ Less

Submitted 1 June, 2022; originally announced June 2022.

Comments: 9 pages of main text, 21 pages in total

arXiv:2203.12314 [pdf, other]

Wider or Deeper Neural Network Architecture for Acoustic Scene Classification with Mismatched Recording Devices

Authors: Lam Pham, Khoa Dinh, Dat Ngo, Hieu Tang, Alexander Schindler

Abstract: In this paper, we present a robust and low complexity system for Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording. We first construct an ASC baseline system in which a novel inception-residual-based network architecture is proposed to deal with the mismatched recording device issue. To further improve the performance but still satisfy the low complexity… ▽ More In this paper, we present a robust and low complexity system for Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording. We first construct an ASC baseline system in which a novel inception-residual-based network architecture is proposed to deal with the mismatched recording device issue. To further improve the performance but still satisfy the low complexity model, we apply two techniques: ensemble of multiple spectrograms and channel reduction on the ASC baseline system. By conducting extensive experiments on the benchmark DCASE 2020 Task 1A Development dataset, we achieve the best model performing an accuracy of 69.9% and a low complexity of 2.4M trainable parameters, which is competitive to the state-of-the-art ASC systems and potential for real-life applications on edge devices. △ Less

Submitted 23 March, 2022; originally announced March 2022.

Comments: This paper was submitted to INTERSPEECH 2022

arXiv:2203.05281 [pdf, other]

Multi-Agent Task Assignment in Vehicular Edge Computing: A Regret-Matching Learning-Based Approach

Authors: Bach Long Nguyen, Duong D. Nguyen, Hung X. Nguyen, Duy T. Ngo, Markus Wagner

Abstract: Vehicular edge computing has recently been proposed to support computation-intensive applications in Intelligent Transportation Systems (ITS) such as self-driving cars and augmented reality. Despite progress in this area, significant challenges remain to efficiently allocate limited computation resources to a range of time-critical ITS tasks. To this end, the current paper develops a new task assi… ▽ More Vehicular edge computing has recently been proposed to support computation-intensive applications in Intelligent Transportation Systems (ITS) such as self-driving cars and augmented reality. Despite progress in this area, significant challenges remain to efficiently allocate limited computation resources to a range of time-critical ITS tasks. To this end, the current paper develops a new task assignment scheme for vehicles in a highway. Because of the high speed of vehicles and the limited communication range of road side units (RSUs), the computation tasks of participating vehicles are to be dynamically migrated across multiple servers. We formulate a binary nonlinear programming (BNLP) problem of assigning computation tasks from vehicles to RSUs and a macrocell base station. To deal with the potentially large size of the formulated optimization problem, we develop a distributed multi-agent regret-matching learning algorithm. Based on the regret minimization principle, the proposed algorithm employs a forgetting method that allows the learning process to quickly adapt to and effectively handle the high mobility feature of vehicle networks. We theoretically prove that it converges to the correlated equilibrium solutions of the considered BNLP problem. Simulation results with practical parameter settings show that the proposed algorithm offers the lowest total delay and cost of processing tasks, as well as utility fairness among agents. Importantly, our algorithm converges much faster than existing methods as the problem size grows, demonstrating its clear advantage in large-scale vehicular networks. △ Less

Submitted 16 December, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

Comments: 10 pages, 12 figures, and 1 table

arXiv:2202.05626 [pdf, other]

Audio-Based Deep Learning Frameworks for Detecting COVID-19

Authors: Dat Ngo, Lam Pham, Truong Hoang, Sefki Kolozali, Delaram Jarchi

Abstract: This paper evaluates a wide range of audio-based deep learning frameworks applied to the breathing, cough, and speech sounds for detecting COVID-19. In general, the audio recording inputs are transformed into low-level spectrogram features, then they are fed into pre-trained deep learning models to extract high-level embedding features. Next, the dimension of these high-level embedding features ar… ▽ More This paper evaluates a wide range of audio-based deep learning frameworks applied to the breathing, cough, and speech sounds for detecting COVID-19. In general, the audio recording inputs are transformed into low-level spectrogram features, then they are fed into pre-trained deep learning models to extract high-level embedding features. Next, the dimension of these high-level embedding features are reduced before finetuning using Light Gradient Boosting Machine (LightGBM) as a back-end classification. Our experiments on the Second DiCOVA Challenge achieved the highest Area Under the Curve (AUC), F1 score, sensitivity score, and specificity score of 89.03%, 64.41%, 63.33%, and 95.13%, respectively. Based on these scores, our method outperforms the state-of-the-art systems, and improves the challenge baseline by 4.33%, 6.00% and 8.33% in terms of AUC, F1 score and sensitivity score, respectively. △ Less

Submitted 2 March, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

arXiv:2202.01292 [pdf, other]

Improved Regret for Differentially Private Exploration in Linear MDP

Authors: Dung Daniel Ngo, Giuseppe Vietri, Zhiwei Steven Wu

Abstract: We study privacy-preserving exploration in sequential decision-making for environments that rely on sensitive data such as medical records. In particular, we focus on solving the problem of reinforcement learning (RL) subject to the constraint of (joint) differential privacy in the linear MDP setting, where both dynamics and rewards are given by linear functions. Prior work on this problem due to… ▽ More We study privacy-preserving exploration in sequential decision-making for environments that rely on sensitive data such as medical records. In particular, we focus on solving the problem of reinforcement learning (RL) subject to the constraint of (joint) differential privacy in the linear MDP setting, where both dynamics and rewards are given by linear functions. Prior work on this problem due to Luyo et al. (2021) achieves a regret rate that has a dependence of $O(K^{3/5})$ on the number of episodes $K$. We provide a private algorithm with an improved regret rate with an optimal dependence of $O(\sqrt{K})$ on the number of episodes. The key recipe for our stronger regret guarantee is the adaptivity in the policy update schedule, in which an update only occurs when sufficient changes in the data are detected. As a result, our algorithm benefits from low switching cost and only performs $O(\log(K))$ updates, which greatly reduces the amount of privacy noise. Finally, in the most prevalent privacy regimes where the privacy parameter $ε$ is a constant, our algorithm incurs negligible privacy cost -- in comparison with the existing non-private regret bounds, the additional regret due to privacy appears in lower-order terms. △ Less

Submitted 22 June, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

Comments: 13 pages of main text, 30 pages in total; typo corrected, references added

arXiv:2201.03054 [pdf, ps, other]

An Ensemble of Deep Learning Frameworks Applied For Predicting Respiratory Anomalies

Authors: Lam Pham, Dat Ngo, Truong Hoang, Alexander Schindler, Ian McLoughlin

Abstract: In this paper, we evaluate various deep learning frameworks for detecting respiratory anomalies from input audio recordings. To this end, we firstly transform audio respiratory cycles collected from patients into spectrograms where both temporal and spectral features are presented, referred to as the front-end feature extraction. We then feed the spectrograms into back-end deep learning networks f… ▽ More In this paper, we evaluate various deep learning frameworks for detecting respiratory anomalies from input audio recordings. To this end, we firstly transform audio respiratory cycles collected from patients into spectrograms where both temporal and spectral features are presented, referred to as the front-end feature extraction. We then feed the spectrograms into back-end deep learning networks for classifying these respiratory cycles into certain categories. Finally, results from high-performed deep learning frameworks are fused to obtain the best score. Our experiments on ICBHI benchmark dataset achieve the highest ICBHI score of 57.3 from a late fusion of inception based and transfer learning based deep learning frameworks, which outperforms the state-of-the-art systems. △ Less

Submitted 9 January, 2022; originally announced January 2022.

arXiv:2201.00118 [pdf, ps, other]

Semantic Search for Large Scale Clinical Ontologies

Authors: Duy-Hoa Ngo, Madonna Kemp, Donna Truran, Bevan Koopman, Alejandro Metke-Jimenez

Abstract: Finding concepts in large clinical ontologies can be challenging when queries use different vocabularies. A search algorithm that overcomes this problem is useful in applications such as concept normalisation and ontology matching, where concepts can be referred to in different ways, using different synonyms. In this paper, we present a deep learning based approach to build a semantic search syste… ▽ More Finding concepts in large clinical ontologies can be challenging when queries use different vocabularies. A search algorithm that overcomes this problem is useful in applications such as concept normalisation and ontology matching, where concepts can be referred to in different ways, using different synonyms. In this paper, we present a deep learning based approach to build a semantic search system for large clinical ontologies. We propose a Triplet-BERT model and a method that generates training data directly from the ontologies. The model is evaluated using five real benchmark data sets and the results show that our approach achieves high results on both free text to concept and concept to concept searching tasks, and outperforms all baseline methods. △ Less

Submitted 1 January, 2022; originally announced January 2022.

arXiv:2112.11723 [pdf, other]

Energy-Efficient Massive MIMO for Federated Learning: Transmission Designs and Resource Allocations

Authors: Tung T. Vu, Hien Q. Ngo, Minh N. Dao, Duy T. Ngo, Erik G. Larsson, Tho Le-Ngoc

Abstract: This work proposes novel synchronous, asynchronous, and session-based designs for energy-efficient massive multiple-input multiple-output networks to support federated learning (FL). The synchronous design relies on strict synchronization among users when executing each FL communication round, while the asynchronous design allows more flexibility for users to save energy by using lower computing f… ▽ More This work proposes novel synchronous, asynchronous, and session-based designs for energy-efficient massive multiple-input multiple-output networks to support federated learning (FL). The synchronous design relies on strict synchronization among users when executing each FL communication round, while the asynchronous design allows more flexibility for users to save energy by using lower computing frequencies. The session-based design splits the downlink and uplink phases in each FL communication round into separate sessions. In this design, we assign users such that one of the participating users in each session finishes its transmission and does not join the next session. As such, more power and degrees of freedom will be allocated to unfinished users, leading to higher rates, lower transmission times, and hence, a higher energy efficiency. In all three designs, we use zero-forcing processing for both uplink and downlink, and develop algorithms that optimize user assignment, time allocation, power, and computing frequencies to minimize the energy consumption at the base station and users, while guaranteeing a predefined maximum execution time of one FL communication round. △ Less

Submitted 15 November, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

Comments: accepted to appear

arXiv:2112.09172 [pdf, ps, other]

An Audio-Visual Dataset and Deep Learning Frameworks for Crowded Scene Classification

Authors: Lam Pham, Dat Ngo, Phu X. Nguyen, Truong Hoang, Alexander Schindler

Abstract: This paper presents a task of audio-visual scene classification (SC) where input videos are classified into one of five real-life crowded scenes: 'Riot', 'Noise-Street', 'Firework-Event', 'Music-Event', and 'Sport-Atmosphere'. To this end, we firstly collect an audio-visual dataset (videos) of these five crowded contexts from Youtube (in-the-wild scenes). Then, a wide range of deep learning framew… ▽ More This paper presents a task of audio-visual scene classification (SC) where input videos are classified into one of five real-life crowded scenes: 'Riot', 'Noise-Street', 'Firework-Event', 'Music-Event', and 'Sport-Atmosphere'. To this end, we firstly collect an audio-visual dataset (videos) of these five crowded contexts from Youtube (in-the-wild scenes). Then, a wide range of deep learning frameworks are proposed to deploy either audio or visual input data independently. Finally, results obtained from high-performed deep learning frameworks are fused to achieve the best accuracy score. Our experimental results indicate that audio and visual input factors independently contribute to the SC task's performance. Significantly, an ensemble of deep learning frameworks exploring either audio or visual input data can achieve the best accuracy of 95.7%. △ Less

Submitted 16 December, 2021; originally announced December 2021.

arXiv:2110.03251 [pdf, other]

doi 10.1109/EMBC48229.2022.9871179

A Cough-based deep learning framework for detecting COVID-19

Authors: Truong Hoang, Lam Pham, Dat Ngo, Hoang D. Nguyen

Abstract: This paper presents a deep learning framework for detecting COVID-19 positive subjects from their cough sounds. In particular, the proposed approach comprises two main steps. In the first step, we generate a feature representing the cough sound by combining an embedding extracted from a pre-trained model and handcrafted features extracted from draw audio recording, referred to as the front-end fea… ▽ More This paper presents a deep learning framework for detecting COVID-19 positive subjects from their cough sounds. In particular, the proposed approach comprises two main steps. In the first step, we generate a feature representing the cough sound by combining an embedding extracted from a pre-trained model and handcrafted features extracted from draw audio recording, referred to as the front-end feature extraction. Then, the combined features are fed into different back-end classification models for detecting COVID-19 positive subjects in the second step. Our experiments on the Track-2 dataset of the Second 2021 DiCOVA Challenge achieved the second top ranking with an AUC score of 81.21 and the top F1 score of 53.21 on a Blind Test set, improving the challenge baseline by 8.43% and 23.4% respectively and showing deployability, robustness and competitiveness with the state-of-the-art systems. △ Less

Submitted 30 September, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

Comments: COVID-19, EMBC-2022, DiCOVA, top 2nd, benchmark on Spec > 0.95%

MSC Class: 92-05; 68Txx ACM Class: J.3; I.5.4; I.5.2; H.5.5; C.3; K.5

Journal ref: EMBC 44 (2022) 3422-3425

arXiv:2108.13512 [pdf, ps, other]

Energy-Efficient Massive MIMO for Serving Multiple Federated Learning Groups

Authors: Tung T. Vu, Hien Quoc Ngo, Duy T. Ngo, Minh N Dao, Erik G. Larsson

Abstract: With its privacy preservation and communication efficiency, federated learning (FL) has emerged as a learning framework that suits beyond 5G and towards 6G systems. This work looks into a future scenario in which there are multiple groups with different learning purposes and participating in different FL processes. We give energy-efficient solutions to demonstrate that this scenario can be realist… ▽ More With its privacy preservation and communication efficiency, federated learning (FL) has emerged as a learning framework that suits beyond 5G and towards 6G systems. This work looks into a future scenario in which there are multiple groups with different learning purposes and participating in different FL processes. We give energy-efficient solutions to demonstrate that this scenario can be realistic. First, to ensure a stable operation of multiple FL processes over wireless channels, we propose to use a massive multiple-input multiple-output network to support the local and global FL training updates, and let the iterations of these FL processes be executed within the same large-scale coherence time. Then, we develop asynchronous and synchronous transmission protocols where these iterations are asynchronously and synchronously executed, respectively, using the downlink unicasting and conventional uplink transmission schemes. Zero-forcing processing is utilized for both uplink and downlink transmissions. Finally, we propose an algorithm that optimally allocates power and computation resources to save energy at both base station and user sides, while guaranteeing a given maximum execution time threshold of each FL iteration. Compared to the baseline schemes, the proposed algorithm significantly reduces the energy consumption, especially when the number of base station antennas is large. △ Less

Submitted 17 October, 2021; v1 submitted 30 August, 2021; originally announced August 2021.

Comments: Accepted to appear in Proc. IEEE Global Communications Conference (GLOBECOM), Madrid, Spain, Dec. 2021. (v2). arXiv admin note: text overlap with arXiv:2107.09577

arXiv:2107.10093 [pdf, other]

Incentivizing Compliance with Algorithmic Instruments

Authors: Daniel Ngo, Logan Stapleton, Vasilis Syrgkanis, Zhiwei Steven Wu

Abstract: Randomized experiments can be susceptible to selection bias due to potential non-compliance by the participants. While much of the existing work has studied compliance as a static behavior, we propose a game-theoretic model to study compliance as dynamic behavior that may change over time. In rounds, a social planner interacts with a sequence of heterogeneous agents who arrive with their unobserve… ▽ More Randomized experiments can be susceptible to selection bias due to potential non-compliance by the participants. While much of the existing work has studied compliance as a static behavior, we propose a game-theoretic model to study compliance as dynamic behavior that may change over time. In rounds, a social planner interacts with a sequence of heterogeneous agents who arrive with their unobserved private type that determines both their prior preferences across the actions (e.g., control and treatment) and their baseline rewards without taking any treatment. The planner provides each agent with a randomized recommendation that may alter their beliefs and their action selection. We develop a novel recommendation mechanism that views the planner's recommendation as a form of instrumental variable (IV) that only affects an agents' action selection, but not the observed rewards. We construct such IVs by carefully mapping the history -- the interactions between the planner and the previous agents -- to a random recommendation. Even though the initial agents may be completely non-compliant, our mechanism can incentivize compliance over time, thereby enabling the estimation of the treatment effect of each treatment, and minimizing the cumulative regret of the planner whose goal is to identify the optimal treatment. △ Less

Submitted 28 July, 2021; v1 submitted 21 July, 2021; originally announced July 2021.

Comments: In Proceedings of the Thirty-eighth International Conference on Machine Learning (ICML 2021), 17 pages of main text, 53 pages total, 3 figures

arXiv:2107.09725 [pdf, other]

Registration of 3D Point Sets Using Correntropy Similarity Matrix

Authors: Ashutosh Singandhupe, Hung La, Trung Dung Ngo, Van Ho

Abstract: This work focuses on Registration or Alignment of 3D point sets. Although the Registration problem is a well established problem and it's solved using multiple variants of Iterative Closest Point (ICP) Algorithm, most of the approaches in the current state of the art still suffers from misalignment when the \textit{Source} and the \textit{Target} point sets are separated by large rotations and tra… ▽ More This work focuses on Registration or Alignment of 3D point sets. Although the Registration problem is a well established problem and it's solved using multiple variants of Iterative Closest Point (ICP) Algorithm, most of the approaches in the current state of the art still suffers from misalignment when the \textit{Source} and the \textit{Target} point sets are separated by large rotations and translation. In this work, we propose a variant of the Standard ICP algorithm, where we introduce a Correntropy Relationship Matrix in the computation of rotation and translation component which attempts to solve the large rotation and translation problem between \textit{Source} and \textit{Target} point sets. This matrix is created through correntropy criterion which is updated in every iteration. The correntropy criterion defined in this approach maintains the relationship between the points in the \textit{Source} dataset and the \textit{Target} dataset. Through our experiments and validation we verify that our approach has performed well under various rotation and translation in comparison to the other well-known state of the art methods available in the Point Cloud Library (PCL) as well as other methods available as open source. We have uploaded our code in the github repository for the readers to validate and verify our approach https://github.com/aralab-unr/CoSM-ICP. △ Less

Submitted 20 July, 2021; originally announced July 2021.

arXiv:2107.05762 [pdf, other]

Strategic Instrumental Variable Regression: Recovering Causal Relationships From Strategic Responses

Authors: Keegan Harris, Daniel Ngo, Logan Stapleton, Hoda Heidari, Zhiwei Steven Wu

Abstract: In settings where Machine Learning (ML) algorithms automate or inform consequential decisions about people, individual decision subjects are often incentivized to strategically modify their observable attributes to receive more favorable predictions. As a result, the distribution the assessment rule is trained on may differ from the one it operates on in deployment. While such distribution shifts,… ▽ More In settings where Machine Learning (ML) algorithms automate or inform consequential decisions about people, individual decision subjects are often incentivized to strategically modify their observable attributes to receive more favorable predictions. As a result, the distribution the assessment rule is trained on may differ from the one it operates on in deployment. While such distribution shifts, in general, can hinder accurate predictions, our work identifies a unique opportunity associated with shifts due to strategic responses: We show that we can use strategic responses effectively to recover causal relationships between the observable features and outcomes we wish to predict, even under the presence of unobserved confounding variables. Specifically, our work establishes a novel connection between strategic responses to ML models and instrumental variable (IV) regression by observing that the sequence of deployed models can be viewed as an instrument that affects agents' observable features but does not directly influence their outcomes. We show that our causal recovery method can be utilized to improve decision-making across several important criteria: individual fairness, agent outcomes, and predictive risk. In particular, we show that if decision subjects differ in their ability to modify non-causal attributes, any decision rule deviating from the causal coefficients can lead to (potentially unbounded) individual-level unfairness. △ Less

Submitted 8 June, 2022; v1 submitted 12 July, 2021; originally announced July 2021.

Comments: In the 39th International Conference on Machine Learning (ICML 2022)

arXiv:2104.02523 [pdf, other]

An Analysis of State-of-the-art Activation Functions For Supervised Deep Neural Network

Authors: Anh Nguyen, Khoa Pham, Dat Ngo, Thanh Ngo, Lam Pham

Abstract: This paper provides an analysis of state-of-the-art activation functions with respect to supervised classification of deep neural network. These activation functions comprise of Rectified Linear Units (ReLU), Exponential Linear Unit (ELU), Scaled Exponential Linear Unit (SELU), Gaussian Error Linear Unit (GELU), and the Inverse Square Root Linear Unit (ISRLU). To evaluate, experiments over two dee… ▽ More This paper provides an analysis of state-of-the-art activation functions with respect to supervised classification of deep neural network. These activation functions comprise of Rectified Linear Units (ReLU), Exponential Linear Unit (ELU), Scaled Exponential Linear Unit (SELU), Gaussian Error Linear Unit (GELU), and the Inverse Square Root Linear Unit (ISRLU). To evaluate, experiments over two deep learning network architectures integrating these activation functions are conducted. The first model, basing on Multilayer Perceptron (MLP), is evaluated with MNIST dataset to perform these activation functions. Meanwhile, the second model, likely VGGish-based architecture, is applied for Acoustic Scene Classification (ASC) Task 1A in DCASE 2018 challenge, thus evaluate whether these activation functions work well in different datasets as well as different network architectures. △ Less

Submitted 5 April, 2021; originally announced April 2021.

Comments: 6 pages, 5 figures

arXiv:2012.15029 [pdf, other]

VinDr-CXR: An open dataset of chest X-rays with radiologist's annotations

Authors: Ha Q. Nguyen, Khanh Lam, Linh T. Le, Hieu H. Pham, Dat Q. Tran, Dung B. Nguyen, Dung D. Le, Chi M. Pham, Hang T. T. Tong, Diep H. Dinh, Cuong D. Do, Luu T. Doan, Cuong N. Nguyen, Binh T. Nguyen, Que V. Nguyen, Au D. Hoang, Hien N. Phan, Anh T. Nguyen, Phuong H. Ho, Dat T. Ngo, Nghia T. Nguyen, Nhan T. Nguyen, Minh Dao, Van Vu

Abstract: Most of the existing chest X-ray datasets include labels from a list of findings without specifying their locations on the radiographs. This limits the development of machine learning algorithms for the detection and localization of chest abnormalities. In this work, we describe a dataset of more than 100,000 chest X-ray scans that were retrospectively collected from two major hospitals in Vietnam… ▽ More Most of the existing chest X-ray datasets include labels from a list of findings without specifying their locations on the radiographs. This limits the development of machine learning algorithms for the detection and localization of chest abnormalities. In this work, we describe a dataset of more than 100,000 chest X-ray scans that were retrospectively collected from two major hospitals in Vietnam. Out of this raw data, we release 18,000 images that were manually annotated by a total of 17 experienced radiologists with 22 local labels of rectangles surrounding abnormalities and 6 global labels of suspected diseases. The released dataset is divided into a training set of 15,000 and a test set of 3,000. Each scan in the training set was independently labeled by 3 radiologists, while each scan in the test set was labeled by the consensus of 5 radiologists. We designed and built a labeling platform for DICOM images to facilitate these annotation procedures. All images are made publicly available (https://www.physionet.org/content/vindr-cxr/1.0.0/) in DICOM format along with the labels of both the training set and the test set. △ Less

Submitted 20 March, 2022; v1 submitted 29 December, 2020; originally announced December 2020.

Comments: 11 pages, under review by Nature Scientific Data

arXiv:2012.13668 [pdf, other]

Deep Learning Framework Applied for Predicting Anomaly of Respiratory Sounds

Authors: Dat Ngo, Lam Pham, Anh Nguyen, Ben Phan, Khoa Tran, Truong Nguyen

Abstract: This paper proposes a robust deep learning framework used for classifying anomaly of respiratory cycles. Initially, our framework starts with front-end feature extraction step. This step aims to transform the respiratory input sound into a two-dimensional spectrogram where both spectral and temporal features are well presented. Next, an ensemble of C- DNN and Autoencoder networks is then applied t… ▽ More This paper proposes a robust deep learning framework used for classifying anomaly of respiratory cycles. Initially, our framework starts with front-end feature extraction step. This step aims to transform the respiratory input sound into a two-dimensional spectrogram where both spectral and temporal features are well presented. Next, an ensemble of C- DNN and Autoencoder networks is then applied to classify into four categories of respiratory anomaly cycles. In this work, we conducted experiments over 2017 Internal Conference on Biomedical Health Informatics (ICBHI) benchmark dataset. As a result, we achieve competitive performances with ICBHI average score of 0.49, ICBHI harmonic score of 0.42. △ Less

Submitted 25 December, 2020; originally announced December 2020.

Comments: 5 pages, 2 figures, 8 tables

arXiv:2012.02471 [pdf, other]

doi 10.1145/3502297

Automated, Cost-effective, and Update-driven App Testing

Authors: Chanh Duc Ngo, Fabrizio Pastore, Lionel Briand

Abstract: Apps' pervasive role in our society led to the definition of test automation approaches to ensure their dependability. However, state-of-the-art approaches tend to generate large numbers of test inputs and are unlikely to achieve more than 50% method coverage. In this paper, we propose a strategy to achieve significantly higher coverage of the code affected by updates with a much smaller number of… ▽ More Apps' pervasive role in our society led to the definition of test automation approaches to ensure their dependability. However, state-of-the-art approaches tend to generate large numbers of test inputs and are unlikely to achieve more than 50% method coverage. In this paper, we propose a strategy to achieve significantly higher coverage of the code affected by updates with a much smaller number of test inputs, thus alleviating the test oracle problem. More specifically, we present ATUA, a model-based approach that synthesizes App models with static analysis, integrates a dynamically-refined state abstraction function and combines complementary testing strategies, including (1) coverage of the model structure, (2) coverage of the App code, (3) random exploration, and (4) coverage of dependencies identified through information retrieval. Its model-based strategy enables ATUA to generate a small set of inputs that exercise only the code affected by the updates. In turn, this makes common test oracle solutions more cost-effective as they tend to involve human effort. A large empirical evaluation, conducted with 72 App versions belonging to nine popular Android Apps, has shown that ATUA is more effective and less effort intensive than state-of-the-art approaches when testing App updates. △ Less

Submitted 6 December, 2021; v1 submitted 4 December, 2020; originally announced December 2020.

arXiv:2009.09619 [pdf, other]

Economic Theoretic LEO Satellite Coverage Control: An Auction-based Framework

Authors: Junghyun Kim, Thong D. Ngo, Paul S. Oh, Sean S. -C. Kwon, Changhee Han, Joongheon Kim

Abstract: Recently, ultra-dense low earth orbit (LEO) satelliteconstellation over high-frequency bands has considered as one ofpromising solutions to supply coverage all over the world. Givensatellite constellations, efficient beam coverage schemes should beemployed at satellites to provide seamless services and full-viewcoverage. In LEO systems, hybrid wide and spot beam coverageschemes are generally used,… ▽ More Recently, ultra-dense low earth orbit (LEO) satelliteconstellation over high-frequency bands has considered as one ofpromising solutions to supply coverage all over the world. Givensatellite constellations, efficient beam coverage schemes should beemployed at satellites to provide seamless services and full-viewcoverage. In LEO systems, hybrid wide and spot beam coverageschemes are generally used, where the LEO provides a widebeam for large area coverage and additional several steering spotbeams for high speed data access. In this given setting, schedulingmultiple spot beams is essentially required. In order to achievethis goal, Vickery-Clarke-Groves (VCG) auction-based trustfulalgorithm is proposed in this paper for scheduling multiple spotbeams for more efficient seamless services and full-view coverage. △ Less

Submitted 21 September, 2020; originally announced September 2020.

Comments: 3 pages

ACM Class: C.2.1

arXiv:2009.02031 [pdf, ps, other]

Joint Resource Allocation to Minimize Execution Time of Federated Learning in Cell-Free Massive MIMO

Authors: Tung T. Vu, Duy T. Ngo, Hien Quoc Ngo, Minh N. Dao, Nguyen H. Tran, Richard H. Middleton

Abstract: Due to its communication efficiency and privacy-preserving capability, federated learning (FL) has emerged as a promising framework for machine learning in 5G-and-beyond wireless networks. Of great interest is the design and optimization of new wireless network structures that support the stable and fast operation of FL. Cell-free massive multiple-input multiple-output (CFmMIMO) turns out to be a… ▽ More Due to its communication efficiency and privacy-preserving capability, federated learning (FL) has emerged as a promising framework for machine learning in 5G-and-beyond wireless networks. Of great interest is the design and optimization of new wireless network structures that support the stable and fast operation of FL. Cell-free massive multiple-input multiple-output (CFmMIMO) turns out to be a suitable candidate, which allows each communication round in the iterative FL process to be stably executed within a large-scale coherence time. Aiming to reduce the total execution time of the FL process in CFmMIMO, this paper proposes choosing only a subset of available users to participate in FL. An optimal selection of users with favorable link conditions would minimize the execution time of each communication round, while limiting the total number of communication rounds required. Toward this end, we formulate a joint optimization problem of user selection, transmit power, and processing frequency, subject to a predefined minimum number of participating users to guarantee the quality of learning. We then develop a new algorithm that is proven to converge to the neighbourhood of the stationary points of the formulated problem. Numerical results confirm that our proposed approach significantly reduces the FL total execution time over baseline schemes. The time reduction is more pronounced when the density of access point deployments is moderately low. △ Less

Submitted 10 June, 2022; v1 submitted 4 September, 2020; originally announced September 2020.

Comments: accepted to appear in IEEE Internet of Things Journal, Jun. 2022

arXiv:2005.12779 [pdf, ps, other]

Sound Context Classification Basing on Join Learning Model and Multi-Spectrogram Features

Authors: Dat Ngo, Hao Hoang, Anh Nguyen, Tien Ly, Lam Pham

Abstract: In this paper, we present a deep learning framework applied for Acoustic Scene Classification (ASC), the task of classifying scene contexts from environmental input sounds. An ASC system generally comprises of two main steps, referred to as front-end feature extraction and back-end classification. In the first step, an extractor is used to extract low-level features from raw audio signals. Next, t… ▽ More In this paper, we present a deep learning framework applied for Acoustic Scene Classification (ASC), the task of classifying scene contexts from environmental input sounds. An ASC system generally comprises of two main steps, referred to as front-end feature extraction and back-end classification. In the first step, an extractor is used to extract low-level features from raw audio signals. Next, the discriminative features extracted are fed into and classified by a classifier, reporting accuracy results. Aim to develop a robust framework applied for ASC, we address exited issues of both the front-end and back-end components in an ASC system, thus present three main contributions: Firstly, we carry out a comprehensive analysis of spectrogram representation extracted from sound scene input, thus propose the best multi-spectrogram combinations. In terms of back-end classification, we propose a novel join learning architecture using parallel convolutional recurrent networks, which is effective to learn spatial features and temporal sequences of spectrogram input. Finally, good experimental results obtained over benchmark datasets of IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Task 1, 2017 Task 1, 2018 Task 1A & 1B, LITIS Rouen prove our proposed framework general and robust for ASC task. △ Less

Submitted 26 May, 2020; originally announced May 2020.

arXiv:2005.12734 [pdf, other]

Interpreting Chest X-rays via CNNs that Exploit Hierarchical Disease Dependencies and Uncertainty Labels

Authors: Hieu H. Pham, Tung T. Le, Dat T. Ngo, Dat Q. Tran, Ha Q. Nguyen

Abstract: The chest X-rays (CXRs) is one of the views most commonly ordered by radiologists (NHS),which is critical for diagnosis of many different thoracic diseases. Accurately detecting thepresence of multiple diseases from CXRs is still a challenging task. We present a multi-labelclassification framework based on deep convolutional neural networks (CNNs) for diagnos-ing the presence of 14 common thoracic… ▽ More The chest X-rays (CXRs) is one of the views most commonly ordered by radiologists (NHS),which is critical for diagnosis of many different thoracic diseases. Accurately detecting thepresence of multiple diseases from CXRs is still a challenging task. We present a multi-labelclassification framework based on deep convolutional neural networks (CNNs) for diagnos-ing the presence of 14 common thoracic diseases and observations. Specifically, we trained astrong set of CNNs that exploit dependencies among abnormality labels and used the labelsmoothing regularization (LSR) for a better handling of uncertain samples. Our deep net-works were trained on over 200,000 CXRs of the recently released CheXpert dataset (Irvinandal., 2019) and the final model, which was an ensemble of the best performing networks,achieved a mean area under the curve (AUC) of 0.940 in predicting 5 selected pathologiesfrom the validation set. To the best of our knowledge, this is the highest AUC score yetreported to date. More importantly, the proposed method was also evaluated on an inde-pendent test set of the CheXpert competition, containing 500 CXR studies annotated by apanel of 5 experienced radiologists. The reported performance was on average better than2.6 out of 3 other individual radiologists with a mean AUC of 0.930, which had led to thecurrent state-of-the-art performance on the CheXpert test set. △ Less

Submitted 25 May, 2020; originally announced May 2020.

Comments: MIDL 2020 Accepted Short Paper. arXiv admin note: substantial text overlap with arXiv:1911.06475

Report number: MIDL/2020/ExtendedAbstract/4o1GLIIHlh

arXiv:2005.09707 [pdf, other]

New Way of Generating Electromagnetic Waves

Authors: Ali Hosseini-Fahraji, Majid Manteghi, Khai d. t. Ngo

Abstract: This paper presents a new method for generating low-frequency electromagnetic waves for navigation and communication in challenging environments, such as underwater and underground. The main idea is to store magnetic energy in two different spaces using the interaction between a permanent magnet and a magnetic material. The magnetic reluctance of the medium around the permanent magnet is modulated… ▽ More This paper presents a new method for generating low-frequency electromagnetic waves for navigation and communication in challenging environments, such as underwater and underground. The main idea is to store magnetic energy in two different spaces using the interaction between a permanent magnet and a magnetic material. The magnetic reluctance of the medium around the permanent magnet is modulated to change the magnetic flux path. The nonlinear properties of magnetic material as a critical phenomenon are used for effective modulation. As a result, a time-variant field is generated by the modulation of the permanent magnet flux. This non-resonant time-variant characterization means that the transmitter is not bound to the fundamental limits of the antennas and can transmit higher data rates. A prototype transmitter as a prove-of-concept is designed and tested based on the proposed idea. Compared to the rotating magnet, the prototyped transmitter can modulate $50\%$ of the stored energy of the permanent magnet with much lower power consumption. △ Less

Submitted 19 May, 2020; originally announced May 2020.

Comments: 8 pages, 9 figures

Showing 1–50 of 82 results for author: Ngo, D