subscribe to arXiv mailings

Streaming Deep Reinforcement Learning Finally Works

Authors: Mohamed Elsayed, Gautham Vasan, A. Rupam Mahmood

Abstract: Natural intelligence processes experience as a continuous stream, sensing, acting, and learning moment-by-moment in real time. Streaming learning, the modus operandi of classic reinforcement learning (RL) algorithms like Q-learning and TD, mimics natural learning by using the most recent sample without storing it. This approach is also ideal for resource-constrained, communication-limited, and pri… ▽ More Natural intelligence processes experience as a continuous stream, sensing, acting, and learning moment-by-moment in real time. Streaming learning, the modus operandi of classic reinforcement learning (RL) algorithms like Q-learning and TD, mimics natural learning by using the most recent sample without storing it. This approach is also ideal for resource-constrained, communication-limited, and privacy-sensitive applications. However, in deep RL, learners almost always use batch updates and replay buffers, making them computationally expensive and incompatible with streaming learning. Although the prevalence of batch deep RL is often attributed to its sample efficiency, a more critical reason for the absence of streaming deep RL is its frequent instability and failure to learn, which we refer to as stream barrier. This paper introduces the stream-x algorithms, the first class of deep RL algorithms to overcome stream barrier for both prediction and control and match sample efficiency of batch RL. Through experiments in Mujoco Gym, DM Control Suite, and Atari Games, we demonstrate stream barrier in existing algorithms and successful stable learning with our stream-x algorithms: stream Q, stream AC, and stream TD, achieving the best model-free performance in DM Control Dog environments. A set of common techniques underlies the stream-x algorithms, enabling their success with a single set of hyperparameters and allowing for easy extension to other algorithms, thereby reviving streaming RL. △ Less

Submitted 18 October, 2024; originally announced October 2024.

arXiv:2410.14242 [pdf, other]

Pseudo-label Refinement for Improving Self-Supervised Learning Systems

Authors: Zia-ur-Rehman, Arif Mahmood, Wenxiong Kang

Abstract: Self-supervised learning systems have gained significant attention in recent years by leveraging clustering-based pseudo-labels to provide supervision without the need for human annotations. However, the noise in these pseudo-labels caused by the clustering methods poses a challenge to the learning process leading to degraded performance. In this work, we propose a pseudo-label refinement (SLR) al… ▽ More Self-supervised learning systems have gained significant attention in recent years by leveraging clustering-based pseudo-labels to provide supervision without the need for human annotations. However, the noise in these pseudo-labels caused by the clustering methods poses a challenge to the learning process leading to degraded performance. In this work, we propose a pseudo-label refinement (SLR) algorithm to address this issue. The cluster labels from the previous epoch are projected to the current epoch cluster-labels space and a linear combination of the new label and the projected label is computed as a soft refined label containing the information from the previous epoch clusters as well as from the current epoch. In contrast to the common practice of using the maximum value as a cluster/class indicator, we employ hierarchical clustering on these soft pseudo-labels to generate refined hard-labels. This approach better utilizes the information embedded in the soft labels, outperforming the simple maximum value approach for hard label generation. The effectiveness of the proposed SLR algorithm is evaluated in the context of person re-identification (Re-ID) using unsupervised domain adaptation (UDA). Experimental results demonstrate that the modified Re-ID baseline, incorporating the SLR algorithm, achieves significantly improved mean Average Precision (mAP) performance in various UDA tasks, including real-to-synthetic, synthetic-to-real, and different real-to-real scenarios. These findings highlight the efficacy of the SLR algorithm in enhancing the performance of self-supervised learning systems. △ Less

Submitted 18 October, 2024; originally announced October 2024.

arXiv:2410.09968 [pdf]

Deep-Ace: LSTM-based Prokaryotic Lysine Acetylation Site Predictor

Authors: Maham Ilyas, Abida Yasmeen, Yaser Daanial Khan, Arif Mahmood

Abstract: Acetylation of lysine residues (K-Ace) is a post-translation modification occurring in both prokaryotes and eukaryotes. It plays a crucial role in disease pathology and cell biology hence it is important to identify these K-Ace sites. In the past, many machine learning-based models using hand-crafted features and encodings have been used to find and analyze the characteristics of K-Ace sites howev… ▽ More Acetylation of lysine residues (K-Ace) is a post-translation modification occurring in both prokaryotes and eukaryotes. It plays a crucial role in disease pathology and cell biology hence it is important to identify these K-Ace sites. In the past, many machine learning-based models using hand-crafted features and encodings have been used to find and analyze the characteristics of K-Ace sites however these methods ignore long term relationships within sequences and therefore observe performance degradation. In the current work we propose Deep-Ace, a deep learning-based framework using Long-Short-Term-Memory (LSTM) network which has the ability to understand and encode long-term relationships within a sequence. Such relations are vital for learning discriminative and effective sequence representations. In the work reported here, the use of LSTM to extract deep features as well as for prediction of K-Ace sites using fully connected layers for eight different species of prokaryotic models (including B. subtilis, C. glutamicum, E. coli, G. kaustophilus, S. eriocheiris, B. velezensis, S. typhimurium, and M. tuberculosis) has been explored. Our proposed method has outperformed existing state of the art models achieving accuracy as 0.80, 0.79, 0.71, 0.75, 0.80, 0.83, 0.756, and 0.82 respectively for eight bacterial species mentioned above. The method with minor modifications can be used for eukaryotic systems and can serve as a tool for the prognosis and diagnosis of various diseases in humans. △ Less

Submitted 20 October, 2024; v1 submitted 13 October, 2024; originally announced October 2024.

arXiv:2410.09964 [pdf, other]

Lower-dimensional projections of cellular expression improves cell type classification from single-cell RNA sequencing

Authors: Muhammad Umar, Muhammad Asif, Arif Mahmood

Abstract: Single-cell RNA sequencing (scRNA-seq) enables the study of cellular diversity at single cell level. It provides a global view of cell-type specification during the onset of biological mechanisms such as developmental processes and human organogenesis. Various statistical, machine and deep learning-based methods have been proposed for cell-type classification. Most of the methods utilizes unsuperv… ▽ More Single-cell RNA sequencing (scRNA-seq) enables the study of cellular diversity at single cell level. It provides a global view of cell-type specification during the onset of biological mechanisms such as developmental processes and human organogenesis. Various statistical, machine and deep learning-based methods have been proposed for cell-type classification. Most of the methods utilizes unsupervised lower dimensional projections obtained from for a large reference data. In this work, we proposed a reference-based method for cell type classification, called EnProCell. The EnProCell, first, computes lower dimensional projections that capture both the high variance and class separability through an ensemble of principle component analysis and multiple discriminant analysis. In the second phase, EnProCell trains a deep neural network on the lower dimensional representation of data to classify cell types. The proposed method outperformed the existing state-of-the-art methods when tested on four different data sets produced from different single-cell sequencing technologies. The EnProCell showed higher accuracy (98.91) and F1 score (98.64) than other methods for predicting reference from reference datasets. Similarly, EnProCell also showed better performance than existing methods in predicting cell types for data with unknown cell types (query) from reference datasets (accuracy:99.52; F1 score: 99.07). In addition to improved performance, the proposed methodology is simple and does not require more computational resources and time. the EnProCell is available at https://github.com/umar1196/EnProCell. △ Less

Submitted 13 October, 2024; originally announced October 2024.

arXiv:2410.09399 [pdf, other]

Text Classification using Graph Convolutional Networks: A Comprehensive Survey

Authors: Syed Mustafa Haider Rizvi, Ramsha Imran, Arif Mahmood

Abstract: Text classification is a quintessential and practical problem in natural language processing with applications in diverse domains such as sentiment analysis, fake news detection, medical diagnosis, and document classification. A sizable body of recent works exists where researchers have studied and tackled text classification from different angles with varying degrees of success. Graph convolution… ▽ More Text classification is a quintessential and practical problem in natural language processing with applications in diverse domains such as sentiment analysis, fake news detection, medical diagnosis, and document classification. A sizable body of recent works exists where researchers have studied and tackled text classification from different angles with varying degrees of success. Graph convolution network (GCN)-based approaches have gained a lot of traction in this domain over the last decade with many implementations achieving state-of-the-art performance in more recent literature and thus, warranting the need for an updated survey. This work aims to summarize and categorize various GCN-based Text Classification approaches with regard to the architecture and mode of supervision. It identifies their strengths and limitations and compares their performance on various benchmark datasets. We also discuss future research directions and the challenges that exist in this domain. △ Less

Submitted 12 October, 2024; originally announced October 2024.

arXiv:2410.04574 [pdf, other]

Enhancing 3D Human Pose Estimation Amidst Severe Occlusion with Dual Transformer Fusion

Authors: Mehwish Ghafoor, Arif Mahmood, Muhammad Bilal

Abstract: In the field of 3D Human Pose Estimation from monocular videos, the presence of diverse occlusion types presents a formidable challenge. Prior research has made progress by harnessing spatial and temporal cues to infer 3D poses from 2D joint observations. This paper introduces a Dual Transformer Fusion (DTF) algorithm, a novel approach to obtain a holistic 3D pose estimation, even in the presence… ▽ More In the field of 3D Human Pose Estimation from monocular videos, the presence of diverse occlusion types presents a formidable challenge. Prior research has made progress by harnessing spatial and temporal cues to infer 3D poses from 2D joint observations. This paper introduces a Dual Transformer Fusion (DTF) algorithm, a novel approach to obtain a holistic 3D pose estimation, even in the presence of severe occlusions. Confronting the issue of occlusion-induced missing joint data, we propose a temporal interpolation-based occlusion guidance mechanism. To enable precise 3D Human Pose Estimation, our approach leverages the innovative DTF architecture, which first generates a pair of intermediate views. Each intermediate-view undergoes spatial refinement through a self-refinement schema. Subsequently, these intermediate-views are fused to yield the final 3D human pose estimation. The entire system is end-to-end trainable. Through extensive experiments conducted on the Human3.6M and MPI-INF-3DHP datasets, our method's performance is rigorously evaluated. Notably, our approach outperforms existing state-of-the-art methods on both datasets, yielding substantial improvements. The code is available here: https://github.com/MehwishG/DTF. △ Less

Submitted 6 October, 2024; originally announced October 2024.

arXiv:2410.04256 [pdf, other]

Implicit to Explicit Entropy Regularization: Benchmarking ViT Fine-tuning under Noisy Labels

Authors: Maria Marrium, Arif Mahmood, Mohammed Bennamoun

Abstract: Automatic annotation of large-scale datasets can introduce noisy training data labels, which adversely affect the learning process of deep neural networks (DNNs). Consequently, Noisy Labels Learning (NLL) has become a critical research field for Convolutional Neural Networks (CNNs), though it remains less explored for Vision Transformers (ViTs). In this study, we evaluate the vulnerability of ViT… ▽ More Automatic annotation of large-scale datasets can introduce noisy training data labels, which adversely affect the learning process of deep neural networks (DNNs). Consequently, Noisy Labels Learning (NLL) has become a critical research field for Convolutional Neural Networks (CNNs), though it remains less explored for Vision Transformers (ViTs). In this study, we evaluate the vulnerability of ViT fine-tuning to noisy labels and compare its robustness with CNNs. We also investigate whether NLL methods developed for CNNs are equally effective for ViTs. Using linear probing and MLP-K fine-tuning, we benchmark two ViT backbones (ViT-B/16 and ViT-L/16) using three commonly used classification losses: Cross Entropy (CE), Focal Loss (FL), and Mean Absolute Error (MAE), alongside six robust NLL methods: GCE, SCE, NLNL, APL, NCE+AGCE, and ANL-CE. The evaluation is conducted across six datasets including MNIST, CIFAR-10/100, WebVision, Clothing1M, and Food-101N. Furthermore, we explore whether implicit prediction entropy minimization contributes to ViT robustness against noisy labels, noting a general trend of prediction entropy reduction across most NLL methods. Building on this observation, we examine whether explicit entropy minimization could enhance ViT resilience to noisy labels. Our findings indicate that incorporating entropy regularization enhances the performance of established loss functions such as CE and FL, as well as the robustness of the six studied NLL methods across both ViT backbones. △ Less

Submitted 5 October, 2024; originally announced October 2024.

arXiv:2409.15495 [pdf, other]

From Our Lab to Their Homes: Learnings from Longitudinal Field Research with Older Adults

Authors: Amama Mahmood, Chien-Ming Huang

Abstract: Conducting research with older adults in their home environments presents unique opportunities and challenges that differ significantly from traditional lab-based studies. In this paper, we share our experiences from year-long research activities aiming to design and evaluate conversational voice assistants for older adults through longitudinal deployment, interviews, co-design workshops, and eval… ▽ More Conducting research with older adults in their home environments presents unique opportunities and challenges that differ significantly from traditional lab-based studies. In this paper, we share our experiences from year-long research activities aiming to design and evaluate conversational voice assistants for older adults through longitudinal deployment, interviews, co-design workshops, and evaluation studies. We discuss the benefits of bringing the lab to their home, including producing realistic and contextual interactions, creating stronger researcher-participant bonds, and enabling participant growth with the research over time. We also detail the difficulties encountered in various aspects of the research process, including recruitment, scheduling, logistics, following study protocols, and study closure. These learnings highlight the complex, yet rewarding, nature of longitudinal home-based research with older adults, offering lessons for future studies aiming to achieve real-world applicability. △ Less

Submitted 23 September, 2024; originally announced September 2024.

arXiv:2409.15488 [pdf, other]

Voice Assistants for Health Self-Management: Designing for and with Older Adults

Authors: Amama Mahmood, Shiye Cao, Maia Stiber, Victor Nikhil Antony, Chien-Ming Huang

Abstract: Supporting older adults in health self-management is crucial for promoting independent aging, particularly given the growing strain on healthcare systems. While voice assistants (VAs) hold the potential to support aging in place, they often lack tailored assistance and present usability challenges. We addressed these issues through a five-stage design process with older adults to develop a persona… ▽ More Supporting older adults in health self-management is crucial for promoting independent aging, particularly given the growing strain on healthcare systems. While voice assistants (VAs) hold the potential to support aging in place, they often lack tailored assistance and present usability challenges. We addressed these issues through a five-stage design process with older adults to develop a personal health assistant. Starting with in-home interviews (N=17), we identified two primary challenges in older adult's health self-management: health awareness and medical adherence. To address these challenges, we developed a high-fidelity LLM-powered VA prototype to debrief doctor's visit notes and generate tailored medication reminders. We refined our prototype with feedback from co-design workshops (N=10) and validated its usability through in-home studies (N=5). Our work highlights key design features for personal health assistants and provides broader insights into desirable VA characteristics, including personalization, adapting to user context, and respect for user autonomy. △ Less

Submitted 23 September, 2024; originally announced September 2024.

arXiv:2409.14345 [pdf]

Evaluation of drought tolerance of some almond genotypes by morphological, phytochemical and molecular markers in Sulaymaniyah governorate

Authors: Anwar Mohammed Raouf Mahmood

Abstract: The study was carried out during 2017 to 2019 growing seasons at four locations in Sulaimani governorate and one location in Halabja governorate, in the Iraqi Kurdistan region including SH, M, Q, B and H. A huge number almond trees were observed for all locations, among them 38 trees were selected with the best morphological characteristics which were chosen 9,3,5,7 and 14 trees depending on the l… ▽ More The study was carried out during 2017 to 2019 growing seasons at four locations in Sulaimani governorate and one location in Halabja governorate, in the Iraqi Kurdistan region including SH, M, Q, B and H. A huge number almond trees were observed for all locations, among them 38 trees were selected with the best morphological characteristics which were chosen 9,3,5,7 and 14 trees depending on the locations, respectively. A simple experiment was conducted using RCBD for this experiment and means was separated by Duncans test. In order to evaluate their tolerances to drought in glass house, an experiment was conducted at Department of Horticulture, College of Agricultural Engineering Sciences, University of Sulaimani, that seeds were taken from those genotype trees and stratified then sown in pots. A factorial RCBD experiment was used with two factors genotypes and irrigation intervals. Therefore, thirty eight seedling genotypes grown in pots under glasshouse condition were exposed to three irrigation intervals 10,20 and 40 days after 10 days from seedling emergence. Therefore, the number of treatment combinations was 114 seedlings for each replicate and with a total 342 seedlings for the whole experiment. Analysis of variance was carried out and the means were compared according to LSD 0.05. As a result of the study, the seedlings showed different levels of adaptation to drought that can be used to future breeding programs as rootstocks. The objectives of this study were to identify morphological, phytochemical and genetic diversity with relatedness among the most important almond genotypes in Sulaimani Region which related to drought tolerance to and relationship between morphological, biochemical and molecular data. △ Less

Submitted 22 September, 2024; originally announced September 2024.

arXiv:2409.06073 [pdf, other]

Integration of Beyond Diagonal RIS and UAVs in 6G NTNs: Enhancing Aerial Connectivity

Authors: Wali Ullah Khan, Eva Lagunas, Asad Mahmood, Muhammad Asif, Manzoor Ahmed, Symeon Chatzinotas

Abstract: The reconfigurable intelligent surface (RIS) technology shows great potential in sixth-generation (6G) terrestrial and non-terrestrial networks (NTNs) since it can effectively change wireless settings to improve connectivity. Extensive research has been conducted on traditional RIS systems with diagonal phase response matrices. The straightforward RIS architecture, while cost-effective, has restri… ▽ More The reconfigurable intelligent surface (RIS) technology shows great potential in sixth-generation (6G) terrestrial and non-terrestrial networks (NTNs) since it can effectively change wireless settings to improve connectivity. Extensive research has been conducted on traditional RIS systems with diagonal phase response matrices. The straightforward RIS architecture, while cost-effective, has restricted capabilities in manipulating the wireless channels. The beyond diagonal reconfigurable intelligent surface (BD-RIS) greatly improves control over the wireless environment by utilizing interconnected phase response elements. This work proposes the integration of unmanned aerial vehicle (UAV) communications and BD-RIS in 6G NTNs, which has the potential to further enhance wireless coverage and spectral efficiency. We begin with the preliminaries of UAV communications and then discuss the fundamentals of BD-RIS technology. Subsequently, we discuss the potential of BD-RIS and UAV communications integration. We then proposed a case study based on UAV-mounted transmissive BD-RIS communication. Finally, we highlight future research directions and conclude this work. △ Less

Submitted 9 September, 2024; originally announced September 2024.

Comments: 7,4

arXiv:2408.15084 [pdf, other]

CR-Enabled NOMA Integrated Non-Terrestrial IoT Networks with Transmissive RIS

Authors: Wali Ullah Khan, Zain Ali, Asad Mahmood, Eva Lagunas, Syed Tariq Shah, Symeon Chatzinotas

Abstract: This work proposes a T-RIS-equipped LEO satellite communication in cognitive radio-enabled integrated NTNs. In the proposed system, a GEO satellite operates as a primary network, and a T-RIS-equipped LEO satellite operates as a secondary IoT network. The objective is to maximize the sum rate of T-RIS-equipped LEO satellite communication using downlink NOMA while ensuring the service quality of GEO… ▽ More This work proposes a T-RIS-equipped LEO satellite communication in cognitive radio-enabled integrated NTNs. In the proposed system, a GEO satellite operates as a primary network, and a T-RIS-equipped LEO satellite operates as a secondary IoT network. The objective is to maximize the sum rate of T-RIS-equipped LEO satellite communication using downlink NOMA while ensuring the service quality of GEO cellular users. Our framework simultaneously optimizes the total transmit power of LEO, NOMA power allocation for LEO IoT (LIoT) and T-RIS phase shift design subject to the service quality of LIoT and interference temperature to the primary GEO network. To solve the non-convex sum rate maximization problem, we first adopt successive convex approximations to reduce the complexity of the formulated optimization. Then, we divide the problem into two parts, i.e., power allocation of LEO and phase shift design of T-RIS. The power allocation problem is solved using KKT conditions, while the phase shift problem is handled by Taylor approximation and semidefinite programming. Numerical results are provided to validate the proposed optimization framework. △ Less

Submitted 27 August, 2024; originally announced August 2024.

Comments: 7,5

arXiv:2408.12926 [pdf, other]

Balancing AoI and Rate for Mission-Critical and eMBB Coexistence with Puncturing, NOMA,and RSMA in Cellular Uplink

Authors: Farnaz Khodakhah, Aamir Mahmood, Čedomir Stefanović, Hossam Farag, Patrik Österberg, Mikael Gidlund

Abstract: Through the lens of average and peak age-of-information (AoI), this paper takes a fresh look into the uplink medium access solutions for mission-critical (MC) communication coexisting with enhanced mobile broadband (eMBB) service. Considering the stochastic packet arrivals from an MC user, we study three access schemes: orthogonal multiple access (OMA) with eMBB preemption (puncturing), non-orthog… ▽ More Through the lens of average and peak age-of-information (AoI), this paper takes a fresh look into the uplink medium access solutions for mission-critical (MC) communication coexisting with enhanced mobile broadband (eMBB) service. Considering the stochastic packet arrivals from an MC user, we study three access schemes: orthogonal multiple access (OMA) with eMBB preemption (puncturing), non-orthogonal multiple access (NOMA), and rate-splitting multiple access (RSMA), the latter two both with concurrent eMBB transmissions. Puncturing is found to reduce both average AoI and peak AoI (PAoI) violation probability but at the expense of decreased eMBB user rates and increased signaling complexity. Conversely, NOMA and RSMA offer higher eMBB rates but may lead to MC packet loss and AoI degradation. The paper systematically investigates the conditions under which NOMA or RSMA can closely match the average AoI and PAoI violation performance of puncturing while maintaining data rate gains. Closed-form expressions for average AoI and PAoI violation probability are derived, and conditions on the eMBB and MC channel gain difference with respect to the base station are analyzed. Additionally, optimal power and rate splitting factors in RSMA are determined through an exhaustive search to minimize MC outage probability. Notably, our results indicate that with a small loss in the average AoI and PAoI violation probability the eMBB rate in NOMA and RSMA can be approximately five times higher than that achieved through puncturing. △ Less

Submitted 23 August, 2024; originally announced August 2024.

Comments: 14 pages, 9 figures, under review for possible publication in IEEE TVT

arXiv:2407.15879 [pdf, other]

Decentralized Federated Anomaly Detection in Smart Grids: A P2P Gossip Approach

Authors: Muhammad Akbar Husnoo, Adnan Anwar, Md Enamul Haque, A. N. Mahmood

Abstract: The increasing security and privacy concerns in the Smart Grid sector have led to a significant demand for robust intrusion detection systems within critical smart grid infrastructure. To address the challenges posed by privacy preservation and decentralized power system zones with distinct data ownership, Federated Learning (FL) has emerged as a promising privacy-preserving solution which facilit… ▽ More The increasing security and privacy concerns in the Smart Grid sector have led to a significant demand for robust intrusion detection systems within critical smart grid infrastructure. To address the challenges posed by privacy preservation and decentralized power system zones with distinct data ownership, Federated Learning (FL) has emerged as a promising privacy-preserving solution which facilitates collaborative training of attack detection models without necessitating the sharing of raw data. However, FL presents several implementation limitations in the power system domain due to its heavy reliance on a centralized aggregator and the risks of privacy leakage during model update transmission. To overcome these technical bottlenecks, this paper introduces a novel decentralized federated anomaly detection scheme based on two main gossip protocols namely Random Walk and Epidemic. Our findings indicate that the Random Walk protocol exhibits superior performance compared to the Epidemic protocol, highlighting its efficacy in decentralized federated learning environments. Experimental validation of the proposed framework utilizing publicly available industrial control systems datasets demonstrates superior attack detection accuracy while safeguarding data confidentiality and mitigating the impact of communication latency and stragglers. Furthermore, our approach yields a notable 35% improvement in training time compared to conventional FL, underscoring the efficacy and robustness of our decentralized learning method. △ Less

Submitted 20 July, 2024; originally announced July 2024.

arXiv:2407.15707 [pdf, other]

Predicting the Best of N Visual Trackers

Authors: Basit Alawode, Sajid Javed, Arif Mahmood, Jiri Matas

Abstract: We observe that the performance of SOTA visual trackers surprisingly strongly varies across different video attributes and datasets. No single tracker remains the best performer across all tracking attributes and datasets. To bridge this gap, for a given video sequence, we predict the "Best of the N Trackers", called the BofN meta-tracker. At its core, a Tracking Performance Prediction Network (TP… ▽ More We observe that the performance of SOTA visual trackers surprisingly strongly varies across different video attributes and datasets. No single tracker remains the best performer across all tracking attributes and datasets. To bridge this gap, for a given video sequence, we predict the "Best of the N Trackers", called the BofN meta-tracker. At its core, a Tracking Performance Prediction Network (TP2N) selects a predicted best performing visual tracker for the given video sequence using only a few initial frames. We also introduce a frame-level BofN meta-tracker which keeps predicting best performer after regular temporal intervals. The TP2N is based on self-supervised learning architectures MocoV2, SwAv, BT, and DINO; experiments show that the DINO with ViT-S as a backbone performs the best. The video-level BofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine standard benchmarks - LaSOT, TrackingNet, GOT-10K, VOT2019, VOT2021, VOT2022, UAV123, OTB100, and WebUAV-3M. Further improvement is achieved by the frame-level BofN meta-tracker effectively handling variations in the tracking scenarios within long sequences. For instance, on GOT-10k, BofN meta-tracker average overlap is 88.7% and 91.1% with video and frame-level settings respectively. The best performing tracker, RTS, achieves 85.20% AO. On VOT2022, BofN expected average overlap is 67.88% and 70.98% with video and frame level settings, compared to the best performing ARTrack, 64.12%. This work also presents an extensive evaluation of competitive tracking methods on all commonly used benchmarks, following their protocols. The code, the trained models, and the results will soon be made publicly available on https://github.com/BasitAlawode/Best_of_N_Trackers. △ Less

Submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.13355 [pdf, other]

EarlyMalDetect: A Novel Approach for Early Windows Malware Detection Based on Sequences of API Calls

Authors: Pascal Maniriho, Abdun Naser Mahmood, Mohammad Jabed Morshed Chowdhury

Abstract: In this work, we propose EarlyMalDetect, a novel approach for early Windows malware detection based on sequences of API calls. Our approach leverages generative transformer models and attention-guided deep recurrent neural networks to accurately identify and detect patterns of malicious behaviors in the early stage of malware execution. By analyzing the sequences of API calls invoked during execut… ▽ More In this work, we propose EarlyMalDetect, a novel approach for early Windows malware detection based on sequences of API calls. Our approach leverages generative transformer models and attention-guided deep recurrent neural networks to accurately identify and detect patterns of malicious behaviors in the early stage of malware execution. By analyzing the sequences of API calls invoked during execution, the proposed approach can classify executable files (programs) as malware or benign by predicting their behaviors based on a few shots (initial API calls) invoked during execution. EarlyMalDetect can predict and reveal what a malware program is going to perform on the target system before it occurs, which can help to stop it before executing its malicious payload and infecting the system. Specifically, EarlyMalDetect relies on a fine-tuned transformer model based on API calls which has the potential to predict the next API call functions to be used by a malware or benign executable program. Our extensive experimental evaluations show that the proposed approach is highly effective in predicting malware behaviors and can be used as a preventive measure against zero-day threats in Windows systems. △ Less

Submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.10240 [pdf]

xLSTMTime : Long-term Time Series Forecasting With xLSTM

Authors: Musleh Alharthi, Ausif Mahmood

Abstract: In recent years, transformer-based models have gained prominence in multivariate long-term time series forecasting (LTSF), demonstrating significant advancements despite facing challenges such as high computational demands, difficulty in capturing temporal dynamics, and managing long-term dependencies. The emergence of LTSF-Linear, with its straightforward linear architecture, has notably outperfo… ▽ More In recent years, transformer-based models have gained prominence in multivariate long-term time series forecasting (LTSF), demonstrating significant advancements despite facing challenges such as high computational demands, difficulty in capturing temporal dynamics, and managing long-term dependencies. The emergence of LTSF-Linear, with its straightforward linear architecture, has notably outperformed transformer-based counterparts, prompting a reevaluation of the transformer's utility in time series forecasting. In response, this paper presents an adaptation of a recent architecture termed extended LSTM (xLSTM) for LTSF. xLSTM incorporates exponential gating and a revised memory structure with higher capacity that has good potential for LTSF. Our adopted architecture for LTSF termed as xLSTMTime surpasses current approaches. We compare xLSTMTime's performance against various state-of-the-art models across multiple real-world da-tasets, demonstrating superior forecasting capabilities. Our findings suggest that refined recurrent architectures can offer competitive alternatives to transformer-based models in LTSF tasks, po-tentially redefining the landscape of time series forecasting. △ Less

Submitted 11 August, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

arXiv:2407.05260 [pdf, other]

Improved Channel Coding Performance Through Cost Variability

Authors: Adeel Mahmood, Aaron B. Wagner

Abstract: Channel coding for discrete memoryless channels (DMCs) with mean and variance cost constraints has been recently introduced. We show that there is an improvement in coding performance due to cost variability, both with and without feedback. We demonstrate this improvement over the traditional almost-sure cost constraint (also called the peak-power constraint) that prohibits any cost variation abov… ▽ More Channel coding for discrete memoryless channels (DMCs) with mean and variance cost constraints has been recently introduced. We show that there is an improvement in coding performance due to cost variability, both with and without feedback. We demonstrate this improvement over the traditional almost-sure cost constraint (also called the peak-power constraint) that prohibits any cost variation above a fixed threshold. Our result simultaneously shows that feedback does not improve the second-order coding rate of simple-dispersion DMCs under the peak-power constraint. This finding parallels similar results for unconstrained simple-dispersion DMCs, additive white Gaussian noise (AWGN) channels and parallel Gaussian channels. △ Less

Submitted 17 September, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.03316 [pdf, other]

An Upper Limit on the Photoproduction Cross Section of the Spin-Exotic $π_1(1600)$

Authors: F. Afzal, C. S. Akondi, M. Albrecht, M. Amaryan, S. Arrigo, V. Arroyave, A. Asaturyan, A. Austregesilo, Z. Baldwin, F. Barbosa, J. Barlow, E. Barriga, R. Barsotti, D. Barton, V. Baturin, V. V. Berdnikov, T. Black, W. Boeglin, M. Boer, W. J. Briscoe, T. Britton, S. Cao, E. Chudakov, G. Chung, P. L. Cole , et al. (124 additional authors not shown)

Abstract: The spin-exotic hybrid meson $π_{1}(1600)$ is predicted to have a large decay rate to the $ωππ$ final state. Using 76.6~pb$^{-1}$ of data collected with the GlueX detector, we measure the cross sections for the reactions $γp \to ωπ^+ π^- p$, $γp \to ωπ^0 π^0 p$, and $γp\toωπ^-π^0Δ^{++}$ in the range $E_γ=$ 8-10 GeV. Using isospin conservation, we set the first upper limits on the photoproduction c… ▽ More The spin-exotic hybrid meson $π_{1}(1600)$ is predicted to have a large decay rate to the $ωππ$ final state. Using 76.6~pb$^{-1}$ of data collected with the GlueX detector, we measure the cross sections for the reactions $γp \to ωπ^+ π^- p$, $γp \to ωπ^0 π^0 p$, and $γp\toωπ^-π^0Δ^{++}$ in the range $E_γ=$ 8-10 GeV. Using isospin conservation, we set the first upper limits on the photoproduction cross sections of the $π^{0}_{1}(1600)$ and $π^{-}_{1}(1600)$. We combine these limits with lattice calculations of decay widths and find that photoproduction of $η'π$ is the most sensitive two-body system to search for the $π_1(1600)$. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 6 pages, 3 figures plus supplemental materials

arXiv:2407.01704 [pdf, other]

Weight Clipping for Deep Continual and Reinforcement Learning

Authors: Mohamed Elsayed, Qingfeng Lan, Clare Lyle, A. Rupam Mahmood

Abstract: Many failures in deep continual and reinforcement learning are associated with increasing magnitudes of the weights, making them hard to change and potentially causing overfitting. While many methods address these learning failures, they often change the optimizer or the architecture, a complexity that hinders widespread adoption in various systems. In this paper, we focus on learning failures tha… ▽ More Many failures in deep continual and reinforcement learning are associated with increasing magnitudes of the weights, making them hard to change and potentially causing overfitting. While many methods address these learning failures, they often change the optimizer or the architecture, a complexity that hinders widespread adoption in various systems. In this paper, we focus on learning failures that are associated with increasing weight norm and we propose a simple technique that can be easily added on top of existing learning systems: clipping neural network weights to limit them to a specific range. We study the effectiveness of weight clipping in a series of supervised and reinforcement learning experiments. Our empirical results highlight the benefits of weight clipping for generalization, addressing loss of plasticity and policy collapse, and facilitating learning with a large replay ratio. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Published in the First Reinforcement Learning Conference (RLC 2024). Code is available at https://github.com/mohmdelsayed/weight-clipping

arXiv:2407.00324 [pdf, other]

Revisiting Sparse Rewards for Goal-Reaching Reinforcement Learning

Authors: Gautham Vasan, Yan Wang, Fahim Shahriar, James Bergstra, Martin Jagersand, A. Rupam Mahmood

Abstract: Many real-world robot learning problems, such as pick-and-place or arriving at a destination, can be seen as a problem of reaching a goal state as soon as possible. These problems, when formulated as episodic reinforcement learning tasks, can easily be specified to align well with our intended goal: -1 reward every time step with termination upon reaching the goal state, called minimum-time tasks.… ▽ More Many real-world robot learning problems, such as pick-and-place or arriving at a destination, can be seen as a problem of reaching a goal state as soon as possible. These problems, when formulated as episodic reinforcement learning tasks, can easily be specified to align well with our intended goal: -1 reward every time step with termination upon reaching the goal state, called minimum-time tasks. Despite this simplicity, such formulations are often overlooked in favor of dense rewards due to their perceived difficulty and lack of informativeness. Our studies contrast the two reward paradigms, revealing that the minimum-time task specification not only facilitates learning higher-quality policies but can also surpass dense-reward-based policies on their own performance metrics. Crucially, we also identify the goal-hit rate of the initial policy as a robust early indicator for learning success in such sparse feedback settings. Finally, using four distinct real-robotic platforms, we show that it is possible to learn pixel-based policies from scratch within two to three hours using constant negative rewards. △ Less

Submitted 8 July, 2024; v1 submitted 29 June, 2024; originally announced July 2024.

Comments: In Proceedings of Reinforcement Learning Conference 2024. For a video demo, see https://youtu.be/a6zlVUuKzBc

arXiv:2407.00148 [pdf, other]

Localizing Anomalies via Multiscale Score Matching Analysis

Authors: Ahsan Mahmood, Junier Oliva, Martin Styner

Abstract: Anomaly detection and localization in medical imaging remain critical challenges in healthcare. This paper introduces Spatial-MSMA (Multiscale Score Matching Analysis), a novel unsupervised method for anomaly localization in volumetric brain MRIs. Building upon the MSMA framework, our approach incorporates spatial information and conditional likelihoods to enhance anomaly detection capabilities. W… ▽ More Anomaly detection and localization in medical imaging remain critical challenges in healthcare. This paper introduces Spatial-MSMA (Multiscale Score Matching Analysis), a novel unsupervised method for anomaly localization in volumetric brain MRIs. Building upon the MSMA framework, our approach incorporates spatial information and conditional likelihoods to enhance anomaly detection capabilities. We employ a flexible normalizing flow model conditioned on patch positions and global image features to estimate patch-wise anomaly scores. The method is evaluated on a dataset of 1,650 T1- and T2-weighted brain MRIs from typically developing children, with simulated lesions added to the test set. Spatial-MSMA significantly outperforms existing methods, including reconstruction-based, generative-based, and interpretation-based approaches, in lesion detection and segmentation tasks. Our model achieves superior performance in both distance-based metrics (99th percentile Hausdorff Distance: $7.05 \pm 0.61$, Mean Surface Distance: $2.10 \pm 0.43$) and component-wise metrics (True Positive Rate: $0.83 \pm 0.01$, Positive Predictive Value: $0.96 \pm 0.01$). These results demonstrate Spatial-MSMA's potential for accurate and interpretable anomaly localization in medical imaging, with implications for improved diagnosis and treatment planning in clinical settings. Our code is available at~\url{https://github.com/ahsanMah/sade/}. △ Less

Submitted 18 July, 2024; v1 submitted 28 June, 2024; originally announced July 2024.

arXiv:2406.12829 [pdf, other]

Measurement of Spin-Density Matrix Elements in $Δ^{++}(1232)$ photoproduction

Authors: F. Afzal, C. S. Akondi, M. Albrecht, M. Amaryan, S. Arrigo, V. Arroyave, A. Asaturyan, A. Austregesilo, Z. Baldwin, F. Barbosa, J. Barlow, E. Barriga, R. Barsotti, D. Barton, V. Baturin, V. V. Berdnikov, T. Black, W. Boeglin, M. Boer, W. J. Briscoe, T. Britton, S. Cao, E. Chudakov, G. Chung, P. L. Cole , et al. (124 additional authors not shown)

Abstract: We measure the spin-density matrix elements (SDMEs) of the $Δ^{++}(1232)$ in the photoproduction reaction $γp \to π^-Δ^{++}(1232)$ with the GlueX experiment in Hall D at Jefferson Lab. The measurement uses a linearly--polarized photon beam with energies from $8.2$ to $8.8$~GeV and the statistical precision of the SDMEs exceeds the previous measurement by three orders of magnitude for the momentum… ▽ More We measure the spin-density matrix elements (SDMEs) of the $Δ^{++}(1232)$ in the photoproduction reaction $γp \to π^-Δ^{++}(1232)$ with the GlueX experiment in Hall D at Jefferson Lab. The measurement uses a linearly--polarized photon beam with energies from $8.2$ to $8.8$~GeV and the statistical precision of the SDMEs exceeds the previous measurement by three orders of magnitude for the momentum transfer squared region below $1.4$ GeV$^2$. The data are sensitive to the previously undetermined relative sign between couplings in existing Regge-exchange models. Linear combinations of the extracted SDMEs allow for a decomposition into natural and unnatural--exchange amplitudes. We find that the unnatural exchange plays an important role in the low momentum transfer region. △ Less

Submitted 26 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12241 [pdf, other]

More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling

Authors: Haque Ishfaq, Yixin Tan, Yu Yang, Qingfeng Lan, Jianfeng Lu, A. Rupam Mahmood, Doina Precup, Pan Xu

Abstract: Thompson sampling (TS) is one of the most popular exploration techniques in reinforcement learning (RL). However, most TS algorithms with theoretical guarantees are difficult to implement and not generalizable to Deep RL. While the emerging approximate sampling-based exploration schemes are promising, most existing algorithms are specific to linear Markov Decision Processes (MDP) with suboptimal r… ▽ More Thompson sampling (TS) is one of the most popular exploration techniques in reinforcement learning (RL). However, most TS algorithms with theoretical guarantees are difficult to implement and not generalizable to Deep RL. While the emerging approximate sampling-based exploration schemes are promising, most existing algorithms are specific to linear Markov Decision Processes (MDP) with suboptimal regret bounds, or only use the most basic samplers such as Langevin Monte Carlo. In this work, we propose an algorithmic framework that incorporates different approximate sampling methods with the recently proposed Feel-Good Thompson Sampling (FGTS) approach (Zhang, 2022; Dann et al., 2021), which was previously known to be computationally intractable in general. When applied to linear MDPs, our regret analysis yields the best known dependency of regret on dimensionality, surpassing existing randomized algorithms. Additionally, we provide explicit sampling complexity for each employed sampler. Empirically, we show that in tasks where deep exploration is necessary, our proposed algorithms that combine FGTS and approximate sampling perform significantly better compared to other strong baselines. On several challenging games from the Atari 57 suite, our algorithms achieve performance that is either better than or on par with other strong baselines from the deep RL literature. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: First two authors contributed equally. Accepted to the Reinforcement Learning Conference (RLC) 2024

arXiv:2406.10691 [pdf, other]

Beyond Diagonal RIS for 6G Non-Terrestrial Networks: Potentials and Challenges

Authors: Wali Ullah Khan, Asad Mahmood, Muhammad Ali Jamshed, Eva Lagunas, Manzoor Ahmed, Symeon Chatzinotas

Abstract: Reconfigurable intelligent surface (RIS) has emerged as a promising technology in both terrestrial and non-terrestrial networks (NTNs) due to its ability to manipulate wireless environments for better connectivity. Significant studies have been focused on conventional RIS with diagonal phase response matrices. This simple RIS architecture, though less expensive, has limited flexibility in engineer… ▽ More Reconfigurable intelligent surface (RIS) has emerged as a promising technology in both terrestrial and non-terrestrial networks (NTNs) due to its ability to manipulate wireless environments for better connectivity. Significant studies have been focused on conventional RIS with diagonal phase response matrices. This simple RIS architecture, though less expensive, has limited flexibility in engineering the wireless channels. As the latest member of RIS technology, beyond diagonal RIS (BD-RIS) has recently been proposed in terrestrial setups. Due to the interconnected phase response elements (PREs), BD-RIS significantly enhances the control over the wireless environment. This work proposes the potential and challenges of BD-RIS in NTNs. We begin with the motivation and recent advances in BD-RIS. Subsequently, we discuss the fundamentals of BD-RIS and NTNs. We then outline the application of BD-RIS in NTNs, followed by a case study on BD-RIS enabled non-orthogonal multiple access low earth orbit satellite communication. Finally, we highlight challenges and research directions with concluding remarks. △ Less

Submitted 22 September, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: 10,4

arXiv:2406.05205 [pdf, other]

CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment

Authors: Sajid Javed, Arif Mahmood, Iyyakutti Iyappan Ganapathi, Fayaz Ali Dharejo, Naoufel Werghi, Mohammed Bennamoun

Abstract: This paper proposes Comprehensive Pathology Language Image Pre-training (CPLIP), a new unsupervised technique designed to enhance the alignment of images and text in histopathology for tasks such as classification and segmentation. This methodology enriches vision-language models by leveraging extensive data without needing ground truth annotations. CPLIP involves constructing a pathology-specific… ▽ More This paper proposes Comprehensive Pathology Language Image Pre-training (CPLIP), a new unsupervised technique designed to enhance the alignment of images and text in histopathology for tasks such as classification and segmentation. This methodology enriches vision-language models by leveraging extensive data without needing ground truth annotations. CPLIP involves constructing a pathology-specific dictionary, generating textual descriptions for images using language models, and retrieving relevant images for each text snippet via a pre-trained model. The model is then fine-tuned using a many-to-many contrastive learning method to align complex interrelated concepts across both modalities. Evaluated across multiple histopathology tasks, CPLIP shows notable improvements in zero-shot learning scenarios, outperforming existing methods in both interpretability and robustness and setting a higher benchmark for the application of vision-language models in the field. To encourage further research and replication, the code for CPLIP is available on GitHub at https://cplip.github.io/ △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.03276 [pdf, other]

Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning

Authors: Mohamed Elsayed, Homayoon Farrahi, Felix Dangel, A. Rupam Mahmood

Abstract: Second-order information is valuable for many applications but challenging to compute. Several works focus on computing or approximating Hessian diagonals, but even this simplification introduces significant additional costs compared to computing a gradient. In the absence of efficient exact computation schemes for Hessian diagonals, we revisit an early approximation scheme proposed by Becker and… ▽ More Second-order information is valuable for many applications but challenging to compute. Several works focus on computing or approximating Hessian diagonals, but even this simplification introduces significant additional costs compared to computing a gradient. In the absence of efficient exact computation schemes for Hessian diagonals, we revisit an early approximation scheme proposed by Becker and LeCun (1989, BL89), which has a cost similar to gradients and appears to have been overlooked by the community. We introduce HesScale, an improvement over BL89, which adds negligible extra computation. On small networks, we find that this improvement is of higher quality than all alternatives, even those with theoretical guarantees, such as unbiasedness, while being much cheaper to compute. We use this insight in reinforcement learning problems where small networks are used and demonstrate HesScale in second-order optimization and scaling the step-size parameter. In our experiments, HesScale optimizes faster than existing methods and improves stability through step-size scaling. These findings are promising for scaling second-order methods in larger models in the future. △ Less

Submitted 3 July, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

Comments: Published in the Proceedings of the 41st International Conference on Machine Learning (ICML 2024). Code is available at https://github.com/mohmdelsayed/HesScale. arXiv admin note: substantial text overlap with arXiv:2210.11639

arXiv:2405.21043 [pdf, other]

Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation

Authors: Fengdi Che, Chenjun Xiao, Jincheng Mei, Bo Dai, Ramki Gummadi, Oscar A Ramirez, Christopher K Harris, A. Rupam Mahmood, Dale Schuurmans

Abstract: We prove that the combination of a target network and over-parameterized linear function approximation establishes a weaker convergence condition for bootstrapped value estimation in certain cases, even with off-policy data. Our condition is naturally satisfied for expected updates over the entire state-action space or learning with a batch of complete trajectories from episodic Markov decision pr… ▽ More We prove that the combination of a target network and over-parameterized linear function approximation establishes a weaker convergence condition for bootstrapped value estimation in certain cases, even with off-policy data. Our condition is naturally satisfied for expected updates over the entire state-action space or learning with a batch of complete trajectories from episodic Markov decision processes. Notably, using only a target network or an over-parameterized model does not provide such a convergence guarantee. Additionally, we extend our results to learning with truncated trajectories, showing that convergence is achievable for all tasks with minor modifications, akin to value truncation for the final states in trajectories. Our primary result focuses on temporal difference estimation for prediction, providing high-probability value estimation error bounds and empirical analysis on Baird's counterexample and a Four-room task. Furthermore, we explore the control setting, demonstrating that similar convergence conditions apply to Q-learning. △ Less

Submitted 4 October, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

Journal ref: Proceedings of the 41 st International Conference on Machine Learning, 2024

arXiv:2405.14881 [pdf, other]

DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models

Authors: Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood, Karthik Nandakumar

Abstract: Recently, a number of image-mixing-based augmentation techniques have been introduced to improve the generalization of deep neural networks. In these techniques, two or more randomly selected natural images are mixed together to generate an augmented image. Such methods may not only omit important portions of the input images but also introduce label ambiguities by mixing images across labels resu… ▽ More Recently, a number of image-mixing-based augmentation techniques have been introduced to improve the generalization of deep neural networks. In these techniques, two or more randomly selected natural images are mixed together to generate an augmented image. Such methods may not only omit important portions of the input images but also introduce label ambiguities by mixing images across labels resulting in misleading supervisory signals. To address these limitations, we propose DiffuseMix, a novel data augmentation technique that leverages a diffusion model to reshape training images, supervised by our bespoke conditional prompts. First, concatenation of a partial natural image and its generated counterpart is obtained which helps in avoiding the generation of unrealistic images or label ambiguities. Then, to enhance resilience against adversarial attacks and improves safety measures, a randomly selected structural pattern from a set of fractal images is blended into the concatenated image to form the final augmented image for training. Our empirical results on seven different datasets reveal that DiffuseMix achieves superior performance compared to existing state-of the-art methods on tasks including general classification,fine-grained classification, fine-tuning, data scarcity, and adversarial robustness. Augmented datasets and codes are available here: https://diffusemix.github.io/ △ Less

Submitted 5 April, 2024; originally announced May 2024.

Comments: Accepted at CVPR 2024

arXiv:2405.11122 [pdf]

doi 10.1002/adfm.202408542

Imaging Local Effects of Voltage and Boron Doping on Spin Reversal in Antiferromagnetic Magnetoelectric Cr2O3 Thin Films and Devices

Authors: Adam Erickson, Syed Qamar Abbas Shah, Ather Mahmood, Pratyush Buragohain, Ilja Fescenko, Alexei Gruverman, Christian Binek, Abdelghani Laraoui

Abstract: Chromia (Cr2O3) is a magnetoelectric oxide which permits voltage-control of the antiferromagnetic (AFM) order, but it suffers technological constraints due to its low Neel Temperature (TN ~307 K) and the need of a symmetry breaking applied magnetic field to achieve reversal of the Neel vector. Recently, boron (B) doping of Cr2O3 films led to an increase TN > 400 K and allowed the realization of vo… ▽ More Chromia (Cr2O3) is a magnetoelectric oxide which permits voltage-control of the antiferromagnetic (AFM) order, but it suffers technological constraints due to its low Neel Temperature (TN ~307 K) and the need of a symmetry breaking applied magnetic field to achieve reversal of the Neel vector. Recently, boron (B) doping of Cr2O3 films led to an increase TN > 400 K and allowed the realization of voltage magnetic-field free controlled Néel vector rotation. Here, we directly image the impact of B doping on the formation of AFM domains in Cr2O3 thin films and elucidate the mechanism of voltage-controlled manipulation of the spin structure using nitrogen vacancy (NV) scanning probe magnetometry. We find a stark reduction and thickness dependence of domain size in B-doped Cr2O3 (B:Cr2O3) films, explained by the increased germ density, likely associated with the B doping. By reconstructing the surface magnetization from the NV stray-field maps, we find a qualitative distinction between the undoped and B-doped Cr2O3 films, manifested by the histogram distribution of the AFM ordering, i.e., 180 degree domains for pure films, and 90 degree domains for B:Cr2O3 films. Additionally, NV imaging of voltage-controlled B-doped Cr2O3 devices corroborate the 90 degeree rotation of the AFM domains observed in magnetotransport measurement. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Journal ref: Advanced Functional Materials 2408542 (2024)

arXiv:2404.00781 [pdf, other]

Addressing Loss of Plasticity and Catastrophic Forgetting in Continual Learning

Authors: Mohamed Elsayed, A. Rupam Mahmood

Abstract: Deep representation learning methods struggle with continual learning, suffering from both catastrophic forgetting of useful units and loss of plasticity, often due to rigid and unuseful units. While many methods address these two issues separately, only a few currently deal with both simultaneously. In this paper, we introduce Utility-based Perturbed Gradient Descent (UPGD) as a novel approach fo… ▽ More Deep representation learning methods struggle with continual learning, suffering from both catastrophic forgetting of useful units and loss of plasticity, often due to rigid and unuseful units. While many methods address these two issues separately, only a few currently deal with both simultaneously. In this paper, we introduce Utility-based Perturbed Gradient Descent (UPGD) as a novel approach for the continual learning of representations. UPGD combines gradient updates with perturbations, where it applies smaller modifications to more useful units, protecting them from forgetting, and larger modifications to less useful units, rejuvenating their plasticity. We use a challenging streaming learning setup where continual learning problems have hundreds of non-stationarities and unknown task boundaries. We show that many existing methods suffer from at least one of the issues, predominantly manifested by their decreasing accuracy over tasks. On the other hand, UPGD continues to improve performance and surpasses or is competitive with all methods in all problems. Finally, in extended reinforcement learning experiments with PPO, we show that while Adam exhibits a performance drop after initial learning, UPGD avoids it by addressing both continual learning issues. △ Less

Submitted 30 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

Comments: Published in the Proceedings of the 12th International Conference on Learning Representations (ICLR 2024). Code is available at https://github.com/mohmdelsayed/upgd

arXiv:2403.17913 [pdf, ps, other]

Enhancing Indoor and Outdoor THz Communications with Beyond Diagonal-IRS: Optimization and Performance Analysis

Authors: Asad Mahmood, Thang X. Vu, Symeon Chatzinotas, Björn Ottersten

Abstract: This work investigates the application of Beyond Diagonal Intelligent Reflective Surface (BD-IRS) to enhance THz downlink communication systems, operating in a hybrid: reflective and transmissive mode, to simultaneously provide services to indoor and outdoor users. We propose an optimization framework that jointly optimizes the beamforming vectors and phase shifts in the hybrid reflective/transmis… ▽ More This work investigates the application of Beyond Diagonal Intelligent Reflective Surface (BD-IRS) to enhance THz downlink communication systems, operating in a hybrid: reflective and transmissive mode, to simultaneously provide services to indoor and outdoor users. We propose an optimization framework that jointly optimizes the beamforming vectors and phase shifts in the hybrid reflective/transmissive mode, aiming to maximize the system sum rate. To tackle the challenges in solving the joint design problem, we employ the conjugate gradient method and propose an iterative algorithm that successively optimizes the hybrid beamforming vectors and the phase shifts. Through comprehensive numerical simulations, our findings demonstrate a significant improvement in rate when compared to existing benchmark schemes, including time- and frequency-divided approaches, by approximately $30.5\%$ and $69.9\%$ respectively and even outperforms the STAR-IRS system by $76.99\%$. This underscores the significant influence of IRS elements on system performance relative to that of base station antennas, highlighting their pivotal role in advancing the communication system efficacy. △ Less

Submitted 9 July, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.16194 [pdf, other]

Pose-Guided Self-Training with Two-Stage Clustering for Unsupervised Landmark Discovery

Authors: Siddharth Tourani, Ahmed Alwheibi, Arif Mahmood, Muhammad Haris Khan

Abstract: Unsupervised landmarks discovery (ULD) for an object category is a challenging computer vision problem. In pursuit of developing a robust ULD framework, we explore the potential of a recent paradigm of self-supervised learning algorithms, known as diffusion models. Some recent works have shown that these models implicitly contain important correspondence cues. Towards harnessing the potential of d… ▽ More Unsupervised landmarks discovery (ULD) for an object category is a challenging computer vision problem. In pursuit of developing a robust ULD framework, we explore the potential of a recent paradigm of self-supervised learning algorithms, known as diffusion models. Some recent works have shown that these models implicitly contain important correspondence cues. Towards harnessing the potential of diffusion models for the ULD task, we make the following core contributions. First, we propose a ZeroShot ULD baseline based on simple clustering of random pixel locations with nearest neighbour matching. It delivers better results than existing ULD methods. Second, motivated by the ZeroShot performance, we develop a ULD algorithm based on diffusion features using self-training and clustering which also outperforms prior methods by notable margins. Third, we introduce a new proxy task based on generating latent pose codes and also propose a two-stage clustering mechanism to facilitate effective pseudo-labeling, resulting in a significant performance improvement. Overall, our approach consistently outperforms state-of-the-art methods on four challenging benchmarks AFLW, MAFL, CatHeads and LS3D by significant margins. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: Accepted in CVPR 2024

arXiv:2403.14743 [pdf, other]

VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding

Authors: Ahmad Mahmood, Ashmal Vayani, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

Abstract: Recent studies have demonstrated the effectiveness of Large Language Models (LLMs) as reasoning modules that can deconstruct complex tasks into more manageable sub-tasks, particularly when applied to visual reasoning tasks for images. In contrast, this paper introduces a Video Understanding and Reasoning Framework (VURF) based on the reasoning power of LLMs. Ours is a novel approach to extend the… ▽ More Recent studies have demonstrated the effectiveness of Large Language Models (LLMs) as reasoning modules that can deconstruct complex tasks into more manageable sub-tasks, particularly when applied to visual reasoning tasks for images. In contrast, this paper introduces a Video Understanding and Reasoning Framework (VURF) based on the reasoning power of LLMs. Ours is a novel approach to extend the utility of LLMs in the context of video tasks, leveraging their capacity to generalize from minimal input and output demonstrations within a contextual framework. By presenting LLMs with pairs of instructions and their corresponding high-level programs, we harness their contextual learning capabilities to generate executable visual programs for video understanding. To enhance program's accuracy and robustness, we implement two important strategies. Firstly, we employ a feedback-generation approach, powered by GPT-3.5, to rectify errors in programs utilizing unsupported functions. Secondly, taking motivation from recent works on self refinement of LLM outputs, we introduce an iterative procedure for improving the quality of the in-context examples by aligning the initial outputs to the outputs that would have been generated had the LLM not been bound by the structure of the in-context examples. Our results on several video-specific tasks, including visual QA, video anticipation, pose estimation and multi-video QA illustrate the efficacy of these enhancements in improving the performance of visual programming approaches for video tasks. △ Less

Submitted 24 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.02421 [pdf, other]

Situated Understanding of Errors in Older Adults' Interactions with Voice Assistants: A Month-Long, In-Home Study

Authors: Amama Mahmood, Junxiang Wang, Chien-Ming Huang

Abstract: Our work addresses the challenges older adults face with commercial Voice Assistants (VAs), notably in conversation breakdowns and error handling. Traditional methods of collecting user experiences-usage logs and post-hoc interviews-do not fully capture the intricacies of older adults' interactions with VAs, particularly regarding their reactions to errors. To bridge this gap, we equipped 15 older… ▽ More Our work addresses the challenges older adults face with commercial Voice Assistants (VAs), notably in conversation breakdowns and error handling. Traditional methods of collecting user experiences-usage logs and post-hoc interviews-do not fully capture the intricacies of older adults' interactions with VAs, particularly regarding their reactions to errors. To bridge this gap, we equipped 15 older adults' homes with smart speakers integrated with custom audio recorders to collect "in-the-wild" audio interaction data for detailed error analysis. Recognizing the conversational limitations of current VAs, our study also explored the capabilities of Large Language Models (LLMs) to handle natural and imperfect text for improving VAs. Midway through our study, we deployed ChatGPT-powered VA to investigate its efficacy for older adults. Our research suggests leveraging vocal and verbal responses combined with LLMs' contextual capabilities for enhanced error prevention and management in VAs, while proposing design considerations to align VA capabilities with older adults' expectations. △ Less

Submitted 23 September, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2401.16417 [pdf, ps, other]

Channel Coding with Mean and Variance Cost Constraints

Authors: Adeel Mahmood, Aaron B. Wagner

Abstract: We consider channel coding for discrete memoryless channels (DMCs) with a novel cost constraint that constrains both the mean and the variance of the cost of the codewords. We show that the maximum (asymptotically) achievable rate under the new cost formulation is equal to the capacity-cost function; in particular, the strong converse holds. We further characterize the optimal second-order coding… ▽ More We consider channel coding for discrete memoryless channels (DMCs) with a novel cost constraint that constrains both the mean and the variance of the cost of the codewords. We show that the maximum (asymptotically) achievable rate under the new cost formulation is equal to the capacity-cost function; in particular, the strong converse holds. We further characterize the optimal second-order coding rate of these cost-constrained codes; in particular, the optimal second-order coding rate is finite. We then show that the second-order coding performance is strictly improved with feedback using a new variation of timid/bold coding, significantly broadening the applicability of timid/bold coding schemes from unconstrained compound-dispersion channels to all cost-constrained channels. Equivalent results on the minimum average probability of error are also given. △ Less

Submitted 12 May, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.06193 [pdf, other]

Dark Energy Compact Stars in Extended Teleparallel Gravity

Authors: Allah Ditta, Xia Tiecheng, G. Mustafa, Değer Sofuoğlu, Asif Mahmood

Abstract: This paper presents the study of dark-energy compact stars in the context of modified Rastall teleparallel gravity. It is the first time that dark energy celestial phenomena have been explored in this modified gravitational theory. Employing the torsion-based functions, $f(T)$ and $h(T)$, we analyzed their effects in a spherically symmetric spacetime chosen as the interior geometry, while using th… ▽ More This paper presents the study of dark-energy compact stars in the context of modified Rastall teleparallel gravity. It is the first time that dark energy celestial phenomena have been explored in this modified gravitational theory. Employing the torsion-based functions, $f(T)$ and $h(T)$, we analyzed their effects in a spherically symmetric spacetime chosen as the interior geometry, while using the Schwarzschild geometry as an outer spacetime. In this study, we explored various dark energy stellar properties, including dark energy pressure components, energy conditions, and equation of state components. Our findings reveal that the observed negative behavior of these stellar properties served as compelling evidence, validating the presence of dark energy in stellar configurations. Detailed investigations of the energy conditions, pressure profiles, sound speeds, TOV equation, adiabatic index, gradients, mass function, compactness, and redshift function forecasts a comprehensive assessment, affirming the acceptability and realism of the investigated stellar configuration. △ Less

Submitted 26 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

Comments: 18 pages, 9 Figures, 2tables

arXiv:2312.15339 [pdf, other]

MaDi: Learning to Mask Distractions for Generalization in Visual Deep Reinforcement Learning

Authors: Bram Grooten, Tristan Tomilin, Gautham Vasan, Matthew E. Taylor, A. Rupam Mahmood, Meng Fang, Mykola Pechenizkiy, Decebal Constantin Mocanu

Abstract: The visual world provides an abundance of information, but many input pixels received by agents often contain distracting stimuli. Autonomous agents need the ability to distinguish useful information from task-irrelevant perceptions, enabling them to generalize to unseen environments with new distractions. Existing works approach this problem using data augmentation or large auxiliary networks wit… ▽ More The visual world provides an abundance of information, but many input pixels received by agents often contain distracting stimuli. Autonomous agents need the ability to distinguish useful information from task-irrelevant perceptions, enabling them to generalize to unseen environments with new distractions. Existing works approach this problem using data augmentation or large auxiliary networks with additional loss functions. We introduce MaDi, a novel algorithm that learns to mask distractions by the reward signal only. In MaDi, the conventional actor-critic structure of deep reinforcement learning agents is complemented by a small third sibling, the Masker. This lightweight neural network generates a mask to determine what the actor and critic will receive, such that they can focus on learning the task. The masks are created dynamically, depending on the current input. We run experiments on the DeepMind Control Generalization Benchmark, the Distracting Control Suite, and a real UR5 Robotic Arm. Our algorithm improves the agent's focus with useful masks, while its efficient Masker network only adds 0.2% more parameters to the original structure, in contrast to previous work. MaDi consistently achieves generalization results better than or competitive to state-of-the-art methods. △ Less

Submitted 23 December, 2023; originally announced December 2023.

Comments: Accepted as full-paper (oral) at AAMAS 2024. Code is available at https://github.com/bramgrooten/mask-distractions and see our 40-second video at https://youtu.be/2oImF0h1k48

arXiv:2312.07454 [pdf, other]

"You Might Like It": How People Respond to Small Talk During Human-Robot Collaboration

Authors: Kaitlynn Taylor Pineda, Amama Mahmood, Juo-Tung Chen, Chien-Ming Huang

Abstract: Social communication between people and social robots has been studied extensively and found to have various notable benefits, including the enhancement of human-robot team cohesion and the development of rapport and trust. However, the potential of social communication between people and non-social robots, such as non-anthropomorphic robot manipulators commonly used in work settings (\eg warehous… ▽ More Social communication between people and social robots has been studied extensively and found to have various notable benefits, including the enhancement of human-robot team cohesion and the development of rapport and trust. However, the potential of social communication between people and non-social robots, such as non-anthropomorphic robot manipulators commonly used in work settings (\eg warehouse and factory), is less explored and not well established. In this work, we investigate people's engagement and attitudes towards a non-anthropomorphic robot manipulator that initiates small talk during a collaborative assembly task and explore how the presence of negative team feedback may affect team dynamics and blame attribution. Through an in-person study with 20 participants, we observed a response rate of 77.60% in response to the robot's small talk attempts. Nine participants continued engaging with the robot by initiating their own questions, indicating sustained interest in the conversation. However, we also found that the first negative feedback decreased the participants' willingness to extend the conversation. We additionally present participants' initial perceptions of small talk for physical robot manipulators and discuss design implications for integrating small talk into non-social robots, along with various aspects of small talk that may influence physical human-robot interactions. △ Less

Submitted 8 May, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: 25 pages, 6 figures, 7 tables,

ACM Class: I.2.9

arXiv:2311.07199 [pdf, ps, other]

Joint Computation and Communication Resource Optimization for Beyond Diagonal UAV-IRS Empowered MEC Networks

Authors: Asad Mahmood, Thang X. Vu, Wali Ullah Khan, Symeon Chatzinotas, Björn Ottersten

Abstract: Recent advancements in 6G systems signal a leap towards universal connectivity and ultra-reliable, low-latency communications for real-time data devices. Yet, these advancements encounter obstacles such as limited device battery life and computational power, along with urban signal blockages. To counter these, Intelligent Reconfigurable Surfaces (IRS) within Mobile Edge Cloud (MEC) infrastructures… ▽ More Recent advancements in 6G systems signal a leap towards universal connectivity and ultra-reliable, low-latency communications for real-time data devices. Yet, these advancements encounter obstacles such as limited device battery life and computational power, along with urban signal blockages. To counter these, Intelligent Reconfigurable Surfaces (IRS) within Mobile Edge Cloud (MEC) infrastructures offer enhanced computing to overcome device limitations and create alternative communication paths. Despite these improvements, connectivity issues remain for remote areas. Our paper presents the Beyond Diagonal IRS (BD-IRS or IRS 2.0), integrated with UAVs in MEC networks (BD-IRS-UAV), providing on-demand links for remote users to offload tasks, tackling resource and battery limitations. We propose a joint optimization strategy to reduce system's worst-case latency and UAV hovering time by optimizing BD-IRS-UAV deployment and resource allocation. This challenge is approached by dividing it into two sub-problems: BD-IRS-UAV Placement and Computational Resource Optimization, and Communication Resource Optimization, each solved iteratively. This design significantly enhances system performance, showing a $17.75\%$ increase over traditional diagonal IRS and a $25.43\%$ improvement over IRS on buildings, with a $13.44\%$ enhancement in worst-case latency compared to binary offloading schemes. △ Less

Submitted 15 March, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

arXiv:2310.19173 [pdf, other]

Can we Quantify Trust? Towards a Trust-based Resilient SIoT Network

Authors: Subhash Sagar, Adnan Mahmood, Quan Z. Sheng, Munazza Zaib, Farhan Sufyan

Abstract: The emerging yet promising paradigm of the Social Internet of Things (SIoT) integrates the notion of the Internet of Things with human social networks. In SIoT, objects, i.e., things, have the capability to socialize with the other objects in the SIoT network and can establish their social network autonomously by modeling human behaviour. The notion of trust is imperative in realizing these charac… ▽ More The emerging yet promising paradigm of the Social Internet of Things (SIoT) integrates the notion of the Internet of Things with human social networks. In SIoT, objects, i.e., things, have the capability to socialize with the other objects in the SIoT network and can establish their social network autonomously by modeling human behaviour. The notion of trust is imperative in realizing these characteristics of socialization in order to assess the reliability of autonomous collaboration. The perception of trust is evolving in the era of SIoT as an extension to traditional security triads in an attempt to offer secure and reliable services, and is considered as an imperative aspect of any SIoT system for minimizing the probable risk of autonomous decision-making. This research investigates the idea of trust quantification by employing trust measurement in terms of direct trust, indirect trust as a recommendation, and the degree of SIoT relationships in terms of social similarities (community-of-interest, friendship, and co-work relationships). A weighted sum approach is subsequently employed to synthesize all the trust features in order to ascertain a single trust score. The experimental evaluation demonstrates the effectiveness of the proposed model in segregating trustworthy and untrustworthy objects and via identifying the dynamic behaviour (i.e., trust-related attacks) of the SIoT objects. △ Less

Submitted 12 May, 2023; originally announced October 2023.

Comments: 18 Pages

arXiv:2310.13074 [pdf, other]

doi 10.1145/3637337

Gender Biases in Error Mitigation by Voice Assistants

Authors: Amama Mahmood, Chien-Ming Huang

Abstract: Commercial voice assistants are largely feminized and associated with stereotypically feminine traits such as warmth and submissiveness. As these assistants continue to be adopted for everyday uses, it is imperative to understand how the portrayed gender shapes the voice assistant's ability to mitigate errors, which are still common in voice interactions. We report a study (N=40) that examined the… ▽ More Commercial voice assistants are largely feminized and associated with stereotypically feminine traits such as warmth and submissiveness. As these assistants continue to be adopted for everyday uses, it is imperative to understand how the portrayed gender shapes the voice assistant's ability to mitigate errors, which are still common in voice interactions. We report a study (N=40) that examined the effects of voice gender (feminine, ambiguous, masculine), error mitigation strategies (apology, compensation) and participant's gender on people's interaction behavior and perceptions of the assistant. Our results show that AI assistants that apologized appeared warmer than those offered compensation. Moreover, male participants preferred apologetic feminine assistants over apologetic masculine ones. Furthermore, male participants interrupted AI assistants regardless of perceived gender more frequently than female participants when errors occurred. Our results suggest that the perceived gender of a voice assistant biases user behavior, especially for male users, and that an ambiguous voice has the potential to reduce biases associated with gender-specific traits. △ Less

Submitted 19 October, 2023; originally announced October 2023.

Journal ref: Proceedings of the ACM on Human-Computer Interaction, Volume 8, Issue CSCW1, 2024; Article No.: 60, Pages 1 - 27

arXiv:2310.05853 [pdf, other]

"Mango Mango, How to Let The Lettuce Dry Without A Spinner?'': Exploring User Perceptions of Using An LLM-Based Conversational Assistant Toward Cooking Partner

Authors: Szeyi Chan, Jiachen Li, Bingsheng Yao, Amama Mahmood, Chien-Ming Huang, Holly Jimison, Elizabeth D Mynatt, Dakuo Wang

Abstract: The rapid advancement of the Large Language Model (LLM) has created numerous potentials for integration with conversational assistants (CAs) assisting people in their daily tasks, particularly due to their extensive flexibility. However, users' real-world experiences interacting with these assistants remain unexplored. In this research, we chose cooking, a complex daily task, as a scenario to inve… ▽ More The rapid advancement of the Large Language Model (LLM) has created numerous potentials for integration with conversational assistants (CAs) assisting people in their daily tasks, particularly due to their extensive flexibility. However, users' real-world experiences interacting with these assistants remain unexplored. In this research, we chose cooking, a complex daily task, as a scenario to investigate people's successful and unsatisfactory experiences while receiving assistance from an LLM-based CA, Mango Mango. We discovered that participants value the system's ability to provide extensive information beyond the recipe, offer customized instructions based on context, and assist them in dynamically planning the task. However, they expect the system to be more adaptive to oral conversation and provide more suggestive responses to keep users actively involved. Recognizing that users began treating our LLM-CA as a personal assistant or even a partner rather than just a recipe-reading tool, we propose several design considerations for future development. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: Under submission to CHI2024

arXiv:2310.01365 [pdf, other]

Elephant Neural Networks: Born to Be a Continual Learner

Authors: Qingfeng Lan, A. Rupam Mahmood

Abstract: Catastrophic forgetting remains a significant challenge to continual learning for decades. While recent works have proposed effective methods to mitigate this problem, they mainly focus on the algorithmic side. Meanwhile, we do not fully understand what architectural properties of neural networks lead to catastrophic forgetting. This study aims to fill this gap by studying the role of activation f… ▽ More Catastrophic forgetting remains a significant challenge to continual learning for decades. While recent works have proposed effective methods to mitigate this problem, they mainly focus on the algorithmic side. Meanwhile, we do not fully understand what architectural properties of neural networks lead to catastrophic forgetting. This study aims to fill this gap by studying the role of activation functions in the training dynamics of neural networks and their impact on catastrophic forgetting. Our study reveals that, besides sparse representations, the gradient sparsity of activation functions also plays an important role in reducing forgetting. Based on this insight, we propose a new class of activation functions, elephant activation functions, that can generate both sparse representations and sparse gradients. We show that by simply replacing classical activation functions with elephant activation functions, we can significantly improve the resilience of neural networks to catastrophic forgetting. Our method has broad applicability and benefits for continual learning in regression, class incremental learning, and reinforcement learning tasks. Specifically, we achieves excellent performance on Split MNIST dataset in just one single pass, without using replay buffer, task boundary information, or pre-training. △ Less

Submitted 2 October, 2023; originally announced October 2023.

arXiv:2309.13879 [pdf, other]

LLM-Powered Conversational Voice Assistants: Interaction Patterns, Opportunities, Challenges, and Design Guidelines

Authors: Amama Mahmood, Junxiang Wang, Bingsheng Yao, Dakuo Wang, Chien-Ming Huang

Abstract: Conventional Voice Assistants (VAs) rely on traditional language models to discern user intent and respond to their queries, leading to interactions that often lack a broader contextual understanding, an area in which Large Language Models (LLMs) excel. However, current LLMs are largely designed for text-based interactions, thus making it unclear how user interactions will evolve if their modality… ▽ More Conventional Voice Assistants (VAs) rely on traditional language models to discern user intent and respond to their queries, leading to interactions that often lack a broader contextual understanding, an area in which Large Language Models (LLMs) excel. However, current LLMs are largely designed for text-based interactions, thus making it unclear how user interactions will evolve if their modality is changed to voice. In this work, we investigate whether LLMs can enrich VA interactions via an exploratory study with participants (N=20) using a ChatGPT-powered VA for three scenarios (medical self-diagnosis, creative planning, and debate) with varied constraints, stakes, and objectivity. We observe that LLM-powered VA elicits richer interaction patterns that vary across tasks, showing its versatility. Notably, LLMs absorb the majority of VA intent recognition failures. We additionally discuss the potential of harnessing LLMs for more resilient and fluid user-VA interactions and provide design guidelines for tailoring LLMs for voice assistance. △ Less

Submitted 25 September, 2023; originally announced September 2023.

arXiv:2309.12507 [pdf, other]

Deep Reinforcement Learning for Backscatter Communications: Augmenting Intelligence in Future Internet of Things

Authors: Wali Ullah Khan, Eva Lagunas, Zain Ali, Asad Mahmood, Chandan Kumar Sheemar, Manzoor Ahmed, Symeon Chatzinotas, Björn Ottersten

Abstract: Backscatter communication (BC) technology offers sustainable solutions for next-generation Internet-of-Things (IoT) networks, where devices can transmit data by reflecting and adjusting incident radio frequency signals. In parallel to BC, deep reinforcement learning (DRL) has recently emerged as a promising tool to augment intelligence and optimize low-powered IoT devices. This article commences b… ▽ More Backscatter communication (BC) technology offers sustainable solutions for next-generation Internet-of-Things (IoT) networks, where devices can transmit data by reflecting and adjusting incident radio frequency signals. In parallel to BC, deep reinforcement learning (DRL) has recently emerged as a promising tool to augment intelligence and optimize low-powered IoT devices. This article commences by elucidating the foundational principles underpinning BC systems, subsequently delving into the diverse array of DRL techniques and their respective practical implementations. Subsequently, it investigates potential domains and presents recent advancements in the realm of DRL-BC systems. A use case of RIS-aided non-orthogonal multiple access BC systems leveraging DRL is meticulously examined to highlight its potential. Lastly, this study identifies and investigates salient challenges and proffers prospective avenues for future research endeavors. △ Less

Submitted 21 September, 2023; originally announced September 2023.

Comments: 7, 3

arXiv:2309.12493 [pdf]

doi 10.1002/apxr.202300061

Post deposition interfacial Néel temperature tuning in magnetoelectric B:Cr2O3

Authors: Ather Mahmood, Jamie Weaver, Syed Qamar Abbas Shah, Will Echtenkamp, Jeffrey W. Lynn, Peter A. Dowben, Christian Binek

Abstract: Boron (B) alloying transforms the magnetoelectric antiferromagnet Cr2O3 into a multifunctional single-phase material which enables electric field driven π/2 rotation of the Néel vector. Nonvolatile, voltage-controlled Néel vector rotation is a much-desired material property in the context of antiferromagnetic spintronics enabling ultra-low power, ultra-fast, nonvolatile memory, and logic device ap… ▽ More Boron (B) alloying transforms the magnetoelectric antiferromagnet Cr2O3 into a multifunctional single-phase material which enables electric field driven π/2 rotation of the Néel vector. Nonvolatile, voltage-controlled Néel vector rotation is a much-desired material property in the context of antiferromagnetic spintronics enabling ultra-low power, ultra-fast, nonvolatile memory, and logic device applications. Néel vector rotation is detected with the help of heavy metal (Pt) Hall-bars in proximity of pulsed laser deposited B:Cr2O3 films. To facilitate operation of B:Cr2O3-based devices in CMOS environments, the Néel temperature, TN, of the functional film must be tunable to values significantly above room temperature. Cold neutron depth profiling and x-ray photoemission spectroscopy depth profiling reveal thermally activated B-accumulation at the B:Cr2O3/ vacuum interface in thin films deposited on Al2O3 substrates. We attribute the B-enrichment to surface segregation. Magnetotransport data confirm B-accumulation at the interface within a layer of about 50 nm thick where the device properties reside. Here TN enhances from 334 K prior to annealing, to 477 K after annealing for several hours. Scaling analysis determines TN as a function of the annealing temperature. Stability of post-annealing device properties is evident from reproducible Néel vector rotation at 370 K performed over the course of weeks. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.10518 [pdf, other]

Unsupervised Landmark Discovery Using Consistency Guided Bottleneck

Authors: Mamona Awan, Muhammad Haris Khan, Sanoojan Baliah, Muhammad Ahmad Waseem, Salman Khan, Fahad Shahbaz Khan, Arif Mahmood

Abstract: We study a challenging problem of unsupervised discovery of object landmarks. Many recent methods rely on bottlenecks to generate 2D Gaussian heatmaps however, these are limited in generating informed heatmaps while training, presumably due to the lack of effective structural cues. Also, it is assumed that all predicted landmarks are semantically relevant despite having no ground truth supervision… ▽ More We study a challenging problem of unsupervised discovery of object landmarks. Many recent methods rely on bottlenecks to generate 2D Gaussian heatmaps however, these are limited in generating informed heatmaps while training, presumably due to the lack of effective structural cues. Also, it is assumed that all predicted landmarks are semantically relevant despite having no ground truth supervision. In the current work, we introduce a consistency-guided bottleneck in an image reconstruction-based pipeline that leverages landmark consistency, a measure of compatibility score with the pseudo-ground truth to generate adaptive heatmaps. We propose obtaining pseudo-supervision via forming landmark correspondence across images. The consistency then modulates the uncertainty of the discovered landmarks in the generation of adaptive heatmaps which rank consistent landmarks above their noisy counterparts, providing effective structural information for improved robustness. Evaluations on five diverse datasets including MAFL, AFLW, LS3D, Cats, and Shoes demonstrate excellent performance of the proposed approach compared to the existing state-of-the-art methods. Our code is publicly available at https://github.com/MamonaAwan/CGB_ULD. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Comments: Accepted ORAL at BMVC 2023 ; Code: https://github.com/MamonaAwan/CGB_ULD

ACM Class: I.4

arXiv:2309.09727 [pdf, other]

When Large Language Models Meet Citation: A Survey

Authors: Yang Zhang, Yufei Wang, Kai Wang, Quan Z. Sheng, Lina Yao, Adnan Mahmood, Wei Emma Zhang, Rongying Zhao

Abstract: Citations in scholarly work serve the essential purpose of acknowledging and crediting the original sources of knowledge that have been incorporated or referenced. Depending on their surrounding textual context, these citations are used for different motivations and purposes. Large Language Models (LLMs) could be helpful in capturing these fine-grained citation information via the corresponding te… ▽ More Citations in scholarly work serve the essential purpose of acknowledging and crediting the original sources of knowledge that have been incorporated or referenced. Depending on their surrounding textual context, these citations are used for different motivations and purposes. Large Language Models (LLMs) could be helpful in capturing these fine-grained citation information via the corresponding textual context, thereby enabling a better understanding towards the literature. Furthermore, these citations also establish connections among scientific papers, providing high-quality inter-document relationships and human-constructed knowledge. Such information could be incorporated into LLMs pre-training and improve the text representation in LLMs. Therefore, in this paper, we offer a preliminary review of the mutually beneficial relationship between LLMs and citation analysis. Specifically, we review the application of LLMs for in-text citation analysis tasks, including citation classification, citation-based summarization, and citation recommendation. We then summarize the research pertinent to leveraging citation linkage knowledge to improve text representations of LLMs via citation prediction, network structure information, and inter-document relationship. We finally provide an overview of these contemporary methods and put forth potential promising avenues in combining LLMs and citation analysis for further investigation. △ Less

Submitted 18 September, 2023; originally announced September 2023.

arXiv:2309.09236 [pdf, other]

Detection and Localization of Firearm Carriers in Complex Scenes for Improved Safety Measures

Authors: Arif Mahmood, Abdul Basit, M. Akhtar Munir, Mohsen Ali

Abstract: Detecting firearms and accurately localizing individuals carrying them in images or videos is of paramount importance in security, surveillance, and content customization. However, this task presents significant challenges in complex environments due to clutter and the diverse shapes of firearms. To address this problem, we propose a novel approach that leverages human-firearm interaction informat… ▽ More Detecting firearms and accurately localizing individuals carrying them in images or videos is of paramount importance in security, surveillance, and content customization. However, this task presents significant challenges in complex environments due to clutter and the diverse shapes of firearms. To address this problem, we propose a novel approach that leverages human-firearm interaction information, which provides valuable clues for localizing firearm carriers. Our approach incorporates an attention mechanism that effectively distinguishes humans and firearms from the background by focusing on relevant areas. Additionally, we introduce a saliency-driven locality-preserving constraint to learn essential features while preserving foreground information in the input image. By combining these components, our approach achieves exceptional results on a newly proposed dataset. To handle inputs of varying sizes, we pass paired human-firearm instances with attention masks as channels through a deep network for feature computation, utilizing an adaptive average pooling layer. We extensively evaluate our approach against existing methods in human-object interaction detection and achieve significant results (AP=77.8\%) compared to the baseline approach (AP=63.1\%). This demonstrates the effectiveness of leveraging attention mechanisms and saliency-driven locality preservation for accurate human-firearm interaction detection. Our findings contribute to advancing the fields of security and surveillance, enabling more efficient firearm localization and identification in diverse scenarios. △ Less

Submitted 17 September, 2023; originally announced September 2023.

Comments: This paper is accepted in IEEE Transactions on Computational Social Systems

Showing 1–50 of 249 results for author: Mahmood, A