subscribe to arXiv mailings

Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study

Authors: Yekun Ke, Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song

Abstract: Recent empirical studies have identified fixed point iteration phenomena in deep neural networks, where the hidden state tends to stabilize after several layers, showing minimal change in subsequent layers. This observation has spurred the development of practical methodologies, such as accelerating inference by bypassing certain layers once the hidden state stabilizes, selectively fine-tuning lay… ▽ More Recent empirical studies have identified fixed point iteration phenomena in deep neural networks, where the hidden state tends to stabilize after several layers, showing minimal change in subsequent layers. This observation has spurred the development of practical methodologies, such as accelerating inference by bypassing certain layers once the hidden state stabilizes, selectively fine-tuning layers to modify the iteration process, and implementing loops of specific layers to maintain fixed point iterations. Despite these advancements, the understanding of fixed point iterations remains superficial, particularly in high-dimensional spaces, due to the inadequacy of current analytical tools. In this study, we conduct a detailed analysis of fixed point iterations in a vector-valued function modeled by neural networks. We establish a sufficient condition for the existence of multiple fixed points of looped neural networks based on varying input regions. Additionally, we expand our examination to include a robust version of fixed point iterations. To demonstrate the effectiveness and insights provided by our approach, we provide case studies that looped neural networks may exist $2^d$ number of robust fixed points under exponentiation or polynomial activation functions, where $d$ is the feature dimension. Furthermore, our preliminary empirical results support our theoretical findings. Our methodology enriches the toolkit available for analyzing fixed point iterations of deep neural networks and may enhance our comprehension of neural network mechanisms. △ Less

Submitted 15 October, 2024; originally announced October 2024.

arXiv:2410.10117 [pdf, other]

StegaINR4MIH: steganography by implicit neural representation for multi-image hiding

Authors: Weina Dong, Jia Liu, Lifeng Chen, Wenquan Sun, Xiaozhong Pan, Yan Ke

Abstract: Multi-image hiding, which embeds multiple secret images into a cover image and is able to recover these images with high quality, has gradually become a research hotspot in the field of image steganography. However, due to the need to embed a large amount of data in a limited cover image space, issues such as contour shadowing or color distortion often arise, posing significant challenges for mult… ▽ More Multi-image hiding, which embeds multiple secret images into a cover image and is able to recover these images with high quality, has gradually become a research hotspot in the field of image steganography. However, due to the need to embed a large amount of data in a limited cover image space, issues such as contour shadowing or color distortion often arise, posing significant challenges for multi-image hiding. In this paper, we propose StegaINR4MIH, a novel implicit neural representation steganography framework that enables the hiding of multiple images within a single implicit representation function. In contrast to traditional methods that use multiple encoders to achieve multi-image embedding, our approach leverages the redundancy of implicit representation function parameters and employs magnitude-based weight selection and secret weight substitution on pre-trained cover image functions to effectively hide and independently extract multiple secret images. We conduct experiments on images with a resolution of from three different datasets: CelebA-HQ, COCO, and DIV2K. When hiding two secret images, the PSNR values of both the secret images and the stego images exceed 42. When hiding five secret images, the PSNR values of both the secret images and the stego images exceed 39. Extensive experiments demonstrate the superior performance of the proposed method in terms of visual quality and undetectability. △ Less

Submitted 13 October, 2024; originally announced October 2024.

Comments: 46pages,14figures

arXiv:2410.09855 [pdf, other]

Text4Seg: Reimagining Image Segmentation as Text Generation

Authors: Mengcheng Lan, Chaofeng Chen, Yue Zhou, Jiaxing Xu, Yiping Ke, Xinjiang Wang, Litong Feng, Wayne Zhang

Abstract: Multimodal Large Language Models (MLLMs) have shown exceptional capabilities in vision-language tasks; however, effectively integrating image segmentation into these models remains a significant challenge. In this paper, we introduce Text4Seg, a novel text-as-mask paradigm that casts image segmentation as a text generation problem, eliminating the need for additional decoders and significantly sim… ▽ More Multimodal Large Language Models (MLLMs) have shown exceptional capabilities in vision-language tasks; however, effectively integrating image segmentation into these models remains a significant challenge. In this paper, we introduce Text4Seg, a novel text-as-mask paradigm that casts image segmentation as a text generation problem, eliminating the need for additional decoders and significantly simplifying the segmentation process. Our key innovation is semantic descriptors, a new textual representation of segmentation masks where each image patch is mapped to its corresponding text label. This unified representation allows seamless integration into the auto-regressive training pipeline of MLLMs for easier optimization. We demonstrate that representing an image with $16\times16$ semantic descriptors yields competitive segmentation performance. To enhance efficiency, we introduce the Row-wise Run-Length Encoding (R-RLE), which compresses redundant text sequences, reducing the length of semantic descriptors by 74% and accelerating inference by $3\times$, without compromising performance. Extensive experiments across various vision tasks, such as referring expression segmentation and comprehension, show that Text4Seg achieves state-of-the-art performance on multiple datasets by fine-tuning different MLLM backbones. Our approach provides an efficient, scalable solution for vision-centric tasks within the MLLM framework. △ Less

Submitted 13 October, 2024; originally announced October 2024.

Comments: Code is available at https://github.com/mc-lan/Text4Seg

arXiv:2410.08431 [pdf]

oRetrieval Augmented Generation for 10 Large Language Models and its Generalizability in Assessing Medical Fitness

Authors: Yu He Ke, Liyuan Jin, Kabilan Elangovan, Hairil Rizal Abdullah, Nan Liu, Alex Tiong Heng Sia, Chai Rick Soh, Joshua Yi Min Tung, Jasmine Chiat Ling Ong, Chang-Fu Kuo, Shao-Chun Wu, Vesela P. Kovacheva, Daniel Shu Wei Ting

Abstract: Large Language Models (LLMs) show potential for medical applications but often lack specialized clinical knowledge. Retrieval Augmented Generation (RAG) allows customization with domain-specific information, making it suitable for healthcare. This study evaluates the accuracy, consistency, and safety of RAG models in determining fitness for surgery and providing preoperative instructions. We devel… ▽ More Large Language Models (LLMs) show potential for medical applications but often lack specialized clinical knowledge. Retrieval Augmented Generation (RAG) allows customization with domain-specific information, making it suitable for healthcare. This study evaluates the accuracy, consistency, and safety of RAG models in determining fitness for surgery and providing preoperative instructions. We developed LLM-RAG models using 35 local and 23 international preoperative guidelines and tested them against human-generated responses. A total of 3,682 responses were evaluated. Clinical documents were processed using Llamaindex, and 10 LLMs, including GPT3.5, GPT4, and Claude-3, were assessed. Fourteen clinical scenarios were analyzed, focusing on seven aspects of preoperative instructions. Established guidelines and expert judgment were used to determine correct responses, with human-generated answers serving as comparisons. The LLM-RAG models generated responses within 20 seconds, significantly faster than clinicians (10 minutes). The GPT4 LLM-RAG model achieved the highest accuracy (96.4% vs. 86.6%, p=0.016), with no hallucinations and producing correct instructions comparable to clinicians. Results were consistent across both local and international guidelines. This study demonstrates the potential of LLM-RAG models for preoperative healthcare tasks, highlighting their efficiency, scalability, and reliability. △ Less

Submitted 10 October, 2024; originally announced October 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2402.01733

arXiv:2410.08228 [pdf, other]

Multi-Atlas Brain Network Classification through Consistency Distillation and Complementary Information Fusion

Authors: Jiaxing Xu, Mengcheng Lan, Xia Dong, Kai He, Wei Zhang, Qingtian Bian, Yiping Ke

Abstract: In the realm of neuroscience, identifying distinctive patterns associated with neurological disorders via brain networks is crucial. Resting-state functional magnetic resonance imaging (fMRI) serves as a primary tool for mapping these networks by correlating blood-oxygen-level-dependent (BOLD) signals across different brain regions, defined as regions of interest (ROIs). Constructing these brain n… ▽ More In the realm of neuroscience, identifying distinctive patterns associated with neurological disorders via brain networks is crucial. Resting-state functional magnetic resonance imaging (fMRI) serves as a primary tool for mapping these networks by correlating blood-oxygen-level-dependent (BOLD) signals across different brain regions, defined as regions of interest (ROIs). Constructing these brain networks involves using atlases to parcellate the brain into ROIs based on various hypotheses of brain division. However, there is no standard atlas for brain network classification, leading to limitations in detecting abnormalities in disorders. Some recent methods have proposed utilizing multiple atlases, but they neglect consistency across atlases and lack ROI-level information exchange. To tackle these limitations, we propose an Atlas-Integrated Distillation and Fusion network (AIDFusion) to improve brain network classification using fMRI data. AIDFusion addresses the challenge of utilizing multiple atlases by employing a disentangle Transformer to filter out inconsistent atlas-specific information and distill distinguishable connections across atlases. It also incorporates subject- and population-level consistency constraints to enhance cross-atlas consistency. Additionally, AIDFusion employs an inter-atlas message-passing mechanism to fuse complementary information across brain regions. Experimental results on four datasets of different diseases demonstrate the effectiveness and efficiency of AIDFusion compared to state-of-the-art methods. A case study illustrates AIDFusion extract patterns that are both interpretable and consistent with established neuroscience findings. △ Less

Submitted 28 September, 2024; originally announced October 2024.

arXiv:2410.05739 [pdf, other]

Array2BR: An End-to-End Noise-immune Binaural Audio Synthesis from Microphone-array Signals

Authors: Cheng Chi, Xiaoyu Li, Andong Li, Yuxuan Ke, Xiaodong Li, Chengshi Zheng

Abstract: Telepresence technology aims to provide an immersive virtual presence for remote conference applications, and it is extremely important to synthesize high-quality binaural audio signals for this aim. Because the ambient noise is often inevitable in practical application scenarios, it is highly desired that binaural audio signals without noise can be obtained from microphone-array signals directly.… ▽ More Telepresence technology aims to provide an immersive virtual presence for remote conference applications, and it is extremely important to synthesize high-quality binaural audio signals for this aim. Because the ambient noise is often inevitable in practical application scenarios, it is highly desired that binaural audio signals without noise can be obtained from microphone-array signals directly. For this purpose, this paper proposes a new end-to-end noise-immune binaural audio synthesis framework from microphone-array signals, abbreviated as Array2BR, and experimental results show that binaural cues can be correctly mapped and noise can be well suppressed simultaneously using the proposed framework. Compared with existing methods, the proposed method achieved better performance in terms of both objective and subjective metric scores. △ Less

Submitted 8 October, 2024; originally announced October 2024.

arXiv:2410.05142 [pdf, other]

Nonadiabatic Quantum Dynamics of Molecules Scattering from Metal Surfaces

Authors: Riley J. Preston, Yaling Ke, Samuel L. Rudge, Nils Hertl, Raffaele Borrelli, Reinhard J. Maurer, Michael Thoss

Abstract: Nonadiabatic coupling between electrons and molecular motion at metal surfaces leads to energy dissipation and dynamical steering effects during chemical surface dynamics. We present a theoretical approach to the scattering of molecules from metal surfaces that incorporates all nonadiabatic and quantum nuclear effects due to the coupling of the molecular degrees of freedom to the electrons in the… ▽ More Nonadiabatic coupling between electrons and molecular motion at metal surfaces leads to energy dissipation and dynamical steering effects during chemical surface dynamics. We present a theoretical approach to the scattering of molecules from metal surfaces that incorporates all nonadiabatic and quantum nuclear effects due to the coupling of the molecular degrees of freedom to the electrons in the metal. This is achieved with the hierarchical equations of motion (HEOM) approach combined with a matrix product state representation in twin space. The method is applied to the scattering of nitric oxide from Au(111), for which strongly nonadiabatic energy loss during scattering has been experimentally observed, thus presenting a significant theoretical challenge. Since the HEOM approach treats the molecule-surface coupling exactly, it captures the interplay between nonadiabatic and quantum nuclear effects. Finally, the data obtained by the HEOM approach is used as a rigorous benchmark to assess various mixed quantum-classical methods, from which we derive insights into the mechanisms of energy dissipation and the suitable working regimes of each method. △ Less

Submitted 7 October, 2024; originally announced October 2024.

arXiv:2409.10944 [pdf, other]

Contrasformer: A Brain Network Contrastive Transformer for Neurodegenerative Condition Identification

Authors: Jiaxing Xu, Kai He, Mengcheng Lan, Qingtian Bian, Wei Li, Tieying Li, Yiping Ke, Miao Qiao

Abstract: Understanding neurological disorder is a fundamental problem in neuroscience, which often requires the analysis of brain networks derived from functional magnetic resonance imaging (fMRI) data. Despite the prevalence of Graph Neural Networks (GNNs) and Graph Transformers in various domains, applying them to brain networks faces challenges. Specifically, the datasets are severely impacted by the no… ▽ More Understanding neurological disorder is a fundamental problem in neuroscience, which often requires the analysis of brain networks derived from functional magnetic resonance imaging (fMRI) data. Despite the prevalence of Graph Neural Networks (GNNs) and Graph Transformers in various domains, applying them to brain networks faces challenges. Specifically, the datasets are severely impacted by the noises caused by distribution shifts across sub-populations and the neglect of node identities, both obstruct the identification of disease-specific patterns. To tackle these challenges, we propose Contrasformer, a novel contrastive brain network Transformer. It generates a prior-knowledge-enhanced contrast graph to address the distribution shifts across sub-populations by a two-stream attention mechanism. A cross attention with identity embedding highlights the identity of nodes, and three auxiliary losses ensure group consistency. Evaluated on 4 functional brain network datasets over 4 different diseases, Contrasformer outperforms the state-of-the-art methods for brain networks by achieving up to 10.8\% improvement in accuracy, which demonstrates its efficacy in neurological disorder identification. Case studies illustrate its interpretability, especially in the context of neuroscience. This paper provides a solution for analyzing brain networks, offering valuable insights into neurological disorders. Our code is available at \url{https://github.com/AngusMonroe/Contrasformer}. △ Less

Submitted 17 September, 2024; originally announced September 2024.

arXiv:2409.06213 [pdf, other]

BACKRUNNER: Mitigating Smart Contract Attacks in the Real World

Authors: Chaofan Shou, Yuanyu Ke, Yupeng Yang, Qi Su, Or Dadosh, Assaf Eli, David Benchimol, Doudou Lu, Daniel Tong, Dex Chen, Zoey Tan, Jacob Chia, Koushik Sen, Wenke Lee

Abstract: Billions of dollars have been lost due to vulnerabilities in smart contracts. To counteract this, researchers have proposed attack frontrunning protections designed to preempt malicious transactions by inserting "whitehat" transactions ahead of them to protect the assets. In this paper, we demonstrate that existing frontrunning protections have become ineffective in real-world scenarios. Specifica… ▽ More Billions of dollars have been lost due to vulnerabilities in smart contracts. To counteract this, researchers have proposed attack frontrunning protections designed to preempt malicious transactions by inserting "whitehat" transactions ahead of them to protect the assets. In this paper, we demonstrate that existing frontrunning protections have become ineffective in real-world scenarios. Specifically, we collected 158 recent real-world attack transactions and discovered that 141 of them can bypass state-of-the-art frontrunning protections. We systematically analyze these attacks and show how inherent limitations of existing frontrunning techniques hinder them from protecting valuable assets in the real world. We then propose a new approach involving 1) preemptive hijack, and 2) attack backrunning, which circumvent the existing limitations and can help protect assets before and after an attack. Our approach adapts the exploit used in the attack to the same or similar contracts before and after the attack to safeguard the assets. We conceptualize adapting exploits as a program repair problem and apply established techniques to implement our approach into a full-fledged framework, BACKRUNNER. Running on previous attacks in 2023, BACKRUNNER can successfully rescue more than \$410M. In the real world, it has helped rescue over \$11.2M worth of assets in 28 separate incidents within two months. △ Less

Submitted 10 September, 2024; originally announced September 2024.

arXiv:2408.15281 [pdf]

NeR-VCP: A Video Content Protection Method Based on Implicit Neural Representation

Authors: Yangping Lin, Yan Ke, Ke Niu, Jia Liu, Xiaoyuan Yang

Abstract: With the popularity of video applications, the security of video content has emerged as a pressing issue that demands urgent attention. Most video content protection methods mainly rely on encryption technology, which needs to be manually designed or implemented in an experience-based manner. To address this problem, we propose an automatic encryption technique for video content protection based o… ▽ More With the popularity of video applications, the security of video content has emerged as a pressing issue that demands urgent attention. Most video content protection methods mainly rely on encryption technology, which needs to be manually designed or implemented in an experience-based manner. To address this problem, we propose an automatic encryption technique for video content protection based on implicit neural representation. We design a key-controllable module, which serves as a key for encryption and decryption. NeR-VCP first pre-distributes the key-controllable module trained by the sender to the recipients, and then uses Implicit Neural Representation (INR) with a (pre-distributed) key-controllable module to encrypt plain video as an implicit neural network, and the legal recipients uses a pre-distributed key-controllable module to decrypt this cipher neural network (the corresponding implicit neural network). Under the guidance of the key-controllable design, our method can improve the security of video content and provide a novel video encryption scheme. Moreover, using model compression techniques, this method can achieve video content protection while effectively mitigating the amount of encrypted data transferred. We experimentally find that it has superior performance in terms of visual representation, imperceptibility to illegal users, and security from a cryptographic viewpoint. △ Less

Submitted 20 August, 2024; originally announced August 2024.

arXiv:2408.11808 [pdf, ps, other]

Distance Correlation in Multiple Biased Sampling Models

Authors: Yuwei Ke, Hok Kan Ling, Yanglei Song

Abstract: Testing the independence between random vectors is a fundamental problem in statistics. Distance correlation, a recently popular dependence measure, is universally consistent for testing independence against all distributions with finite moments. However, when data are subject to selection bias or collected from multiple sources or schemes, spurious dependence may arise. This creates a need for me… ▽ More Testing the independence between random vectors is a fundamental problem in statistics. Distance correlation, a recently popular dependence measure, is universally consistent for testing independence against all distributions with finite moments. However, when data are subject to selection bias or collected from multiple sources or schemes, spurious dependence may arise. This creates a need for methods that can effectively utilize data from different sources and correct these biases. In this paper, we study the estimation of distance covariance and distance correlation under multiple biased sampling models, which provide a natural framework for addressing these issues. Theoretical properties, including the strong consistency and asymptotic null distributions of the distance covariance and correlation estimators, and the rate at which the test statistic diverges under sequences of alternatives approaching the null, are established. A weighted permutation procedure is proposed to determine the critical value of the independence test. Simulation studies demonstrate that our approach improves both the estimation of distance correlation and the power of the test. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.04883 [pdf, other]

ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation

Authors: Mengcheng Lan, Chaofeng Chen, Yiping Ke, Xinjiang Wang, Litong Feng, Wayne Zhang

Abstract: Open-vocabulary semantic segmentation requires models to effectively integrate visual representations with open-vocabulary semantic labels. While Contrastive Language-Image Pre-training (CLIP) models shine in recognizing visual concepts from text, they often struggle with segment coherence due to their limited localization ability. In contrast, Vision Foundation Models (VFMs) excel at acquiring sp… ▽ More Open-vocabulary semantic segmentation requires models to effectively integrate visual representations with open-vocabulary semantic labels. While Contrastive Language-Image Pre-training (CLIP) models shine in recognizing visual concepts from text, they often struggle with segment coherence due to their limited localization ability. In contrast, Vision Foundation Models (VFMs) excel at acquiring spatially consistent local visual representations, yet they fall short in semantic understanding. This paper introduces ProxyCLIP, an innovative framework designed to harmonize the strengths of both CLIP and VFMs, facilitating enhanced open-vocabulary semantic segmentation. ProxyCLIP leverages the spatial feature correspondence from VFMs as a form of proxy attention to augment CLIP, thereby inheriting the VFMs' robust local consistency and maintaining CLIP's exceptional zero-shot transfer capacity. We propose an adaptive normalization and masking strategy to get the proxy attention from VFMs, allowing for adaptation across different VFMs. Remarkably, as a training-free approach, ProxyCLIP significantly improves the average mean Intersection over Union (mIoU) across eight benchmarks from 40.3 to 44.4, showcasing its exceptional efficacy in bridging the gap between spatial precision and semantic richness for the open-vocabulary segmentation task. △ Less

Submitted 9 August, 2024; originally announced August 2024.

Comments: Accepted to ECCV 2024. Code available at https://github.com/mc-lan/ProxyCLIP

arXiv:2407.12822 [pdf]

Lightweight Large Language Model for Medication Enquiry: Med-Pal

Authors: Kabilan Elangovan, Jasmine Chiat Ling Ong, Liyuan Jin, Benjamin Jun Jie Seng, Yu Heng Kwan, Lit Soo Tan, Ryan Jian Zhong, Justina Koi Li Ma, YuHe Ke, Nan Liu, Kathleen M Giacomini, Daniel Shu Wei Ting

Abstract: Large Language Models (LLMs) have emerged as a potential solution to assist digital health development with patient education, commonly medication-related enquires. We trained and validated Med-Pal, a medication domain-specific LLM-chatbot fine-tuned with a fine-grained and expert curated dataset from a selection of five light-weighted open-source LLMs of smaller parameter size (7 billion or less)… ▽ More Large Language Models (LLMs) have emerged as a potential solution to assist digital health development with patient education, commonly medication-related enquires. We trained and validated Med-Pal, a medication domain-specific LLM-chatbot fine-tuned with a fine-grained and expert curated dataset from a selection of five light-weighted open-source LLMs of smaller parameter size (7 billion or less) regarding computational constraints and prioritizing operational efficiency. A multi-disciplinary team performed a clinical evaluation of LLMs responses using the SCORE criteria, focusing on safety, accuracy, bias, reproducibility, and ease of understanding. Best performing light-weighted LLM was chosen as Med-Pal for further engineering with guard-railing using adversarial prompting. Med-Pal and existing light-weighted LLMs, including pretrained Biomistral and finetuned Meerkat, were validated on an independent dataset on a broad range of medication-related questions (231 in total), 12 different question types across 14 different medication classes. Mistral-7b emerged as the top performer among selected lightweight LLMs, achieving the highest median score of 14 and 71.9% high-quality responses in accuracy and safety domains, hence chosen as the backbone LLM for Med-Pal. When compared against Biomistral, Med-pal outperformed in generating responses appropriate for patient communication, with significant reductions bias and errors typical of general LLMs. Comparable performance was observed when comparing Med-Pal with Meerkat. Med-Pal showcases the feasibility of developing and employing fine-tuned light-weighted LLMs to enhance digital health communications. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.12442 [pdf, other]

ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference

Authors: Mengcheng Lan, Chaofeng Chen, Yiping Ke, Xinjiang Wang, Litong Feng, Wayne Zhang

Abstract: Despite the success of large-scale pretrained Vision-Language Models (VLMs) especially CLIP in various open-vocabulary tasks, their application to semantic segmentation remains challenging, producing noisy segmentation maps with mis-segmented regions. In this paper, we carefully re-investigate the architecture of CLIP, and identify residual connections as the primary source of noise that degrades… ▽ More Despite the success of large-scale pretrained Vision-Language Models (VLMs) especially CLIP in various open-vocabulary tasks, their application to semantic segmentation remains challenging, producing noisy segmentation maps with mis-segmented regions. In this paper, we carefully re-investigate the architecture of CLIP, and identify residual connections as the primary source of noise that degrades segmentation quality. With a comparative analysis of statistical properties in the residual connection and the attention output across different pretrained models, we discover that CLIP's image-text contrastive training paradigm emphasizes global features at the expense of local discriminability, leading to noisy segmentation results. In response, we propose ClearCLIP, a novel approach that decomposes CLIP's representations to enhance open-vocabulary semantic segmentation. We introduce three simple modifications to the final layer: removing the residual connection, implementing the self-self attention, and discarding the feed-forward network. ClearCLIP consistently generates clearer and more accurate segmentation maps and outperforms existing approaches across multiple benchmarks, affirming the significance of our discoveries. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: Accepted to ECCV 2024. code available at https://github.com/mc- lan/ClearCLIP

arXiv:2407.11034 [pdf]

Bridging Data Gaps in Healthcare: A Scoping Review of Transfer Learning in Biomedical Data Analysis

Authors: Siqi Li, Xin Li, Kunyu Yu, Di Miao, Mingcheng Zhu, Mengying Yan, Yuhe Ke, Danny D'Agostino, Yilin Ning, Qiming Wu, Ziwen Wang, Yuqing Shang, Molei Liu, Chuan Hong, Nan Liu

Abstract: Clinical and biomedical research in low-resource settings often faces significant challenges due to the need for high-quality data with sufficient sample sizes to construct effective models. These constraints hinder robust model training and prompt researchers to seek methods for leveraging existing knowledge from related studies to support new research efforts. Transfer learning (TL), a machine l… ▽ More Clinical and biomedical research in low-resource settings often faces significant challenges due to the need for high-quality data with sufficient sample sizes to construct effective models. These constraints hinder robust model training and prompt researchers to seek methods for leveraging existing knowledge from related studies to support new research efforts. Transfer learning (TL), a machine learning technique, emerges as a powerful solution by utilizing knowledge from pre-trained models to enhance the performance of new models, offering promise across various healthcare domains. Despite its conceptual origins in the 1990s, the application of TL in medical research has remained limited, especially beyond image analysis. In our review of TL applications in structured clinical and biomedical data, we screened 3,515 papers, with 55 meeting the inclusion criteria. Among these, only 2% (one out of 55) utilized external studies, and 7% (four out of 55) addressed scenarios involving multi-site collaborations with privacy constraints. To achieve actionable TL with structured medical data while addressing regional disparities, inequality, and privacy constraints in healthcare research, we advocate for the careful identification of appropriate source data and models, the selection of suitable TL frameworks, and the validation of TL models with proper baselines. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2406.09385 [pdf, other]

Towards Vision-Language Geo-Foundation Model: A Survey

Authors: Yue Zhou, Litong Feng, Yiping Ke, Xue Jiang, Junchi Yan, Xue Yang, Wayne Zhang

Abstract: Vision-Language Foundation Models (VLFMs) have made remarkable progress on various multimodal tasks, such as image captioning, image-text retrieval, visual question answering, and visual grounding. However, most methods rely on training with general image datasets, and the lack of geospatial data leads to poor performance on earth observation. Numerous geospatial image-text pair datasets and VLFMs… ▽ More Vision-Language Foundation Models (VLFMs) have made remarkable progress on various multimodal tasks, such as image captioning, image-text retrieval, visual question answering, and visual grounding. However, most methods rely on training with general image datasets, and the lack of geospatial data leads to poor performance on earth observation. Numerous geospatial image-text pair datasets and VLFMs fine-tuned on them have been proposed recently. These new approaches aim to leverage large-scale, multimodal geospatial data to build versatile intelligent models with diverse geo-perceptive capabilities, which we refer to as Vision-Language Geo-Foundation Models (VLGFMs). This paper thoroughly reviews VLGFMs, summarizing and analyzing recent developments in the field. In particular, we introduce the background and motivation behind the rise of VLGFMs, highlighting their unique research significance. Then, we systematically summarize the core technologies employed in VLGFMs, including data construction, model architectures, and applications of various multimodal geospatial tasks. Finally, we conclude with insights, issues, and discussions regarding future research directions. To the best of our knowledge, this is the first comprehensive literature review of VLGFMs. We keep tracing related works at https://github.com/zytx121/Awesome-VLGFM. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 18 pages, 4 figures

arXiv:2406.05130 [pdf, other]

An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models

Authors: Xiongtao Zhou, Jie He, Yuhua Ke, Guangyao Zhu, Víctor Gutiérrez-Basulto, Jeff Z. Pan

Abstract: Multimodal large language models (MLLMs) fine-tuned with multimodal instruction datasets have demonstrated remarkable capabilities in multimodal tasks. However, fine-tuning all parameters of MLLMs has become challenging as they usually contain billions of parameters. To address this issue, we study parameter-efficient fine-tuning (PEFT) methods for MLLMs. We aim to identify effective methods for e… ▽ More Multimodal large language models (MLLMs) fine-tuned with multimodal instruction datasets have demonstrated remarkable capabilities in multimodal tasks. However, fine-tuning all parameters of MLLMs has become challenging as they usually contain billions of parameters. To address this issue, we study parameter-efficient fine-tuning (PEFT) methods for MLLMs. We aim to identify effective methods for enhancing the performance of MLLMs in scenarios where only a limited number of parameters are trained. This paper conducts empirical studies using four popular PEFT methods to fine-tune the LLM component of open-source MLLMs. We present a comprehensive analysis that encompasses various aspects, including the impact of PEFT methods on various models, parameters and location of the PEFT module, size of fine-tuning data, model stability based on PEFT methods, MLLM's generalization, and hallucination. We evaluated four PEFT methods on seven datasets from two different categories: unseen and seen datasets. Across all experiments, we show that the adapter is the best-performing PEFT method. At the same time, fine-tuning the connector layers leads to improved performance in most MLLMs. Code and data are available at https://github.com/alenai97/PEFT-MLLM.git. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: ACL finding 2024

arXiv:2405.08428 [pdf, other]

A Low-Power Spike Detector Using In-Memory Computing for Event-based Neural Frontend

Authors: Ye Ke, Arindam Basu

Abstract: With the sensor scaling of next-generation Brain-Machine Interface (BMI) systems, the massive A/D conversion and analog multiplexing at the neural frontend poses a challenge in terms of power and data rates for wireless and implantable BMIs. While previous works have reported the neuromorphic compression of neural signal, further compression requires integration of spike detectors on chip. In this… ▽ More With the sensor scaling of next-generation Brain-Machine Interface (BMI) systems, the massive A/D conversion and analog multiplexing at the neural frontend poses a challenge in terms of power and data rates for wireless and implantable BMIs. While previous works have reported the neuromorphic compression of neural signal, further compression requires integration of spike detectors on chip. In this work, we propose an efficient HRAM-based spike detector using In-memory computing for compressive event-based neural frontend. Our proposed method involves detecting spikes from event pulses without reconstructing the signal and uses a 10T hybrid in-memory computing bitcell for the accumulation and thresholding operations. We show that our method ensures a spike detection accuracy of 92-99% for neural signal inputs while consuming only 13.8 nW per channel in 65 nm CMOS. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: Originally submitted at IEEE ISCAS 2024

arXiv:2404.01804 [pdf, other]

Neuromorphic Wireless Device-Edge Co-Inference via the Directed Information Bottleneck

Authors: Yuzhen Ke, Zoran Utkovski, Mehdi Heshmati, Osvaldo Simeone, Johannes Dommel, Slawomir Stanczak

Abstract: An important use case of next-generation wireless systems is device-edge co-inference, where a semantic task is partitioned between a device and an edge server. The device carries out data collection and partial processing of the data, while the remote server completes the given task based on information received from the device. It is often required that processing and communication be run as eff… ▽ More An important use case of next-generation wireless systems is device-edge co-inference, where a semantic task is partitioned between a device and an edge server. The device carries out data collection and partial processing of the data, while the remote server completes the given task based on information received from the device. It is often required that processing and communication be run as efficiently as possible at the device, while more computing resources are available at the edge. To address such scenarios, we introduce a new system solution, termed neuromorphic wireless device-edge co-inference. According to it, the device runs sensing, processing, and communication units using neuromorphic hardware, while the server employs conventional radio and computing technologies. The proposed system is designed using a transmitter-centric information-theoretic criterion that targets a reduction of the communication overhead, while retaining the most relevant information for the end-to-end semantic task of interest. Numerical results on standard data sets validate the proposed architecture, and a preliminary testbed realization is reported. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 8 pages

arXiv:2403.12762 [pdf, ps, other]

Smooth helically symmetric transonic flows with nonzero vorticity in a concentric cylinder

Authors: Yi Ke, Shangkun Weng

Abstract: This paper concerns the structural stability of smooth cylindrical symmetric transonic flows in a concentric cylinder under helically symmetric perturbation of suitable boundary conditions. The deformation-curl decomposition developed by the second author and his collaborator is utilized to effectively decouple the elliptic-hyperbolic mixed structure in the steady compressible Euler equation. A ke… ▽ More This paper concerns the structural stability of smooth cylindrical symmetric transonic flows in a concentric cylinder under helically symmetric perturbation of suitable boundary conditions. The deformation-curl decomposition developed by the second author and his collaborator is utilized to effectively decouple the elliptic-hyperbolic mixed structure in the steady compressible Euler equation. A key parameter in the helical symmetry is the step (denoted by $σ$), which denotes the magnitude of the translation along the symmetry axis after rotating one full turn. It is shown that the step determines the type of the first order partial differential system satisfied by the radial and vertical velocity. There exists a critical number $σ_{*}$ depending only on the background transonic flows, such that if $0<σ<σ_{*}$, one can prove the existence and uniqueness of smooth helically symmetric transonic flows with nonzero vorticity. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.05881 [pdf, other]

KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques

Authors: Rui Yang, Haoran Liu, Edison Marrese-Taylor, Qingcheng Zeng, Yu He Ke, Wanxin Li, Lechao Cheng, Qingyu Chen, James Caverlee, Yutaka Matsuo, Irene Li

Abstract: Large language models (LLMs) have demonstrated impressive generative capabilities with the potential to innovate in medicine. However, the application of LLMs in real clinical settings remains challenging due to the lack of factual consistency in the generated content. In this work, we develop an augmented LLM framework, KG-Rank, which leverages a medical knowledge graph (KG) along with ranking an… ▽ More Large language models (LLMs) have demonstrated impressive generative capabilities with the potential to innovate in medicine. However, the application of LLMs in real clinical settings remains challenging due to the lack of factual consistency in the generated content. In this work, we develop an augmented LLM framework, KG-Rank, which leverages a medical knowledge graph (KG) along with ranking and re-ranking techniques, to improve the factuality of long-form question answering (QA) in the medical domain. Specifically, when receiving a question, KG-Rank automatically identifies medical entities within the question and retrieves the related triples from the medical KG to gather factual information. Subsequently, KG-Rank innovatively applies multiple ranking techniques to refine the ordering of these triples, providing more relevant and precise information for LLM inference. To the best of our knowledge, KG-Rank is the first application of KG combined with ranking models in medical QA specifically for generating long answers. Evaluation on four selected medical QA datasets demonstrates that KG-Rank achieves an improvement of over 18% in ROUGE-L score. Additionally, we extend KG-Rank to open domains, including law, business, music, and history, where it realizes a 14% improvement in ROUGE-L score, indicating the effectiveness and great potential of KG-Rank. △ Less

Submitted 4 July, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

Comments: 12 pages, 9 figures, 8 tables

arXiv:2403.05235 [pdf]

Fairness-Aware Interpretable Modeling (FAIM) for Trustworthy Machine Learning in Healthcare

Authors: Mingxuan Liu, Yilin Ning, Yuhe Ke, Yuqing Shang, Bibhas Chakraborty, Marcus Eng Hock Ong, Roger Vaughan, Nan Liu

Abstract: The escalating integration of machine learning in high-stakes fields such as healthcare raises substantial concerns about model fairness. We propose an interpretable framework - Fairness-Aware Interpretable Modeling (FAIM), to improve model fairness without compromising performance, featuring an interactive interface to identify a "fairer" model from a set of high-performing models and promoting t… ▽ More The escalating integration of machine learning in high-stakes fields such as healthcare raises substantial concerns about model fairness. We propose an interpretable framework - Fairness-Aware Interpretable Modeling (FAIM), to improve model fairness without compromising performance, featuring an interactive interface to identify a "fairer" model from a set of high-performing models and promoting the integration of data-driven evidence and clinical expertise to enhance contextualized fairness. We demonstrated FAIM's value in reducing sex and race biases by predicting hospital admission with two real-world databases, MIMIC-IV-ED and SGH-ED. We show that for both datasets, FAIM models not only exhibited satisfactory discriminatory performance but also significantly mitigated biases as measured by well-established fairness metrics, outperforming commonly used bias-mitigation methods. Our approach demonstrates the feasibility of improving fairness without sacrificing performance and provides an a modeling mode that invites domain experts to engage, fostering a multidisciplinary effort toward tailored AI fairness. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.01704 [pdf]

Giant second harmonic generation in supertwisted WS2 spirals grown in step edge particle induced non-Euclidean surfaces

Authors: Tong Tong, Ruijie Chen, Yuxuan Ke, Qian Wang, Xinchao Wang, Qinjun Sun, Jie Chen, Zhiyuan Gu, Ying Yu, Hongyan Wei, Yuying Hao, Xiaopeng Fan, Qing Zhang

Abstract: In moiré crystals resulting from the stacking of twisted two-dimensional (2D) layered materials, a subtle adjustment in the twist angle surprisingly gives rise to a wide range of correlated optical and electrical properties. Herein, we report the synthesis of supertwisted WS2 spirals and the observation of giant second harmonic generation (SHG) in these spirals. Supertwisted WS2 spirals featuring… ▽ More In moiré crystals resulting from the stacking of twisted two-dimensional (2D) layered materials, a subtle adjustment in the twist angle surprisingly gives rise to a wide range of correlated optical and electrical properties. Herein, we report the synthesis of supertwisted WS2 spirals and the observation of giant second harmonic generation (SHG) in these spirals. Supertwisted WS2 spirals featuring different twist angles are synthesized on a Euclidean or step-edge particle-induced non-Euclidean surface using a carefully designed water-assisted chemical vapor deposition. We observed an oscillatory dependence of SHG intensity on layer number, attributed to atomically phase-matched nonlinear dipoles within layers of supertwisted spiral crystals where inversion symmetry is restored. Through an investigation into the twist angle evolution of SHG intensity, we discovered that the stacking model between layers plays a crucial role in determining the nonlinearity, and the SHG signals in supertwisted spirals exhibit enhancements by a factor of 2 to 136 when compared with the SHG of the single-layer structure. These findings provide an efficient method for the rational growth of 2D twisted structures and the implementation of twist angle adjustable endowing them great potential for exploring strong coupling correlation physics and applications in the field of twistronics. △ Less

Submitted 19 July, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

Comments: 26 pages, 4 figures

arXiv:2402.11273 [pdf, other]

Semi-supervised Medical Image Segmentation Method Based on Cross-pseudo Labeling Leveraging Strong and Weak Data Augmentation Strategies

Authors: Yifei Chen, Chenyan Zhang, Yifan Ke, Yiyu Huang, Xuezhou Dai, Feiwei Qin, Yongquan Zhang, Xiaodong Zhang, Changmiao Wang

Abstract: Traditional supervised learning methods have historically encountered certain constraints in medical image segmentation due to the challenging collection process, high labeling cost, low signal-to-noise ratio, and complex features characterizing biomedical images. This paper proposes a semi-supervised model, DFCPS, which innovatively incorporates the Fixmatch concept. This significantly enhances t… ▽ More Traditional supervised learning methods have historically encountered certain constraints in medical image segmentation due to the challenging collection process, high labeling cost, low signal-to-noise ratio, and complex features characterizing biomedical images. This paper proposes a semi-supervised model, DFCPS, which innovatively incorporates the Fixmatch concept. This significantly enhances the model's performance and generalizability through data augmentation processing, employing varied strategies for unlabeled data. Concurrently, the model design gives appropriate emphasis to the generation, filtration, and refinement processes of pseudo-labels. The novel concept of cross-pseudo-supervision is introduced, integrating consistency learning with self-training. This enables the model to fully leverage pseudo-labels from multiple perspectives, thereby enhancing training diversity. The DFCPS model is compared with both baseline and advanced models using the publicly accessible Kvasir-SEG dataset. Across all four subdivisions containing different proportions of unlabeled data, our model consistently exhibits superior performance. Our source code is available at https://github.com/JustlfC03/DFCPS. △ Less

Submitted 17 February, 2024; originally announced February 2024.

Comments: 5 pages, 2 figures, accept ISBI2024

Journal ref: ISBI 2024

arXiv:2402.10083 [pdf]

Fine-tuning Large Language Model (LLM) Artificial Intelligence Chatbots in Ophthalmology and LLM-based evaluation using GPT-4

Authors: Ting Fang Tan, Kabilan Elangovan, Liyuan Jin, Yao Jie, Li Yong, Joshua Lim, Stanley Poh, Wei Yan Ng, Daniel Lim, Yuhe Ke, Nan Liu, Daniel Shu Wei Ting

Abstract: Purpose: To assess the alignment of GPT-4-based evaluation to human clinician experts, for the evaluation of responses to ophthalmology-related patient queries generated by fine-tuned LLM chatbots. Methods: 400 ophthalmology questions and paired answers were created by ophthalmologists to represent commonly asked patient questions, divided into fine-tuning (368; 92%), and testing (40; 8%). We find… ▽ More Purpose: To assess the alignment of GPT-4-based evaluation to human clinician experts, for the evaluation of responses to ophthalmology-related patient queries generated by fine-tuned LLM chatbots. Methods: 400 ophthalmology questions and paired answers were created by ophthalmologists to represent commonly asked patient questions, divided into fine-tuning (368; 92%), and testing (40; 8%). We find-tuned 5 different LLMs, including LLAMA2-7b, LLAMA2-7b-Chat, LLAMA2-13b, and LLAMA2-13b-Chat. For the testing dataset, additional 8 glaucoma QnA pairs were included. 200 responses to the testing dataset were generated by 5 fine-tuned LLMs for evaluation. A customized clinical evaluation rubric was used to guide GPT-4 evaluation, grounded on clinical accuracy, relevance, patient safety, and ease of understanding. GPT-4 evaluation was then compared against ranking by 5 clinicians for clinical alignment. Results: Among all fine-tuned LLMs, GPT-3.5 scored the highest (87.1%), followed by LLAMA2-13b (80.9%), LLAMA2-13b-chat (75.5%), LLAMA2-7b-Chat (70%) and LLAMA2-7b (68.8%) based on the GPT-4 evaluation. GPT-4 evaluation demonstrated significant agreement with human clinician rankings, with Spearman and Kendall Tau correlation coefficients of 0.90 and 0.80 respectively; while correlation based on Cohen Kappa was more modest at 0.50. Notably, qualitative analysis and the glaucoma sub-analysis revealed clinical inaccuracies in the LLM-generated responses, which were appropriately identified by the GPT-4 evaluation. Conclusion: The notable clinical alignment of GPT-4 evaluation highlighted its potential to streamline the clinical evaluation of LLM chatbot responses to healthcare-related queries. By complementing the existing clinician-dependent manual grading, this efficient and automated evaluation could assist the validation of future developments in LLM applications for healthcare. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 13 Pages, 1 Figure, 8 Tables

arXiv:2402.01741 [pdf]

Development and Testing of a Novel Large Language Model-Based Clinical Decision Support Systems for Medication Safety in 12 Clinical Specialties

Authors: Jasmine Chiat Ling Ong, Liyuan Jin, Kabilan Elangovan, Gilbert Yong San Lim, Daniel Yan Zheng Lim, Gerald Gui Ren Sng, Yuhe Ke, Joshua Yi Min Tung, Ryan Jian Zhong, Christopher Ming Yao Koh, Keane Zhi Hao Lee, Xiang Chen, Jack Kian Chng, Aung Than, Ken Junyang Goh, Daniel Shu Wei Ting

Abstract: Importance: We introduce a novel Retrieval Augmented Generation (RAG)-Large Language Model (LLM) framework as a Clinical Decision Support Systems (CDSS) to support safe medication prescription. Objective: To evaluate the efficacy of LLM-based CDSS in correctly identifying medication errors in different patient case vignettes from diverse medical and surgical sub-disciplines, against a human expe… ▽ More Importance: We introduce a novel Retrieval Augmented Generation (RAG)-Large Language Model (LLM) framework as a Clinical Decision Support Systems (CDSS) to support safe medication prescription. Objective: To evaluate the efficacy of LLM-based CDSS in correctly identifying medication errors in different patient case vignettes from diverse medical and surgical sub-disciplines, against a human expert panel derived ground truth. We compared performance for under 2 different CDSS practical healthcare integration modalities: LLM-based CDSS alone (fully autonomous mode) vs junior pharmacist + LLM-based CDSS (co-pilot, assistive mode). Design, Setting, and Participants: Utilizing a RAG model with state-of-the-art medically-related LLMs (GPT-4, Gemini Pro 1.0 and Med-PaLM 2), this study used 61 prescribing error scenarios embedded into 23 complex clinical vignettes across 12 different medical and surgical specialties. A multidisciplinary expert panel assessed these cases for Drug-Related Problems (DRPs) using the PCNE classification and graded severity / potential for harm using revised NCC MERP medication error index. We compared. Results RAG-LLM performed better compared to LLM alone. When employed in a co-pilot mode, accuracy, recall, and F1 scores were optimized, indicating effectiveness in identifying moderate to severe DRPs. The accuracy of DRP detection with RAG-LLM improved in several categories but at the expense of lower precision. Conclusions This study established that a RAG-LLM based CDSS significantly boosts the accuracy of medication error identification when used alongside junior pharmacists (co-pilot), with notable improvements in detecting severe DRPs. This study also illuminates the comparative performance of current state-of-the-art LLMs in RAG-based CDSS systems. △ Less

Submitted 17 February, 2024; v1 submitted 29 January, 2024; originally announced February 2024.

arXiv:2402.01733 [pdf]

Development and Testing of Retrieval Augmented Generation in Large Language Models -- A Case Study Report

Authors: YuHe Ke, Liyuan Jin, Kabilan Elangovan, Hairil Rizal Abdullah, Nan Liu, Alex Tiong Heng Sia, Chai Rick Soh, Joshua Yi Min Tung, Jasmine Chiat Ling Ong, Daniel Shu Wei Ting

Abstract: Purpose: Large Language Models (LLMs) hold significant promise for medical applications. Retrieval Augmented Generation (RAG) emerges as a promising approach for customizing domain knowledge in LLMs. This case study presents the development and evaluation of an LLM-RAG pipeline tailored for healthcare, focusing specifically on preoperative medicine. Methods: We developed an LLM-RAG model using 3… ▽ More Purpose: Large Language Models (LLMs) hold significant promise for medical applications. Retrieval Augmented Generation (RAG) emerges as a promising approach for customizing domain knowledge in LLMs. This case study presents the development and evaluation of an LLM-RAG pipeline tailored for healthcare, focusing specifically on preoperative medicine. Methods: We developed an LLM-RAG model using 35 preoperative guidelines and tested it against human-generated responses, with a total of 1260 responses evaluated. The RAG process involved converting clinical documents into text using Python-based frameworks like LangChain and Llamaindex, and processing these texts into chunks for embedding and retrieval. Vector storage techniques and selected embedding models to optimize data retrieval, using Pinecone for vector storage with a dimensionality of 1536 and cosine similarity for loss metrics. Human-generated answers, provided by junior doctors, were used as a comparison. Results: The LLM-RAG model generated answers within an average of 15-20 seconds, significantly faster than the 10 minutes typically required by humans. Among the basic LLMs, GPT4.0 exhibited the best accuracy of 80.1%. This accuracy was further increased to 91.4% when the model was enhanced with RAG. Compared to the human-generated instructions, which had an accuracy of 86.3%, the performance of the GPT4.0 RAG model demonstrated non-inferiority (p=0.610). Conclusions: In this case study, we demonstrated a LLM-RAG model for healthcare implementation. The pipeline shows the advantages of grounded knowledge, upgradability, and scalability as important aspects of healthcare LLM deployment. △ Less

Submitted 29 January, 2024; originally announced February 2024.

Comments: NA

arXiv:2401.17081 [pdf, ps, other]

Shortcuts to adiabatic Thouless pumping

Authors: Wenjie Liu, Yongguan Ke, Chaohong Lee

Abstract: Thouless pumping, the quantized transport of particles in a cyclic adiabatic evolution, faces a challenge: slow driving may exceed the coherent time, while fast driving may break quantization. To address this dilemma, we propose to speed up Thouless pumping using shortcuts to adiabaticity. By using counterdiabatic theory, we analytically derive the controlled Hamiltonian for implementing dispersio… ▽ More Thouless pumping, the quantized transport of particles in a cyclic adiabatic evolution, faces a challenge: slow driving may exceed the coherent time, while fast driving may break quantization. To address this dilemma, we propose to speed up Thouless pumping using shortcuts to adiabaticity. By using counterdiabatic theory, we analytically derive the controlled Hamiltonian for implementing dispersion-suppressed Thouless pumping beyond the adiabatic regime. Compared to traditional Thouless pumping methods, our fast topological pumping approach offers remarkable advantages. Firstly, it enables a substantial reduction of pumping time up to 11 orders of magnitude faster than the traditional approach. Secondly, our method effectively suppresses wavepacket diffusion, further enhancing its efficiency. Furthermore, we demonstrate the resilience of our protocol against moderate noise levels. Our study offers a practical and efficient method for achieving fast topological pumping beyond the adiabatic regime. △ Less

Submitted 7 February, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: Comments and suggestions are welcome

arXiv:2401.14589 [pdf]

Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias

Authors: Yu He Ke, Rui Yang, Sui An Lie, Taylor Xin Yi Lim, Hairil Rizal Abdullah, Daniel Shu Wei Ting, Nan Liu

Abstract: Background: Cognitive biases in clinical decision-making significantly contribute to errors in diagnosis and suboptimal patient outcomes. Addressing these biases presents a formidable challenge in the medical field. Objective: This study explores the role of large language models (LLMs) in mitigating these biases through the utilization of a multi-agent framework. We simulate the clinical decisi… ▽ More Background: Cognitive biases in clinical decision-making significantly contribute to errors in diagnosis and suboptimal patient outcomes. Addressing these biases presents a formidable challenge in the medical field. Objective: This study explores the role of large language models (LLMs) in mitigating these biases through the utilization of a multi-agent framework. We simulate the clinical decision-making processes through multi-agent conversation and evaluate its efficacy in improving diagnostic accuracy. Methods: A total of 16 published and unpublished case reports where cognitive biases have resulted in misdiagnoses were identified from the literature. In the multi-agent framework, we leveraged GPT-4 to facilitate interactions among four simulated agents to replicate clinical team dynamics. Each agent has a distinct role: 1) To make the final diagnosis after considering the discussions, 2) The devil's advocate and correct confirmation and anchoring bias, 3) The tutor and facilitator of the discussion to reduce premature closure bias, and 4) To record and summarize the findings. A total of 80 simulations were evaluated for the accuracy of initial diagnosis, top differential diagnosis and final two differential diagnoses. Results: In a total of 80 responses evaluating both initial and final diagnoses, the initial diagnosis had an accuracy of 0% (0/80), but following multi-agent discussions, the accuracy for the top differential diagnosis increased to 71.3% (57/80), and for the final two differential diagnoses, to 80.0% (64/80). Conclusions: The framework demonstrated an ability to re-evaluate and correct misconceptions, even in scenarios with misleading initial investigations. The LLM-driven multi-agent conversation framework shows promise in enhancing diagnostic accuracy in diagnostically challenging medical scenarios. △ Less

Submitted 12 May, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: 21 pages, 3 figures

arXiv:2401.10906 [pdf, other]

Topological pumping induced by spatiotemporal modulation of interaction

Authors: Boning Huang, Yongguan Ke, Wenjie Liu, Chaohong Lee

Abstract: Particle-particle interaction provides a new degree of freedom to induce novel topological phenomena. Here, we propose to use spatiotemporal modulation of interaction to realize topological pumping without single-particle counterpart. Because the modulation breaks time-reversal symmetry, the multiparticle energy bands of bound states have none-zero Chern number, and support topological bound edge… ▽ More Particle-particle interaction provides a new degree of freedom to induce novel topological phenomena. Here, we propose to use spatiotemporal modulation of interaction to realize topological pumping without single-particle counterpart. Because the modulation breaks time-reversal symmetry, the multiparticle energy bands of bound states have none-zero Chern number, and support topological bound edge states. In a Thouless pump, a bound state that uniformly occupies a topological energy band can be shifted by integer unit cells per cycle, consistent with the corresponding Chern number. We can also realize topological pumping of bound edge state from one end to another. The entanglement entropy between particles rapidly increases at transition points, which is related to the spatial spread of a bounded pair. In addition, we propose to realize hybridized pumping with fractional displacement per cycle by adding an extra tilt potential to separate topological pumping of the bound state and Bloch oscillations of single particle. Our work could trigger further studies of correlated topological phenomena that do not have a single-particle counterpart. △ Less

Submitted 7 January, 2024; originally announced January 2024.

Comments: Comments and suggestions are welcome

arXiv:2312.15664 [pdf, other]

Interaction-induced multiparticle bound states in the continuum

Authors: Boning Huang, Yongguan Ke, Honghua Zhong, Yuri S. Kivshar, Chaohong Lee

Abstract: Bound states in the continuum (BICs) are localized modes residing in the radiation continuum. They were first predicted for single-particle states, and became a general feature of many wave systems. In many-body quantum physics, it is still unclear what would be a close analog of BICs, and whether interparticle interaction may induce BICs. Here, we predict a novel type of multiparticle states in t… ▽ More Bound states in the continuum (BICs) are localized modes residing in the radiation continuum. They were first predicted for single-particle states, and became a general feature of many wave systems. In many-body quantum physics, it is still unclear what would be a close analog of BICs, and whether interparticle interaction may induce BICs. Here, we predict a novel type of multiparticle states in the interaction-modulated Bose-Hubbard model that can be associated with the BIC concept. Under periodic boundary conditions, a so-called quasi-BIC appears as a bound pair residing in a standing wave formed by the third particle. Under open boundary conditions, such a hybrid state becomes an eigenstate of the system. We demonstrate that the Thouless pumping of the quasi-BICs can be realized by modulating the onsite interactions in space and time. Surprisingly, while the center-of-mass of the quasi-BIC is shifted by a unit cell in one cycle, the bound pair moves in the opposite direction with the standing wave. △ Less

Submitted 14 September, 2024; v1 submitted 25 December, 2023; originally announced December 2023.

Comments: Accepted by Phys. Rev. Lett

arXiv:2312.05868 [pdf, other]

doi 10.1007/s11433-023-2362-x

A discovery of Two Slow Pulsars with FAST: "Ronin" from the Globular Cluster M15

Authors: Dengke Zhou, Pei Wang, Di Li, Jianhua Fang, Chenchen Miao, Paulo C. C. Freire, Lei Zhang, Dandan Zhang, Huaxi Chen, Yi Feng, Yifan Xiao, Jintao Xie, Xu Zhang, Chenwu Jin, Han Wang, Yinan Ke, Xuerong Guo, Rushuang Zhao, Chenhui Niu, Weiwei Zhu, Mengyao Xue, Yabiao Wang, Jiafu Wu, Zhenye Gan, Zhongyi Sun , et al. (4 additional authors not shown)

Abstract: Globular clusters harbor numerous millisecond pulsars, but long-period pulsars ($P \gtrsim 100$ ms) are rarely found. In this study, we employed a fast folding algorithm to analyze observational data from multiple globular clusters obtained by the Five-hundred-meter Aperture Spherical radio Telescope (FAST), aiming to detect the existence of long-period pulsars. We estimated the impact of the medi… ▽ More Globular clusters harbor numerous millisecond pulsars, but long-period pulsars ($P \gtrsim 100$ ms) are rarely found. In this study, we employed a fast folding algorithm to analyze observational data from multiple globular clusters obtained by the Five-hundred-meter Aperture Spherical radio Telescope (FAST), aiming to detect the existence of long-period pulsars. We estimated the impact of the median filtering algorithm in eliminating red noise on the minimum detectable flux density ($S_{\rm min}$) of pulsars. Subsequently, we successfully discovered two isolated long-period pulsars in M15 with periods approximately equal to 1.928451 seconds and 3.960716 seconds, respectively. On the $P-\dot{P}$ diagram, both pulsars are positioned below the spin-up line, suggesting a possible history of partial recycling in X-ray binary systems disrupted by dynamical encounters later on. According to timing results, these two pulsars exhibit remarkably strong magnetic fields. If the magnetic fields were weakened during the accretion process, then a short duration of accretion might explain the strong magnetic fields of these pulsars. △ Less

Submitted 18 April, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

Comments: Accepted by SCIENCE CHINA Physics, Mechanics & Astronomy

Journal ref: Sci. China-Phys. Mech. Astron. 67, 269512 (2024)

arXiv:2312.04743 [pdf, other]

Hiding Functions within Functions: Steganography by Implicit Neural Representations

Authors: Jia Liu, Peng Luo, Yan Ke

Abstract: Deep steganography utilizes the powerful capabilities of deep neural networks to embed and extract messages, but its reliance on an additional message extractor limits its practical use due to the added suspicion it can raise from steganalyzers. To address this problem, we propose StegaINR, which utilizes Implicit Neural Representation (INR) to implement steganography. StegaINR embeds a secret fun… ▽ More Deep steganography utilizes the powerful capabilities of deep neural networks to embed and extract messages, but its reliance on an additional message extractor limits its practical use due to the added suspicion it can raise from steganalyzers. To address this problem, we propose StegaINR, which utilizes Implicit Neural Representation (INR) to implement steganography. StegaINR embeds a secret function into a stego function, which serves as both the message extractor and the stego media for secure transmission on a public channel. Recipients need only use a shared key to recover the secret function from the stego function, allowing them to obtain the secret message. Our approach makes use of continuous functions, enabling it to handle various types of messages. To our knowledge, this is the first work to introduce INR into steganography. We performed evaluations on image and climate data to test our method in different deployment contexts. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2311.18161 [pdf, other]

Random Green's function method for large-scale electronic structure calculation

Authors: Mingfa Tang, Chang Liu, Aixia Zhang, Qingyun Zhang, Shengjun Yuan, Youqi Ke

Abstract: We report a linear-scaling random Green's function (rGF) method for large-scale electronic structure calculation. In this method, the rGF is defined on a set of random states to stochastically express the density matrix, and rGF is calculated with the linear-scaling computational cost. We show the rGF method is generally applicable to the nonorthogonal localized basis, and circumvent the large Che… ▽ More We report a linear-scaling random Green's function (rGF) method for large-scale electronic structure calculation. In this method, the rGF is defined on a set of random states to stochastically express the density matrix, and rGF is calculated with the linear-scaling computational cost. We show the rGF method is generally applicable to the nonorthogonal localized basis, and circumvent the large Chebyshev expansion for the density matrix. As a demonstration, we implement rGF with density-functional Tight-Binding method and apply it to self-consistently calculate water clusters up 9984 H2Os. We find the rGF method combining with a simple fragment correction can reach an error of ~1meV per H2O in total energy, compared to the deterministic calculations, due to the self-average. The development of rGF method advances the stochastic electronic structure theory to a new stage of the efficiency and applicability. △ Less

Submitted 3 March, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: 5 pages, 2 figures

arXiv:2311.17066 [pdf]

Cluster trajectory of SOFA score in predicting mortality in sepsis

Authors: Yuhe Ke, Matilda Swee Sun Tang, Celestine Jia Ling Loh, Hairil Rizal Abdullah, Nicholas Brian Shannon

Abstract: Objective: Sepsis is a life-threatening condition. Sequential Organ Failure Assessment (SOFA) score is commonly used to assess organ dysfunction and predict ICU mortality, but it is taken as a static measurement and fails to capture dynamic changes. This study aims to investigate the relationship between dynamic changes in SOFA scores over the first 72 hours of ICU admission and patient outcomes.… ▽ More Objective: Sepsis is a life-threatening condition. Sequential Organ Failure Assessment (SOFA) score is commonly used to assess organ dysfunction and predict ICU mortality, but it is taken as a static measurement and fails to capture dynamic changes. This study aims to investigate the relationship between dynamic changes in SOFA scores over the first 72 hours of ICU admission and patient outcomes. Design, setting, and participants: 3,253 patients in the Medical Information Mart for Intensive Care IV database who met the sepsis-3 criteria and were admitted from the emergency department with at least 72 hours of ICU admission and full-active resuscitation status were analysed. Group-based trajectory modelling with dynamic time warping and k-means clustering identified distinct trajectory patterns in dynamic SOFA scores. They were subsequently compared using Python. Main outcome measures: Outcomes including hospital and ICU mortality, length of stay in hospital and ICU, and readmission during hospital stay, were collected. Discharge time from ICU to wards and cut-offs at 7-day and 14-day were taken. Results: Four clusters were identified: A (consistently low SOFA scores), B (rapid increase followed by a decline in SOFA scores), C (higher baseline scores with gradual improvement), and D (persistently elevated scores). Cluster D had the longest ICU and hospital stays, highest ICU and hospital mortality. Discharge rates from ICU were similar for Clusters A and B, while Cluster C had initially comparable rates but a slower transition to ward. Conclusion: Monitoring dynamic changes in SOFA score is valuable for assessing sepsis severity and treatment responsiveness. △ Less

Submitted 23 November, 2023; originally announced November 2023.

Comments: 26 pages, 4 figures, 2 tables

arXiv:2311.11771 [pdf, ps, other]

Floquet Engineering of Hilbert Space Fragmentation in Stark Lattices

Authors: Li Zhang, Yongguan Ke, Ling Lin, Chaohong Lee

Abstract: The concept of Hilbert space fragmentation (HSF) has recently been put forward as a routine to break quantum ergodicity. While HSF widely exists in dynamical constraint models, it is still challenging to tune HSF. Here, we propose a scheme to tune HSF in a one-dimensional tilted lattice of interacting spinless fermions with periodically driven tunneling. The dynamics is governed by effective Hamil… ▽ More The concept of Hilbert space fragmentation (HSF) has recently been put forward as a routine to break quantum ergodicity. While HSF widely exists in dynamical constraint models, it is still challenging to tune HSF. Here, we propose a scheme to tune HSF in a one-dimensional tilted lattice of interacting spinless fermions with periodically driven tunneling. The dynamics is governed by effective Hamiltonians with kinetic constraints, which appear as density-dependent tunneling in the weak-tunneling perturbation expansion. The kinetic constraint can be tuned via changing the driving frequency, and three different kinds of strong HSF can be engineered. In general, the system is strongly constrained and exhibits a strong HSF. Two partial resonance frequencies are analytically given by a time-dependent perturbation theory for Floquet systems, at which some kinetic constraints are released and the system exhibits another two different strong HSF. We demonstrate the perturbation analysis with exact numerical simulation of the entanglement entropy, the density correlation functions and the saturated local density profiles. Our result provides a promising way to control HSF through Floquet engineering. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2310.19132

Solitary solutions of a nonlinear Dirac equation with different frequencies

Authors: Qi Guo, Yuanyuan Ke

Abstract: We study the existence and nonexistence of solitary solutions with different frequencies for a type of nonlinear extension of Dirac-Slater model. There are three main ingredients in this paper. The first is the Pohozaev's identity of nonlinear Dirac equations. Combine with variational identity, we find the nonexistence results when the frequency $ω$ is greater than $m$. The second is critical poin… ▽ More We study the existence and nonexistence of solitary solutions with different frequencies for a type of nonlinear extension of Dirac-Slater model. There are three main ingredients in this paper. The first is the Pohozaev's identity of nonlinear Dirac equations. Combine with variational identity, we find the nonexistence results when the frequency $ω$ is greater than $m$. The second is critical point theorem of strongly indefinite functionals. With this, we obtain existence result of $ω\in (-m,m)$. The third, which is the new main ingredient of this paper, is perturbation of the functional from the second ingredient. Then we can show the existence of solitary solutions when $ω=-m$. An interesting outcome from our result is that we can see the left and right are completely different in {\it Spectrum Zero Problem} which implies a new phenomenon in quantum theory. △ Less

Submitted 13 November, 2023; v1 submitted 29 October, 2023; originally announced October 2023.

Comments: Need modification

arXiv:2310.17874 [pdf, other]

SmooSeg: Smoothness Prior for Unsupervised Semantic Segmentation

Authors: Mengcheng Lan, Xinjiang Wang, Yiping Ke, Jiaxing Xu, Litong Feng, Wayne Zhang

Abstract: Unsupervised semantic segmentation is a challenging task that segments images into semantic groups without manual annotation. Prior works have primarily focused on leveraging prior knowledge of semantic consistency or priori concepts from self-supervised learning methods, which often overlook the coherence property of image segments. In this paper, we demonstrate that the smoothness prior, asserti… ▽ More Unsupervised semantic segmentation is a challenging task that segments images into semantic groups without manual annotation. Prior works have primarily focused on leveraging prior knowledge of semantic consistency or priori concepts from self-supervised learning methods, which often overlook the coherence property of image segments. In this paper, we demonstrate that the smoothness prior, asserting that close features in a metric space share the same semantics, can significantly simplify segmentation by casting unsupervised semantic segmentation as an energy minimization problem. Under this paradigm, we propose a novel approach called SmooSeg that harnesses self-supervised learning methods to model the closeness relationships among observations as smoothness signals. To effectively discover coherent semantic segments, we introduce a novel smoothness loss that promotes piecewise smoothness within segments while preserving discontinuities across different segments. Additionally, to further enhance segmentation quality, we design an asymmetric teacher-student style predictor that generates smoothly updated pseudo labels, facilitating an optimal fit between observations and labeling outputs. Thanks to the rich supervision cues of the smoothness prior, our SmooSeg significantly outperforms STEGO in terms of pixel accuracy on three datasets: COCOStuff (+14.9%), Cityscapes (+13.0%), and Potsdam-3 (+5.7%). △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: Accepted by NeurIPS 2023. Code available: https://github.com/mc-lan/SmooSeg

arXiv:2310.11686 [pdf, ps, other]

Deflation conjecture and local dimensions of Brent equations

Authors: Xin Li, Liping Zhang, Yifen Ke

Abstract: In this paper, a classical deflation process raised by Dayton, Li and Zeng is realized for the Brent equations, which provides new bounds for local dimensions of the solution set. Originally, this deflation process focuses on isolated solutions. We generalize it to the case of irreducible components and a related conjecture is given. We analyze its realization and apply it to the Brent equations.… ▽ More In this paper, a classical deflation process raised by Dayton, Li and Zeng is realized for the Brent equations, which provides new bounds for local dimensions of the solution set. Originally, this deflation process focuses on isolated solutions. We generalize it to the case of irreducible components and a related conjecture is given. We analyze its realization and apply it to the Brent equations. The decrease of the nullities is easily observed. So the deflation process can be served as a useful tool for determining the local dimensions. In addition, our result implies that along with the decrease of the tensor rank, the singular solutions will become more and more. △ Less

Submitted 2 June, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: The previous Remark 5.1 is clarified

arXiv:2310.06437 [pdf, other]

Skeleton Ground Truth Extraction: Methodology, Annotation Tool and Benchmarks

Authors: Cong Yang, Bipin Indurkhya, John See, Bo Gao, Yan Ke, Zeyd Boukhers, Zhenyu Yang, Marcin Grzegorzek

Abstract: Skeleton Ground Truth (GT) is critical to the success of supervised skeleton extraction methods, especially with the popularity of deep learning techniques. Furthermore, we see skeleton GTs used not only for training skeleton detectors with Convolutional Neural Networks (CNN) but also for evaluating skeleton-related pruning and matching algorithms. However, most existing shape and image datasets s… ▽ More Skeleton Ground Truth (GT) is critical to the success of supervised skeleton extraction methods, especially with the popularity of deep learning techniques. Furthermore, we see skeleton GTs used not only for training skeleton detectors with Convolutional Neural Networks (CNN) but also for evaluating skeleton-related pruning and matching algorithms. However, most existing shape and image datasets suffer from the lack of skeleton GT and inconsistency of GT standards. As a result, it is difficult to evaluate and reproduce CNN-based skeleton detectors and algorithms on a fair basis. In this paper, we present a heuristic strategy for object skeleton GT extraction in binary shapes and natural images. Our strategy is built on an extended theory of diagnosticity hypothesis, which enables encoding human-in-the-loop GT extraction based on clues from the target's context, simplicity, and completeness. Using this strategy, we developed a tool, SkeView, to generate skeleton GT of 17 existing shape and image datasets. The GTs are then structurally evaluated with representative methods to build viable baselines for fair comparisons. Experiments demonstrate that GTs generated by our strategy yield promising quality with respect to standard consistency, and also provide a balance between simplicity and completeness. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: Accepted for publication in the International Journal of Computer Vision (IJCV)

arXiv:2310.02778 [pdf, other]

Integrating UMLS Knowledge into Large Language Models for Medical Question Answering

Authors: Rui Yang, Edison Marrese-Taylor, Yuhe Ke, Lechao Cheng, Qingyu Chen, Irene Li

Abstract: Large language models (LLMs) have demonstrated powerful text generation capabilities, bringing unprecedented innovation to the healthcare field. While LLMs hold immense promise for applications in healthcare, applying them to real clinical scenarios presents significant challenges, as these models may generate content that deviates from established medical facts and even exhibit potential biases.… ▽ More Large language models (LLMs) have demonstrated powerful text generation capabilities, bringing unprecedented innovation to the healthcare field. While LLMs hold immense promise for applications in healthcare, applying them to real clinical scenarios presents significant challenges, as these models may generate content that deviates from established medical facts and even exhibit potential biases. In our research, we develop an augmented LLM framework based on the Unified Medical Language System (UMLS), aiming to better serve the healthcare community. We employ LLaMa2-13b-chat and ChatGPT-3.5 as our benchmark models, and conduct automatic evaluations using the ROUGE Score and BERTScore on 104 questions from the LiveQA test set. Additionally, we establish criteria for physician-evaluation based on four dimensions: Factuality, Completeness, Readability and Relevancy. ChatGPT-3.5 is used for physician evaluation with 20 questions on the LiveQA test set. Multiple resident physicians conducted blind reviews to evaluate the generated content, and the results indicate that this framework effectively enhances the factuality, completeness, and relevance of generated content. Our research demonstrates the effectiveness of using UMLS-augmented LLMs and highlights the potential application value of LLMs in in medical question-answering. △ Less

Submitted 13 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: 12 pages, 3 figures

arXiv:2309.11747 [pdf, other]

MarkNerf:Watermarking for Neural Radiance Field

Authors: Lifeng Chen, Jia Liu, Yan Ke, Wenquan Sun, Weina Dong, Xiaozhong Pan

Abstract: A watermarking algorithm is proposed in this paper to address the copyright protection issue of implicit 3D models. The algorithm involves embedding watermarks into the images in the training set through an embedding network, and subsequently utilizing the NeRF model for 3D modeling. A copyright verifier is employed to generate a backdoor image by providing a secret perspective as input to the neu… ▽ More A watermarking algorithm is proposed in this paper to address the copyright protection issue of implicit 3D models. The algorithm involves embedding watermarks into the images in the training set through an embedding network, and subsequently utilizing the NeRF model for 3D modeling. A copyright verifier is employed to generate a backdoor image by providing a secret perspective as input to the neural radiation field. Subsequently, a watermark extractor is devised using the hyperparameterization method of the neural network to extract the embedded watermark image from that perspective. In a black box scenario, if there is a suspicion that the 3D model has been used without authorization, the verifier can extract watermarks from a secret perspective to verify network copyright. Experimental results demonstrate that the proposed algorithm effectively safeguards the copyright of 3D models. Furthermore, the extracted watermarks exhibit favorable visual effects and demonstrate robust resistance against various types of noise attacks. △ Less

Submitted 20 September, 2023; originally announced September 2023.

arXiv:2309.10503 [pdf, other]

Steganography for Neural Radiance Fields by Backdooring

Authors: Weina Dong, Jia Liu, Yan Ke, Lifeng Chen, Wenquan Sun, Xiaozhong Pan

Abstract: The utilization of implicit representation for visual data (such as images, videos, and 3D models) has recently gained significant attention in computer vision research. In this letter, we propose a novel model steganography scheme with implicit neural representation. The message sender leverages Neural Radiance Fields (NeRF) and its viewpoint synthesis capabilities by introducing a viewpoint as a… ▽ More The utilization of implicit representation for visual data (such as images, videos, and 3D models) has recently gained significant attention in computer vision research. In this letter, we propose a novel model steganography scheme with implicit neural representation. The message sender leverages Neural Radiance Fields (NeRF) and its viewpoint synthesis capabilities by introducing a viewpoint as a key. The NeRF model generates a secret viewpoint image, which serves as a backdoor. Subsequently, we train a message extractor using overfitting to establish a one-to-one mapping between the secret message and the secret viewpoint image. The sender delivers the trained NeRF model and the message extractor to the receiver over the open channel, and the receiver utilizes the key shared by both parties to obtain the rendered image in the secret view from the NeRF model, and then obtains the secret message through the message extractor. The inherent complexity of the viewpoint information prevents attackers from stealing the secret message accurately. Experimental results demonstrate that the message extractor trained in this letter achieves high-capacity steganography with fast performance, achieving a 100\% accuracy in message extraction. Furthermore, the extensive viewpoint key space of NeRF ensures the security of the steganography scheme. △ Less

Submitted 19 September, 2023; originally announced September 2023.

Comments: 6 pages, 7 figures

arXiv:2309.04802 [pdf, other]

doi 10.1145/3583780.3615512

CPMR: Context-Aware Incremental Sequential Recommendation with Pseudo-Multi-Task Learning

Authors: Qingtian Bian, Jiaxing Xu, Hui Fang, Yiping Ke

Abstract: The motivations of users to make interactions can be divided into static preference and dynamic interest. To accurately model user representations over time, recent studies in sequential recommendation utilize information propagation and evolution to mine from batches of arriving interactions. However, they ignore the fact that people are easily influenced by the recent actions of other users in t… ▽ More The motivations of users to make interactions can be divided into static preference and dynamic interest. To accurately model user representations over time, recent studies in sequential recommendation utilize information propagation and evolution to mine from batches of arriving interactions. However, they ignore the fact that people are easily influenced by the recent actions of other users in the contextual scenario, and applying evolution across all historical interactions dilutes the importance of recent ones, thus failing to model the evolution of dynamic interest accurately. To address this issue, we propose a Context-Aware Pseudo-Multi-Task Recommender System (CPMR) to model the evolution in both historical and contextual scenarios by creating three representations for each user and item under different dynamics: static embedding, historical temporal states, and contextual temporal states. To dually improve the performance of temporal states evolution and incremental recommendation, we design a Pseudo-Multi-Task Learning (PMTL) paradigm by stacking the incremental single-target recommendations into one multi-target task for joint optimization. Within the PMTL paradigm, CPMR employs a shared-bottom network to conduct the evolution of temporal states across historical and contextual scenarios, as well as the fusion of them at the user-item level. In addition, CPMR incorporates one real tower for incremental predictions, and two pseudo towers dedicated to updating the respective temporal states based on new batches of interactions. Experimental results on four benchmark recommendation datasets show that CPMR consistently outperforms state-of-the-art baselines and achieves significant gains on three of them. The code is available at: https://github.com/DiMarzioBian/CPMR. △ Less

Submitted 16 September, 2023; v1 submitted 9 September, 2023; originally announced September 2023.

Comments: Accepted by CIKM 2023. Alias: "Modeling Context-Aware Temporal Dynamics via Pseudo-Multi-Task Learning"

Journal ref: ACM International Conference on Information and Knowledge Management(CIKM '23), October 21-25,2023,Birmingham,United Kingdom

arXiv:2308.14051 [pdf]

3D Printed Multilayer Structures for High Numerical Aperture Achromatic Metalenses

Authors: Cheng-Feng Pan, Hao Wang, Hongtao Wang, Parvathi Nair S, Qifeng Ruan, Simon Wredh, Yujie Ke, John You En Chan, Wang Zhang, Cheng-Wei Qiu, Joel K. W. Yang

Abstract: Flat optics consisting of nanostructures of high-refractive-index materials produce lenses with thin form factors that tend to operate only at specific wavelengths. Recent attempts to achieve achromatic lenses uncover a trade-off between the numerical aperture (NA) and bandwidth, which limits performance. Here we propose a new approach to design high NA, broadband and polarization-insensitive mult… ▽ More Flat optics consisting of nanostructures of high-refractive-index materials produce lenses with thin form factors that tend to operate only at specific wavelengths. Recent attempts to achieve achromatic lenses uncover a trade-off between the numerical aperture (NA) and bandwidth, which limits performance. Here we propose a new approach to design high NA, broadband and polarization-insensitive multilayer achromatic metalenses (MAM). We combine topology optimization and full wave simulations to inversely design MAMs and fabricate the structures in low-refractive-index materials by two-photon polymerization lithography. MAMs measuring 20 micrometer in diameter operating in the visible range of 400-800 nm with 0.5 NA and 0.7 NA were achieved with efficiencies of up to 42%. We demonstrate broadband imaging performance of the fabricated MAM under white light, and RGB narrowband illuminations. These results highlight the potential of the 3D printed multilayer structures for realizing broadband and multi-functional meta-devices with inverse design. △ Less

Submitted 27 August, 2023; originally announced August 2023.

Comments: 37 pages, 15 figures

arXiv:2308.04164 [pdf, other]

doi 10.1103/PhysRevB.108.174204

Calculations of Chern number: equivalence of real-space and twisted-boundary-condition formulae

Authors: Ling Lin, Yongguan Ke, Li Zhang, Chaohong Lee

Abstract: Chern number is a crucial invariant for characterizing topological feature of two-dimensional quantum systems. Real-space Chern number allows us to extract topological properties of systems without involving translational symmetry, and hence plays an important role in investigating topological systems with disorder or impurity. On the other hand, the twisted boundary condition (TBC) can also be us… ▽ More Chern number is a crucial invariant for characterizing topological feature of two-dimensional quantum systems. Real-space Chern number allows us to extract topological properties of systems without involving translational symmetry, and hence plays an important role in investigating topological systems with disorder or impurity. On the other hand, the twisted boundary condition (TBC) can also be used to define the Chern number in the absence of translational symmetry. Based on the perturbative nature of the TBC under appropriate gauges, we derive the two real-space formulae of Chern number (namely the non-commutative Chern number and the Bott index formula), which are numerically confirmed for the Chern insulator and the quantum spin Hall insulator. Our results not only establish the equivalence between the real-space and TBC formula of the Chern number, but also provide concrete and instructive examples for deriving the real-space topological invariant through the twisted boundary condition. △ Less

Submitted 7 November, 2023; v1 submitted 8 August, 2023; originally announced August 2023.

Comments: none

arXiv:2307.11133 [pdf, other]

doi 10.1109/TMI.2024.3392988

Contrastive Graph Pooling for Explainable Classification of Brain Networks

Authors: Jiaxing Xu, Qingtian Bian, Xinhang Li, Aihu Zhang, Yiping Ke, Miao Qiao, Wei Zhang, Wei Khang Jeremy Sim, Balázs Gulyás

Abstract: Functional magnetic resonance imaging (fMRI) is a commonly used technique to measure neural activation. Its application has been particularly important in identifying underlying neurodegenerative conditions such as Parkinson's, Alzheimer's, and Autism. Recent analysis of fMRI data models the brain as a graph and extracts features by graph neural networks (GNNs). However, the unique characteristics… ▽ More Functional magnetic resonance imaging (fMRI) is a commonly used technique to measure neural activation. Its application has been particularly important in identifying underlying neurodegenerative conditions such as Parkinson's, Alzheimer's, and Autism. Recent analysis of fMRI data models the brain as a graph and extracts features by graph neural networks (GNNs). However, the unique characteristics of fMRI data require a special design of GNN. Tailoring GNN to generate effective and domain-explainable features remains challenging. In this paper, we propose a contrastive dual-attention block and a differentiable graph pooling method called ContrastPool to better utilize GNN for brain networks, meeting fMRI-specific requirements. We apply our method to 5 resting-state fMRI brain network datasets of 3 diseases and demonstrate its superiority over state-of-the-art baselines. Our case study confirms that the patterns extracted by our method match the domain knowledge in neuroscience literature, and disclose direct and interesting insights. Our contributions underscore the potential of ContrastPool for advancing the understanding of brain networks and neurodegenerative conditions. The source code is available at https://github.com/AngusMonroe/ContrastPool. △ Less

Submitted 6 September, 2024; v1 submitted 7 July, 2023; originally announced July 2023.

Journal ref: IEEE Transactions on Medical Imaging, vol. 43, no. 9, pp. 3292-3305, Sept. 2024

arXiv:2307.08979 [pdf, ps, other]

Scalable Auction Algorithms for Bipartite Maximum Matching Problems

Authors: Quanquan C. Liu, Yiduo Ke, Samir Khuller

Abstract: In this paper, we give new auction algorithms for maximum weighted bipartite matching (MWM) and maximum cardinality bipartite $b$-matching (MCbM). Our algorithms run in $O\left(\log n/\varepsilon^8\right)$ and $O\left(\log n/\varepsilon^2\right)$ rounds, respectively, in the blackboard distributed setting. We show that our MWM algorithm can be implemented in the distributed, interactive setting us… ▽ More In this paper, we give new auction algorithms for maximum weighted bipartite matching (MWM) and maximum cardinality bipartite $b$-matching (MCbM). Our algorithms run in $O\left(\log n/\varepsilon^8\right)$ and $O\left(\log n/\varepsilon^2\right)$ rounds, respectively, in the blackboard distributed setting. We show that our MWM algorithm can be implemented in the distributed, interactive setting using $O(\log^2 n)$ and $O(\log n)$ bit messages, respectively, directly answering the open question posed by Demange, Gale and Sotomayor [DNO14]. Furthermore, we implement our algorithms in a variety of other models including the the semi-streaming model, the shared-memory work-depth model, and the massively parallel computation model. Our semi-streaming MWM algorithm uses $O(1/\varepsilon^8)$ passes in $O(n \log n \cdot \log(1/\varepsilon))$ space and our MCbM algorithm runs in $O(1/\varepsilon^2)$ passes using $O\left(\left(\sum_{i \in L} b_i + |R|\right)\log(1/\varepsilon)\right)$ space (where parameters $b_i$ represent the degree constraints on the $b$-matching and $L$ and $R$ represent the left and right side of the bipartite graph, respectively). Both of these algorithms improves \emph{exponentially} the dependence on $\varepsilon$ in the space complexity in the semi-streaming model against the best-known algorithms for these problems, in addition to improvements in round complexity for MCbM. Finally, our algorithms eliminate the large polylogarithmic dependence on $n$ in depth and number of rounds in the work-depth and massively parallel computation models, respectively, improving on previous results which have large polylogarithmic dependence on $n$ (and exponential dependence on $\varepsilon$ in the MPC model). △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: To appear in APPROX 2023

arXiv:2305.17880 [pdf]

Ultra-small topological spin textures with size of 1.3nm at above room temperature in Fe78Si9B13 amorphous alloy

Authors: Weiwei Wu, Huaping Zhang, Hong Wang, Chao Chang, Hongyu Jiang, Jinfeng Li, Zhichao Lv, Laiquan Shen, Hanqiu Jiang, Chunyong He, Yubin Ke, Yuhua Su, Kosuke Hiroi, Zhendong Fu, Zi-An Li, Lin Gu, Maozhi Li, Dong Ma, Haiyang Bai

Abstract: Topologically protected spin textures, such as skyrmions1,2 and vortices3,4, are robust against perturbations, serving as the building blocks for a range of topological devices5-9. In order to implement these topological devices, it is necessary to find ultra-small topological spin textures at room temperature, because small size implies the higher topological charge density, stronger signal of to… ▽ More Topologically protected spin textures, such as skyrmions1,2 and vortices3,4, are robust against perturbations, serving as the building blocks for a range of topological devices5-9. In order to implement these topological devices, it is necessary to find ultra-small topological spin textures at room temperature, because small size implies the higher topological charge density, stronger signal of topological transport10,11 and the higher memory density or integration for topological quantum devices5-9. However, finding ultra-small topological spin textures at high temperatures is still a great challenge up to now. Here we find ultra-small topological spin textures in Fe78Si9B13 amorphous alloy. We measured a large topological Hall effect (THE) up to above room temperature, indicating the existence of highly densed and ultra-small topological spin textures in the samples. Further measurements by small-angle neutron scattering (SANS) reveal that the average size of ultra-small magnetic texture is around 1.3nm. Our Monte Carlo simulations show that such ultra-small spin texture is topologically equivalent to skyrmions, which originate from competing frustration and Dzyaloshinskii-Moriya interaction12,13 coming from amorphous structure14-17. Taking a single topological spin texture as one bit and ignoring the distance between them, we evaluated the ideal memory density of Fe78Si9B13, which reaches up to 4.44*104 gigabits (43.4 TB) per in2 and is 2 times of the value of GdRu2Si218 at 5K. More important, such high memory density can be obtained at above room temperature, which is 4 orders of magnitude larger than the value of other materials at the same temperature. These findings provide a unique candidate for magnetic memory devices with ultra-high density. △ Less

Submitted 29 May, 2023; originally announced May 2023.

Comments: 26 pages, 4 figures

arXiv:2305.15747 [pdf, other]

Union Subgraph Neural Networks

Authors: Jiaxing Xu, Aihu Zhang, Qingtian Bian, Vijay Prakash Dwivedi, Yiping Ke

Abstract: Graph Neural Networks (GNNs) are widely used for graph representation learning in many application domains. The expressiveness of vanilla GNNs is upper-bounded by 1-dimensional Weisfeiler-Leman (1-WL) test as they operate on rooted subtrees through iterative message passing. In this paper, we empower GNNs by injecting neighbor-connectivity information extracted from a new type of substructure. We… ▽ More Graph Neural Networks (GNNs) are widely used for graph representation learning in many application domains. The expressiveness of vanilla GNNs is upper-bounded by 1-dimensional Weisfeiler-Leman (1-WL) test as they operate on rooted subtrees through iterative message passing. In this paper, we empower GNNs by injecting neighbor-connectivity information extracted from a new type of substructure. We first investigate different kinds of connectivities existing in a local neighborhood and identify a substructure called union subgraph, which is able to capture the complete picture of the 1-hop neighborhood of an edge. We then design a shortest-path-based substructure descriptor that possesses three nice properties and can effectively encode the high-order connectivities in union subgraphs. By infusing the encoded neighbor connectivities, we propose a novel model, namely Union Subgraph Neural Network (UnionSNN), which is proven to be strictly more powerful than 1-WL in distinguishing non-isomorphic graphs. Additionally, the local encoding from union subgraphs can also be injected into arbitrary message-passing neural networks (MPNNs) and Transformer-based models as a plugin. Extensive experiments on 18 benchmarks of both graph-level and node-level tasks demonstrate that UnionSNN outperforms state-of-the-art baseline models, with competitive computational efficiency. The injection of our local encoding to existing models is able to boost the performance by up to 11.09%. Our code is available at https://github.com/AngusMonroe/UnionSNN. △ Less

Submitted 9 January, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

Showing 1–50 of 183 results for author: Ke, Y