-
Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study
Authors:
Yekun Ke,
Xiaoyu Li,
Yingyu Liang,
Zhenmei Shi,
Zhao Song
Abstract:
Recent empirical studies have identified fixed point iteration phenomena in deep neural networks, where the hidden state tends to stabilize after several layers, showing minimal change in subsequent layers. This observation has spurred the development of practical methodologies, such as accelerating inference by bypassing certain layers once the hidden state stabilizes, selectively fine-tuning lay…
▽ More
Recent empirical studies have identified fixed point iteration phenomena in deep neural networks, where the hidden state tends to stabilize after several layers, showing minimal change in subsequent layers. This observation has spurred the development of practical methodologies, such as accelerating inference by bypassing certain layers once the hidden state stabilizes, selectively fine-tuning layers to modify the iteration process, and implementing loops of specific layers to maintain fixed point iterations. Despite these advancements, the understanding of fixed point iterations remains superficial, particularly in high-dimensional spaces, due to the inadequacy of current analytical tools. In this study, we conduct a detailed analysis of fixed point iterations in a vector-valued function modeled by neural networks. We establish a sufficient condition for the existence of multiple fixed points of looped neural networks based on varying input regions. Additionally, we expand our examination to include a robust version of fixed point iterations. To demonstrate the effectiveness and insights provided by our approach, we provide case studies that looped neural networks may exist $2^d$ number of robust fixed points under exponentiation or polynomial activation functions, where $d$ is the feature dimension. Furthermore, our preliminary empirical results support our theoretical findings. Our methodology enriches the toolkit available for analyzing fixed point iterations of deep neural networks and may enhance our comprehension of neural network mechanisms.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
StegaINR4MIH: steganography by implicit neural representation for multi-image hiding
Authors:
Weina Dong,
Jia Liu,
Lifeng Chen,
Wenquan Sun,
Xiaozhong Pan,
Yan Ke
Abstract:
Multi-image hiding, which embeds multiple secret images into a cover image and is able to recover these images with high quality, has gradually become a research hotspot in the field of image steganography. However, due to the need to embed a large amount of data in a limited cover image space, issues such as contour shadowing or color distortion often arise, posing significant challenges for mult…
▽ More
Multi-image hiding, which embeds multiple secret images into a cover image and is able to recover these images with high quality, has gradually become a research hotspot in the field of image steganography. However, due to the need to embed a large amount of data in a limited cover image space, issues such as contour shadowing or color distortion often arise, posing significant challenges for multi-image hiding. In this paper, we propose StegaINR4MIH, a novel implicit neural representation steganography framework that enables the hiding of multiple images within a single implicit representation function. In contrast to traditional methods that use multiple encoders to achieve multi-image embedding, our approach leverages the redundancy of implicit representation function parameters and employs magnitude-based weight selection and secret weight substitution on pre-trained cover image functions to effectively hide and independently extract multiple secret images. We conduct experiments on images with a resolution of from three different datasets: CelebA-HQ, COCO, and DIV2K. When hiding two secret images, the PSNR values of both the secret images and the stego images exceed 42. When hiding five secret images, the PSNR values of both the secret images and the stego images exceed 39. Extensive experiments demonstrate the superior performance of the proposed method in terms of visual quality and undetectability.
△ Less
Submitted 13 October, 2024;
originally announced October 2024.
-
Text4Seg: Reimagining Image Segmentation as Text Generation
Authors:
Mengcheng Lan,
Chaofeng Chen,
Yue Zhou,
Jiaxing Xu,
Yiping Ke,
Xinjiang Wang,
Litong Feng,
Wayne Zhang
Abstract:
Multimodal Large Language Models (MLLMs) have shown exceptional capabilities in vision-language tasks; however, effectively integrating image segmentation into these models remains a significant challenge. In this paper, we introduce Text4Seg, a novel text-as-mask paradigm that casts image segmentation as a text generation problem, eliminating the need for additional decoders and significantly sim…
▽ More
Multimodal Large Language Models (MLLMs) have shown exceptional capabilities in vision-language tasks; however, effectively integrating image segmentation into these models remains a significant challenge. In this paper, we introduce Text4Seg, a novel text-as-mask paradigm that casts image segmentation as a text generation problem, eliminating the need for additional decoders and significantly simplifying the segmentation process. Our key innovation is semantic descriptors, a new textual representation of segmentation masks where each image patch is mapped to its corresponding text label. This unified representation allows seamless integration into the auto-regressive training pipeline of MLLMs for easier optimization. We demonstrate that representing an image with $16\times16$ semantic descriptors yields competitive segmentation performance. To enhance efficiency, we introduce the Row-wise Run-Length Encoding (R-RLE), which compresses redundant text sequences, reducing the length of semantic descriptors by 74% and accelerating inference by $3\times$, without compromising performance. Extensive experiments across various vision tasks, such as referring expression segmentation and comprehension, show that Text4Seg achieves state-of-the-art performance on multiple datasets by fine-tuning different MLLM backbones. Our approach provides an efficient, scalable solution for vision-centric tasks within the MLLM framework.
△ Less
Submitted 13 October, 2024;
originally announced October 2024.
-
oRetrieval Augmented Generation for 10 Large Language Models and its Generalizability in Assessing Medical Fitness
Authors:
Yu He Ke,
Liyuan Jin,
Kabilan Elangovan,
Hairil Rizal Abdullah,
Nan Liu,
Alex Tiong Heng Sia,
Chai Rick Soh,
Joshua Yi Min Tung,
Jasmine Chiat Ling Ong,
Chang-Fu Kuo,
Shao-Chun Wu,
Vesela P. Kovacheva,
Daniel Shu Wei Ting
Abstract:
Large Language Models (LLMs) show potential for medical applications but often lack specialized clinical knowledge. Retrieval Augmented Generation (RAG) allows customization with domain-specific information, making it suitable for healthcare. This study evaluates the accuracy, consistency, and safety of RAG models in determining fitness for surgery and providing preoperative instructions. We devel…
▽ More
Large Language Models (LLMs) show potential for medical applications but often lack specialized clinical knowledge. Retrieval Augmented Generation (RAG) allows customization with domain-specific information, making it suitable for healthcare. This study evaluates the accuracy, consistency, and safety of RAG models in determining fitness for surgery and providing preoperative instructions. We developed LLM-RAG models using 35 local and 23 international preoperative guidelines and tested them against human-generated responses. A total of 3,682 responses were evaluated. Clinical documents were processed using Llamaindex, and 10 LLMs, including GPT3.5, GPT4, and Claude-3, were assessed. Fourteen clinical scenarios were analyzed, focusing on seven aspects of preoperative instructions. Established guidelines and expert judgment were used to determine correct responses, with human-generated answers serving as comparisons. The LLM-RAG models generated responses within 20 seconds, significantly faster than clinicians (10 minutes). The GPT4 LLM-RAG model achieved the highest accuracy (96.4% vs. 86.6%, p=0.016), with no hallucinations and producing correct instructions comparable to clinicians. Results were consistent across both local and international guidelines. This study demonstrates the potential of LLM-RAG models for preoperative healthcare tasks, highlighting their efficiency, scalability, and reliability.
△ Less
Submitted 10 October, 2024;
originally announced October 2024.
-
Multi-Atlas Brain Network Classification through Consistency Distillation and Complementary Information Fusion
Authors:
Jiaxing Xu,
Mengcheng Lan,
Xia Dong,
Kai He,
Wei Zhang,
Qingtian Bian,
Yiping Ke
Abstract:
In the realm of neuroscience, identifying distinctive patterns associated with neurological disorders via brain networks is crucial. Resting-state functional magnetic resonance imaging (fMRI) serves as a primary tool for mapping these networks by correlating blood-oxygen-level-dependent (BOLD) signals across different brain regions, defined as regions of interest (ROIs). Constructing these brain n…
▽ More
In the realm of neuroscience, identifying distinctive patterns associated with neurological disorders via brain networks is crucial. Resting-state functional magnetic resonance imaging (fMRI) serves as a primary tool for mapping these networks by correlating blood-oxygen-level-dependent (BOLD) signals across different brain regions, defined as regions of interest (ROIs). Constructing these brain networks involves using atlases to parcellate the brain into ROIs based on various hypotheses of brain division. However, there is no standard atlas for brain network classification, leading to limitations in detecting abnormalities in disorders. Some recent methods have proposed utilizing multiple atlases, but they neglect consistency across atlases and lack ROI-level information exchange. To tackle these limitations, we propose an Atlas-Integrated Distillation and Fusion network (AIDFusion) to improve brain network classification using fMRI data. AIDFusion addresses the challenge of utilizing multiple atlases by employing a disentangle Transformer to filter out inconsistent atlas-specific information and distill distinguishable connections across atlases. It also incorporates subject- and population-level consistency constraints to enhance cross-atlas consistency. Additionally, AIDFusion employs an inter-atlas message-passing mechanism to fuse complementary information across brain regions. Experimental results on four datasets of different diseases demonstrate the effectiveness and efficiency of AIDFusion compared to state-of-the-art methods. A case study illustrates AIDFusion extract patterns that are both interpretable and consistent with established neuroscience findings.
△ Less
Submitted 28 September, 2024;
originally announced October 2024.
-
Array2BR: An End-to-End Noise-immune Binaural Audio Synthesis from Microphone-array Signals
Authors:
Cheng Chi,
Xiaoyu Li,
Andong Li,
Yuxuan Ke,
Xiaodong Li,
Chengshi Zheng
Abstract:
Telepresence technology aims to provide an immersive virtual presence for remote conference applications, and it is extremely important to synthesize high-quality binaural audio signals for this aim. Because the ambient noise is often inevitable in practical application scenarios, it is highly desired that binaural audio signals without noise can be obtained from microphone-array signals directly.…
▽ More
Telepresence technology aims to provide an immersive virtual presence for remote conference applications, and it is extremely important to synthesize high-quality binaural audio signals for this aim. Because the ambient noise is often inevitable in practical application scenarios, it is highly desired that binaural audio signals without noise can be obtained from microphone-array signals directly. For this purpose, this paper proposes a new end-to-end noise-immune binaural audio synthesis framework from microphone-array signals, abbreviated as Array2BR, and experimental results show that binaural cues can be correctly mapped and noise can be well suppressed simultaneously using the proposed framework. Compared with existing methods, the proposed method achieved better performance in terms of both objective and subjective metric scores.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
Nonadiabatic Quantum Dynamics of Molecules Scattering from Metal Surfaces
Authors:
Riley J. Preston,
Yaling Ke,
Samuel L. Rudge,
Nils Hertl,
Raffaele Borrelli,
Reinhard J. Maurer,
Michael Thoss
Abstract:
Nonadiabatic coupling between electrons and molecular motion at metal surfaces leads to energy dissipation and dynamical steering effects during chemical surface dynamics. We present a theoretical approach to the scattering of molecules from metal surfaces that incorporates all nonadiabatic and quantum nuclear effects due to the coupling of the molecular degrees of freedom to the electrons in the…
▽ More
Nonadiabatic coupling between electrons and molecular motion at metal surfaces leads to energy dissipation and dynamical steering effects during chemical surface dynamics. We present a theoretical approach to the scattering of molecules from metal surfaces that incorporates all nonadiabatic and quantum nuclear effects due to the coupling of the molecular degrees of freedom to the electrons in the metal. This is achieved with the hierarchical equations of motion (HEOM) approach combined with a matrix product state representation in twin space. The method is applied to the scattering of nitric oxide from Au(111), for which strongly nonadiabatic energy loss during scattering has been experimentally observed, thus presenting a significant theoretical challenge. Since the HEOM approach treats the molecule-surface coupling exactly, it captures the interplay between nonadiabatic and quantum nuclear effects. Finally, the data obtained by the HEOM approach is used as a rigorous benchmark to assess various mixed quantum-classical methods, from which we derive insights into the mechanisms of energy dissipation and the suitable working regimes of each method.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
Contrasformer: A Brain Network Contrastive Transformer for Neurodegenerative Condition Identification
Authors:
Jiaxing Xu,
Kai He,
Mengcheng Lan,
Qingtian Bian,
Wei Li,
Tieying Li,
Yiping Ke,
Miao Qiao
Abstract:
Understanding neurological disorder is a fundamental problem in neuroscience, which often requires the analysis of brain networks derived from functional magnetic resonance imaging (fMRI) data. Despite the prevalence of Graph Neural Networks (GNNs) and Graph Transformers in various domains, applying them to brain networks faces challenges. Specifically, the datasets are severely impacted by the no…
▽ More
Understanding neurological disorder is a fundamental problem in neuroscience, which often requires the analysis of brain networks derived from functional magnetic resonance imaging (fMRI) data. Despite the prevalence of Graph Neural Networks (GNNs) and Graph Transformers in various domains, applying them to brain networks faces challenges. Specifically, the datasets are severely impacted by the noises caused by distribution shifts across sub-populations and the neglect of node identities, both obstruct the identification of disease-specific patterns. To tackle these challenges, we propose Contrasformer, a novel contrastive brain network Transformer. It generates a prior-knowledge-enhanced contrast graph to address the distribution shifts across sub-populations by a two-stream attention mechanism. A cross attention with identity embedding highlights the identity of nodes, and three auxiliary losses ensure group consistency. Evaluated on 4 functional brain network datasets over 4 different diseases, Contrasformer outperforms the state-of-the-art methods for brain networks by achieving up to 10.8\% improvement in accuracy, which demonstrates its efficacy in neurological disorder identification. Case studies illustrate its interpretability, especially in the context of neuroscience. This paper provides a solution for analyzing brain networks, offering valuable insights into neurological disorders. Our code is available at \url{https://github.com/AngusMonroe/Contrasformer}.
△ Less
Submitted 17 September, 2024;
originally announced September 2024.
-
BACKRUNNER: Mitigating Smart Contract Attacks in the Real World
Authors:
Chaofan Shou,
Yuanyu Ke,
Yupeng Yang,
Qi Su,
Or Dadosh,
Assaf Eli,
David Benchimol,
Doudou Lu,
Daniel Tong,
Dex Chen,
Zoey Tan,
Jacob Chia,
Koushik Sen,
Wenke Lee
Abstract:
Billions of dollars have been lost due to vulnerabilities in smart contracts. To counteract this, researchers have proposed attack frontrunning protections designed to preempt malicious transactions by inserting "whitehat" transactions ahead of them to protect the assets. In this paper, we demonstrate that existing frontrunning protections have become ineffective in real-world scenarios. Specifica…
▽ More
Billions of dollars have been lost due to vulnerabilities in smart contracts. To counteract this, researchers have proposed attack frontrunning protections designed to preempt malicious transactions by inserting "whitehat" transactions ahead of them to protect the assets. In this paper, we demonstrate that existing frontrunning protections have become ineffective in real-world scenarios. Specifically, we collected 158 recent real-world attack transactions and discovered that 141 of them can bypass state-of-the-art frontrunning protections. We systematically analyze these attacks and show how inherent limitations of existing frontrunning techniques hinder them from protecting valuable assets in the real world. We then propose a new approach involving 1) preemptive hijack, and 2) attack backrunning, which circumvent the existing limitations and can help protect assets before and after an attack. Our approach adapts the exploit used in the attack to the same or similar contracts before and after the attack to safeguard the assets. We conceptualize adapting exploits as a program repair problem and apply established techniques to implement our approach into a full-fledged framework, BACKRUNNER. Running on previous attacks in 2023, BACKRUNNER can successfully rescue more than \$410M. In the real world, it has helped rescue over \$11.2M worth of assets in 28 separate incidents within two months.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
NeR-VCP: A Video Content Protection Method Based on Implicit Neural Representation
Authors:
Yangping Lin,
Yan Ke,
Ke Niu,
Jia Liu,
Xiaoyuan Yang
Abstract:
With the popularity of video applications, the security of video content has emerged as a pressing issue that demands urgent attention. Most video content protection methods mainly rely on encryption technology, which needs to be manually designed or implemented in an experience-based manner. To address this problem, we propose an automatic encryption technique for video content protection based o…
▽ More
With the popularity of video applications, the security of video content has emerged as a pressing issue that demands urgent attention. Most video content protection methods mainly rely on encryption technology, which needs to be manually designed or implemented in an experience-based manner. To address this problem, we propose an automatic encryption technique for video content protection based on implicit neural representation. We design a key-controllable module, which serves as a key for encryption and decryption. NeR-VCP first pre-distributes the key-controllable module trained by the sender to the recipients, and then uses Implicit Neural Representation (INR) with a (pre-distributed) key-controllable module to encrypt plain video as an implicit neural network, and the legal recipients uses a pre-distributed key-controllable module to decrypt this cipher neural network (the corresponding implicit neural network). Under the guidance of the key-controllable design, our method can improve the security of video content and provide a novel video encryption scheme. Moreover, using model compression techniques, this method can achieve video content protection while effectively mitigating the amount of encrypted data transferred. We experimentally find that it has superior performance in terms of visual representation, imperceptibility to illegal users, and security from a cryptographic viewpoint.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Distance Correlation in Multiple Biased Sampling Models
Authors:
Yuwei Ke,
Hok Kan Ling,
Yanglei Song
Abstract:
Testing the independence between random vectors is a fundamental problem in statistics. Distance correlation, a recently popular dependence measure, is universally consistent for testing independence against all distributions with finite moments. However, when data are subject to selection bias or collected from multiple sources or schemes, spurious dependence may arise. This creates a need for me…
▽ More
Testing the independence between random vectors is a fundamental problem in statistics. Distance correlation, a recently popular dependence measure, is universally consistent for testing independence against all distributions with finite moments. However, when data are subject to selection bias or collected from multiple sources or schemes, spurious dependence may arise. This creates a need for methods that can effectively utilize data from different sources and correct these biases. In this paper, we study the estimation of distance covariance and distance correlation under multiple biased sampling models, which provide a natural framework for addressing these issues. Theoretical properties, including the strong consistency and asymptotic null distributions of the distance covariance and correlation estimators, and the rate at which the test statistic diverges under sequences of alternatives approaching the null, are established. A weighted permutation procedure is proposed to determine the critical value of the independence test. Simulation studies demonstrate that our approach improves both the estimation of distance correlation and the power of the test.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
Authors:
Mengcheng Lan,
Chaofeng Chen,
Yiping Ke,
Xinjiang Wang,
Litong Feng,
Wayne Zhang
Abstract:
Open-vocabulary semantic segmentation requires models to effectively integrate visual representations with open-vocabulary semantic labels. While Contrastive Language-Image Pre-training (CLIP) models shine in recognizing visual concepts from text, they often struggle with segment coherence due to their limited localization ability. In contrast, Vision Foundation Models (VFMs) excel at acquiring sp…
▽ More
Open-vocabulary semantic segmentation requires models to effectively integrate visual representations with open-vocabulary semantic labels. While Contrastive Language-Image Pre-training (CLIP) models shine in recognizing visual concepts from text, they often struggle with segment coherence due to their limited localization ability. In contrast, Vision Foundation Models (VFMs) excel at acquiring spatially consistent local visual representations, yet they fall short in semantic understanding. This paper introduces ProxyCLIP, an innovative framework designed to harmonize the strengths of both CLIP and VFMs, facilitating enhanced open-vocabulary semantic segmentation. ProxyCLIP leverages the spatial feature correspondence from VFMs as a form of proxy attention to augment CLIP, thereby inheriting the VFMs' robust local consistency and maintaining CLIP's exceptional zero-shot transfer capacity. We propose an adaptive normalization and masking strategy to get the proxy attention from VFMs, allowing for adaptation across different VFMs. Remarkably, as a training-free approach, ProxyCLIP significantly improves the average mean Intersection over Union (mIoU) across eight benchmarks from 40.3 to 44.4, showcasing its exceptional efficacy in bridging the gap between spatial precision and semantic richness for the open-vocabulary segmentation task.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Lightweight Large Language Model for Medication Enquiry: Med-Pal
Authors:
Kabilan Elangovan,
Jasmine Chiat Ling Ong,
Liyuan Jin,
Benjamin Jun Jie Seng,
Yu Heng Kwan,
Lit Soo Tan,
Ryan Jian Zhong,
Justina Koi Li Ma,
YuHe Ke,
Nan Liu,
Kathleen M Giacomini,
Daniel Shu Wei Ting
Abstract:
Large Language Models (LLMs) have emerged as a potential solution to assist digital health development with patient education, commonly medication-related enquires. We trained and validated Med-Pal, a medication domain-specific LLM-chatbot fine-tuned with a fine-grained and expert curated dataset from a selection of five light-weighted open-source LLMs of smaller parameter size (7 billion or less)…
▽ More
Large Language Models (LLMs) have emerged as a potential solution to assist digital health development with patient education, commonly medication-related enquires. We trained and validated Med-Pal, a medication domain-specific LLM-chatbot fine-tuned with a fine-grained and expert curated dataset from a selection of five light-weighted open-source LLMs of smaller parameter size (7 billion or less) regarding computational constraints and prioritizing operational efficiency. A multi-disciplinary team performed a clinical evaluation of LLMs responses using the SCORE criteria, focusing on safety, accuracy, bias, reproducibility, and ease of understanding. Best performing light-weighted LLM was chosen as Med-Pal for further engineering with guard-railing using adversarial prompting. Med-Pal and existing light-weighted LLMs, including pretrained Biomistral and finetuned Meerkat, were validated on an independent dataset on a broad range of medication-related questions (231 in total), 12 different question types across 14 different medication classes. Mistral-7b emerged as the top performer among selected lightweight LLMs, achieving the highest median score of 14 and 71.9% high-quality responses in accuracy and safety domains, hence chosen as the backbone LLM for Med-Pal. When compared against Biomistral, Med-pal outperformed in generating responses appropriate for patient communication, with significant reductions bias and errors typical of general LLMs. Comparable performance was observed when comparing Med-Pal with Meerkat. Med-Pal showcases the feasibility of developing and employing fine-tuned light-weighted LLMs to enhance digital health communications.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
Authors:
Mengcheng Lan,
Chaofeng Chen,
Yiping Ke,
Xinjiang Wang,
Litong Feng,
Wayne Zhang
Abstract:
Despite the success of large-scale pretrained Vision-Language Models (VLMs) especially CLIP in various open-vocabulary tasks, their application to semantic segmentation remains challenging, producing noisy segmentation maps with mis-segmented regions. In this paper, we carefully re-investigate the architecture of CLIP, and identify residual connections as the primary source of noise that degrades…
▽ More
Despite the success of large-scale pretrained Vision-Language Models (VLMs) especially CLIP in various open-vocabulary tasks, their application to semantic segmentation remains challenging, producing noisy segmentation maps with mis-segmented regions. In this paper, we carefully re-investigate the architecture of CLIP, and identify residual connections as the primary source of noise that degrades segmentation quality. With a comparative analysis of statistical properties in the residual connection and the attention output across different pretrained models, we discover that CLIP's image-text contrastive training paradigm emphasizes global features at the expense of local discriminability, leading to noisy segmentation results. In response, we propose ClearCLIP, a novel approach that decomposes CLIP's representations to enhance open-vocabulary semantic segmentation. We introduce three simple modifications to the final layer: removing the residual connection, implementing the self-self attention, and discarding the feed-forward network. ClearCLIP consistently generates clearer and more accurate segmentation maps and outperforms existing approaches across multiple benchmarks, affirming the significance of our discoveries.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
Bridging Data Gaps in Healthcare: A Scoping Review of Transfer Learning in Biomedical Data Analysis
Authors:
Siqi Li,
Xin Li,
Kunyu Yu,
Di Miao,
Mingcheng Zhu,
Mengying Yan,
Yuhe Ke,
Danny D'Agostino,
Yilin Ning,
Qiming Wu,
Ziwen Wang,
Yuqing Shang,
Molei Liu,
Chuan Hong,
Nan Liu
Abstract:
Clinical and biomedical research in low-resource settings often faces significant challenges due to the need for high-quality data with sufficient sample sizes to construct effective models. These constraints hinder robust model training and prompt researchers to seek methods for leveraging existing knowledge from related studies to support new research efforts. Transfer learning (TL), a machine l…
▽ More
Clinical and biomedical research in low-resource settings often faces significant challenges due to the need for high-quality data with sufficient sample sizes to construct effective models. These constraints hinder robust model training and prompt researchers to seek methods for leveraging existing knowledge from related studies to support new research efforts. Transfer learning (TL), a machine learning technique, emerges as a powerful solution by utilizing knowledge from pre-trained models to enhance the performance of new models, offering promise across various healthcare domains. Despite its conceptual origins in the 1990s, the application of TL in medical research has remained limited, especially beyond image analysis. In our review of TL applications in structured clinical and biomedical data, we screened 3,515 papers, with 55 meeting the inclusion criteria. Among these, only 2% (one out of 55) utilized external studies, and 7% (four out of 55) addressed scenarios involving multi-site collaborations with privacy constraints. To achieve actionable TL with structured medical data while addressing regional disparities, inequality, and privacy constraints in healthcare research, we advocate for the careful identification of appropriate source data and models, the selection of suitable TL frameworks, and the validation of TL models with proper baselines.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Towards Vision-Language Geo-Foundation Model: A Survey
Authors:
Yue Zhou,
Litong Feng,
Yiping Ke,
Xue Jiang,
Junchi Yan,
Xue Yang,
Wayne Zhang
Abstract:
Vision-Language Foundation Models (VLFMs) have made remarkable progress on various multimodal tasks, such as image captioning, image-text retrieval, visual question answering, and visual grounding. However, most methods rely on training with general image datasets, and the lack of geospatial data leads to poor performance on earth observation. Numerous geospatial image-text pair datasets and VLFMs…
▽ More
Vision-Language Foundation Models (VLFMs) have made remarkable progress on various multimodal tasks, such as image captioning, image-text retrieval, visual question answering, and visual grounding. However, most methods rely on training with general image datasets, and the lack of geospatial data leads to poor performance on earth observation. Numerous geospatial image-text pair datasets and VLFMs fine-tuned on them have been proposed recently. These new approaches aim to leverage large-scale, multimodal geospatial data to build versatile intelligent models with diverse geo-perceptive capabilities, which we refer to as Vision-Language Geo-Foundation Models (VLGFMs). This paper thoroughly reviews VLGFMs, summarizing and analyzing recent developments in the field. In particular, we introduce the background and motivation behind the rise of VLGFMs, highlighting their unique research significance. Then, we systematically summarize the core technologies employed in VLGFMs, including data construction, model architectures, and applications of various multimodal geospatial tasks. Finally, we conclude with insights, issues, and discussions regarding future research directions. To the best of our knowledge, this is the first comprehensive literature review of VLGFMs. We keep tracing related works at https://github.com/zytx121/Awesome-VLGFM.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models
Authors:
Xiongtao Zhou,
Jie He,
Yuhua Ke,
Guangyao Zhu,
Víctor Gutiérrez-Basulto,
Jeff Z. Pan
Abstract:
Multimodal large language models (MLLMs) fine-tuned with multimodal instruction datasets have demonstrated remarkable capabilities in multimodal tasks. However, fine-tuning all parameters of MLLMs has become challenging as they usually contain billions of parameters. To address this issue, we study parameter-efficient fine-tuning (PEFT) methods for MLLMs. We aim to identify effective methods for e…
▽ More
Multimodal large language models (MLLMs) fine-tuned with multimodal instruction datasets have demonstrated remarkable capabilities in multimodal tasks. However, fine-tuning all parameters of MLLMs has become challenging as they usually contain billions of parameters. To address this issue, we study parameter-efficient fine-tuning (PEFT) methods for MLLMs. We aim to identify effective methods for enhancing the performance of MLLMs in scenarios where only a limited number of parameters are trained. This paper conducts empirical studies using four popular PEFT methods to fine-tune the LLM component of open-source MLLMs. We present a comprehensive analysis that encompasses various aspects, including the impact of PEFT methods on various models, parameters and location of the PEFT module, size of fine-tuning data, model stability based on PEFT methods, MLLM's generalization, and hallucination. We evaluated four PEFT methods on seven datasets from two different categories: unseen and seen datasets. Across all experiments, we show that the adapter is the best-performing PEFT method. At the same time, fine-tuning the connector layers leads to improved performance in most MLLMs. Code and data are available at https://github.com/alenai97/PEFT-MLLM.git.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
A Low-Power Spike Detector Using In-Memory Computing for Event-based Neural Frontend
Authors:
Ye Ke,
Arindam Basu
Abstract:
With the sensor scaling of next-generation Brain-Machine Interface (BMI) systems, the massive A/D conversion and analog multiplexing at the neural frontend poses a challenge in terms of power and data rates for wireless and implantable BMIs. While previous works have reported the neuromorphic compression of neural signal, further compression requires integration of spike detectors on chip. In this…
▽ More
With the sensor scaling of next-generation Brain-Machine Interface (BMI) systems, the massive A/D conversion and analog multiplexing at the neural frontend poses a challenge in terms of power and data rates for wireless and implantable BMIs. While previous works have reported the neuromorphic compression of neural signal, further compression requires integration of spike detectors on chip. In this work, we propose an efficient HRAM-based spike detector using In-memory computing for compressive event-based neural frontend. Our proposed method involves detecting spikes from event pulses without reconstructing the signal and uses a 10T hybrid in-memory computing bitcell for the accumulation and thresholding operations. We show that our method ensures a spike detection accuracy of 92-99% for neural signal inputs while consuming only 13.8 nW per channel in 65 nm CMOS.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Neuromorphic Wireless Device-Edge Co-Inference via the Directed Information Bottleneck
Authors:
Yuzhen Ke,
Zoran Utkovski,
Mehdi Heshmati,
Osvaldo Simeone,
Johannes Dommel,
Slawomir Stanczak
Abstract:
An important use case of next-generation wireless systems is device-edge co-inference, where a semantic task is partitioned between a device and an edge server. The device carries out data collection and partial processing of the data, while the remote server completes the given task based on information received from the device. It is often required that processing and communication be run as eff…
▽ More
An important use case of next-generation wireless systems is device-edge co-inference, where a semantic task is partitioned between a device and an edge server. The device carries out data collection and partial processing of the data, while the remote server completes the given task based on information received from the device. It is often required that processing and communication be run as efficiently as possible at the device, while more computing resources are available at the edge. To address such scenarios, we introduce a new system solution, termed neuromorphic wireless device-edge co-inference. According to it, the device runs sensing, processing, and communication units using neuromorphic hardware, while the server employs conventional radio and computing technologies. The proposed system is designed using a transmitter-centric information-theoretic criterion that targets a reduction of the communication overhead, while retaining the most relevant information for the end-to-end semantic task of interest. Numerical results on standard data sets validate the proposed architecture, and a preliminary testbed realization is reported.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Smooth helically symmetric transonic flows with nonzero vorticity in a concentric cylinder
Authors:
Yi Ke,
Shangkun Weng
Abstract:
This paper concerns the structural stability of smooth cylindrical symmetric transonic flows in a concentric cylinder under helically symmetric perturbation of suitable boundary conditions. The deformation-curl decomposition developed by the second author and his collaborator is utilized to effectively decouple the elliptic-hyperbolic mixed structure in the steady compressible Euler equation. A ke…
▽ More
This paper concerns the structural stability of smooth cylindrical symmetric transonic flows in a concentric cylinder under helically symmetric perturbation of suitable boundary conditions. The deformation-curl decomposition developed by the second author and his collaborator is utilized to effectively decouple the elliptic-hyperbolic mixed structure in the steady compressible Euler equation. A key parameter in the helical symmetry is the step (denoted by $σ$), which denotes the magnitude of the translation along the symmetry axis after rotating one full turn. It is shown that the step determines the type of the first order partial differential system satisfied by the radial and vertical velocity. There exists a critical number $σ_{*}$ depending only on the background transonic flows, such that if $0<σ<σ_{*}$, one can prove the existence and uniqueness of smooth helically symmetric transonic flows with nonzero vorticity.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques
Authors:
Rui Yang,
Haoran Liu,
Edison Marrese-Taylor,
Qingcheng Zeng,
Yu He Ke,
Wanxin Li,
Lechao Cheng,
Qingyu Chen,
James Caverlee,
Yutaka Matsuo,
Irene Li
Abstract:
Large language models (LLMs) have demonstrated impressive generative capabilities with the potential to innovate in medicine. However, the application of LLMs in real clinical settings remains challenging due to the lack of factual consistency in the generated content. In this work, we develop an augmented LLM framework, KG-Rank, which leverages a medical knowledge graph (KG) along with ranking an…
▽ More
Large language models (LLMs) have demonstrated impressive generative capabilities with the potential to innovate in medicine. However, the application of LLMs in real clinical settings remains challenging due to the lack of factual consistency in the generated content. In this work, we develop an augmented LLM framework, KG-Rank, which leverages a medical knowledge graph (KG) along with ranking and re-ranking techniques, to improve the factuality of long-form question answering (QA) in the medical domain. Specifically, when receiving a question, KG-Rank automatically identifies medical entities within the question and retrieves the related triples from the medical KG to gather factual information. Subsequently, KG-Rank innovatively applies multiple ranking techniques to refine the ordering of these triples, providing more relevant and precise information for LLM inference. To the best of our knowledge, KG-Rank is the first application of KG combined with ranking models in medical QA specifically for generating long answers. Evaluation on four selected medical QA datasets demonstrates that KG-Rank achieves an improvement of over 18% in ROUGE-L score. Additionally, we extend KG-Rank to open domains, including law, business, music, and history, where it realizes a 14% improvement in ROUGE-L score, indicating the effectiveness and great potential of KG-Rank.
△ Less
Submitted 4 July, 2024; v1 submitted 9 March, 2024;
originally announced March 2024.
-
Fairness-Aware Interpretable Modeling (FAIM) for Trustworthy Machine Learning in Healthcare
Authors:
Mingxuan Liu,
Yilin Ning,
Yuhe Ke,
Yuqing Shang,
Bibhas Chakraborty,
Marcus Eng Hock Ong,
Roger Vaughan,
Nan Liu
Abstract:
The escalating integration of machine learning in high-stakes fields such as healthcare raises substantial concerns about model fairness. We propose an interpretable framework - Fairness-Aware Interpretable Modeling (FAIM), to improve model fairness without compromising performance, featuring an interactive interface to identify a "fairer" model from a set of high-performing models and promoting t…
▽ More
The escalating integration of machine learning in high-stakes fields such as healthcare raises substantial concerns about model fairness. We propose an interpretable framework - Fairness-Aware Interpretable Modeling (FAIM), to improve model fairness without compromising performance, featuring an interactive interface to identify a "fairer" model from a set of high-performing models and promoting the integration of data-driven evidence and clinical expertise to enhance contextualized fairness. We demonstrated FAIM's value in reducing sex and race biases by predicting hospital admission with two real-world databases, MIMIC-IV-ED and SGH-ED. We show that for both datasets, FAIM models not only exhibited satisfactory discriminatory performance but also significantly mitigated biases as measured by well-established fairness metrics, outperforming commonly used bias-mitigation methods. Our approach demonstrates the feasibility of improving fairness without sacrificing performance and provides an a modeling mode that invites domain experts to engage, fostering a multidisciplinary effort toward tailored AI fairness.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Giant second harmonic generation in supertwisted WS2 spirals grown in step edge particle induced non-Euclidean surfaces
Authors:
Tong Tong,
Ruijie Chen,
Yuxuan Ke,
Qian Wang,
Xinchao Wang,
Qinjun Sun,
Jie Chen,
Zhiyuan Gu,
Ying Yu,
Hongyan Wei,
Yuying Hao,
Xiaopeng Fan,
Qing Zhang
Abstract:
In moiré crystals resulting from the stacking of twisted two-dimensional (2D) layered materials, a subtle adjustment in the twist angle surprisingly gives rise to a wide range of correlated optical and electrical properties. Herein, we report the synthesis of supertwisted WS2 spirals and the observation of giant second harmonic generation (SHG) in these spirals. Supertwisted WS2 spirals featuring…
▽ More
In moiré crystals resulting from the stacking of twisted two-dimensional (2D) layered materials, a subtle adjustment in the twist angle surprisingly gives rise to a wide range of correlated optical and electrical properties. Herein, we report the synthesis of supertwisted WS2 spirals and the observation of giant second harmonic generation (SHG) in these spirals. Supertwisted WS2 spirals featuring different twist angles are synthesized on a Euclidean or step-edge particle-induced non-Euclidean surface using a carefully designed water-assisted chemical vapor deposition. We observed an oscillatory dependence of SHG intensity on layer number, attributed to atomically phase-matched nonlinear dipoles within layers of supertwisted spiral crystals where inversion symmetry is restored. Through an investigation into the twist angle evolution of SHG intensity, we discovered that the stacking model between layers plays a crucial role in determining the nonlinearity, and the SHG signals in supertwisted spirals exhibit enhancements by a factor of 2 to 136 when compared with the SHG of the single-layer structure. These findings provide an efficient method for the rational growth of 2D twisted structures and the implementation of twist angle adjustable endowing them great potential for exploring strong coupling correlation physics and applications in the field of twistronics.
△ Less
Submitted 19 July, 2024; v1 submitted 3 March, 2024;
originally announced March 2024.
-
Semi-supervised Medical Image Segmentation Method Based on Cross-pseudo Labeling Leveraging Strong and Weak Data Augmentation Strategies
Authors:
Yifei Chen,
Chenyan Zhang,
Yifan Ke,
Yiyu Huang,
Xuezhou Dai,
Feiwei Qin,
Yongquan Zhang,
Xiaodong Zhang,
Changmiao Wang
Abstract:
Traditional supervised learning methods have historically encountered certain constraints in medical image segmentation due to the challenging collection process, high labeling cost, low signal-to-noise ratio, and complex features characterizing biomedical images. This paper proposes a semi-supervised model, DFCPS, which innovatively incorporates the Fixmatch concept. This significantly enhances t…
▽ More
Traditional supervised learning methods have historically encountered certain constraints in medical image segmentation due to the challenging collection process, high labeling cost, low signal-to-noise ratio, and complex features characterizing biomedical images. This paper proposes a semi-supervised model, DFCPS, which innovatively incorporates the Fixmatch concept. This significantly enhances the model's performance and generalizability through data augmentation processing, employing varied strategies for unlabeled data. Concurrently, the model design gives appropriate emphasis to the generation, filtration, and refinement processes of pseudo-labels. The novel concept of cross-pseudo-supervision is introduced, integrating consistency learning with self-training. This enables the model to fully leverage pseudo-labels from multiple perspectives, thereby enhancing training diversity. The DFCPS model is compared with both baseline and advanced models using the publicly accessible Kvasir-SEG dataset. Across all four subdivisions containing different proportions of unlabeled data, our model consistently exhibits superior performance. Our source code is available at https://github.com/JustlfC03/DFCPS.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
Fine-tuning Large Language Model (LLM) Artificial Intelligence Chatbots in Ophthalmology and LLM-based evaluation using GPT-4
Authors:
Ting Fang Tan,
Kabilan Elangovan,
Liyuan Jin,
Yao Jie,
Li Yong,
Joshua Lim,
Stanley Poh,
Wei Yan Ng,
Daniel Lim,
Yuhe Ke,
Nan Liu,
Daniel Shu Wei Ting
Abstract:
Purpose: To assess the alignment of GPT-4-based evaluation to human clinician experts, for the evaluation of responses to ophthalmology-related patient queries generated by fine-tuned LLM chatbots. Methods: 400 ophthalmology questions and paired answers were created by ophthalmologists to represent commonly asked patient questions, divided into fine-tuning (368; 92%), and testing (40; 8%). We find…
▽ More
Purpose: To assess the alignment of GPT-4-based evaluation to human clinician experts, for the evaluation of responses to ophthalmology-related patient queries generated by fine-tuned LLM chatbots. Methods: 400 ophthalmology questions and paired answers were created by ophthalmologists to represent commonly asked patient questions, divided into fine-tuning (368; 92%), and testing (40; 8%). We find-tuned 5 different LLMs, including LLAMA2-7b, LLAMA2-7b-Chat, LLAMA2-13b, and LLAMA2-13b-Chat. For the testing dataset, additional 8 glaucoma QnA pairs were included. 200 responses to the testing dataset were generated by 5 fine-tuned LLMs for evaluation. A customized clinical evaluation rubric was used to guide GPT-4 evaluation, grounded on clinical accuracy, relevance, patient safety, and ease of understanding. GPT-4 evaluation was then compared against ranking by 5 clinicians for clinical alignment. Results: Among all fine-tuned LLMs, GPT-3.5 scored the highest (87.1%), followed by LLAMA2-13b (80.9%), LLAMA2-13b-chat (75.5%), LLAMA2-7b-Chat (70%) and LLAMA2-7b (68.8%) based on the GPT-4 evaluation. GPT-4 evaluation demonstrated significant agreement with human clinician rankings, with Spearman and Kendall Tau correlation coefficients of 0.90 and 0.80 respectively; while correlation based on Cohen Kappa was more modest at 0.50. Notably, qualitative analysis and the glaucoma sub-analysis revealed clinical inaccuracies in the LLM-generated responses, which were appropriately identified by the GPT-4 evaluation. Conclusion: The notable clinical alignment of GPT-4 evaluation highlighted its potential to streamline the clinical evaluation of LLM chatbot responses to healthcare-related queries. By complementing the existing clinician-dependent manual grading, this efficient and automated evaluation could assist the validation of future developments in LLM applications for healthcare.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Development and Testing of a Novel Large Language Model-Based Clinical Decision Support Systems for Medication Safety in 12 Clinical Specialties
Authors:
Jasmine Chiat Ling Ong,
Liyuan Jin,
Kabilan Elangovan,
Gilbert Yong San Lim,
Daniel Yan Zheng Lim,
Gerald Gui Ren Sng,
Yuhe Ke,
Joshua Yi Min Tung,
Ryan Jian Zhong,
Christopher Ming Yao Koh,
Keane Zhi Hao Lee,
Xiang Chen,
Jack Kian Chng,
Aung Than,
Ken Junyang Goh,
Daniel Shu Wei Ting
Abstract:
Importance: We introduce a novel Retrieval Augmented Generation (RAG)-Large Language Model (LLM) framework as a Clinical Decision Support Systems (CDSS) to support safe medication prescription.
Objective: To evaluate the efficacy of LLM-based CDSS in correctly identifying medication errors in different patient case vignettes from diverse medical and surgical sub-disciplines, against a human expe…
▽ More
Importance: We introduce a novel Retrieval Augmented Generation (RAG)-Large Language Model (LLM) framework as a Clinical Decision Support Systems (CDSS) to support safe medication prescription.
Objective: To evaluate the efficacy of LLM-based CDSS in correctly identifying medication errors in different patient case vignettes from diverse medical and surgical sub-disciplines, against a human expert panel derived ground truth. We compared performance for under 2 different CDSS practical healthcare integration modalities: LLM-based CDSS alone (fully autonomous mode) vs junior pharmacist + LLM-based CDSS (co-pilot, assistive mode).
Design, Setting, and Participants: Utilizing a RAG model with state-of-the-art medically-related LLMs (GPT-4, Gemini Pro 1.0 and Med-PaLM 2), this study used 61 prescribing error scenarios embedded into 23 complex clinical vignettes across 12 different medical and surgical specialties. A multidisciplinary expert panel assessed these cases for Drug-Related Problems (DRPs) using the PCNE classification and graded severity / potential for harm using revised NCC MERP medication error index. We compared.
Results RAG-LLM performed better compared to LLM alone. When employed in a co-pilot mode, accuracy, recall, and F1 scores were optimized, indicating effectiveness in identifying moderate to severe DRPs. The accuracy of DRP detection with RAG-LLM improved in several categories but at the expense of lower precision.
Conclusions This study established that a RAG-LLM based CDSS significantly boosts the accuracy of medication error identification when used alongside junior pharmacists (co-pilot), with notable improvements in detecting severe DRPs. This study also illuminates the comparative performance of current state-of-the-art LLMs in RAG-based CDSS systems.
△ Less
Submitted 17 February, 2024; v1 submitted 29 January, 2024;
originally announced February 2024.
-
Development and Testing of Retrieval Augmented Generation in Large Language Models -- A Case Study Report
Authors:
YuHe Ke,
Liyuan Jin,
Kabilan Elangovan,
Hairil Rizal Abdullah,
Nan Liu,
Alex Tiong Heng Sia,
Chai Rick Soh,
Joshua Yi Min Tung,
Jasmine Chiat Ling Ong,
Daniel Shu Wei Ting
Abstract:
Purpose: Large Language Models (LLMs) hold significant promise for medical applications. Retrieval Augmented Generation (RAG) emerges as a promising approach for customizing domain knowledge in LLMs. This case study presents the development and evaluation of an LLM-RAG pipeline tailored for healthcare, focusing specifically on preoperative medicine.
Methods: We developed an LLM-RAG model using 3…
▽ More
Purpose: Large Language Models (LLMs) hold significant promise for medical applications. Retrieval Augmented Generation (RAG) emerges as a promising approach for customizing domain knowledge in LLMs. This case study presents the development and evaluation of an LLM-RAG pipeline tailored for healthcare, focusing specifically on preoperative medicine.
Methods: We developed an LLM-RAG model using 35 preoperative guidelines and tested it against human-generated responses, with a total of 1260 responses evaluated. The RAG process involved converting clinical documents into text using Python-based frameworks like LangChain and Llamaindex, and processing these texts into chunks for embedding and retrieval. Vector storage techniques and selected embedding models to optimize data retrieval, using Pinecone for vector storage with a dimensionality of 1536 and cosine similarity for loss metrics. Human-generated answers, provided by junior doctors, were used as a comparison.
Results: The LLM-RAG model generated answers within an average of 15-20 seconds, significantly faster than the 10 minutes typically required by humans. Among the basic LLMs, GPT4.0 exhibited the best accuracy of 80.1%. This accuracy was further increased to 91.4% when the model was enhanced with RAG. Compared to the human-generated instructions, which had an accuracy of 86.3%, the performance of the GPT4.0 RAG model demonstrated non-inferiority (p=0.610).
Conclusions: In this case study, we demonstrated a LLM-RAG model for healthcare implementation. The pipeline shows the advantages of grounded knowledge, upgradability, and scalability as important aspects of healthcare LLM deployment.
△ Less
Submitted 29 January, 2024;
originally announced February 2024.
-
Shortcuts to adiabatic Thouless pumping
Authors:
Wenjie Liu,
Yongguan Ke,
Chaohong Lee
Abstract:
Thouless pumping, the quantized transport of particles in a cyclic adiabatic evolution, faces a challenge: slow driving may exceed the coherent time, while fast driving may break quantization. To address this dilemma, we propose to speed up Thouless pumping using shortcuts to adiabaticity. By using counterdiabatic theory, we analytically derive the controlled Hamiltonian for implementing dispersio…
▽ More
Thouless pumping, the quantized transport of particles in a cyclic adiabatic evolution, faces a challenge: slow driving may exceed the coherent time, while fast driving may break quantization. To address this dilemma, we propose to speed up Thouless pumping using shortcuts to adiabaticity. By using counterdiabatic theory, we analytically derive the controlled Hamiltonian for implementing dispersion-suppressed Thouless pumping beyond the adiabatic regime. Compared to traditional Thouless pumping methods, our fast topological pumping approach offers remarkable advantages. Firstly, it enables a substantial reduction of pumping time up to 11 orders of magnitude faster than the traditional approach. Secondly, our method effectively suppresses wavepacket diffusion, further enhancing its efficiency. Furthermore, we demonstrate the resilience of our protocol against moderate noise levels. Our study offers a practical and efficient method for achieving fast topological pumping beyond the adiabatic regime.
△ Less
Submitted 7 February, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias
Authors:
Yu He Ke,
Rui Yang,
Sui An Lie,
Taylor Xin Yi Lim,
Hairil Rizal Abdullah,
Daniel Shu Wei Ting,
Nan Liu
Abstract:
Background: Cognitive biases in clinical decision-making significantly contribute to errors in diagnosis and suboptimal patient outcomes. Addressing these biases presents a formidable challenge in the medical field.
Objective: This study explores the role of large language models (LLMs) in mitigating these biases through the utilization of a multi-agent framework. We simulate the clinical decisi…
▽ More
Background: Cognitive biases in clinical decision-making significantly contribute to errors in diagnosis and suboptimal patient outcomes. Addressing these biases presents a formidable challenge in the medical field.
Objective: This study explores the role of large language models (LLMs) in mitigating these biases through the utilization of a multi-agent framework. We simulate the clinical decision-making processes through multi-agent conversation and evaluate its efficacy in improving diagnostic accuracy.
Methods: A total of 16 published and unpublished case reports where cognitive biases have resulted in misdiagnoses were identified from the literature. In the multi-agent framework, we leveraged GPT-4 to facilitate interactions among four simulated agents to replicate clinical team dynamics. Each agent has a distinct role: 1) To make the final diagnosis after considering the discussions, 2) The devil's advocate and correct confirmation and anchoring bias, 3) The tutor and facilitator of the discussion to reduce premature closure bias, and 4) To record and summarize the findings. A total of 80 simulations were evaluated for the accuracy of initial diagnosis, top differential diagnosis and final two differential diagnoses.
Results: In a total of 80 responses evaluating both initial and final diagnoses, the initial diagnosis had an accuracy of 0% (0/80), but following multi-agent discussions, the accuracy for the top differential diagnosis increased to 71.3% (57/80), and for the final two differential diagnoses, to 80.0% (64/80).
Conclusions: The framework demonstrated an ability to re-evaluate and correct misconceptions, even in scenarios with misleading initial investigations. The LLM-driven multi-agent conversation framework shows promise in enhancing diagnostic accuracy in diagnostically challenging medical scenarios.
△ Less
Submitted 12 May, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Topological pumping induced by spatiotemporal modulation of interaction
Authors:
Boning Huang,
Yongguan Ke,
Wenjie Liu,
Chaohong Lee
Abstract:
Particle-particle interaction provides a new degree of freedom to induce novel topological phenomena. Here, we propose to use spatiotemporal modulation of interaction to realize topological pumping without single-particle counterpart. Because the modulation breaks time-reversal symmetry, the multiparticle energy bands of bound states have none-zero Chern number, and support topological bound edge…
▽ More
Particle-particle interaction provides a new degree of freedom to induce novel topological phenomena. Here, we propose to use spatiotemporal modulation of interaction to realize topological pumping without single-particle counterpart. Because the modulation breaks time-reversal symmetry, the multiparticle energy bands of bound states have none-zero Chern number, and support topological bound edge states. In a Thouless pump, a bound state that uniformly occupies a topological energy band can be shifted by integer unit cells per cycle, consistent with the corresponding Chern number. We can also realize topological pumping of bound edge state from one end to another. The entanglement entropy between particles rapidly increases at transition points, which is related to the spatial spread of a bounded pair. In addition, we propose to realize hybridized pumping with fractional displacement per cycle by adding an extra tilt potential to separate topological pumping of the bound state and Bloch oscillations of single particle. Our work could trigger further studies of correlated topological phenomena that do not have a single-particle counterpart.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
Interaction-induced multiparticle bound states in the continuum
Authors:
Boning Huang,
Yongguan Ke,
Honghua Zhong,
Yuri S. Kivshar,
Chaohong Lee
Abstract:
Bound states in the continuum (BICs) are localized modes residing in the radiation continuum. They were first predicted for single-particle states, and became a general feature of many wave systems. In many-body quantum physics, it is still unclear what would be a close analog of BICs, and whether interparticle interaction may induce BICs. Here, we predict a novel type of multiparticle states in t…
▽ More
Bound states in the continuum (BICs) are localized modes residing in the radiation continuum. They were first predicted for single-particle states, and became a general feature of many wave systems. In many-body quantum physics, it is still unclear what would be a close analog of BICs, and whether interparticle interaction may induce BICs. Here, we predict a novel type of multiparticle states in the interaction-modulated Bose-Hubbard model that can be associated with the BIC concept. Under periodic boundary conditions, a so-called quasi-BIC appears as a bound pair residing in a standing wave formed by the third particle. Under open boundary conditions, such a hybrid state becomes an eigenstate of the system. We demonstrate that the Thouless pumping of the quasi-BICs can be realized by modulating the onsite interactions in space and time. Surprisingly, while the center-of-mass of the quasi-BIC is shifted by a unit cell in one cycle, the bound pair moves in the opposite direction with the standing wave.
△ Less
Submitted 14 September, 2024; v1 submitted 25 December, 2023;
originally announced December 2023.
-
A discovery of Two Slow Pulsars with FAST: "Ronin" from the Globular Cluster M15
Authors:
Dengke Zhou,
Pei Wang,
Di Li,
Jianhua Fang,
Chenchen Miao,
Paulo C. C. Freire,
Lei Zhang,
Dandan Zhang,
Huaxi Chen,
Yi Feng,
Yifan Xiao,
Jintao Xie,
Xu Zhang,
Chenwu Jin,
Han Wang,
Yinan Ke,
Xuerong Guo,
Rushuang Zhao,
Chenhui Niu,
Weiwei Zhu,
Mengyao Xue,
Yabiao Wang,
Jiafu Wu,
Zhenye Gan,
Zhongyi Sun
, et al. (4 additional authors not shown)
Abstract:
Globular clusters harbor numerous millisecond pulsars, but long-period pulsars ($P \gtrsim 100$ ms) are rarely found. In this study, we employed a fast folding algorithm to analyze observational data from multiple globular clusters obtained by the Five-hundred-meter Aperture Spherical radio Telescope (FAST), aiming to detect the existence of long-period pulsars. We estimated the impact of the medi…
▽ More
Globular clusters harbor numerous millisecond pulsars, but long-period pulsars ($P \gtrsim 100$ ms) are rarely found. In this study, we employed a fast folding algorithm to analyze observational data from multiple globular clusters obtained by the Five-hundred-meter Aperture Spherical radio Telescope (FAST), aiming to detect the existence of long-period pulsars. We estimated the impact of the median filtering algorithm in eliminating red noise on the minimum detectable flux density ($S_{\rm min}$) of pulsars. Subsequently, we successfully discovered two isolated long-period pulsars in M15 with periods approximately equal to 1.928451 seconds and 3.960716 seconds, respectively. On the $P-\dot{P}$ diagram, both pulsars are positioned below the spin-up line, suggesting a possible history of partial recycling in X-ray binary systems disrupted by dynamical encounters later on. According to timing results, these two pulsars exhibit remarkably strong magnetic fields. If the magnetic fields were weakened during the accretion process, then a short duration of accretion might explain the strong magnetic fields of these pulsars.
△ Less
Submitted 18 April, 2024; v1 submitted 10 December, 2023;
originally announced December 2023.
-
Hiding Functions within Functions: Steganography by Implicit Neural Representations
Authors:
Jia Liu,
Peng Luo,
Yan Ke
Abstract:
Deep steganography utilizes the powerful capabilities of deep neural networks to embed and extract messages, but its reliance on an additional message extractor limits its practical use due to the added suspicion it can raise from steganalyzers. To address this problem, we propose StegaINR, which utilizes Implicit Neural Representation (INR) to implement steganography. StegaINR embeds a secret fun…
▽ More
Deep steganography utilizes the powerful capabilities of deep neural networks to embed and extract messages, but its reliance on an additional message extractor limits its practical use due to the added suspicion it can raise from steganalyzers. To address this problem, we propose StegaINR, which utilizes Implicit Neural Representation (INR) to implement steganography. StegaINR embeds a secret function into a stego function, which serves as both the message extractor and the stego media for secure transmission on a public channel. Recipients need only use a shared key to recover the secret function from the stego function, allowing them to obtain the secret message. Our approach makes use of continuous functions, enabling it to handle various types of messages. To our knowledge, this is the first work to introduce INR into steganography. We performed evaluations on image and climate data to test our method in different deployment contexts.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Random Green's function method for large-scale electronic structure calculation
Authors:
Mingfa Tang,
Chang Liu,
Aixia Zhang,
Qingyun Zhang,
Shengjun Yuan,
Youqi Ke
Abstract:
We report a linear-scaling random Green's function (rGF) method for large-scale electronic structure calculation. In this method, the rGF is defined on a set of random states to stochastically express the density matrix, and rGF is calculated with the linear-scaling computational cost. We show the rGF method is generally applicable to the nonorthogonal localized basis, and circumvent the large Che…
▽ More
We report a linear-scaling random Green's function (rGF) method for large-scale electronic structure calculation. In this method, the rGF is defined on a set of random states to stochastically express the density matrix, and rGF is calculated with the linear-scaling computational cost. We show the rGF method is generally applicable to the nonorthogonal localized basis, and circumvent the large Chebyshev expansion for the density matrix. As a demonstration, we implement rGF with density-functional Tight-Binding method and apply it to self-consistently calculate water clusters up 9984 H2Os. We find the rGF method combining with a simple fragment correction can reach an error of ~1meV per H2O in total energy, compared to the deterministic calculations, due to the self-average. The development of rGF method advances the stochastic electronic structure theory to a new stage of the efficiency and applicability.
△ Less
Submitted 3 March, 2024; v1 submitted 29 November, 2023;
originally announced November 2023.
-
Cluster trajectory of SOFA score in predicting mortality in sepsis
Authors:
Yuhe Ke,
Matilda Swee Sun Tang,
Celestine Jia Ling Loh,
Hairil Rizal Abdullah,
Nicholas Brian Shannon
Abstract:
Objective: Sepsis is a life-threatening condition. Sequential Organ Failure Assessment (SOFA) score is commonly used to assess organ dysfunction and predict ICU mortality, but it is taken as a static measurement and fails to capture dynamic changes. This study aims to investigate the relationship between dynamic changes in SOFA scores over the first 72 hours of ICU admission and patient outcomes.…
▽ More
Objective: Sepsis is a life-threatening condition. Sequential Organ Failure Assessment (SOFA) score is commonly used to assess organ dysfunction and predict ICU mortality, but it is taken as a static measurement and fails to capture dynamic changes. This study aims to investigate the relationship between dynamic changes in SOFA scores over the first 72 hours of ICU admission and patient outcomes.
Design, setting, and participants: 3,253 patients in the Medical Information Mart for Intensive Care IV database who met the sepsis-3 criteria and were admitted from the emergency department with at least 72 hours of ICU admission and full-active resuscitation status were analysed. Group-based trajectory modelling with dynamic time warping and k-means clustering identified distinct trajectory patterns in dynamic SOFA scores. They were subsequently compared using Python.
Main outcome measures: Outcomes including hospital and ICU mortality, length of stay in hospital and ICU, and readmission during hospital stay, were collected. Discharge time from ICU to wards and cut-offs at 7-day and 14-day were taken.
Results: Four clusters were identified: A (consistently low SOFA scores), B (rapid increase followed by a decline in SOFA scores), C (higher baseline scores with gradual improvement), and D (persistently elevated scores). Cluster D had the longest ICU and hospital stays, highest ICU and hospital mortality. Discharge rates from ICU were similar for Clusters A and B, while Cluster C had initially comparable rates but a slower transition to ward.
Conclusion: Monitoring dynamic changes in SOFA score is valuable for assessing sepsis severity and treatment responsiveness.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
Floquet Engineering of Hilbert Space Fragmentation in Stark Lattices
Authors:
Li Zhang,
Yongguan Ke,
Ling Lin,
Chaohong Lee
Abstract:
The concept of Hilbert space fragmentation (HSF) has recently been put forward as a routine to break quantum ergodicity. While HSF widely exists in dynamical constraint models, it is still challenging to tune HSF. Here, we propose a scheme to tune HSF in a one-dimensional tilted lattice of interacting spinless fermions with periodically driven tunneling. The dynamics is governed by effective Hamil…
▽ More
The concept of Hilbert space fragmentation (HSF) has recently been put forward as a routine to break quantum ergodicity. While HSF widely exists in dynamical constraint models, it is still challenging to tune HSF. Here, we propose a scheme to tune HSF in a one-dimensional tilted lattice of interacting spinless fermions with periodically driven tunneling. The dynamics is governed by effective Hamiltonians with kinetic constraints, which appear as density-dependent tunneling in the weak-tunneling perturbation expansion. The kinetic constraint can be tuned via changing the driving frequency, and three different kinds of strong HSF can be engineered. In general, the system is strongly constrained and exhibits a strong HSF. Two partial resonance frequencies are analytically given by a time-dependent perturbation theory for Floquet systems, at which some kinetic constraints are released and the system exhibits another two different strong HSF. We demonstrate the perturbation analysis with exact numerical simulation of the entanglement entropy, the density correlation functions and the saturated local density profiles. Our result provides a promising way to control HSF through Floquet engineering.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Solitary solutions of a nonlinear Dirac equation with different frequencies
Authors:
Qi Guo,
Yuanyuan Ke
Abstract:
We study the existence and nonexistence of solitary solutions with different frequencies for a type of nonlinear extension of Dirac-Slater model. There are three main ingredients in this paper. The first is the Pohozaev's identity of nonlinear Dirac equations. Combine with variational identity, we find the nonexistence results when the frequency $ω$ is greater than $m$. The second is critical poin…
▽ More
We study the existence and nonexistence of solitary solutions with different frequencies for a type of nonlinear extension of Dirac-Slater model. There are three main ingredients in this paper. The first is the Pohozaev's identity of nonlinear Dirac equations. Combine with variational identity, we find the nonexistence results when the frequency $ω$ is greater than $m$. The second is critical point theorem of strongly indefinite functionals. With this, we obtain existence result of $ω\in (-m,m)$. The third, which is the new main ingredient of this paper, is perturbation of the functional from the second ingredient. Then we can show the existence of solitary solutions when $ω=-m$. An interesting outcome from our result is that we can see the left and right are completely different in {\it Spectrum Zero Problem} which implies a new phenomenon in quantum theory.
△ Less
Submitted 13 November, 2023; v1 submitted 29 October, 2023;
originally announced October 2023.
-
SmooSeg: Smoothness Prior for Unsupervised Semantic Segmentation
Authors:
Mengcheng Lan,
Xinjiang Wang,
Yiping Ke,
Jiaxing Xu,
Litong Feng,
Wayne Zhang
Abstract:
Unsupervised semantic segmentation is a challenging task that segments images into semantic groups without manual annotation. Prior works have primarily focused on leveraging prior knowledge of semantic consistency or priori concepts from self-supervised learning methods, which often overlook the coherence property of image segments. In this paper, we demonstrate that the smoothness prior, asserti…
▽ More
Unsupervised semantic segmentation is a challenging task that segments images into semantic groups without manual annotation. Prior works have primarily focused on leveraging prior knowledge of semantic consistency or priori concepts from self-supervised learning methods, which often overlook the coherence property of image segments. In this paper, we demonstrate that the smoothness prior, asserting that close features in a metric space share the same semantics, can significantly simplify segmentation by casting unsupervised semantic segmentation as an energy minimization problem. Under this paradigm, we propose a novel approach called SmooSeg that harnesses self-supervised learning methods to model the closeness relationships among observations as smoothness signals. To effectively discover coherent semantic segments, we introduce a novel smoothness loss that promotes piecewise smoothness within segments while preserving discontinuities across different segments. Additionally, to further enhance segmentation quality, we design an asymmetric teacher-student style predictor that generates smoothly updated pseudo labels, facilitating an optimal fit between observations and labeling outputs. Thanks to the rich supervision cues of the smoothness prior, our SmooSeg significantly outperforms STEGO in terms of pixel accuracy on three datasets: COCOStuff (+14.9%), Cityscapes (+13.0%), and Potsdam-3 (+5.7%).
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
Deflation conjecture and local dimensions of Brent equations
Authors:
Xin Li,
Liping Zhang,
Yifen Ke
Abstract:
In this paper, a classical deflation process raised by Dayton, Li and Zeng is realized for the Brent equations, which provides new bounds for local dimensions of the solution set. Originally, this deflation process focuses on isolated solutions. We generalize it to the case of irreducible components and a related conjecture is given. We analyze its realization and apply it to the Brent equations.…
▽ More
In this paper, a classical deflation process raised by Dayton, Li and Zeng is realized for the Brent equations, which provides new bounds for local dimensions of the solution set. Originally, this deflation process focuses on isolated solutions. We generalize it to the case of irreducible components and a related conjecture is given. We analyze its realization and apply it to the Brent equations. The decrease of the nullities is easily observed. So the deflation process can be served as a useful tool for determining the local dimensions. In addition, our result implies that along with the decrease of the tensor rank, the singular solutions will become more and more.
△ Less
Submitted 2 June, 2024; v1 submitted 17 October, 2023;
originally announced October 2023.
-
Skeleton Ground Truth Extraction: Methodology, Annotation Tool and Benchmarks
Authors:
Cong Yang,
Bipin Indurkhya,
John See,
Bo Gao,
Yan Ke,
Zeyd Boukhers,
Zhenyu Yang,
Marcin Grzegorzek
Abstract:
Skeleton Ground Truth (GT) is critical to the success of supervised skeleton extraction methods, especially with the popularity of deep learning techniques. Furthermore, we see skeleton GTs used not only for training skeleton detectors with Convolutional Neural Networks (CNN) but also for evaluating skeleton-related pruning and matching algorithms. However, most existing shape and image datasets s…
▽ More
Skeleton Ground Truth (GT) is critical to the success of supervised skeleton extraction methods, especially with the popularity of deep learning techniques. Furthermore, we see skeleton GTs used not only for training skeleton detectors with Convolutional Neural Networks (CNN) but also for evaluating skeleton-related pruning and matching algorithms. However, most existing shape and image datasets suffer from the lack of skeleton GT and inconsistency of GT standards. As a result, it is difficult to evaluate and reproduce CNN-based skeleton detectors and algorithms on a fair basis. In this paper, we present a heuristic strategy for object skeleton GT extraction in binary shapes and natural images. Our strategy is built on an extended theory of diagnosticity hypothesis, which enables encoding human-in-the-loop GT extraction based on clues from the target's context, simplicity, and completeness. Using this strategy, we developed a tool, SkeView, to generate skeleton GT of 17 existing shape and image datasets. The GTs are then structurally evaluated with representative methods to build viable baselines for fair comparisons. Experiments demonstrate that GTs generated by our strategy yield promising quality with respect to standard consistency, and also provide a balance between simplicity and completeness.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
Integrating UMLS Knowledge into Large Language Models for Medical Question Answering
Authors:
Rui Yang,
Edison Marrese-Taylor,
Yuhe Ke,
Lechao Cheng,
Qingyu Chen,
Irene Li
Abstract:
Large language models (LLMs) have demonstrated powerful text generation capabilities, bringing unprecedented innovation to the healthcare field. While LLMs hold immense promise for applications in healthcare, applying them to real clinical scenarios presents significant challenges, as these models may generate content that deviates from established medical facts and even exhibit potential biases.…
▽ More
Large language models (LLMs) have demonstrated powerful text generation capabilities, bringing unprecedented innovation to the healthcare field. While LLMs hold immense promise for applications in healthcare, applying them to real clinical scenarios presents significant challenges, as these models may generate content that deviates from established medical facts and even exhibit potential biases. In our research, we develop an augmented LLM framework based on the Unified Medical Language System (UMLS), aiming to better serve the healthcare community. We employ LLaMa2-13b-chat and ChatGPT-3.5 as our benchmark models, and conduct automatic evaluations using the ROUGE Score and BERTScore on 104 questions from the LiveQA test set. Additionally, we establish criteria for physician-evaluation based on four dimensions: Factuality, Completeness, Readability and Relevancy. ChatGPT-3.5 is used for physician evaluation with 20 questions on the LiveQA test set. Multiple resident physicians conducted blind reviews to evaluate the generated content, and the results indicate that this framework effectively enhances the factuality, completeness, and relevance of generated content. Our research demonstrates the effectiveness of using UMLS-augmented LLMs and highlights the potential application value of LLMs in in medical question-answering.
△ Less
Submitted 13 October, 2023; v1 submitted 4 October, 2023;
originally announced October 2023.
-
MarkNerf:Watermarking for Neural Radiance Field
Authors:
Lifeng Chen,
Jia Liu,
Yan Ke,
Wenquan Sun,
Weina Dong,
Xiaozhong Pan
Abstract:
A watermarking algorithm is proposed in this paper to address the copyright protection issue of implicit 3D models. The algorithm involves embedding watermarks into the images in the training set through an embedding network, and subsequently utilizing the NeRF model for 3D modeling. A copyright verifier is employed to generate a backdoor image by providing a secret perspective as input to the neu…
▽ More
A watermarking algorithm is proposed in this paper to address the copyright protection issue of implicit 3D models. The algorithm involves embedding watermarks into the images in the training set through an embedding network, and subsequently utilizing the NeRF model for 3D modeling. A copyright verifier is employed to generate a backdoor image by providing a secret perspective as input to the neural radiation field. Subsequently, a watermark extractor is devised using the hyperparameterization method of the neural network to extract the embedded watermark image from that perspective. In a black box scenario, if there is a suspicion that the 3D model has been used without authorization, the verifier can extract watermarks from a secret perspective to verify network copyright. Experimental results demonstrate that the proposed algorithm effectively safeguards the copyright of 3D models. Furthermore, the extracted watermarks exhibit favorable visual effects and demonstrate robust resistance against various types of noise attacks.
△ Less
Submitted 20 September, 2023;
originally announced September 2023.
-
Steganography for Neural Radiance Fields by Backdooring
Authors:
Weina Dong,
Jia Liu,
Yan Ke,
Lifeng Chen,
Wenquan Sun,
Xiaozhong Pan
Abstract:
The utilization of implicit representation for visual data (such as images, videos, and 3D models) has recently gained significant attention in computer vision research. In this letter, we propose a novel model steganography scheme with implicit neural representation. The message sender leverages Neural Radiance Fields (NeRF) and its viewpoint synthesis capabilities by introducing a viewpoint as a…
▽ More
The utilization of implicit representation for visual data (such as images, videos, and 3D models) has recently gained significant attention in computer vision research. In this letter, we propose a novel model steganography scheme with implicit neural representation. The message sender leverages Neural Radiance Fields (NeRF) and its viewpoint synthesis capabilities by introducing a viewpoint as a key. The NeRF model generates a secret viewpoint image, which serves as a backdoor. Subsequently, we train a message extractor using overfitting to establish a one-to-one mapping between the secret message and the secret viewpoint image. The sender delivers the trained NeRF model and the message extractor to the receiver over the open channel, and the receiver utilizes the key shared by both parties to obtain the rendered image in the secret view from the NeRF model, and then obtains the secret message through the message extractor. The inherent complexity of the viewpoint information prevents attackers from stealing the secret message accurately. Experimental results demonstrate that the message extractor trained in this letter achieves high-capacity steganography with fast performance, achieving a 100\% accuracy in message extraction. Furthermore, the extensive viewpoint key space of NeRF ensures the security of the steganography scheme.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
CPMR: Context-Aware Incremental Sequential Recommendation with Pseudo-Multi-Task Learning
Authors:
Qingtian Bian,
Jiaxing Xu,
Hui Fang,
Yiping Ke
Abstract:
The motivations of users to make interactions can be divided into static preference and dynamic interest. To accurately model user representations over time, recent studies in sequential recommendation utilize information propagation and evolution to mine from batches of arriving interactions. However, they ignore the fact that people are easily influenced by the recent actions of other users in t…
▽ More
The motivations of users to make interactions can be divided into static preference and dynamic interest. To accurately model user representations over time, recent studies in sequential recommendation utilize information propagation and evolution to mine from batches of arriving interactions. However, they ignore the fact that people are easily influenced by the recent actions of other users in the contextual scenario, and applying evolution across all historical interactions dilutes the importance of recent ones, thus failing to model the evolution of dynamic interest accurately. To address this issue, we propose a Context-Aware Pseudo-Multi-Task Recommender System (CPMR) to model the evolution in both historical and contextual scenarios by creating three representations for each user and item under different dynamics: static embedding, historical temporal states, and contextual temporal states. To dually improve the performance of temporal states evolution and incremental recommendation, we design a Pseudo-Multi-Task Learning (PMTL) paradigm by stacking the incremental single-target recommendations into one multi-target task for joint optimization. Within the PMTL paradigm, CPMR employs a shared-bottom network to conduct the evolution of temporal states across historical and contextual scenarios, as well as the fusion of them at the user-item level. In addition, CPMR incorporates one real tower for incremental predictions, and two pseudo towers dedicated to updating the respective temporal states based on new batches of interactions. Experimental results on four benchmark recommendation datasets show that CPMR consistently outperforms state-of-the-art baselines and achieves significant gains on three of them. The code is available at: https://github.com/DiMarzioBian/CPMR.
△ Less
Submitted 16 September, 2023; v1 submitted 9 September, 2023;
originally announced September 2023.
-
3D Printed Multilayer Structures for High Numerical Aperture Achromatic Metalenses
Authors:
Cheng-Feng Pan,
Hao Wang,
Hongtao Wang,
Parvathi Nair S,
Qifeng Ruan,
Simon Wredh,
Yujie Ke,
John You En Chan,
Wang Zhang,
Cheng-Wei Qiu,
Joel K. W. Yang
Abstract:
Flat optics consisting of nanostructures of high-refractive-index materials produce lenses with thin form factors that tend to operate only at specific wavelengths. Recent attempts to achieve achromatic lenses uncover a trade-off between the numerical aperture (NA) and bandwidth, which limits performance. Here we propose a new approach to design high NA, broadband and polarization-insensitive mult…
▽ More
Flat optics consisting of nanostructures of high-refractive-index materials produce lenses with thin form factors that tend to operate only at specific wavelengths. Recent attempts to achieve achromatic lenses uncover a trade-off between the numerical aperture (NA) and bandwidth, which limits performance. Here we propose a new approach to design high NA, broadband and polarization-insensitive multilayer achromatic metalenses (MAM). We combine topology optimization and full wave simulations to inversely design MAMs and fabricate the structures in low-refractive-index materials by two-photon polymerization lithography. MAMs measuring 20 micrometer in diameter operating in the visible range of 400-800 nm with 0.5 NA and 0.7 NA were achieved with efficiencies of up to 42%. We demonstrate broadband imaging performance of the fabricated MAM under white light, and RGB narrowband illuminations. These results highlight the potential of the 3D printed multilayer structures for realizing broadband and multi-functional meta-devices with inverse design.
△ Less
Submitted 27 August, 2023;
originally announced August 2023.
-
Calculations of Chern number: equivalence of real-space and twisted-boundary-condition formulae
Authors:
Ling Lin,
Yongguan Ke,
Li Zhang,
Chaohong Lee
Abstract:
Chern number is a crucial invariant for characterizing topological feature of two-dimensional quantum systems. Real-space Chern number allows us to extract topological properties of systems without involving translational symmetry, and hence plays an important role in investigating topological systems with disorder or impurity. On the other hand, the twisted boundary condition (TBC) can also be us…
▽ More
Chern number is a crucial invariant for characterizing topological feature of two-dimensional quantum systems. Real-space Chern number allows us to extract topological properties of systems without involving translational symmetry, and hence plays an important role in investigating topological systems with disorder or impurity. On the other hand, the twisted boundary condition (TBC) can also be used to define the Chern number in the absence of translational symmetry. Based on the perturbative nature of the TBC under appropriate gauges, we derive the two real-space formulae of Chern number (namely the non-commutative Chern number and the Bott index formula), which are numerically confirmed for the Chern insulator and the quantum spin Hall insulator. Our results not only establish the equivalence between the real-space and TBC formula of the Chern number, but also provide concrete and instructive examples for deriving the real-space topological invariant through the twisted boundary condition.
△ Less
Submitted 7 November, 2023; v1 submitted 8 August, 2023;
originally announced August 2023.
-
Contrastive Graph Pooling for Explainable Classification of Brain Networks
Authors:
Jiaxing Xu,
Qingtian Bian,
Xinhang Li,
Aihu Zhang,
Yiping Ke,
Miao Qiao,
Wei Zhang,
Wei Khang Jeremy Sim,
Balázs Gulyás
Abstract:
Functional magnetic resonance imaging (fMRI) is a commonly used technique to measure neural activation. Its application has been particularly important in identifying underlying neurodegenerative conditions such as Parkinson's, Alzheimer's, and Autism. Recent analysis of fMRI data models the brain as a graph and extracts features by graph neural networks (GNNs). However, the unique characteristics…
▽ More
Functional magnetic resonance imaging (fMRI) is a commonly used technique to measure neural activation. Its application has been particularly important in identifying underlying neurodegenerative conditions such as Parkinson's, Alzheimer's, and Autism. Recent analysis of fMRI data models the brain as a graph and extracts features by graph neural networks (GNNs). However, the unique characteristics of fMRI data require a special design of GNN. Tailoring GNN to generate effective and domain-explainable features remains challenging. In this paper, we propose a contrastive dual-attention block and a differentiable graph pooling method called ContrastPool to better utilize GNN for brain networks, meeting fMRI-specific requirements. We apply our method to 5 resting-state fMRI brain network datasets of 3 diseases and demonstrate its superiority over state-of-the-art baselines. Our case study confirms that the patterns extracted by our method match the domain knowledge in neuroscience literature, and disclose direct and interesting insights. Our contributions underscore the potential of ContrastPool for advancing the understanding of brain networks and neurodegenerative conditions. The source code is available at https://github.com/AngusMonroe/ContrastPool.
△ Less
Submitted 6 September, 2024; v1 submitted 7 July, 2023;
originally announced July 2023.
-
Scalable Auction Algorithms for Bipartite Maximum Matching Problems
Authors:
Quanquan C. Liu,
Yiduo Ke,
Samir Khuller
Abstract:
In this paper, we give new auction algorithms for maximum weighted bipartite matching (MWM) and maximum cardinality bipartite $b$-matching (MCbM). Our algorithms run in $O\left(\log n/\varepsilon^8\right)$ and $O\left(\log n/\varepsilon^2\right)$ rounds, respectively, in the blackboard distributed setting. We show that our MWM algorithm can be implemented in the distributed, interactive setting us…
▽ More
In this paper, we give new auction algorithms for maximum weighted bipartite matching (MWM) and maximum cardinality bipartite $b$-matching (MCbM). Our algorithms run in $O\left(\log n/\varepsilon^8\right)$ and $O\left(\log n/\varepsilon^2\right)$ rounds, respectively, in the blackboard distributed setting. We show that our MWM algorithm can be implemented in the distributed, interactive setting using $O(\log^2 n)$ and $O(\log n)$ bit messages, respectively, directly answering the open question posed by Demange, Gale and Sotomayor [DNO14]. Furthermore, we implement our algorithms in a variety of other models including the the semi-streaming model, the shared-memory work-depth model, and the massively parallel computation model. Our semi-streaming MWM algorithm uses $O(1/\varepsilon^8)$ passes in $O(n \log n \cdot \log(1/\varepsilon))$ space and our MCbM algorithm runs in $O(1/\varepsilon^2)$ passes using $O\left(\left(\sum_{i \in L} b_i + |R|\right)\log(1/\varepsilon)\right)$ space (where parameters $b_i$ represent the degree constraints on the $b$-matching and $L$ and $R$ represent the left and right side of the bipartite graph, respectively). Both of these algorithms improves \emph{exponentially} the dependence on $\varepsilon$ in the space complexity in the semi-streaming model against the best-known algorithms for these problems, in addition to improvements in round complexity for MCbM. Finally, our algorithms eliminate the large polylogarithmic dependence on $n$ in depth and number of rounds in the work-depth and massively parallel computation models, respectively, improving on previous results which have large polylogarithmic dependence on $n$ (and exponential dependence on $\varepsilon$ in the MPC model).
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
Ultra-small topological spin textures with size of 1.3nm at above room temperature in Fe78Si9B13 amorphous alloy
Authors:
Weiwei Wu,
Huaping Zhang,
Hong Wang,
Chao Chang,
Hongyu Jiang,
Jinfeng Li,
Zhichao Lv,
Laiquan Shen,
Hanqiu Jiang,
Chunyong He,
Yubin Ke,
Yuhua Su,
Kosuke Hiroi,
Zhendong Fu,
Zi-An Li,
Lin Gu,
Maozhi Li,
Dong Ma,
Haiyang Bai
Abstract:
Topologically protected spin textures, such as skyrmions1,2 and vortices3,4, are robust against perturbations, serving as the building blocks for a range of topological devices5-9. In order to implement these topological devices, it is necessary to find ultra-small topological spin textures at room temperature, because small size implies the higher topological charge density, stronger signal of to…
▽ More
Topologically protected spin textures, such as skyrmions1,2 and vortices3,4, are robust against perturbations, serving as the building blocks for a range of topological devices5-9. In order to implement these topological devices, it is necessary to find ultra-small topological spin textures at room temperature, because small size implies the higher topological charge density, stronger signal of topological transport10,11 and the higher memory density or integration for topological quantum devices5-9. However, finding ultra-small topological spin textures at high temperatures is still a great challenge up to now. Here we find ultra-small topological spin textures in Fe78Si9B13 amorphous alloy. We measured a large topological Hall effect (THE) up to above room temperature, indicating the existence of highly densed and ultra-small topological spin textures in the samples. Further measurements by small-angle neutron scattering (SANS) reveal that the average size of ultra-small magnetic texture is around 1.3nm. Our Monte Carlo simulations show that such ultra-small spin texture is topologically equivalent to skyrmions, which originate from competing frustration and Dzyaloshinskii-Moriya interaction12,13 coming from amorphous structure14-17. Taking a single topological spin texture as one bit and ignoring the distance between them, we evaluated the ideal memory density of Fe78Si9B13, which reaches up to 4.44*104 gigabits (43.4 TB) per in2 and is 2 times of the value of GdRu2Si218 at 5K. More important, such high memory density can be obtained at above room temperature, which is 4 orders of magnitude larger than the value of other materials at the same temperature. These findings provide a unique candidate for magnetic memory devices with ultra-high density.
△ Less
Submitted 29 May, 2023;
originally announced May 2023.
-
Union Subgraph Neural Networks
Authors:
Jiaxing Xu,
Aihu Zhang,
Qingtian Bian,
Vijay Prakash Dwivedi,
Yiping Ke
Abstract:
Graph Neural Networks (GNNs) are widely used for graph representation learning in many application domains. The expressiveness of vanilla GNNs is upper-bounded by 1-dimensional Weisfeiler-Leman (1-WL) test as they operate on rooted subtrees through iterative message passing. In this paper, we empower GNNs by injecting neighbor-connectivity information extracted from a new type of substructure. We…
▽ More
Graph Neural Networks (GNNs) are widely used for graph representation learning in many application domains. The expressiveness of vanilla GNNs is upper-bounded by 1-dimensional Weisfeiler-Leman (1-WL) test as they operate on rooted subtrees through iterative message passing. In this paper, we empower GNNs by injecting neighbor-connectivity information extracted from a new type of substructure. We first investigate different kinds of connectivities existing in a local neighborhood and identify a substructure called union subgraph, which is able to capture the complete picture of the 1-hop neighborhood of an edge. We then design a shortest-path-based substructure descriptor that possesses three nice properties and can effectively encode the high-order connectivities in union subgraphs. By infusing the encoded neighbor connectivities, we propose a novel model, namely Union Subgraph Neural Network (UnionSNN), which is proven to be strictly more powerful than 1-WL in distinguishing non-isomorphic graphs. Additionally, the local encoding from union subgraphs can also be injected into arbitrary message-passing neural networks (MPNNs) and Transformer-based models as a plugin. Extensive experiments on 18 benchmarks of both graph-level and node-level tasks demonstrate that UnionSNN outperforms state-of-the-art baseline models, with competitive computational efficiency. The injection of our local encoding to existing models is able to boost the performance by up to 11.09%. Our code is available at https://github.com/AngusMonroe/UnionSNN.
△ Less
Submitted 9 January, 2024; v1 submitted 25 May, 2023;
originally announced May 2023.