subscribe to arXiv mailings

When Machine Unlearning Meets Retrieval-Augmented Generation (RAG): Keep Secret or Forget Knowledge?

Authors: Shang Wang, Tianqing Zhu, Dayong Ye, Wanlei Zhou

Abstract: The deployment of large language models (LLMs) like ChatGPT and Gemini has shown their powerful natural language generation capabilities. However, these models can inadvertently learn and retain sensitive information and harmful content during training, raising significant ethical and legal concerns. To address these issues, machine unlearning has been introduced as a potential solution. While exi… ▽ More The deployment of large language models (LLMs) like ChatGPT and Gemini has shown their powerful natural language generation capabilities. However, these models can inadvertently learn and retain sensitive information and harmful content during training, raising significant ethical and legal concerns. To address these issues, machine unlearning has been introduced as a potential solution. While existing unlearning methods take into account the specific characteristics of LLMs, they often suffer from high computational demands, limited applicability, or the risk of catastrophic forgetting. To address these limitations, we propose a lightweight unlearning framework based on Retrieval-Augmented Generation (RAG) technology. By modifying the external knowledge base of RAG, we simulate the effects of forgetting without directly interacting with the unlearned LLM. We approach the construction of unlearned knowledge as a constrained optimization problem, deriving two key components that underpin the effectiveness of RAG-based unlearning. This RAG-based approach is particularly effective for closed-source LLMs, where existing unlearning methods often fail. We evaluate our framework through extensive experiments on both open-source and closed-source models, including ChatGPT, Gemini, Llama-2-7b-chat-hf, and PaLM 2. The results demonstrate that our approach meets five key unlearning criteria: effectiveness, universality, harmlessness, simplicity, and robustness. Meanwhile, this approach can extend to multimodal large language models and LLM-based agents. △ Less

Submitted 19 October, 2024; originally announced October 2024.

Comments: 15 pages, 9 figures, 9 tables

arXiv:2410.15262 [pdf, other]

HyQE: Ranking Contexts with Hypothetical Query Embeddings

Authors: Weichao Zhou, Jiaxin Zhang, Hilaf Hasson, Anu Singh, Wenchao Li

Abstract: In retrieval-augmented systems, context ranking techniques are commonly employed to reorder the retrieved contexts based on their relevance to a user query. A standard approach is to measure this relevance through the similarity between contexts and queries in the embedding space. However, such similarity often fails to capture the relevance. Alternatively, large language models (LLMs) have been u… ▽ More In retrieval-augmented systems, context ranking techniques are commonly employed to reorder the retrieved contexts based on their relevance to a user query. A standard approach is to measure this relevance through the similarity between contexts and queries in the embedding space. However, such similarity often fails to capture the relevance. Alternatively, large language models (LLMs) have been used for ranking contexts. However, they can encounter scalability issues when the number of candidate contexts grows and the context window sizes of the LLMs remain constrained. Additionally, these approaches require fine-tuning LLMs with domain-specific data. In this work, we introduce a scalable ranking framework that combines embedding similarity and LLM capabilities without requiring LLM fine-tuning. Our framework uses a pre-trained LLM to hypothesize the user query based on the retrieved contexts and ranks the context based on the similarity between the hypothesized queries and the user query. Our framework is efficient at inference time and is compatible with many other retrieval and ranking techniques. Experimental results show that our method improves the ranking performance across multiple benchmarks. The complete code and data are available at https://github.com/zwc662/hyqe △ Less

Submitted 19 October, 2024; originally announced October 2024.

arXiv:2410.15172 [pdf, other]

Efficient and Adaptive Reconfiguration of Light Structure in Optical Fibers with Programmable Silicon Photonics

Authors: Wu Zhou, Zengqi Chen, Kaihang Lu, Hao Chen, Mingyuan Zhang, Wenzhang Tian, Yeyu Tong

Abstract: The demand for structured light with a reconfigurable spatial and polarization distribution has been increasing across a wide range of fundamental and advanced photonics applications, including microscopy, imaging, sensing, communications, and quantum information processing. Nevertheless, the unique challenge in manipulating light structure after optical fiber transmission is the necessity to dyna… ▽ More The demand for structured light with a reconfigurable spatial and polarization distribution has been increasing across a wide range of fundamental and advanced photonics applications, including microscopy, imaging, sensing, communications, and quantum information processing. Nevertheless, the unique challenge in manipulating light structure after optical fiber transmission is the necessity to dynamically address the inherent unknown fiber transmission matrix, which can be affected by factors like variations in the fiber stress and inter-modal coupling. In this study, we demonstrated that the beam structure at the fiber end including its spatial and polarization distribution can be precisely and adaptively reconfigured by a programmable silicon photonic processor, without prior knowledge of the optical fiber systems and their changes in the transmission matrices. Our demonstrated photonic chip can generate and control the full set of spatial and polarization modes or their superposition in a two-mode few-mode optical fiber. High-quality beam structures can be obtained in experiments. In addition, efficient generation is achieved by our proposed chip-to-fiber emitter while using a complementary metal-oxide-semiconductor compatible fabrication technology. Our findings present a scalable pathway towards achieving a portable and reliable system capable of achieving precise control, efficient emission, and adaptive reconfiguration for structured light in optical fibers. △ Less

Submitted 19 October, 2024; originally announced October 2024.

arXiv:2410.15034 [pdf, other]

Revisiting the Velocity Dispersion-Size Relation in Molecular Cloud Structures

Authors: Haoran Feng, Zhiwei Chen, Zhibo Jiang, Yuehui Ma, Yang Yang, Shuling Yu, Dongqing Ge, Wei Zhou, Fujun Du, Chen Wang, Shiyu Zhang, Yang Su, Ji Yang

Abstract: Structures in molecular ISM are observed to follow a power-law relation between the velocity dispersion and spatial size, known as Larson's first relation, which is often attributed to the turbulent nature of molecular ISM and imprints the dynamics of molecular cloud structures. Using the ${}^{13}\mathrm{CO}~(J=1-0)$ data from the Milky Way Imaging Scroll Painting survey, we built a sample with 36… ▽ More Structures in molecular ISM are observed to follow a power-law relation between the velocity dispersion and spatial size, known as Larson's first relation, which is often attributed to the turbulent nature of molecular ISM and imprints the dynamics of molecular cloud structures. Using the ${}^{13}\mathrm{CO}~(J=1-0)$ data from the Milky Way Imaging Scroll Painting survey, we built a sample with 360 structures having relatively accurate distances obtained from either the reddened background stars with Gaia parallaxes or associated maser parallaxes, spanning from $0.4$ to $\sim 15~\mathrm{kpc}$. Using this sample and about 0.3 million pixels, we analyzed the correlations between velocity dispersion, surface/column density, and spatial scales. Our structure-wise results show power-law indices smaller than 0.5 in both the $σ_v$-$R_{\mathrm{eff}}$ and $σ_v$-$R_{\mathrm{eff}} \cdot Σ$ relations. In the pixel-wise results, the $σ_v^{\mathrm{pix}}$ is statistically scaling with the beam physical size ($R_{\mathrm{s}} \equiv ΘD/2$) in form of $σ_v^{\mathrm{pix}} \propto R_{\mathrm{s}}^{0.43 \pm 0.03}$. Meanwhile, $σ_v^{\mathrm{pix}}$ in the inner Galaxy is statistically larger than the outer side. We also analyzed correlations between $σ_v^{\mathrm{pix}}$ and the $\mathrm{H_2}$ column density $N(\mathrm{H_2})$, finding that $σ_v^{\mathrm{pix}}$ stops increasing with $N(\mathrm{H_2})$ after $\gtrsim 10^{22}~{\mathrm{cm^{-2}}}$. The structures with and without high-column-density ($> 10^{22}~\mathrm{cm^{-2}}$) pixels show different $σ_v^{\mathrm{pix}} \propto N(\mathrm{H_2})^ξ$ relations, where the mean (std) $ξ$ values are $0.38~(0.14)$ and $0.62~(0.27)$, respectively. △ Less

Submitted 19 October, 2024; originally announced October 2024.

Comments: 23 pages, 12 figures, accepted for publication in Research in Astronomy and Astrophysics

arXiv:2410.13986 [pdf, other]

Recurrent Neural Goodness-of-Fit Test for Time Series

Authors: Aoran Zhang, Wenbin Zhou, Liyan Xie, Shixiang Zhu

Abstract: Time series data are crucial across diverse domains such as finance and healthcare, where accurate forecasting and decision-making rely on advanced modeling techniques. While generative models have shown great promise in capturing the intricate dynamics inherent in time series, evaluating their performance remains a major challenge. Traditional evaluation metrics fall short due to the temporal dep… ▽ More Time series data are crucial across diverse domains such as finance and healthcare, where accurate forecasting and decision-making rely on advanced modeling techniques. While generative models have shown great promise in capturing the intricate dynamics inherent in time series, evaluating their performance remains a major challenge. Traditional evaluation metrics fall short due to the temporal dependencies and potential high dimensionality of the features. In this paper, we propose the REcurrent NeurAL (RENAL) Goodness-of-Fit test, a novel and statistically rigorous framework for evaluating generative time series models. By leveraging recurrent neural networks, we transform the time series into conditionally independent data pairs, enabling the application of a chi-square-based goodness-of-fit test to the temporal dependencies within the data. This approach offers a robust, theoretically grounded solution for assessing the quality of generative models, particularly in settings with limited time sequences. We demonstrate the efficacy of our method across both synthetic and real-world datasets, outperforming existing methods in terms of reliability and accuracy. Our method fills a critical gap in the evaluation of time series generative models, offering a tool that is both practical and adaptable to high-stakes applications. △ Less

Submitted 17 October, 2024; originally announced October 2024.

Comments: 27 pages, 4 figures

arXiv:2410.13785 [pdf, other]

PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment

Authors: Zekun Moore Wang, Shawn Wang, Kang Zhu, Jiaheng Liu, Ke Xu, Jie Fu, Wangchunshu Zhou, Wenhao Huang

Abstract: Alignment of large language models (LLMs) involves training models on preference-contrastive output pairs to adjust their responses according to human preferences. To obtain such contrastive pairs, traditional methods like RLHF and RLAIF rely on limited contrasting patterns, such as varying model variants or decoding temperatures. This singularity leads to two issues: (1) alignment is not comprehe… ▽ More Alignment of large language models (LLMs) involves training models on preference-contrastive output pairs to adjust their responses according to human preferences. To obtain such contrastive pairs, traditional methods like RLHF and RLAIF rely on limited contrasting patterns, such as varying model variants or decoding temperatures. This singularity leads to two issues: (1) alignment is not comprehensive; and thereby (2) models are susceptible to jailbreaking attacks. To address these issues, we investigate how to construct more comprehensive and diversified contrasting patterns to enhance preference data (RQ1) and verify the impact of the diversification of contrasting patterns on model alignment (RQ2). For RQ1, we propose PopAlign, a framework that integrates diversified contrasting patterns across the prompt, model, and pipeline levels, introducing six contrasting strategies that do not require additional feedback labeling procedures. Regarding RQ2, we conduct thorough experiments demonstrating that PopAlign significantly outperforms existing methods, leading to more comprehensive alignment. △ Less

Submitted 17 October, 2024; originally announced October 2024.

Comments: 28 pages

arXiv:2410.13639 [pdf, other]

A Comparative Study on Reasoning Patterns of OpenAI's o1 Model

Authors: Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang, Chenghua Lin, J. H. Liu

Abstract: Enabling Large Language Models (LLMs) to handle a wider range of complex tasks (e.g., coding, math) has drawn great attention from many researchers. As LLMs continue to evolve, merely increasing the number of model parameters yields diminishing performance improvements and heavy computational costs. Recently, OpenAI's o1 model has shown that inference strategies (i.e., Test-time Compute methods) c… ▽ More Enabling Large Language Models (LLMs) to handle a wider range of complex tasks (e.g., coding, math) has drawn great attention from many researchers. As LLMs continue to evolve, merely increasing the number of model parameters yields diminishing performance improvements and heavy computational costs. Recently, OpenAI's o1 model has shown that inference strategies (i.e., Test-time Compute methods) can also significantly enhance the reasoning capabilities of LLMs. However, the mechanisms behind these methods are still unexplored. In our work, to investigate the reasoning patterns of o1, we compare o1 with existing Test-time Compute methods (BoN, Step-wise BoN, Agent Workflow, and Self-Refine) by using OpenAI's GPT-4o as a backbone on general reasoning benchmarks in three domains (i.e., math, coding, commonsense reasoning). Specifically, first, our experiments show that the o1 model has achieved the best performance on most datasets. Second, as for the methods of searching diverse responses (e.g., BoN), we find the reward models' capability and the search space both limit the upper boundary of these methods. Third, as for the methods that break the problem into many sub-problems, the Agent Workflow has achieved better performance than Step-wise BoN due to the domain-specific system prompt for planning better reasoning processes. Fourth, it is worth mentioning that we have summarized six reasoning patterns of o1, and provided a detailed analysis on several reasoning benchmarks. △ Less

Submitted 17 October, 2024; originally announced October 2024.

arXiv:2410.11345 [pdf, other]

Visual Manipulation with Legs

Authors: Xialin He, Chengjing Yuan, Wenxuan Zhou, Ruihan Yang, David Held, Xiaolong Wang

Abstract: Animals use limbs for both locomotion and manipulation. We aim to equip quadruped robots with similar versatility. This work introduces a system that enables quadruped robots to interact with objects using their legs, inspired by non-prehensile manipulation. The system has two main components: a visual manipulation policy module and a loco-manipulator module. The visual manipulation policy, traine… ▽ More Animals use limbs for both locomotion and manipulation. We aim to equip quadruped robots with similar versatility. This work introduces a system that enables quadruped robots to interact with objects using their legs, inspired by non-prehensile manipulation. The system has two main components: a visual manipulation policy module and a loco-manipulator module. The visual manipulation policy, trained with reinforcement learning (RL) using point cloud observations and object-centric actions, decides how the leg should interact with the object. The loco-manipulator controller manages leg movements and body pose adjustments, based on impedance control and Model Predictive Control (MPC). Besides manipulating objects with a single leg, the system can select from the left or right leg based on critic maps and move objects to distant goals through base adjustment. Experiments evaluate the system on object pose alignment tasks in both simulation and the real world, demonstrating more versatile object manipulation skills with legs than previous work. Videos can be found at https://legged-manipulation.github.io/ △ Less

Submitted 16 October, 2024; v1 submitted 15 October, 2024; originally announced October 2024.

Comments: More details can be found on our project page: https://legged-manipulation.github.io/

arXiv:2410.11046 [pdf]

SGUQ: Staged Graph Convolution Neural Network for Alzheimer's Disease Diagnosis using Multi-Omics Data

Authors: Liang Tao, Yixin Xie, Jeffrey D Deng, Hui Shen, Hong-Wen Deng, Weihua Zhou, Chen Zhao

Abstract: Alzheimer's disease (AD) is a chronic neurodegenerative disorder and the leading cause of dementia, significantly impacting cost, mortality, and burden worldwide. The advent of high-throughput omics technologies, such as genomics, transcriptomics, proteomics, and epigenomics, has revolutionized the molecular understanding of AD. Conventional AI approaches typically require the completion of all om… ▽ More Alzheimer's disease (AD) is a chronic neurodegenerative disorder and the leading cause of dementia, significantly impacting cost, mortality, and burden worldwide. The advent of high-throughput omics technologies, such as genomics, transcriptomics, proteomics, and epigenomics, has revolutionized the molecular understanding of AD. Conventional AI approaches typically require the completion of all omics data at the outset to achieve optimal AD diagnosis, which are inefficient and may be unnecessary. To reduce the clinical cost and improve the accuracy of AD diagnosis using multi-omics data, we propose a novel staged graph convolutional network with uncertainty quantification (SGUQ). SGUQ begins with mRNA and progressively incorporates DNA methylation and miRNA data only when necessary, reducing overall costs and exposure to harmful tests. Experimental results indicate that 46.23% of the samples can be reliably predicted using only single-modal omics data (mRNA), while an additional 16.04% of the samples can achieve reliable predictions when combining two omics data types (mRNA + DNA methylation). In addition, the proposed staged SGUQ achieved an accuracy of 0.858 on ROSMAP dataset, which outperformed existing methods significantly. The proposed SGUQ can not only be applied to AD diagnosis using multi-omics data but also has the potential for clinical decision-making using multi-viewed data. Our implementation is publicly available at https://github.com/chenzhao2023/multiomicsuncertainty. △ Less

Submitted 14 October, 2024; originally announced October 2024.

Comments: 20 pages, 2 figures

arXiv:2410.10728 [pdf, other]

Towards LLM-guided Efficient and Interpretable Multi-linear Tensor Network Rank Selection

Authors: Giorgos Iacovides, Wuyang Zhou, Danilo Mandic

Abstract: We propose a novel framework that leverages large language models (LLMs) to guide the rank selection in tensor network models for higher-order data analysis. By utilising the intrinsic reasoning capabilities and domain knowledge of LLMs, our approach offers enhanced interpretability of the rank choices and can effectively optimise the objective function. This framework enables users without specia… ▽ More We propose a novel framework that leverages large language models (LLMs) to guide the rank selection in tensor network models for higher-order data analysis. By utilising the intrinsic reasoning capabilities and domain knowledge of LLMs, our approach offers enhanced interpretability of the rank choices and can effectively optimise the objective function. This framework enables users without specialised domain expertise to utilise tensor network decompositions and understand the underlying rationale within the rank selection process. Experimental results validate our method on financial higher-order datasets, demonstrating interpretable reasoning, strong generalisation to unseen test data, and its potential for self-enhancement over successive iterations. This work is placed at the intersection of large language models and higher-order data analysis. △ Less

Submitted 14 October, 2024; originally announced October 2024.

arXiv:2410.10244 [pdf, other]

Capture Artifacts via Progressive Disentangling and Purifying Blended Identities for Deepfake Detection

Authors: Weijie Zhou, Xiaoqing Luo, Zhancheng Zhang, Jiachen He, Xiaojun Wu

Abstract: The Deepfake technology has raised serious concerns regarding privacy breaches and trust issues. To tackle these challenges, Deepfake detection technology has emerged. Current methods over-rely on the global feature space, which contains redundant information independent of the artifacts. As a result, existing Deepfake detection techniques suffer performance degradation when encountering unknown d… ▽ More The Deepfake technology has raised serious concerns regarding privacy breaches and trust issues. To tackle these challenges, Deepfake detection technology has emerged. Current methods over-rely on the global feature space, which contains redundant information independent of the artifacts. As a result, existing Deepfake detection techniques suffer performance degradation when encountering unknown datasets. To reduce information redundancy, the current methods use disentanglement techniques to roughly separate the fake faces into artifacts and content information. However, these methods lack a solid disentanglement foundation and cannot guarantee the reliability of their disentangling process. To address these issues, a Deepfake detection method based on progressive disentangling and purifying blended identities is innovatively proposed in this paper. Based on the artifact generation mechanism, the coarse- and fine-grained strategies are combined to ensure the reliability of the disentanglement method. Our method aims to more accurately capture and separate artifact features in fake faces. Specifically, we first perform the coarse-grained disentangling on fake faces to obtain a pair of blended identities that require no additional annotation to distinguish between source face and target face. Then, the artifact features from each identity are separated to achieve fine-grained disentanglement. To obtain pure identity information and artifacts, an Identity-Artifact Correlation Compression module (IACC) is designed based on the information bottleneck theory, effectively reducing the potential correlation between identity information and artifacts. Additionally, an Identity-Artifact Separation Contrast Loss is designed to enhance the independence of artifact features post-disentangling. Finally, the classifier only focuses on pure artifact features to achieve a generalized Deepfake detector. △ Less

Submitted 15 October, 2024; v1 submitted 14 October, 2024; originally announced October 2024.

Comments: TCSVT(Under Review)

arXiv:2410.10122 [pdf, other]

MuseTalk: Real-Time High Quality Lip Synchronization with Latent Space Inpainting

Authors: Yue Zhang, Minhao Liu, Zhaokang Chen, Bin Wu, Yubin Zeng, Chao Zhan, Yingjie He, Junxin Huang, Wenjiang Zhou

Abstract: Achieving high-resolution, identity consistency, and accurate lip-speech synchronization in face visual dubbing presents significant challenges, particularly for real-time applications like live video streaming. We propose MuseTalk, which generates lip-sync targets in a latent space encoded by a Variational Autoencoder, enabling high-fidelity talking face video generation with efficient inference.… ▽ More Achieving high-resolution, identity consistency, and accurate lip-speech synchronization in face visual dubbing presents significant challenges, particularly for real-time applications like live video streaming. We propose MuseTalk, which generates lip-sync targets in a latent space encoded by a Variational Autoencoder, enabling high-fidelity talking face video generation with efficient inference. Specifically, we project the occluded lower half of the face image and itself as an reference into a low-dimensional latent space and use a multi-scale U-Net to fuse audio and visual features at various levels. We further propose a novel sampling strategy during training, which selects reference images with head poses closely matching the target, allowing the model to focus on precise lip movement by filtering out redundant information. Additionally, we analyze the mechanism of lip-sync loss and reveal its relationship with input information volume. Extensive experiments show that MuseTalk consistently outperforms recent state-of-the-art methods in visual fidelity and achieves comparable lip-sync accuracy. As MuseTalk supports the online generation of face at 256x256 at more than 30 FPS with negligible starting latency, it paves the way for real-time applications. △ Less

Submitted 16 October, 2024; v1 submitted 13 October, 2024; originally announced October 2024.

Comments: 15 pages, 4 figures

Report number: RV-10-16

arXiv:2410.10120 [pdf, other]

Evaluating of Machine Unlearning: Robustness Verification Without Prior Modifications

Authors: Heng Xu, Tianqing Zhu, Wanlei Zhou

Abstract: Machine unlearning, a process enabling pre-trained models to remove the influence of specific training samples, has attracted significant attention in recent years. While extensive research has focused on developing efficient unlearning strategies, the critical aspect of unlearning verification has been largely overlooked. Existing verification methods mainly rely on machine learning attack techni… ▽ More Machine unlearning, a process enabling pre-trained models to remove the influence of specific training samples, has attracted significant attention in recent years. While extensive research has focused on developing efficient unlearning strategies, the critical aspect of unlearning verification has been largely overlooked. Existing verification methods mainly rely on machine learning attack techniques, such as membership inference attacks (MIAs) or backdoor attacks. However, these methods, not being formally designed for verification purposes, exhibit limitations in robustness and only support a small, predefined subset of samples. Moreover, dependence on prepared sample-level modifications of MIAs or backdoor attacks restricts their applicability in Machine Learning as a Service (MLaaS) environments. To address these limitations, we propose a novel robustness verification scheme without any prior modifications, and can support verification on a much larger set. Our scheme employs an optimization-based method to recover the actual training samples from the model. By comparative analysis of recovered samples extracted pre- and post-unlearning, MLaaS users can verify the unlearning process. This verification scheme, operating exclusively through model parameters, avoids the need for any sample-level modifications prior to model training while supporting verification on a much larger set and maintaining robustness. The effectiveness of our proposed approach is demonstrated through theoretical analysis and experiments involving diverse models on various datasets in different scenarios. △ Less

Submitted 13 October, 2024; originally announced October 2024.

arXiv:2410.09207 [pdf, other]

P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains

Authors: Simeng Han, Aaron Yu, Rui Shen, Zhenting Qi, Martin Riddell, Wenfei Zhou, Yujie Qiao, Yilun Zhao, Semih Yavuz, Ye Liu, Shafiq Joty, Yingbo Zhou, Caiming Xiong, Dragomir Radev, Rex Ying, Arman Cohan

Abstract: Existing methods on understanding the capabilities of LLMs in logical reasoning rely on binary entailment classification or synthetically derived rationales, which are not sufficient for proper investigation of model's capabilities. We present P-FOLIO, a human-annotated dataset consisting of diverse and complex reasoning chains for a set of realistic logical reasoning stories also written by human… ▽ More Existing methods on understanding the capabilities of LLMs in logical reasoning rely on binary entailment classification or synthetically derived rationales, which are not sufficient for proper investigation of model's capabilities. We present P-FOLIO, a human-annotated dataset consisting of diverse and complex reasoning chains for a set of realistic logical reasoning stories also written by humans. P-FOLIO is collected with an annotation protocol that facilitates humans to annotate well-structured natural language proofs for first-order logic reasoning problems in a step-by-step manner. The number of reasoning steps in P-FOLIO span from 0 to 20. We further use P-FOLIO to evaluate and improve large-language-model (LLM) reasoning capabilities. We evaluate LLM reasoning capabilities at a fine granularity via single-step inference rule classification, with more diverse inference rules of more diverse and higher levels of complexities than previous works. Given that a single model-generated reasoning chain could take a completely different path than the human-annotated one, we sample multiple reasoning chains from a model and use pass@k metrics for evaluating the quality of model-generated reasoning chains. We show that human-written reasoning chains significantly boost the logical reasoning capabilities of LLMs via many-shot prompting and fine-tuning. Furthermore, fine-tuning Llama3-7B on P-FOLIO improves the model performance by 10% or more on three other out-of-domain logical reasoning datasets. We also conduct detailed analysis to show where most powerful LLMs fall short in reasoning. We will release the dataset and code publicly. △ Less

Submitted 11 October, 2024; originally announced October 2024.

arXiv:2410.09102 [pdf, other]

Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy

Authors: Tong Wu, Shujian Zhang, Kaiqiang Song, Silei Xu, Sanqiang Zhao, Ravi Agrawal, Sathish Reddy Indurthi, Chong Xiang, Prateek Mittal, Wenxuan Zhou

Abstract: Large Language Models (LLMs) are susceptible to security and safety threats, such as prompt injection, prompt extraction, and harmful requests. One major cause of these vulnerabilities is the lack of an instruction hierarchy. Modern LLM architectures treat all inputs equally, failing to distinguish between and prioritize various types of instructions, such as system messages, user prompts, and dat… ▽ More Large Language Models (LLMs) are susceptible to security and safety threats, such as prompt injection, prompt extraction, and harmful requests. One major cause of these vulnerabilities is the lack of an instruction hierarchy. Modern LLM architectures treat all inputs equally, failing to distinguish between and prioritize various types of instructions, such as system messages, user prompts, and data. As a result, lower-priority user prompts may override more critical system instructions, including safety protocols. Existing approaches to achieving instruction hierarchy, such as delimiters and instruction-based training, do not address this issue at the architectural level. We introduce the Instructional Segment Embedding (ISE) technique, inspired by BERT, to modern large language models, which embeds instruction priority information directly into the model. This approach enables models to explicitly differentiate and prioritize various instruction types, significantly improving safety against malicious prompts that attempt to override priority rules. Our experiments on the Structured Query and Instruction Hierarchy benchmarks demonstrate an average robust accuracy increase of up to 15.75% and 18.68%, respectively. Furthermore, we observe an improvement in instruction-following capability of up to 4.1% evaluated on AlpacaEval. Overall, our approach offers a promising direction for enhancing the safety and effectiveness of LLM architectures. △ Less

Submitted 9 October, 2024; originally announced October 2024.

Comments: Preprint

arXiv:2410.07643 [pdf, other]

Rethinking Adversarial Inverse Reinforcement Learning: From the Angles of Policy Imitation and Transferable Reward Recovery

Authors: Yangchun Zhang, Wang Zhou, Yirui Zhou

Abstract: In scenarios of inverse reinforcement learning (IRL) with a single expert, adversarial inverse reinforcement learning (AIRL) serves as a foundational approach to providing comprehensive and transferable task descriptions by restricting the reward class, e.g., to state-only rewards. However, AIRL faces practical challenges, primarily stemming from the difficulty of verifying the unobservable transi… ▽ More In scenarios of inverse reinforcement learning (IRL) with a single expert, adversarial inverse reinforcement learning (AIRL) serves as a foundational approach to providing comprehensive and transferable task descriptions by restricting the reward class, e.g., to state-only rewards. However, AIRL faces practical challenges, primarily stemming from the difficulty of verifying the unobservable transition matrix - often encountered in practice - under the specific conditions necessary for effective transfer. This paper reexamines AIRL in light of the unobservable transition matrix or limited informative priors. By applying random matrix theory (RMT), we demonstrate that AIRL can disentangle rewards for effective transfer with high probability, irrespective of specific conditions. This perspective reframes inadequate transfer in certain contexts. Specifically, it is attributed to the selection problem of the reinforcement learning algorithm employed by AIRL, which is characterized by training variance. Based on this insight, we propose a hybrid framework that integrates on-policy proximal policy optimization (PPO) in the source environment with off-policy soft actor-critic (SAC) in the target environment, leading to significant improvements in reward transfer effectiveness. △ Less

Submitted 10 October, 2024; originally announced October 2024.

Comments: arXiv admin note: text overlap with arXiv:2403.14593

arXiv:2410.07035 [pdf, other]

PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness

Authors: Zekun Wang, Feiyu Duan, Yibo Zhang, Wangchunshu Zhou, Ke Xu, Wenhao Huang, Jie Fu

Abstract: Large Language Models (LLMs) demonstrate impressive capabilities across various domains, including role-playing, creative writing, mathematical reasoning, and coding. Despite these advancements, LLMs still encounter challenges with length control, frequently failing to adhere to specific length constraints due to their token-level operations and insufficient training on data with strict length lim… ▽ More Large Language Models (LLMs) demonstrate impressive capabilities across various domains, including role-playing, creative writing, mathematical reasoning, and coding. Despite these advancements, LLMs still encounter challenges with length control, frequently failing to adhere to specific length constraints due to their token-level operations and insufficient training on data with strict length limitations. We identify this issue as stemming from a lack of positional awareness and propose novel approaches--PositionID Prompting and PositionID Fine-Tuning--to address it. These methods enhance the model's ability to continuously monitor and manage text length during generation. Additionally, we introduce PositionID CP Prompting to enable LLMs to perform copy and paste operations accurately. Furthermore, we develop two benchmarks for evaluating length control and copy-paste abilities. Our experiments demonstrate that our methods significantly improve the model's adherence to length constraints and copy-paste accuracy without compromising response quality. △ Less

Submitted 9 October, 2024; originally announced October 2024.

Comments: 39 pages. CP-Bench and LenCtrl-Bench are available in https://huggingface.co/datasets/ZenMoore/CP-Bench and https://huggingface.co/datasets/ZenMoore/LenCtrl-Bench

arXiv:2410.06513 [pdf, other]

MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning

Authors: Xiaoyang Liu, Yunyao Mao, Wengang Zhou, Houqiang Li

Abstract: We introduce MotionRL, the first approach to utilize Multi-Reward Reinforcement Learning (RL) for optimizing text-to-motion generation tasks and aligning them with human preferences. Previous works focused on improving numerical performance metrics on the given datasets, often neglecting the variability and subjectivity of human feedback. In contrast, our novel approach uses reinforcement learning… ▽ More We introduce MotionRL, the first approach to utilize Multi-Reward Reinforcement Learning (RL) for optimizing text-to-motion generation tasks and aligning them with human preferences. Previous works focused on improving numerical performance metrics on the given datasets, often neglecting the variability and subjectivity of human feedback. In contrast, our novel approach uses reinforcement learning to fine-tune the motion generator based on human preferences prior knowledge of the human perception model, allowing it to generate motions that better align human preferences. In addition, MotionRL introduces a novel multi-objective optimization strategy to approximate Pareto optimality between text adherence, motion quality, and human preferences. Extensive experiments and user studies demonstrate that MotionRL not only allows control over the generated results across different objectives but also significantly enhances performance across these metrics compared to other algorithms. △ Less

Submitted 8 October, 2024; originally announced October 2024.

arXiv:2410.06489 [pdf]

High proton conductivity through angstrom-porous titania

Authors: Y. Ji, G. -P. Hao, Y. -T. Tan, W. Q. Xiong, Y. Liu, W. Z. Zhou, D. -M. Tang, R. Z. Ma, S. J. Yuan, T. Sasaki, M. Lozada-Hidalgo, A. K. Geim, Pengzhan Sun

Abstract: Two dimensional (2D) crystals have attracted strong interest as a new class of proton conducting materials that can block atoms, molecules and ions while allowing proton transport through the atomically thin basal planes. Although 2D materials exhibit this perfect selectivity, the reported proton conductivities have been relatively low. Here we show that vacancy-rich titania monolayers are highly… ▽ More Two dimensional (2D) crystals have attracted strong interest as a new class of proton conducting materials that can block atoms, molecules and ions while allowing proton transport through the atomically thin basal planes. Although 2D materials exhibit this perfect selectivity, the reported proton conductivities have been relatively low. Here we show that vacancy-rich titania monolayers are highly permeable to protons while remaining impermeable to helium with proton conductivity exceeding 100 S cm-2 at 200 C and surpassing targets set by industry roadmaps. The fast and selective proton transport is attributed to an extremely high density of titanium-atom vacancies (one per square nm), which effectively turns titania monolayers into angstrom-scale sieves. Our findings highlight the potential of 2D oxides as membrane materials for hydrogen-based technologies. △ Less

Submitted 8 October, 2024; originally announced October 2024.

arXiv:2410.05650 [pdf, other]

doi 10.1145/3664647.3680642

SIA-OVD: Shape-Invariant Adapter for Bridging the Image-Region Gap in Open-Vocabulary Detection

Authors: Zishuo Wang, Wenhao Zhou, Jinglin Xu, Yuxin Peng

Abstract: Open-vocabulary detection (OVD) aims to detect novel objects without instance-level annotations to achieve open-world object detection at a lower cost. Existing OVD methods mainly rely on the powerful open-vocabulary image-text alignment capability of Vision-Language Pretrained Models (VLM) such as CLIP. However, CLIP is trained on image-text pairs and lacks the perceptual ability for local region… ▽ More Open-vocabulary detection (OVD) aims to detect novel objects without instance-level annotations to achieve open-world object detection at a lower cost. Existing OVD methods mainly rely on the powerful open-vocabulary image-text alignment capability of Vision-Language Pretrained Models (VLM) such as CLIP. However, CLIP is trained on image-text pairs and lacks the perceptual ability for local regions within an image, resulting in the gap between image and region representations. Directly using CLIP for OVD causes inaccurate region classification. We find the image-region gap is primarily caused by the deformation of region feature maps during region of interest (RoI) extraction. To mitigate the inaccurate region classification in OVD, we propose a new Shape-Invariant Adapter named SIA-OVD to bridge the image-region gap in the OVD task. SIA-OVD learns a set of feature adapters for regions with different shapes and designs a new adapter allocation mechanism to select the optimal adapter for each region. The adapted region representations can align better with text representations learned by CLIP. Extensive experiments demonstrate that SIA-OVD effectively improves the classification accuracy for regions by addressing the gap between images and regions caused by shape deformation. SIA-OVD achieves substantial improvements over representative methods on the COCO-OVD benchmark. The code is available at https://github.com/PKU-ICST-MIPL/SIA-OVD_ACMMM2024. △ Less

Submitted 7 October, 2024; originally announced October 2024.

Comments: 9 pages, 7 figures

ACM Class: I.2.10

arXiv:2410.05567 [pdf, other]

With random regressors, least squares inference is robust to correlated errors with unknown correlation structure

Authors: Zifeng Zhang, Peng Ding, Wen Zhou, Haonan Wang

Abstract: Linear regression is arguably the most widely used statistical method. With fixed regressors and correlated errors, the conventional wisdom is to modify the variance-covariance estimator to accommodate the known correlation structure of the errors. We depart from the literature by showing that with random regressors, linear regression inference is robust to correlated errors with unknown correlati… ▽ More Linear regression is arguably the most widely used statistical method. With fixed regressors and correlated errors, the conventional wisdom is to modify the variance-covariance estimator to accommodate the known correlation structure of the errors. We depart from the literature by showing that with random regressors, linear regression inference is robust to correlated errors with unknown correlation structure. The existing theoretical analyses for linear regression are no longer valid because even the asymptotic normality of the least-squares coefficients breaks down in this regime. We first prove the asymptotic normality of the t statistics by establishing their Berry-Esseen bounds based on a novel probabilistic analysis of self-normalized statistics. We then study the local power of the corresponding t tests and show that, perhaps surprisingly, error correlation can even enhance power in the regime of weak signals. Overall, our results show that linear regression is applicable more broadly than the conventional theory suggests, and further demonstrate the value of randomization to ensure robustness of inference. △ Less

Submitted 10 October, 2024; v1 submitted 7 October, 2024; originally announced October 2024.

arXiv:2410.05248 [pdf, other]

SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe

Authors: Yuxin Xiao, Shujian Zhang, Wenxuan Zhou, Marzyeh Ghassemi, Sanqiang Zhao

Abstract: To induce desired behaviors in large language models (LLMs) for interaction-driven tasks, the instruction-tuning stage typically trains LLMs on instruction-response pairs using the next-token prediction (NTP) loss. Previous work aiming to improve instruction-tuning performance often emphasizes the need for higher-quality supervised fine-tuning (SFT) datasets, which typically involves expensive dat… ▽ More To induce desired behaviors in large language models (LLMs) for interaction-driven tasks, the instruction-tuning stage typically trains LLMs on instruction-response pairs using the next-token prediction (NTP) loss. Previous work aiming to improve instruction-tuning performance often emphasizes the need for higher-quality supervised fine-tuning (SFT) datasets, which typically involves expensive data filtering with proprietary LLMs or labor-intensive data generation by human annotators. However, these approaches do not fully leverage the datasets' intrinsic properties, resulting in high computational and labor costs, thereby limiting scalability and performance gains. In this paper, we propose SFTMix, a novel recipe that elevates instruction-tuning performance beyond the conventional NTP paradigm, without the need for well-curated datasets. Observing that LLMs exhibit uneven confidence across the semantic representation space, we argue that examples with different confidence levels should play distinct roles during the instruction-tuning process. Based on this insight, SFTMix leverages training dynamics to identify examples with varying confidence levels, then applies a Mixup-based regularization to mitigate overfitting on confident examples while propagating supervision signals to improve learning on relatively unconfident ones. This approach enables SFTMix to significantly outperform NTP across a wide range of instruction-following and healthcare domain-specific SFT tasks, demonstrating its adaptability to diverse LLM families and scalability to datasets of any size. Comprehensive ablation studies further verify the robustness of SFTMix's design choices, underscoring its versatility in consistently enhancing performance across different LLMs and datasets in broader natural language processing applications. △ Less

Submitted 7 October, 2024; originally announced October 2024.

arXiv:2410.04922 [pdf, other]

Random-projection ensemble dimension reduction

Authors: Wenxing Zhou, Timothy I. Cannings

Abstract: We introduce a new framework for dimension reduction in the context of high-dimensional regression. Our proposal is to aggregate an ensemble of random projections, which have been carefully chosen based on the empirical regression performance after being applied to the covariates. More precisely, we consider disjoint groups of independent random projections, apply a base regression method after ea… ▽ More We introduce a new framework for dimension reduction in the context of high-dimensional regression. Our proposal is to aggregate an ensemble of random projections, which have been carefully chosen based on the empirical regression performance after being applied to the covariates. More precisely, we consider disjoint groups of independent random projections, apply a base regression method after each projection, and retain the projection in each group based on the empirical performance. We aggregate the selected projections by taking the singular value decomposition of their empirical average and then output the leading order singular vectors. A particularly appealing aspect of our approach is that the singular values provide a measure of the relative importance of the corresponding projection directions, which can be used to select the final projection dimension. We investigate in detail (and provide default recommendations for) various aspects of our general framework, including the projection distribution and the base regression method, as well as the number of random projections used. Additionally, we investigate the possibility of further reducing the dimension by applying our algorithm twice in cases where projection dimension recommended in the initial application is too large. Our theoretical results show that the error of our algorithm stabilises as the number of groups of projections increases. We demonstrate the excellent empirical performance of our proposal in a large numerical study using simulated and real data. △ Less

Submitted 7 October, 2024; originally announced October 2024.

Comments: 37 pages, 12 figures and 6 tables

arXiv:2410.04354 [pdf, other]

StreetSurfGS: Scalable Urban Street Surface Reconstruction with Planar-based Gaussian Splatting

Authors: Xiao Cui, Weicai Ye, Yifan Wang, Guofeng Zhang, Wengang Zhou, Houqiang Li

Abstract: Reconstructing urban street scenes is crucial due to its vital role in applications such as autonomous driving and urban planning. These scenes are characterized by long and narrow camera trajectories, occlusion, complex object relationships, and data sparsity across multiple scales. Despite recent advancements, existing surface reconstruction methods, which are primarily designed for object-centr… ▽ More Reconstructing urban street scenes is crucial due to its vital role in applications such as autonomous driving and urban planning. These scenes are characterized by long and narrow camera trajectories, occlusion, complex object relationships, and data sparsity across multiple scales. Despite recent advancements, existing surface reconstruction methods, which are primarily designed for object-centric scenarios, struggle to adapt effectively to the unique characteristics of street scenes. To address this challenge, we introduce StreetSurfGS, the first method to employ Gaussian Splatting specifically tailored for scalable urban street scene surface reconstruction. StreetSurfGS utilizes a planar-based octree representation and segmented training to reduce memory costs, accommodate unique camera characteristics, and ensure scalability. Additionally, to mitigate depth inaccuracies caused by object overlap, we propose a guided smoothing strategy within regularization to eliminate inaccurate boundary points and outliers. Furthermore, to address sparse views and multi-scale challenges, we use a dual-step matching strategy that leverages adjacent and long-term information. Extensive experiments validate the efficacy of StreetSurfGS in both novel view synthesis and surface reconstruction. △ Less

Submitted 19 October, 2024; v1 submitted 6 October, 2024; originally announced October 2024.

arXiv:2410.03752 [pdf, other]

Efficient Streaming LLM for Speech Recognition

Authors: Junteng Jia, Gil Keren, Wei Zhou, Egor Lakomkin, Xiaohui Zhang, Chunyang Wu, Frank Seide, Jay Mahadeokar, Ozlem Kalinli

Abstract: Recent works have shown that prompting large language models with audio encodings can unlock speech recognition capabilities. However, existing techniques do not scale efficiently, especially while handling long form streaming audio inputs -- not only do they extrapolate poorly beyond the audio length seen during training, but they are also computationally inefficient due to the quadratic cost of… ▽ More Recent works have shown that prompting large language models with audio encodings can unlock speech recognition capabilities. However, existing techniques do not scale efficiently, especially while handling long form streaming audio inputs -- not only do they extrapolate poorly beyond the audio length seen during training, but they are also computationally inefficient due to the quadratic cost of attention. In this work, we introduce SpeechLLM-XL, a linear scaling decoder-only model for streaming speech recognition. We process audios in configurable chunks using limited attention window for reduced computation, and the text tokens for each audio chunk are generated auto-regressively until an EOS is predicted. During training, the transcript is segmented into chunks, using a CTC forced alignment estimated from encoder output. SpeechLLM-XL with 1.28 seconds chunk size achieves 2.7%/6.7% WER on LibriSpeech test clean/other, and it shows no quality degradation on long form utterances 10x longer than the training utterances. △ Less

Submitted 1 October, 2024; originally announced October 2024.

arXiv:2410.02798 [pdf, other]

Joint multifractality in the cross-correlations between grains \& oilseeds indices and external uncertainties

Authors: Ying-Hui Shao, Xing-Lu Gao, Yan-Hong Yang, Wei-Xing Zhou

Abstract: This study investigates the relationships between agricultural spot markets and external uncertainties via the multifractal detrending moving-average cross-correlation analysis (MF-X-DMA). The dataset contains the Grains \& Oilseeds Index (GOI) and its five sub-indices of wheat, maize, soyabeans, rice, and barley. Moreover, we use three uncertainty proxies, namely, economic policy uncertainty (EPU… ▽ More This study investigates the relationships between agricultural spot markets and external uncertainties via the multifractal detrending moving-average cross-correlation analysis (MF-X-DMA). The dataset contains the Grains \& Oilseeds Index (GOI) and its five sub-indices of wheat, maize, soyabeans, rice, and barley. Moreover, we use three uncertainty proxies, namely, economic policy uncertainty (EPU), geopolitical risk (GPR), and volatility Index (VIX). We observe the presence of multifractal cross-correlations between agricultural markets and uncertainties. Further, statistical tests show that maize has intrinsic joint multifractality with all the uncertainty proxies, exhibiting a high degree of sensitivity. Additionally, intrinsic multifractality among GOI-GPR, wheat-GPR and soyabeans-VIX is illustrated. However, other series have apparent multifractal cross-correlations with high possibilities. Moreover, our analysis suggests that among the three kinds of external uncertainties, geopolitical risk has a relatively stronger association with grain prices. △ Less

Submitted 18 September, 2024; originally announced October 2024.

Comments: 30 pages, 21 figures

arXiv:2410.01162 [pdf, other]

Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech

Authors: Wonjune Kang, Junteng Jia, Chunyang Wu, Wei Zhou, Egor Lakomkin, Yashesh Gaur, Leda Sari, Suyoun Kim, Ke Li, Jay Mahadeokar, Ozlem Kalinli

Abstract: As speech becomes an increasingly common modality for interacting with large language models (LLMs), it is becoming desirable to develop systems where LLMs can take into account users' emotions or speaking styles when providing their responses. In this work, we study the potential of an LLM to understand these aspects of speech without fine-tuning its weights. To do this, we utilize an end-to-end… ▽ More As speech becomes an increasingly common modality for interacting with large language models (LLMs), it is becoming desirable to develop systems where LLMs can take into account users' emotions or speaking styles when providing their responses. In this work, we study the potential of an LLM to understand these aspects of speech without fine-tuning its weights. To do this, we utilize an end-to-end system with a speech encoder; the encoder is trained to produce token embeddings such that the LLM's response to an expressive speech prompt is aligned with its response to a semantically matching text prompt where the speaker's emotion has also been specified. We find that this training framework allows the encoder to generate tokens that capture both semantic and paralinguistic information in speech and effectively convey it to the LLM, even when the LLM remains completely frozen. We also explore training on additional emotion and style-related response alignment tasks, finding that they further increase the amount of paralinguistic information explicitly captured in the speech tokens. Experiments demonstrate that our system is able to produce higher quality and more empathetic responses to expressive speech prompts compared to several baselines. △ Less

Submitted 1 October, 2024; originally announced October 2024.

arXiv:2410.01002 [pdf, other]

The currently observed clumps cannot be the "direct" precursors of the currently observed open clusters

Authors: J. W. Zhou, Sami Dib

Abstract: We categorized clumps, embedded clusters, and open clusters, and conducted a comparative analysis of their physical properties. Overall, the radii of open clusters are significantly larger than those of embedded clusters and clumps. The radii of embedded clusters are larger than those of clumps, which may be due to the expansion of embedded clusters. The open clusters have significantly larger mas… ▽ More We categorized clumps, embedded clusters, and open clusters, and conducted a comparative analysis of their physical properties. Overall, the radii of open clusters are significantly larger than those of embedded clusters and clumps. The radii of embedded clusters are larger than those of clumps, which may be due to the expansion of embedded clusters. The open clusters have significantly larger masses than embedded clusters, by about one order of magnitude. Given the current mass distribution of clumps in the Milky Way, the evolutionary sequence from a single clump evolving into an embedded cluster and subsequently into an open cluster cannot account for the observed open clusters with old ages and large masses, which is also supported by N-body simulations of individual embedded clusters. To explain the mass and radius distributions of the observed open clusters, initial embedded clusters with masses larger than 3000 M$_{\odot}$ are necessary. However, the upper limit of the embedded cluster sample is less than 1000 M$_{\odot}$. And only few ATLASGAL clumps have a mass larger than 3000 M$_{\odot}$. Thus, the currently observed clumps cannot be the "direct" precursors of the currently observed open clusters. If the Milky Way has a burst-like and time-dependent star formation history, the currently observed open clusters with old ages and large masses may come from massive clumps in the past. There is also a great possibility that these open clusters originate from multiple embedded clusters mergers. We compared the separation of open clusters and the typical size of molecular clouds, and found that most molecular clouds may only form one open cluster, which supports the merger scenario. Further study is necessary to distinguish between different scenarios. △ Less

Submitted 1 October, 2024; originally announced October 2024.

Comments: Accepted for publication in A&A, 8 pages, 6 figures. arXiv admin note: text overlap with arXiv:2409.20271

Journal ref: 2024, Article reference: aa51728-24

arXiv:2410.00648 [pdf, ps, other]

A strengthening on consecutive odd cycles in graphs of given minimum degree

Authors: Hao Lin, Guanghui Wang, Wenling Zhou

Abstract: Liu and Ma [J. Combin. Theory Ser. B, 2018] conjectured that every $2$-connected non-bipartite graph with minimum degree at least $k+1$ contains $\lceil k/2\rceil $ cycles with consecutive odd lengths. In particular, they showed that this conjecture holds when $k$ is even. In this paper, we confirm this conjecture for any $k\in \mathbb N$. Moreover, we also improve some previous results about cycl… ▽ More Liu and Ma [J. Combin. Theory Ser. B, 2018] conjectured that every $2$-connected non-bipartite graph with minimum degree at least $k+1$ contains $\lceil k/2\rceil $ cycles with consecutive odd lengths. In particular, they showed that this conjecture holds when $k$ is even. In this paper, we confirm this conjecture for any $k\in \mathbb N$. Moreover, we also improve some previous results about cycles of consecutive lengths. △ Less

Submitted 1 October, 2024; originally announced October 2024.

Comments: 10 pages

arXiv:2410.00022 [pdf, other]

TREB: a BERT attempt for imputing tabular data imputation

Authors: Shuyue Wang, Wenjun Zhou, Han drk-m-s Jiang, Shuo Wang, Ren Zheng

Abstract: TREB, a novel tabular imputation framework utilizing BERT, introduces a groundbreaking approach for handling missing values in tabular data. Unlike traditional methods that often overlook the specific demands of imputation, TREB leverages the robust capabilities of BERT to address this critical task. While many BERT-based approaches for tabular data have emerged, they frequently under-utilize the… ▽ More TREB, a novel tabular imputation framework utilizing BERT, introduces a groundbreaking approach for handling missing values in tabular data. Unlike traditional methods that often overlook the specific demands of imputation, TREB leverages the robust capabilities of BERT to address this critical task. While many BERT-based approaches for tabular data have emerged, they frequently under-utilize the language model's full potential. To rectify this, TREB employs a BERT-based model fine-tuned specifically for the task of imputing real-valued continuous numbers in tabular datasets. The paper comprehensively addresses the unique challenges posed by tabular data imputation, emphasizing the importance of context-based interconnections. The effectiveness of TREB is validated through rigorous evaluation using the California Housing dataset. The results demonstrate its ability to preserve feature interrelationships and accurately impute missing values. Moreover, the authors shed light on the computational efficiency and environmental impact of TREB, quantifying the floating-point operations (FLOPs) and carbon footprint associated with its training and deployment. △ Less

Submitted 15 September, 2024; originally announced October 2024.

Comments: 12 pages, 7 figures

arXiv:2409.20370 [pdf, other]

The Perfect Blend: Redefining RLHF with Mixture of Judges

Authors: Tengyu Xu, Eryk Helenowski, Karthik Abinav Sankararaman, Di Jin, Kaiyan Peng, Eric Han, Shaoliang Nie, Chen Zhu, Hejia Zhang, Wenxuan Zhou, Zhouhao Zeng, Yun He, Karishma Mandyam, Arya Talabzadeh, Madian Khabsa, Gabriel Cohen, Yuandong Tian, Hao Ma, Sinong Wang, Han Fang

Abstract: Reinforcement learning from human feedback (RLHF) has become the leading approach for fine-tuning large language models (LLM). However, RLHF has limitations in multi-task learning (MTL) due to challenges of reward hacking and extreme multi-objective optimization (i.e., trade-off of multiple and/or sometimes conflicting objectives). Applying RLHF for MTL currently requires careful tuning of the wei… ▽ More Reinforcement learning from human feedback (RLHF) has become the leading approach for fine-tuning large language models (LLM). However, RLHF has limitations in multi-task learning (MTL) due to challenges of reward hacking and extreme multi-objective optimization (i.e., trade-off of multiple and/or sometimes conflicting objectives). Applying RLHF for MTL currently requires careful tuning of the weights for reward model and data combinations. This is often done via human intuition and does not generalize. In this work, we introduce a novel post-training paradigm which we called Constrained Generative Policy Optimization (CGPO). The core of CGPO is Mixture of Judges (MoJ) with cost-efficient constrained policy optimization with stratification, which can identify the perfect blend in RLHF in a principled manner. It shows strong empirical results with theoretical guarantees, does not require extensive hyper-parameter tuning, and is plug-and-play in common post-training pipelines. Together, this can detect and mitigate reward hacking behaviors while reaching a pareto-optimal point across an extremely large number of objectives. Our empirical evaluations demonstrate that CGPO significantly outperforms standard RLHF algorithms like PPO and DPO across various tasks including general chat, STEM questions, instruction following, and coding. Specifically, CGPO shows improvements of 7.4% in AlpacaEval-2 (general chat), 12.5% in Arena-Hard (STEM & reasoning), and consistent gains in other domains like math and coding. Notably, PPO, while commonly used, is prone to severe reward hacking in popular coding benchmarks, which CGPO successfully addresses. This breakthrough in RLHF not only tackles reward hacking and extreme multi-objective optimization challenges but also advances the state-of-the-art in aligning general-purpose LLMs for diverse applications. △ Less

Submitted 30 September, 2024; originally announced September 2024.

Comments: submitted to conference

arXiv:2409.19542 [pdf, other]

doi 10.1016/j.eswa.2024.125460

BiPC: Bidirectional Probability Calibration for Unsupervised Domain Adaption

Authors: Wenlve Zhou, Zhiheng Zhou, Junyuan Shang, Chang Niu, Mingyue Zhang, Xiyuan Tao, Tianlei Wang

Abstract: Unsupervised Domain Adaptation (UDA) leverages a labeled source domain to solve tasks in an unlabeled target domain. While Transformer-based methods have shown promise in UDA, their application is limited to plain Transformers, excluding Convolutional Neural Networks (CNNs) and hierarchical Transformers. To address this issues, we propose Bidirectional Probability Calibration (BiPC) from a probabi… ▽ More Unsupervised Domain Adaptation (UDA) leverages a labeled source domain to solve tasks in an unlabeled target domain. While Transformer-based methods have shown promise in UDA, their application is limited to plain Transformers, excluding Convolutional Neural Networks (CNNs) and hierarchical Transformers. To address this issues, we propose Bidirectional Probability Calibration (BiPC) from a probability space perspective. We demonstrate that the probability outputs from a pre-trained head, after extensive pre-training, are robust against domain gaps and can adjust the probability distribution of the task head. Moreover, the task head can enhance the pre-trained head during adaptation training, improving model performance through bidirectional complementation. Technically, we introduce Calibrated Probability Alignment (CPA) to adjust the pre-trained head's probabilities, such as those from an ImageNet-1k pre-trained classifier. Additionally, we design a Calibrated Gini Impurity (CGI) loss to refine the task head, with calibrated coefficients learned from the pre-trained classifier. BiPC is a simple yet effective method applicable to various networks, including CNNs and Transformers. Experimental results demonstrate its remarkable performance across multiple UDA tasks. Our code will be available at: https://github.com/Wenlve-Zhou/BiPC. △ Less

Submitted 28 September, 2024; originally announced September 2024.

arXiv:2409.19307 [pdf, other]

Quantile connectedness across BRICS and international grain futures markets: Insights from the Russia-Ukraine conflict

Authors: Yan-Hong Yang, Ying-Hui Shao, Wei-Xing Zhou

Abstract: This study examines the quantile connectedness among grain futures markets in BRICS and international markets, with a particular focus on the ongoing and escalating impacts of the Russia-Ukraine conflict. The findings reveal significant heterogeneity in spillover effects across different quantiles and market conditions. Specifically, the time-varying total connectedness index (TCI) consistently fl… ▽ More This study examines the quantile connectedness among grain futures markets in BRICS and international markets, with a particular focus on the ongoing and escalating impacts of the Russia-Ukraine conflict. The findings reveal significant heterogeneity in spillover effects across different quantiles and market conditions. Specifically, the time-varying total connectedness index (TCI) consistently fluctuated around 95\% under both extreme bearish and bullish market conditions, markedly higher than in normal market conditions. Moreover, across all quantile levels, the TCI was higher during the pre-outbreak period than in the post-outbreak period. This systemic risk has notably decreased following the onset of the Russia-Ukraine conflict and the subsequent changes to the Black Sea Grain Initiative. Apart from rice, U.S. grain futures maintained a dominant position as benchmarks for international grain prices, exerting substantial influence over the grain futures markets in BRICS throughout most of the period. Finally, the study highlights that the influence of grain type and regional proximity strengthens pairwise connectedness among futures markets, with short-term spillovers being dominant and the spillover effect generally symmetric across quantiles. △ Less

Submitted 28 September, 2024; originally announced September 2024.

Comments: 42 pages, 31 figures

arXiv:2409.18853 [pdf, other]

Interaction between Unruh-Dewitt detectors exclusively due to acceleration: A Parallel to the FDU Effect

Authors: Wenting Zhou, Shijing Cheng, Hongwei Yu

Abstract: We have discovered an interaction between two detectors in a vacuum that emerges exclusively due to acceleration, akin to the spontaneous excitation of a single detector as predicted by the Fulling-Davies-Unruh (FDU) effect. However, this interaction contrasts sharply with the FDU effect, which suggests that a uniformly accelerated detector behaves as if it were in a thermal bath, as the discovere… ▽ More We have discovered an interaction between two detectors in a vacuum that emerges exclusively due to acceleration, akin to the spontaneous excitation of a single detector as predicted by the Fulling-Davies-Unruh (FDU) effect. However, this interaction contrasts sharply with the FDU effect, which suggests that a uniformly accelerated detector behaves as if it were in a thermal bath, as the discovered interaction does not manifest in a thermal environment. The novel interaction displays unique dependencies on the separation between detectors: it can be either attractive or repulsive, with the potential to transition between these behaviors as the inter-detector separation changes. More intriguingly, it exhibits a surprising large-small duality in its dependence on acceleration, suggesting the existence of an optimal acceleration at which the interaction is strongest, in contrast to the monotonic acceleration-dependence of the FDU effect. △ Less

Submitted 27 September, 2024; originally announced September 2024.

Comments: 18 pages, 1 figure

arXiv:2409.18422 [pdf, other]

The resilience of China's financial markets: With a focus on the impact of its climate policy uncertainty

Authors: Si-yao Wei, Wei-xing Zhou

Abstract: Resilience serves to assess the ability of financial markets to resist external shocks. The intensity and duration, used to indicate resilience, are calculated for China's financial markets in this paper, focusing on the performance of each financial market during and after several crises. Given that climate issues have been recognized as an important source of risk by financial markets, we also i… ▽ More Resilience serves to assess the ability of financial markets to resist external shocks. The intensity and duration, used to indicate resilience, are calculated for China's financial markets in this paper, focusing on the performance of each financial market during and after several crises. Given that climate issues have been recognized as an important source of risk by financial markets, we also investigate the spillover effects and mechanism of China's climate policy uncertainty on its financial markets resilience. We have found that the two resilience indicators of each market have a relatively consistent trend, but spillovers among markets have different sensitivities to the both. In addition, China's climate policy uncertainty shocks its financial markets resilience by increasing the investor sentiment index and the non-performing loan ratio of commercial banks and by reducing the capital and financial account balance. It is further found that China's financial markets' consensus on the unswerving implementation of climate policy, which provides the reference for other countries on how to balance climate policies introduction and financial markets development. △ Less

Submitted 26 September, 2024; originally announced September 2024.

arXiv:2409.17692 [pdf, other]

MIO: A Foundation Model on Multimodal Tokens

Authors: Zekun Wang, King Zhu, Chunpu Xu, Wangchunshu Zhou, Jiaheng Liu, Yibo Zhang, Jiashuo Wang, Ning Shi, Siyu Li, Yizhi Li, Haoran Que, Zhaoxiang Zhang, Yuanxing Zhang, Ge Zhang, Ke Xu, Jie Fu, Wenhao Huang

Abstract: In this paper, we introduce MIO, a novel foundation model built on multimodal tokens, capable of understanding and generating speech, text, images, and videos in an end-to-end, autoregressive manner. While the emergence of large language models (LLMs) and multimodal large language models (MM-LLMs) propels advancements in artificial general intelligence through their versatile capabilities, they st… ▽ More In this paper, we introduce MIO, a novel foundation model built on multimodal tokens, capable of understanding and generating speech, text, images, and videos in an end-to-end, autoregressive manner. While the emergence of large language models (LLMs) and multimodal large language models (MM-LLMs) propels advancements in artificial general intelligence through their versatile capabilities, they still lack true any-to-any understanding and generation. Recently, the release of GPT-4o has showcased the remarkable potential of any-to-any LLMs for complex real-world tasks, enabling omnidirectional input and output across images, speech, and text. However, it is closed-source and does not support the generation of multimodal interleaved sequences. To address this gap, we present MIO, which is trained on a mixture of discrete tokens across four modalities using causal multimodal modeling. MIO undergoes a four-stage training process: (1) alignment pre-training, (2) interleaved pre-training, (3) speech-enhanced pre-training, and (4) comprehensive supervised fine-tuning on diverse textual, visual, and speech tasks. Our experimental results indicate that MIO exhibits competitive, and in some cases superior, performance compared to previous dual-modal baselines, any-to-any model baselines, and even modality-specific baselines. Moreover, MIO demonstrates advanced capabilities inherent to its any-to-any feature, such as interleaved video-text generation, chain-of-visual-thought reasoning, visual guideline generation, instructional image editing, etc. △ Less

Submitted 26 September, 2024; originally announced September 2024.

Comments: Technical Report. Codes and models will be available soon

arXiv:2409.16191 [pdf, other]

HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

Authors: Haoran Que, Feiyu Duan, Liqun He, Yutao Mou, Wangchunshu Zhou, Jiaheng Liu, Wenge Rong, Zekun Moore Wang, Jian Yang, Ge Zhang, Junran Peng, Zhaoxiang Zhang, Songyang Zhang, Kai Chen

Abstract: In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks (e.g., long-context understanding), and many benchmarks have been proposed. However, we observe that long text generation capabilities are not well investigated. Therefore, we introduce the Hierarchical Long Text Generation Benchmark (HelloBench), a comprehensive, in-the-wild, and open-ended be… ▽ More In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks (e.g., long-context understanding), and many benchmarks have been proposed. However, we observe that long text generation capabilities are not well investigated. Therefore, we introduce the Hierarchical Long Text Generation Benchmark (HelloBench), a comprehensive, in-the-wild, and open-ended benchmark to evaluate LLMs' performance in generating long text. Based on Bloom's Taxonomy, HelloBench categorizes long text generation tasks into five subtasks: open-ended QA, summarization, chat, text completion, and heuristic text generation. Besides, we propose Hierarchical Long Text Evaluation (HelloEval), a human-aligned evaluation method that significantly reduces the time and effort required for human evaluation while maintaining a high correlation with human evaluation. We have conducted extensive experiments across around 30 mainstream LLMs and observed that the current LLMs lack long text generation capabilities. Specifically, first, regardless of whether the instructions include explicit or implicit length constraints, we observe that most LLMs cannot generate text that is longer than 4000 words. Second, we observe that while some LLMs can generate longer text, many issues exist (e.g., severe repetition and quality degradation). Third, to demonstrate the effectiveness of HelloEval, we compare HelloEval with traditional metrics (e.g., ROUGE, BLEU, etc.) and LLM-as-a-Judge methods, which show that HelloEval has the highest correlation with human evaluation. We release our code in https://github.com/Quehry/HelloBench. △ Less

Submitted 24 September, 2024; originally announced September 2024.

arXiv:2409.15974 [pdf, other]

Disentangling Age and Identity with a Mutual Information Minimization Approach for Cross-Age Speaker Verification

Authors: Fengrun Zhang, Wangjin Zhou, Yiming Liu, Wang Geng, Yahui Shan, Chen Zhang

Abstract: There has been an increasing research interest in cross-age speaker verification~(CASV). However, existing speaker verification systems perform poorly in CASV due to the great individual differences in voice caused by aging. In this paper, we propose a disentangled representation learning framework for CASV based on mutual information~(MI) minimization. In our method, a backbone model is trained t… ▽ More There has been an increasing research interest in cross-age speaker verification~(CASV). However, existing speaker verification systems perform poorly in CASV due to the great individual differences in voice caused by aging. In this paper, we propose a disentangled representation learning framework for CASV based on mutual information~(MI) minimization. In our method, a backbone model is trained to disentangle the identity- and age-related embeddings from speaker information, and an MI estimator is trained to minimize the correlation between age- and identity-related embeddings via MI minimization, resulting in age-invariant speaker embeddings. Furthermore, by using the age gaps between positive and negative samples, we propose an aging-aware MI minimization loss function that allows the backbone model to focus more on the vocal changes with large age gaps. Experimental results show that the proposed method outperforms other methods on multiple Cross-Age test sets of Vox-CA. △ Less

Submitted 24 September, 2024; originally announced September 2024.

Comments: Interspeech 2024

arXiv:2409.14163 [pdf, other]

PromptTA: Prompt-driven Text Adapter for Source-free Domain Generalization

Authors: Haoran Zhang, Shuanghao Bai, Wanqi Zhou, Jingwen Fu, Badong Chen

Abstract: Source-free domain generalization (SFDG) tackles the challenge of adapting models to unseen target domains without access to source domain data. To deal with this challenging task, recent advances in SFDG have primarily focused on leveraging the text modality of vision-language models such as CLIP. These methods involve developing a transferable linear classifier based on diverse style features ex… ▽ More Source-free domain generalization (SFDG) tackles the challenge of adapting models to unseen target domains without access to source domain data. To deal with this challenging task, recent advances in SFDG have primarily focused on leveraging the text modality of vision-language models such as CLIP. These methods involve developing a transferable linear classifier based on diverse style features extracted from the text and learned prompts or deriving domain-unified text representations from domain banks. However, both style features and domain banks have limitations in capturing comprehensive domain knowledge. In this work, we propose Prompt-Driven Text Adapter (PromptTA) method, which is designed to better capture the distribution of style features and employ resampling to ensure thorough coverage of domain knowledge. To further leverage this rich domain information, we introduce a text adapter that learns from these style features for efficient domain information storage. Extensive experiments conducted on four benchmark datasets demonstrate that PromptTA achieves state-of-the-art performance. The code is available at https://github.com/zhanghr2001/PromptTA. △ Less

Submitted 21 September, 2024; originally announced September 2024.

arXiv:2409.13265 [pdf, other]

Towards LifeSpan Cognitive Systems

Authors: Yu Wang, Chi Han, Tongtong Wu, Xiaoxin He, Wangchunshu Zhou, Nafis Sadeq, Xiusi Chen, Zexue He, Wei Wang, Gholamreza Haffari, Heng Ji, Julian McAuley

Abstract: Building a human-like system that continuously interacts with complex environments -- whether simulated digital worlds or human society -- presents several key challenges. Central to this is enabling continuous, high-frequency interactions, where the interactions are termed experiences. We refer to this envisioned system as the LifeSpan Cognitive System (LSCS). A critical feature of LSCS is its ab… ▽ More Building a human-like system that continuously interacts with complex environments -- whether simulated digital worlds or human society -- presents several key challenges. Central to this is enabling continuous, high-frequency interactions, where the interactions are termed experiences. We refer to this envisioned system as the LifeSpan Cognitive System (LSCS). A critical feature of LSCS is its ability to engage in incremental and rapid updates while retaining and accurately recalling past experiences. We identify two major challenges in achieving this: (1) Abstraction and Experience Merging, and (2) Long-term Retention with Accurate Recall. These properties are essential for storing new experiences, organizing past experiences, and responding to the environment in ways that leverage relevant historical data. Unlike language models with continual learning, which typically rely on large corpora for fine-tuning and focus on improving performance within specific domains or tasks, LSCS must rapidly and incrementally update with new information from its environment at a high frequency. Existing technologies with the potential of solving the above two major challenges can be classified into four classes based on a conceptual metric called Storage Complexity, which measures the relative space required to store past experiences. Each of these four classes of technologies has its own strengths and limitations. Given that none of the existing technologies can achieve LSCS alone, we propose a novel paradigm for LSCS that integrates all four classes of technologies. The new paradigm operates through two core processes: Absorbing Experiences and Generating Responses. △ Less

Submitted 20 September, 2024; originally announced September 2024.

arXiv:2409.12993 [pdf, other]

CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code Repair

Authors: Mingjie Liu, Yun-Da Tsai, Wenfei Zhou, Haoxing Ren

Abstract: Despite the significant progress made in code generation with large language models, challenges persist, especially with hardware description languages such as Verilog. This paper first presents an analysis of fine-tuned LLMs on Verilog coding, with synthetic data from prior methods. We identify two main issues: difficulties in handling non-textual representations (Karnaugh maps, state-transition… ▽ More Despite the significant progress made in code generation with large language models, challenges persist, especially with hardware description languages such as Verilog. This paper first presents an analysis of fine-tuned LLMs on Verilog coding, with synthetic data from prior methods. We identify two main issues: difficulties in handling non-textual representations (Karnaugh maps, state-transition diagrams and waveforms) and significant variability during training with models randomly making "minor" mistakes. To address these limitations, we enhance data curation by creating correct-by-construction data targeting non-textual representations. Additionally, we introduce an automated framework that generates error reports from various model checkpoints and injects these errors into open-source code to create targeted code repair data. Our fine-tuned Starcoder2-15B outperforms prior state-of-the-art results by 3.8%, 10.9%, 6.6% for pass@1 on VerilogEval-Machine, VerilogEval-Human, and RTLLM. △ Less

Submitted 19 September, 2024; originally announced September 2024.

arXiv:2409.12536 [pdf, ps, other]

Necessary and sufficient condition for CLT of linear spectral statistics of sample correlation matrices

Authors: Yanpeng Li, Guangming Pan, Jiahui Xie, Wang Zhou

Abstract: In this paper, we establish the central limit theorem (CLT) for the linear spectral statistics (LSS) of sample correlation matrix $R$, constructed from a $p\times n$ data matrix $X$ with independent and identically distributed (i.i.d.) entries having mean zero, variance one, and infinite fourth moments in the high-dimensional regime $n/p\rightarrow φ\in \mathbb{R}_+\backslash \{1\}$. We derive a n… ▽ More In this paper, we establish the central limit theorem (CLT) for the linear spectral statistics (LSS) of sample correlation matrix $R$, constructed from a $p\times n$ data matrix $X$ with independent and identically distributed (i.i.d.) entries having mean zero, variance one, and infinite fourth moments in the high-dimensional regime $n/p\rightarrow φ\in \mathbb{R}_+\backslash \{1\}$. We derive a necessary and sufficient condition for the CLT. More precisely, under the assumption that the identical distribution $ξ$ of the entries in $X$ satisfies $\mathbb{P}(|ξ|>x)\sim l(x)x^{-α}$ when $x\rightarrow \infty$ for $α\in (2,4]$, where $l(x)$ is a slowly varying function, we conclude that: (i). When $α\in(3,4]$, the universal asymptotic normality for the LSS of sample correlation matrix holds, with the same asymptotic mean and variance as in the finite fourth moment scenario; (ii) We identify a necessary and sufficient condition $\lim_{x\rightarrow\infty}x^3\mathbb{P}(|ξ|>x)=0$ for the universal CLT; (iii) We establish a local law for $α\in (2, 4]$. Overall, our proof strategy follows the routine of the matrix resampling, intermediate local law, Green function comparison, and characteristic function estimation. In various parts of the proof, we are required to come up with new approaches and ideas to solve the challenges posed by the special structure of sample correlation matrix. Our results also demonstrate that the symmetry condition is unnecessary for the CLT of LSS for sample correlation matrix, but the tail index $α$ plays a crucial role in determining the asymptotic behaviors of LSS for $α\in (2, 3)$. △ Less

Submitted 19 September, 2024; originally announced September 2024.

Comments: 112 pages

MSC Class: 60B20; 60F05; 62E20; 62H20; 15B52

arXiv:2409.11701 [pdf, other]

Bias Reduction in Matched Observational Studies with Continuous Treatments: Calipered Non-Bipartite Matching and Bias-Corrected Estimation and Inference

Authors: Anthony Frazier, Siyu Heng, Wen Zhou

Abstract: Matching is a commonly used causal inference framework in observational studies. By pairing individuals with different treatment values but with the same values of covariates (i.e., exact matching), the sample average treatment effect (SATE) can be consistently estimated and inferred using the classic Neyman-type (difference-in-means) estimator and confidence interval. However, inexact matching ty… ▽ More Matching is a commonly used causal inference framework in observational studies. By pairing individuals with different treatment values but with the same values of covariates (i.e., exact matching), the sample average treatment effect (SATE) can be consistently estimated and inferred using the classic Neyman-type (difference-in-means) estimator and confidence interval. However, inexact matching typically exists in practice and may cause substantial bias for the downstream treatment effect estimation and inference. Many methods have been proposed to reduce bias due to inexact matching in the binary treatment case. However, to our knowledge, no existing work has systematically investigated bias due to inexact matching in the continuous treatment case. To fill this blank, we propose a general framework for reducing bias in inexactly matched observational studies with continuous treatments. In the matching stage, we propose a carefully formulated caliper that incorporates the information of both the paired covariates and treatment doses to better tailor matching for the downstream SATE estimation and inference. In the estimation and inference stage, we propose a bias-corrected Neyman estimator paired with the corresponding bias-corrected variance estimator to leverage the information on propensity density discrepancies after inexact matching to further reduce the bias due to inexact matching. We apply our proposed framework to COVID-19 social mobility data to showcase differences between classic and bias-corrected SATE estimation and inference. △ Less

Submitted 18 September, 2024; originally announced September 2024.

arXiv:2409.11279 [pdf, other]

P-RAG: Progressive Retrieval Augmented Generation For Planning on Embodied Everyday Task

Authors: Weiye Xu, Min Wang, Wengang Zhou, Houqiang Li

Abstract: Embodied Everyday Task is a popular task in the embodied AI community, requiring agents to make a sequence of actions based on natural language instructions and visual observations. Traditional learning-based approaches face two challenges. Firstly, natural language instructions often lack explicit task planning. Secondly, extensive training is required to equip models with knowledge of the task e… ▽ More Embodied Everyday Task is a popular task in the embodied AI community, requiring agents to make a sequence of actions based on natural language instructions and visual observations. Traditional learning-based approaches face two challenges. Firstly, natural language instructions often lack explicit task planning. Secondly, extensive training is required to equip models with knowledge of the task environment. Previous works based on Large Language Model (LLM) either suffer from poor performance due to the lack of task-specific knowledge or rely on ground truth as few-shot samples. To address the above limitations, we propose a novel approach called Progressive Retrieval Augmented Generation (P-RAG), which not only effectively leverages the powerful language processing capabilities of LLMs but also progressively accumulates task-specific knowledge without ground-truth. Compared to the conventional RAG methods, which retrieve relevant information from the database in a one-shot manner to assist generation, P-RAG introduces an iterative approach to progressively update the database. In each iteration, P-RAG retrieves the latest database and obtains historical information from the previous interaction as experiential references for the current interaction. Moreover, we also introduce a more granular retrieval scheme that not only retrieves similar tasks but also incorporates retrieval of similar situations to provide more valuable reference experiences. Extensive experiments reveal that P-RAG achieves competitive results without utilizing ground truth and can even further improve performance through self-iterations. △ Less

Submitted 17 September, 2024; originally announced September 2024.

arXiv:2409.10011 [pdf, other]

HALO: Hallucination Analysis and Learning Optimization to Empower LLMs with Retrieval-Augmented Context for Guided Clinical Decision Making

Authors: Sumera Anjum, Hanzhi Zhang, Wenjun Zhou, Eun Jin Paek, Xiaopeng Zhao, Yunhe Feng

Abstract: Large language models (LLMs) have significantly advanced natural language processing tasks, yet they are susceptible to generating inaccurate or unreliable responses, a phenomenon known as hallucination. In critical domains such as health and medicine, these hallucinations can pose serious risks. This paper introduces HALO, a novel framework designed to enhance the accuracy and reliability of medi… ▽ More Large language models (LLMs) have significantly advanced natural language processing tasks, yet they are susceptible to generating inaccurate or unreliable responses, a phenomenon known as hallucination. In critical domains such as health and medicine, these hallucinations can pose serious risks. This paper introduces HALO, a novel framework designed to enhance the accuracy and reliability of medical question-answering (QA) systems by focusing on the detection and mitigation of hallucinations. Our approach generates multiple variations of a given query using LLMs and retrieves relevant information from external open knowledge bases to enrich the context. We utilize maximum marginal relevance scoring to prioritize the retrieved context, which is then provided to LLMs for answer generation, thereby reducing the risk of hallucinations. The integration of LangChain further streamlines this process, resulting in a notable and robust increase in the accuracy of both open-source and commercial LLMs, such as Llama-3.1 (from 44% to 65%) and ChatGPT (from 56% to 70%). This framework underscores the critical importance of addressing hallucinations in medical QA systems, ultimately improving clinical decision-making and patient care. The open-source HALO is available at: https://github.com/ResponsibleAILab/HALO. △ Less

Submitted 18 September, 2024; v1 submitted 16 September, 2024; originally announced September 2024.

Comments: 10 pages, 4 figures

arXiv:2409.08582 [pdf, other]

ChangeChat: An Interactive Model for Remote Sensing Change Analysis via Multimodal Instruction Tuning

Authors: Pei Deng, Wenqian Zhou, Hanlin Wu

Abstract: Remote sensing (RS) change analysis is vital for monitoring Earth's dynamic processes by detecting alterations in images over time. Traditional change detection excels at identifying pixel-level changes but lacks the ability to contextualize these alterations. While recent advancements in change captioning offer natural language descriptions of changes, they do not support interactive, user-specif… ▽ More Remote sensing (RS) change analysis is vital for monitoring Earth's dynamic processes by detecting alterations in images over time. Traditional change detection excels at identifying pixel-level changes but lacks the ability to contextualize these alterations. While recent advancements in change captioning offer natural language descriptions of changes, they do not support interactive, user-specific queries. To address these limitations, we introduce ChangeChat, the first bitemporal vision-language model (VLM) designed specifically for RS change analysis. ChangeChat utilizes multimodal instruction tuning, allowing it to handle complex queries such as change captioning, category-specific quantification, and change localization. To enhance the model's performance, we developed the ChangeChat-87k dataset, which was generated using a combination of rule-based methods and GPT-assisted techniques. Experiments show that ChangeChat offers a comprehensive, interactive solution for RS change analysis, achieving performance comparable to or even better than state-of-the-art (SOTA) methods on specific tasks, and significantly surpassing the latest general-domain model, GPT-4. Code and pre-trained weights are available at https://github.com/hanlinwu/ChangeChat. △ Less

Submitted 13 September, 2024; originally announced September 2024.

Comments: 5 pages, 2 figures

arXiv:2409.08575 [pdf, ps, other]

A Simple approach for precision calculation of Bethe logarithm

Authors: San-Jiang Yang, Jing Chi, Wan-Ping Zhou, Li-Yan Tang, Zhen-Xiang Zhong, Ting-Yun Shi, Hao-Xue Qiao

Abstract: In this article we propose a simple approach for the precision calculation of Bethe logarithm. The leading contributions are obtained using specific operators, while the remaining terms are eliminated by adjusting the parameter $λ$. Through the use of dimensional regularization, singular divergences are algebraically canceled. Compared to the standard form of Bethe logarithm, our approach signific… ▽ More In this article we propose a simple approach for the precision calculation of Bethe logarithm. The leading contributions are obtained using specific operators, while the remaining terms are eliminated by adjusting the parameter $λ$. Through the use of dimensional regularization, singular divergences are algebraically canceled. Compared to the standard form of Bethe logarithm, our approach significantly reduces the complexity of constructing pseudostates in numerical evaluations. Using this approach we obtain a very highly precise result of Bethe logarithm for the ground state of the hydrogen, achieving 49 significant digits. And for multi-electron systems this approach appears simplicity and efficiency as well. △ Less

Submitted 13 September, 2024; originally announced September 2024.

Comments: 8 pages, 5 tables

arXiv:2409.08039 [pdf, other]

Zero-Shot Sing Voice Conversion: built upon clustering-based phoneme representations

Authors: Wangjin Zhou, Fengrun Zhang, Yiming Liu, Wenhao Guan, Yi Zhao, Tatsuya Kawahara

Abstract: This study presents an innovative Zero-Shot any-to-any Singing Voice Conversion (SVC) method, leveraging a novel clustering-based phoneme representation to effectively separate content, timbre, and singing style. This approach enables precise voice characteristic manipulation. We discovered that datasets with fewer recordings per artist are more susceptible to timbre leakage. Extensive testing on… ▽ More This study presents an innovative Zero-Shot any-to-any Singing Voice Conversion (SVC) method, leveraging a novel clustering-based phoneme representation to effectively separate content, timbre, and singing style. This approach enables precise voice characteristic manipulation. We discovered that datasets with fewer recordings per artist are more susceptible to timbre leakage. Extensive testing on over 10,000 hours of singing and user feedback revealed our model significantly improves sound quality and timbre accuracy, aligning with our objectives and advancing voice conversion technology. Furthermore, this research advances zero-shot SVC and sets the stage for future work on discrete speech representation, emphasizing the preservation of rhyme. △ Less

Submitted 14 October, 2024; v1 submitted 12 September, 2024; originally announced September 2024.

arXiv:2409.06195 [pdf, ps, other]

The non-relativistic expansion of Dirac-Coulomb Hamiltonian up to $α^8$ order

Authors: Wanping Zhou, Sanjiang Yang, Haoxue Qiao

Abstract: This paper calculates the relativistic corrections for the Dirac-Coulomb system through the method of non-relativistic expansion. By expanding the large and small components of the Dirac wave function and the energy eigenvalues in terms of $α^2$ (where $α$ is the fine-structure constant), we obtain iterative equations for calculating the higher-order relativistic corrections of non-relativistic sy… ▽ More This paper calculates the relativistic corrections for the Dirac-Coulomb system through the method of non-relativistic expansion. By expanding the large and small components of the Dirac wave function and the energy eigenvalues in terms of $α^2$ (where $α$ is the fine-structure constant), we obtain iterative equations for calculating the higher-order relativistic corrections of non-relativistic systems. For a single-electron system, the operator results of the iterative equations are consistent with those in the literature Ref[Zhou et al 2023 J. Phys. B At. Mol. Opt. Phys. {\bf 56} 045001]. Using these iterative equations, we numerically calculate the relativistic corrections up to the order of $α^{20}$ for the hydrogen atom using Slater basis, which converge rapidly to the analytical results of the hydrogen atom. For the two-electron Dirac-Coulomb system, we also present iterative equations for calculating high-order energy corrections, as well as numerical energy corrections of ground state up to the order of $α^8$. △ Less

Submitted 9 September, 2024; originally announced September 2024.

arXiv:2409.04430 [pdf, other]

Highly efficient path-integral molecular dynamics simulations with GPUMD using neuroevolution potentials: Case studies on thermal properties of materials

Authors: Penghua Ying, Wenjiang Zhou, Lucas Svensson, Esmée Berger, Erik Fransson, Fredrik Eriksson, Ke Xu, Ting Liang, Jianbin Xu, Bai Song, Shunda Chen, Paul Erhart, Zheyong Fan

Abstract: Path-integral molecular dynamics (PIMD) simulations are crucial for accurately capturing nuclear quantum effects in materials. However, their computational intensity and reliance on multiple software packages often limit their applicability at large scales. Here, we present an integration of PIMD methods, including thermostatted ring-polymer molecular dynamics (TRPMD), into the open-source GPUMD p… ▽ More Path-integral molecular dynamics (PIMD) simulations are crucial for accurately capturing nuclear quantum effects in materials. However, their computational intensity and reliance on multiple software packages often limit their applicability at large scales. Here, we present an integration of PIMD methods, including thermostatted ring-polymer molecular dynamics (TRPMD), into the open-source GPUMD package, combined with highly accurate and efficient machine-learned neuroevolution potential (NEP) models. This approach achieves almost the accuracy of first-principles calculations with the computational efficiency of empirical potentials, enabling large-scale atomistic simulations that incorporate nuclear quantum effects. We demonstrate the efficacy of the combined NEP-PIMD approach by examining various thermal properties of diverse materials, including lithium hydride (LiH), three porous metal-organic frameworks (MOFs), liquid water, and elemental aluminum. For LiH, our NEP-PIMD simulations successfully capture the isotope effect, reproducing the experimentally observed dependence of the lattice parameter on the reduced mass. For MOFs, our results reveal that achieving good agreement with experimental data requires consideration of both nuclear quantum effects and dispersive interactions. For water, our PIMD simulations capture the significant impact of nuclear quantum effects on its microscopic structure. For aluminum, the TRPMD method effectively captures thermal expansion and phonon properties, aligning well with quantum mechanical predictions. This efficient NEP-PIMD approach opens new avenues for exploring complex material properties influenced by nuclear quantum effects, with potential applications across a broad range of materials. △ Less

Submitted 28 September, 2024; v1 submitted 6 September, 2024; originally announced September 2024.

Comments: 16 pages, 9 figures in the main text; 1 table and 8 figures in the SI

Showing 1–50 of 1,657 results for author: Zhou, W