-
Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Composite Spatial Reasoning
Authors:
Yihong Tang,
Ao Qu,
Zhaokai Wang,
Dingyi Zhuang,
Zhaofeng Wu,
Wei Ma,
Shenhao Wang,
Yunhan Zheng,
Zhan Zhao,
Jinhua Zhao
Abstract:
Vision language models (VLMs) have demonstrated impressive performance across a wide range of downstream tasks. However, their proficiency in spatial reasoning remains limited, despite its crucial role in tasks involving navigation and interaction with physical environments. Specifically, much of the spatial reasoning in these tasks occurs in two-dimensional (2D) environments, and our evaluation r…
▽ More
Vision language models (VLMs) have demonstrated impressive performance across a wide range of downstream tasks. However, their proficiency in spatial reasoning remains limited, despite its crucial role in tasks involving navigation and interaction with physical environments. Specifically, much of the spatial reasoning in these tasks occurs in two-dimensional (2D) environments, and our evaluation reveals that state-of-the-art VLMs frequently generate implausible and incorrect responses to composite spatial reasoning problems, including simple pathfinding tasks that humans can solve effortlessly at a glance. To address this, we explore an effective approach to enhance 2D spatial reasoning within VLMs by training the model on basic spatial capabilities. We begin by disentangling the key components of 2D spatial reasoning: direction comprehension, distance estimation, and localization. Our central hypothesis is that mastering these basic spatial capabilities can significantly enhance a model's performance on composite spatial tasks requiring advanced spatial understanding and combinatorial problem-solving. To investigate this hypothesis, we introduce Sparkle, a framework that fine-tunes VLMs on these three basic spatial capabilities by synthetic data generation and targeted supervision to form an instruction dataset for each capability. Our experiments demonstrate that VLMs fine-tuned with Sparkle achieve significant performance gains, not only in the basic tasks themselves but also in generalizing to composite and out-of-distribution spatial reasoning tasks (e.g., improving from 13.5% to 40.0% on the shortest path problem). These findings underscore the effectiveness of mastering basic spatial capabilities in enhancing composite spatial problem-solving, offering insights for improving VLMs' spatial reasoning capabilities.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
A New Approach to Solving SMAC Task: Generating Decision Tree Code from Large Language Models
Authors:
Yue Deng,
Weiyu Ma,
Yuxin Fan,
Yin Zhang,
Haifeng Zhang,
Jian Zhao
Abstract:
StarCraft Multi-Agent Challenge (SMAC) is one of the most commonly used experimental environments in multi-agent reinforcement learning (MARL), where the specific task is to control a set number of allied units to defeat enemy forces. Traditional MARL algorithms often require interacting with the environment for up to 1 million steps to train a model, and the resulting policies are typically non-i…
▽ More
StarCraft Multi-Agent Challenge (SMAC) is one of the most commonly used experimental environments in multi-agent reinforcement learning (MARL), where the specific task is to control a set number of allied units to defeat enemy forces. Traditional MARL algorithms often require interacting with the environment for up to 1 million steps to train a model, and the resulting policies are typically non-interpretable with weak transferability. In this paper, we propose a novel approach to solving SMAC tasks called LLM-SMAC. In our framework, agents leverage large language models (LLMs) to generate decision tree code by providing task descriptions. The model is further self-reflection using feedback from the rewards provided by the environment. We conduct experiments in the SMAC and demonstrate that our method can produce high-quality, interpretable decision trees with minimal environmental exploration. Moreover, these models exhibit strong transferability, successfully applying to similar SMAC environments without modification. We believe this approach offers a new direction for solving decision-making tasks in the future.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
Inverse scattering transform for the defocusing-defocusing coupled Hirota equations with non-parallel boundary conditions at infinity
Authors:
Peng-Fei Han,
Wen-Xiu Ma,
Yi Zhang
Abstract:
The inverse scattering transform for the defocusing-defocusing coupled Hirota equations is strictly discussed with non-zero boundary conditions at infinity including non-parallel boundary conditions, specifically referring to the asymptotic polarization vectors. To address the non-analyticity encountered in some of the Jost eigenfunctions, the "adjoint" Lax pair is employed. The inverse problem is…
▽ More
The inverse scattering transform for the defocusing-defocusing coupled Hirota equations is strictly discussed with non-zero boundary conditions at infinity including non-parallel boundary conditions, specifically referring to the asymptotic polarization vectors. To address the non-analyticity encountered in some of the Jost eigenfunctions, the "adjoint" Lax pair is employed. The inverse problem is formulated as an appropriate matrix Riemann-Hilbert problem. A key difference between non-parallel and parallel boundary conditions lies in the asymptotic behavior of the scattering coefficients, which significantly impacts the normalization of the eigenfunctions and the properties of sectionally meromorphic matrices within the Riemann-Hilbert problem framework. When the asymptotic polarization vectors are non-orthogonal, two distinct methodologies are introduced to convert the Riemann-Hilbert problem into a series of linear algebraic-integral equations. In contrast, when the asymptotic polarization vectors are orthogonal, only one method is feasible. Ultimately, it is demonstrated that pure soliton solutions do not exist in both orthogonal and non orthogonal polarization vector cases. This study provides a comprehensive framework for analyzing the defocusing-defocusing coupled Hirota equations using the inverse scattering transform, offering new insights into the characteristics and solutions of the equations.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
Long Term Memory: The Foundation of AI Self-Evolution
Authors:
Xun Jiang,
Feng Li,
Han Zhao,
Jiaying Wang,
Jun Shao,
Shihao Xu,
Shu Zhang,
Weiling Chen,
Xavier Tang,
Yize Chen,
Mengyue Wu,
Weizhi Ma,
Mengdi Wang,
Tianqiao Chen
Abstract:
Large language models (LLMs) like GPTs, trained on vast datasets, have demonstrated impressive capabilities in language understanding, reasoning, and planning, achieving human-level performance in various tasks. Most studies focus on enhancing these models by training on ever-larger datasets to build more powerful foundation models. While training stronger models is important, enabling models to e…
▽ More
Large language models (LLMs) like GPTs, trained on vast datasets, have demonstrated impressive capabilities in language understanding, reasoning, and planning, achieving human-level performance in various tasks. Most studies focus on enhancing these models by training on ever-larger datasets to build more powerful foundation models. While training stronger models is important, enabling models to evolve during inference is equally crucial, a process we refer to as AI self-evolution. Unlike large-scale training, self-evolution may rely on limited data or interactions. Inspired by the columnar organization of the human cerebral cortex, we hypothesize that AI models could develop cognitive abilities and build internal representations through iterative interactions with their environment. To achieve this, models need long-term memory (LTM) to store and manage processed interaction data. LTM supports self-evolution by representing diverse experiences across environments and agents. In this report, we explore AI self-evolution and its potential to enhance models during inference. We examine LTM's role in lifelong learning, allowing models to evolve based on accumulated interactions. We outline the structure of LTM and the systems needed for effective data retention and representation. We also classify approaches for building personalized models with LTM data and show how these models achieve self-evolution through interaction. Using LTM, our multi-agent framework OMNE achieved first place on the GAIA benchmark, demonstrating LTM's potential for AI self-evolution. Finally, we present a roadmap for future research, emphasizing the importance of LTM for advancing AI technology and its practical applications.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
Advancing Large Language Model Attribution through Self-Improving
Authors:
Lei Huang,
Xiaocheng Feng,
Weitao Ma,
Liang Zhao,
Yuchun Fan,
Weihong Zhong,
Dongliang Xu,
Qing Yang,
Hongtao Liu,
Bing Qin
Abstract:
Teaching large language models (LLMs) to generate text with citations to evidence sources can mitigate hallucinations and enhance verifiability in information-seeking systems. However, improving this capability requires high-quality attribution data, which is costly and labor-intensive. Inspired by recent advances in self-improvement that enhance LLMs without manual annotation, we present START, a…
▽ More
Teaching large language models (LLMs) to generate text with citations to evidence sources can mitigate hallucinations and enhance verifiability in information-seeking systems. However, improving this capability requires high-quality attribution data, which is costly and labor-intensive. Inspired by recent advances in self-improvement that enhance LLMs without manual annotation, we present START, a Self-Taught AttRibuTion framework for iteratively improving the attribution capability of LLMs. First, to prevent models from stagnating due to initially insufficient supervision signals, START leverages the model to self-construct synthetic training data for warming up. To further self-improve the model's attribution ability, START iteratively utilizes fine-grained preference supervision signals constructed from its sampled responses to encourage robust, comprehensive, and attributable generation. Experiments on three open-domain question-answering datasets, covering long-form QA and multi-step reasoning, demonstrate significant performance gains of 25.13% on average without relying on human annotations and more advanced models. Further analysis reveals that START excels in aggregating information across multiple sources.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
A Simplifying and Learnable Graph Convolutional Attention Network for Unsupervised Knowledge Graphs Alignment
Authors:
Weishan Cai,
Wenjun Ma,
Yuncheng Jiang
Abstract:
The success of current Entity Alignment (EA) task depends largely on the supervision information provided by labeled data. Considering the cost of labeled data, most supervised methods are difficult to apply in practical scenarios. Therefore, more and more works based on contrastive learning, active learning or other deep learning techniques have been developed, to solve the performance bottleneck…
▽ More
The success of current Entity Alignment (EA) task depends largely on the supervision information provided by labeled data. Considering the cost of labeled data, most supervised methods are difficult to apply in practical scenarios. Therefore, more and more works based on contrastive learning, active learning or other deep learning techniques have been developed, to solve the performance bottleneck caused by the lack of labeled data. However, the existing unsupervised EA methods still have some limitations, either their modeling complexity is high or they cannot balance the effectiveness and practicality of alignment. To overcome these issues, we propose a Simplifying and Learnable graph convolutional attention network for Unsupervised Knowledge Graphs alignment method (SLU). Specifically, we first introduce LCAT, a new and simple framework as the backbone network to model the graph structure of two KGs. Then we design a reconstruction method of relation structure based on potential matching relations for efficiently filtering invalid neighborhood information of aligned entities, to improve the usability and scalability of SLU. Impressively, a similarity function based on consistency is proposed to better measure the similarity of candidate entity pairs. Finally, we conduct extensive experiments on three datasets of different sizes (15K and 100K) and different types (cross-lingual and monolingual) to verify the superiority of SLU. Experimental results show that SLU significantly improves alignment accuracy, outperforming 25 supervised or unsupervised methods, and improving 6.4% in Hits@1 over the best baseline in the best case.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
Statistical Inference in Tensor Completion: Optimal Uncertainty Quantification and Statistical-Computational Gaps
Authors:
Wanteng Ma,
Dong Xia
Abstract:
This paper presents a simple yet efficient method for statistical inference of tensor linear forms with incomplete and noisy observations. Under the Tucker low-rank tensor model, we utilize an appropriate initial estimate, along with a debiasing technique followed by a one-step power iteration, to construct an asymptotic normal test statistic. This method is suitable for various statistical infere…
▽ More
This paper presents a simple yet efficient method for statistical inference of tensor linear forms with incomplete and noisy observations. Under the Tucker low-rank tensor model, we utilize an appropriate initial estimate, along with a debiasing technique followed by a one-step power iteration, to construct an asymptotic normal test statistic. This method is suitable for various statistical inference tasks, including confidence interval prediction, inference under heteroskedastic and sub-exponential noises, and simultaneous testing. Furthermore, the approach reaches the Cramér-Rao lower bound for statistical estimation on Riemannian manifolds, indicating its optimality for uncertainty quantification. We comprehensively discusses the statistical-computational gaps and investigates the relationship between initialization and bias-correlation approaches. The findings demonstrate that with independent initialization, statistically optimal sample sizes and signal-to-noise ratios are sufficient for accurate inferences. Conversely, when initialization depends on the observations, computationally optimal sample sizes and signal-to-noise ratios also guarantee asymptotic normality without the need for data-splitting. The phase transition of computational and statistical limits is presented. Numerical simulations results conform to the theoretical discoveries.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
UniGEM: A Unified Approach to Generation and Property Prediction for Molecules
Authors:
Shikun Feng,
Yuyan Ni,
Yan Lu,
Zhi-Ming Ma,
Wei-Ying Ma,
Yanyan Lan
Abstract:
Molecular generation and molecular property prediction are both crucial for drug discovery, but they are often developed independently. Inspired by recent studies, which demonstrate that diffusion model, a prominent generative approach, can learn meaningful data representations that enhance predictive tasks, we explore the potential for developing a unified generative model in the molecular domain…
▽ More
Molecular generation and molecular property prediction are both crucial for drug discovery, but they are often developed independently. Inspired by recent studies, which demonstrate that diffusion model, a prominent generative approach, can learn meaningful data representations that enhance predictive tasks, we explore the potential for developing a unified generative model in the molecular domain that effectively addresses both molecular generation and property prediction tasks. However, the integration of these tasks is challenging due to inherent inconsistencies, making simple multi-task learning ineffective. To address this, we propose UniGEM, the first unified model to successfully integrate molecular generation and property prediction, delivering superior performance in both tasks. Our key innovation lies in a novel two-phase generative process, where predictive tasks are activated in the later stages, after the molecular scaffold is formed. We further enhance task balance through innovative training strategies. Rigorous theoretical analysis and comprehensive experiments demonstrate our significant improvements in both tasks. The principles behind UniGEM hold promise for broader applications, including natural language processing and computer vision.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
Large Language Model-Enhanced Reinforcement Learning for Generic Bus Holding Control Strategies
Authors:
Jiajie Yu,
Yuhong Wang,
Wei Ma
Abstract:
Bus holding control is a widely-adopted strategy for maintaining stability and improving the operational efficiency of bus systems. Traditional model-based methods often face challenges with the low accuracy of bus state prediction and passenger demand estimation. In contrast, Reinforcement Learning (RL), as a data-driven approach, has demonstrated great potential in formulating bus holding strate…
▽ More
Bus holding control is a widely-adopted strategy for maintaining stability and improving the operational efficiency of bus systems. Traditional model-based methods often face challenges with the low accuracy of bus state prediction and passenger demand estimation. In contrast, Reinforcement Learning (RL), as a data-driven approach, has demonstrated great potential in formulating bus holding strategies. RL determines the optimal control strategies in order to maximize the cumulative reward, which reflects the overall control goals. However, translating sparse and delayed control goals in real-world tasks into dense and real-time rewards for RL is challenging, normally requiring extensive manual trial-and-error. In view of this, this study introduces an automatic reward generation paradigm by leveraging the in-context learning and reasoning capabilities of Large Language Models (LLMs). This new paradigm, termed the LLM-enhanced RL, comprises several LLM-based modules: reward initializer, reward modifier, performance analyzer, and reward refiner. These modules cooperate to initialize and iteratively improve the reward function according to the feedback from training and test results for the specified RL-based task. Ineffective reward functions generated by the LLM are filtered out to ensure the stable evolution of the RL agents' performance over iterations. To evaluate the feasibility of the proposed LLM-enhanced RL paradigm, it is applied to various bus holding control scenarios, including a synthetic single-line system and a real-world multi-line system. The results demonstrate the superiority and robustness of the proposed paradigm compared to vanilla RL strategies, the LLM-based controller, and conventional space headway-based feedback control. This study sheds light on the great potential of utilizing LLMs in various smart mobility applications.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
The Epochal Sawtooth Effect: Unveiling Training Loss Oscillations in Adam and Other Optimizers
Authors:
Qi Liu,
Wanjing Ma
Abstract:
In this paper, we identify and analyze a recurring training loss pattern, which we term the \textit{Epochal Sawtooth Effect (ESE)}, commonly observed during training with adaptive gradient-based optimizers, particularly Adam optimizer. This pattern is characterized by a sharp drop in loss at the beginning of each epoch, followed by a gradual increase, resulting in a sawtooth-shaped loss curve. Thr…
▽ More
In this paper, we identify and analyze a recurring training loss pattern, which we term the \textit{Epochal Sawtooth Effect (ESE)}, commonly observed during training with adaptive gradient-based optimizers, particularly Adam optimizer. This pattern is characterized by a sharp drop in loss at the beginning of each epoch, followed by a gradual increase, resulting in a sawtooth-shaped loss curve. Through empirical observations, we demonstrate that while this effect is most pronounced with Adam, it persists, although less severely, with other optimizers such as RMSProp.
We provide an in-depth explanation of the underlying mechanisms that lead to the Epochal Sawtooth Effect. The influences of factors like \(β\), batch size, data shuffling on this pattern have been studied. We quantify the influence of \(β_2\) on the shape of the loss curve, showing that higher values of \(β_2\) result in a nearly linear increase in loss, while lower values create a concave upward trend. Our analysis reveals that this behavior stems from the adaptive learning rate controlled by the second moment estimate, with \(β_1\) playing a minimal role when \(β_2\) is large.
To support our analysis, we replicate this phenomenon through a controlled quadratic minimization task. By incrementally solving a series of quadratic optimization problems using Adam, we demonstrate that the Epochal Sawtooth Effect can emerge even in simple optimization scenarios, reinforcing the generality of this pattern. This paper provides both theoretical insights and quantitative analysis, offering a comprehensive understanding of this ubiquitous phenomenon in modern optimization techniques.
△ Less
Submitted 13 October, 2024;
originally announced October 2024.
-
Boltzmann-Aligned Inverse Folding Model as a Predictor of Mutational Effects on Protein-Protein Interactions
Authors:
Xiaoran Jiao,
Weian Mao,
Wengong Jin,
Peiyuan Yang,
Hao Chen,
Chunhua Shen
Abstract:
Predicting the change in binding free energy ($ΔΔG$) is crucial for understanding and modulating protein-protein interactions, which are critical in drug design. Due to the scarcity of experimental $ΔΔG$ data, existing methods focus on pre-training, while neglecting the importance of alignment. In this work, we propose the Boltzmann Alignment technique to transfer knowledge from pre-trained invers…
▽ More
Predicting the change in binding free energy ($ΔΔG$) is crucial for understanding and modulating protein-protein interactions, which are critical in drug design. Due to the scarcity of experimental $ΔΔG$ data, existing methods focus on pre-training, while neglecting the importance of alignment. In this work, we propose the Boltzmann Alignment technique to transfer knowledge from pre-trained inverse folding models to $ΔΔG$ prediction. We begin by analyzing the thermodynamic definition of $ΔΔG$ and introducing the Boltzmann distribution to connect energy with protein conformational distribution. However, the protein conformational distribution is intractable; therefore, we employ Bayes' theorem to circumvent direct estimation and instead utilize the log-likelihood provided by protein inverse folding models for $ΔΔG$ estimation. Compared to previous inverse folding-based methods, our method explicitly accounts for the unbound state of protein complex in the $ΔΔG$ thermodynamic cycle, introducing a physical inductive bias and achieving both supervised and unsupervised state-of-the-art (SoTA) performance. Experimental results on SKEMPI v2 indicate that our method achieves Spearman coefficients of 0.3201 (unsupervised) and 0.5134 (supervised), significantly surpassing the previously reported SoTA values of 0.2632 and 0.4324, respectively. Futhermore, we demonstrate the capability of our method on binding energy prediction, protein-protein docking and antibody optimization tasks.
△ Less
Submitted 12 October, 2024;
originally announced October 2024.
-
SegGrasp: Zero-Shot Task-Oriented Grasping via Semantic and Geometric Guided Segmentation
Authors:
Haosheng Li,
Weixin Mao,
Weipeng Deng,
Chenyu Meng,
Rui Zhang,
Fan Jia,
Tiancai Wang,
Haoqiang Fan,
Hongan Wang,
Xiaoming Deng
Abstract:
Task-oriented grasping, which involves grasping specific parts of objects based on their functions, is crucial for developing advanced robotic systems capable of performing complex tasks in dynamic environments. In this paper, we propose a training-free framework that incorporates both semantic and geometric priors for zero-shot task-oriented grasp generation. The proposed framework, SegGrasp, fir…
▽ More
Task-oriented grasping, which involves grasping specific parts of objects based on their functions, is crucial for developing advanced robotic systems capable of performing complex tasks in dynamic environments. In this paper, we propose a training-free framework that incorporates both semantic and geometric priors for zero-shot task-oriented grasp generation. The proposed framework, SegGrasp, first leverages the vision-language models like GLIP for coarse segmentation. It then uses detailed geometric information from convex decomposition to improve segmentation quality through a fusion policy named GeoFusion. An effective grasp pose can be generated by a grasping network with improved segmentation. We conducted the experiments on both segmentation benchmark and real-world robot grasping. The experimental results show that SegGrasp surpasses the baseline by more than 15\% in grasp and segmentation performance.
△ Less
Submitted 14 October, 2024; v1 submitted 11 October, 2024;
originally announced October 2024.
-
Generalizable autoregressive modeling of time series through functional narratives
Authors:
Ran Liu,
Wenrui Ma,
Ellen Zippi,
Hadi Pouransari,
Jingyun Xiao,
Chris Sandino,
Behrooz Mahasseni,
Juri Minxha,
Erdrin Azemi,
Eva L. Dyer,
Ali Moin
Abstract:
Time series data are inherently functions of time, yet current transformers often learn time series by modeling them as mere concatenations of time periods, overlooking their functional properties. In this work, we propose a novel objective for transformers that learn time series by re-interpreting them as temporal functions. We build an alternative sequence of time series by constructing degradat…
▽ More
Time series data are inherently functions of time, yet current transformers often learn time series by modeling them as mere concatenations of time periods, overlooking their functional properties. In this work, we propose a novel objective for transformers that learn time series by re-interpreting them as temporal functions. We build an alternative sequence of time series by constructing degradation operators of different intensity in the functional space, creating augmented variants of the original sample that are abstracted or simplified to different degrees. Based on the new set of generated sequence, we train an autoregressive transformer that progressively recovers the original sample from the most simplified variant. Analogous to the next word prediction task in languages that learns narratives by connecting different words, our autoregressive transformer aims to learn the Narratives of Time Series (NoTS) by connecting different functions in time. Theoretically, we justify the construction of the alternative sequence through its advantages in approximating functions. When learning time series data with transformers, constructing sequences of temporal functions allows for a broader class of approximable functions (e.g., differentiation) compared to sequences of time periods, leading to a 26\% performance improvement in synthetic feature regression experiments. Experimentally, we validate NoTS in 3 different tasks across 22 real-world datasets, where we show that NoTS significantly outperforms other pre-training methods by up to 6\%. Additionally, combining NoTS on top of existing transformer architectures can consistently boost the performance. Our results demonstrate the potential of NoTS as a general-purpose dynamic learner, offering a viable alternative for developing foundation models for time series analysis.
△ Less
Submitted 10 October, 2024;
originally announced October 2024.
-
StepTool: A Step-grained Reinforcement Learning Framework for Tool Learning in LLMs
Authors:
Yuanqing Yu,
Zhefan Wang,
Weizhi Ma,
Zhicheng Guo,
Jingtao Zhan,
Shuai Wang,
Chuhan Wu,
Zhiqiang Guo,
Min Zhang
Abstract:
Despite having powerful reasoning and inference capabilities, Large Language Models (LLMs) still need external tools to acquire real-time information retrieval or domain-specific expertise to solve complex tasks, which is referred to as tool learning. Existing tool learning methods primarily rely on tuning with expert trajectories, focusing on token-sequence learning from a linguistic perspective.…
▽ More
Despite having powerful reasoning and inference capabilities, Large Language Models (LLMs) still need external tools to acquire real-time information retrieval or domain-specific expertise to solve complex tasks, which is referred to as tool learning. Existing tool learning methods primarily rely on tuning with expert trajectories, focusing on token-sequence learning from a linguistic perspective. However, there are several challenges: 1) imitating static trajectories limits their ability to generalize to new tasks. 2) even expert trajectories can be suboptimal, and better solution paths may exist. In this work, we introduce StepTool, a novel step-grained reinforcement learning framework to improve tool learning in LLMs. It consists of two components: Step-grained Reward Shaping, which assigns rewards at each tool interaction based on tool invocation success and its contribution to the task, and Step-grained Optimization, which uses policy gradient methods to optimize the model in a multi-step manner. Experimental results demonstrate that StepTool significantly outperforms existing methods in multi-step, tool-based tasks, providing a robust solution for complex task environments. Codes are available at https://github.com/yuyq18/StepTool.
△ Less
Submitted 10 October, 2024;
originally announced October 2024.
-
Mitigating the Risk of Health Inequity Exacerbated by Large Language Models
Authors:
Yuelyu Ji,
Wenhe Ma,
Sonish Sivarajkumar,
Hang Zhang,
Eugene Mathew Sadhu,
Zhuochun Li,
Xizhi Wu,
Shyam Visweswaran,
Yanshan Wang
Abstract:
Recent advancements in large language models have demonstrated their potential in numerous medical applications, particularly in automating clinical trial matching for translational research and enhancing medical question answering for clinical decision support. However, our study shows that incorporating non decisive sociodemographic factors such as race, sex, income level, LGBT+ status, homeless…
▽ More
Recent advancements in large language models have demonstrated their potential in numerous medical applications, particularly in automating clinical trial matching for translational research and enhancing medical question answering for clinical decision support. However, our study shows that incorporating non decisive sociodemographic factors such as race, sex, income level, LGBT+ status, homelessness, illiteracy, disability, and unemployment into the input of LLMs can lead to incorrect and harmful outputs for these populations. These discrepancies risk exacerbating existing health disparities if LLMs are widely adopted in healthcare. To address this issue, we introduce EquityGuard, a novel framework designed to detect and mitigate the risk of health inequities in LLM based medical applications. Our evaluation demonstrates its efficacy in promoting equitable outcomes across diverse populations.
△ Less
Submitted 14 October, 2024; v1 submitted 7 October, 2024;
originally announced October 2024.
-
$\texttt{dattri}$: A Library for Efficient Data Attribution
Authors:
Junwei Deng,
Ting-Wei Li,
Shiyuan Zhang,
Shixuan Liu,
Yijun Pan,
Hao Huang,
Xinhe Wang,
Pingbang Hu,
Xingjian Zhang,
Jiaqi W. Ma
Abstract:
Data attribution methods aim to quantify the influence of individual training samples on the prediction of artificial intelligence (AI) models. As training data plays an increasingly crucial role in the modern development of large-scale AI models, data attribution has found broad applications in improving AI performance and safety. However, despite a surge of new data attribution methods being dev…
▽ More
Data attribution methods aim to quantify the influence of individual training samples on the prediction of artificial intelligence (AI) models. As training data plays an increasingly crucial role in the modern development of large-scale AI models, data attribution has found broad applications in improving AI performance and safety. However, despite a surge of new data attribution methods being developed recently, there lacks a comprehensive library that facilitates the development, benchmarking, and deployment of different data attribution methods. In this work, we introduce $\texttt{dattri}$, an open-source data attribution library that addresses the above needs. Specifically, $\texttt{dattri}$ highlights three novel design features. Firstly, $\texttt{dattri}$ proposes a unified and easy-to-use API, allowing users to integrate different data attribution methods into their PyTorch-based machine learning pipeline with a few lines of code changed. Secondly, $\texttt{dattri}$ modularizes low-level utility functions that are commonly used in data attribution methods, such as Hessian-vector product, inverse-Hessian-vector product or random projection, making it easier for researchers to develop new data attribution methods. Thirdly, $\texttt{dattri}$ provides a comprehensive benchmark framework with pre-trained models and ground truth annotations for a variety of benchmark settings, including generative AI settings. We have implemented a variety of state-of-the-art efficient data attribution methods that can be applied to large-scale neural network models, and will continuously update the library in the future. Using the developed $\texttt{dattri}$ library, we are able to perform a comprehensive and fair benchmark analysis across a wide range of data attribution methods. The source code of $\texttt{dattri}$ is available at https://github.com/TRAIS-Lab/dattri.
△ Less
Submitted 6 October, 2024;
originally announced October 2024.
-
GlobeSumm: A Challenging Benchmark Towards Unifying Multi-lingual, Cross-lingual and Multi-document News Summarization
Authors:
Yangfan Ye,
Xiachong Feng,
Xiaocheng Feng,
Weitao Ma,
Libo Qin,
Dongliang Xu,
Qing Yang,
Hongtao Liu,
Bing Qin
Abstract:
News summarization in today's global scene can be daunting with its flood of multilingual content and varied viewpoints from different sources. However, current studies often neglect such real-world scenarios as they tend to focus solely on either single-language or single-document tasks. To bridge this gap, we aim to unify Multi-lingual, Cross-lingual and Multi-document Summarization into a novel…
▽ More
News summarization in today's global scene can be daunting with its flood of multilingual content and varied viewpoints from different sources. However, current studies often neglect such real-world scenarios as they tend to focus solely on either single-language or single-document tasks. To bridge this gap, we aim to unify Multi-lingual, Cross-lingual and Multi-document Summarization into a novel task, i.e., MCMS, which encapsulates the real-world requirements all-in-one. Nevertheless, the lack of a benchmark inhibits researchers from adequately studying this invaluable problem. To tackle this, we have meticulously constructed the GLOBESUMM dataset by first collecting a wealth of multilingual news reports and restructuring them into event-centric format. Additionally, we introduce the method of protocol-guided prompting for high-quality and cost-effective reference annotation. In MCMS, we also highlight the challenge of conflicts between news reports, in addition to the issues of redundancies and omissions, further enhancing the complexity of GLOBESUMM. Through extensive experimental analysis, we validate the quality of our dataset and elucidate the inherent challenges of the task. We firmly believe that GLOBESUMM, given its challenging nature, will greatly contribute to the multilingual communities and the evaluation of LLMs.
△ Less
Submitted 5 October, 2024;
originally announced October 2024.
-
Double-Strand Break Clustering: An Economical and Effective Strategy for DNA Repair
Authors:
Junyi Chen,
Wenzong Ma,
Yuqi Ma,
Gen Yang
Abstract:
In mammalian cells, repair centers for DNA double-strand breaks (DSBs) have been identified. However, previous researches predominantly rely on methods that induce specific DSBs by cutting particular DNA sequences. The clustering and its spatiotemporal properties of non-specifically DSBs, especially those induced by environmental stresses such as irradiation, remains unclear. In this study, we use…
▽ More
In mammalian cells, repair centers for DNA double-strand breaks (DSBs) have been identified. However, previous researches predominantly rely on methods that induce specific DSBs by cutting particular DNA sequences. The clustering and its spatiotemporal properties of non-specifically DSBs, especially those induced by environmental stresses such as irradiation, remains unclear. In this study, we used Dragonfly microscopy to induce high-precision damage in cells and discovered that DSB clustering during the early stages of DNA damage response (DDR) and repair, but not during the repair plateau phase. Early in DDR, DSB clustered into existing 53BP1 foci. The DSB clustering at different stages has different implications for DNA repair. By controlling the distance between adjacent damage points, we found that the probability of DSB clustering remains constant at distances of 0.8 - 1.4 um, while clustering does not occur beyond 1.4 um. Within the 0.8 um range, the probability of clustering significantly increases due to the phase separation effect of 53BP1. Using a Monte Carlo approach, we developed a dynamic model of 53BP1 foci formation, fission, and fusion. This model accurately predicts experimental outcomes and further demonstrates the temporal and spatial influences on DSB clustering. These results showed that, similarly to specifically induced DSBs, non-specifically induced DSBs can also cluster. The extent of DSB clustering is influenced by both temporal and spatial factors, which provide new insights into the dynamics of DSB clustering and the role of 53BP1 in DNA repair processes. Such findings could enhance our understanding of DNA damage responses and help us improve DNA repair therapies in disease.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
Adaptive high-precision sound source localization at low frequencies based on convolutional neural network
Authors:
Wenbo Ma,
Yan Lu,
Yijun Liu
Abstract:
Sound source localization (SSL) technology plays a crucial role in various application areas such as fault diagnosis, speech separation, and vibration noise reduction. Although beamforming algorithms are widely used in SSL, their resolution at low frequencies is limited. In recent years, deep learning-based SSL methods have significantly improved their accuracy by employing large microphone arrays…
▽ More
Sound source localization (SSL) technology plays a crucial role in various application areas such as fault diagnosis, speech separation, and vibration noise reduction. Although beamforming algorithms are widely used in SSL, their resolution at low frequencies is limited. In recent years, deep learning-based SSL methods have significantly improved their accuracy by employing large microphone arrays and training case specific neural networks, however, this could lead to narrow applicability. To address these issues, this paper proposes a convolutional neural network-based method for high-precision SSL, which is adaptive in the lower frequency range under 1kHz with varying numbers of sound sources and microphone array-to-scanning grid distances. It takes the pressure distribution on a relatively small microphone array as input to the neural network, and employs customized training labels and loss function to train the model. Prediction accuracy, adaptability and robustness of the trained model under certain signal-to-noise ratio (SNR) are evaluated using randomly generated test datasets, and compared with classical beamforming algorithms, CLEAN-SC and DAMAS. Results of both planar and spatial sound source distributions show that the proposed neural network model significantly improves low-frequency localization accuracy, demonstrating its effectiveness and potential in SSL.
△ Less
Submitted 30 September, 2024;
originally announced September 2024.
-
Search for proton decay via $p\rightarrow{e^+η}$ and $p\rightarrow{μ^+η}$ with a 0.37 Mton-year exposure of Super-Kamiokande
Authors:
Super-Kamiokande Collaboration,
:,
N. Taniuchi,
K. Abe,
S. Abe,
Y. Asaoka,
C. Bronner,
M. Harada,
Y. Hayato,
K. Hiraide,
K. Hosokawa,
K. Ieki,
M. Ikeda,
J. Kameda,
Y. Kanemura,
R. Kaneshima,
Y. Kashiwagi,
Y. Kataoka,
S. Miki,
S. Mine,
M. Miura,
S. Moriyama,
M. Nakahata,
S. Nakayama,
Y. Noguchi
, et al. (267 additional authors not shown)
Abstract:
A search for proton decay into $e^+/μ^+$ and a $η$ meson has been performed using data from a 0.373 Mton$\cdot$year exposure (6050.3 live days) of Super-Kamiokande. Compared to previous searches this work introduces an improved model of the intranuclear $η$ interaction cross section, resulting in a factor of two reduction in uncertainties from this source and $\sim$10\% increase in signal efficien…
▽ More
A search for proton decay into $e^+/μ^+$ and a $η$ meson has been performed using data from a 0.373 Mton$\cdot$year exposure (6050.3 live days) of Super-Kamiokande. Compared to previous searches this work introduces an improved model of the intranuclear $η$ interaction cross section, resulting in a factor of two reduction in uncertainties from this source and $\sim$10\% increase in signal efficiency. No significant data excess was found above the expected number of atmospheric neutrino background events resulting in no indication of proton decay into either mode. Lower limits on the proton partial lifetime of $1.4\times\mathrm{10^{34}~years}$ for $p\rightarrow e^+η$ and $7.3\times\mathrm{10^{33}~years}$ for $p\rightarrow μ^+η$ at the 90$\%$ C.L. were set. These limits are around 1.5 times longer than our previous study and are the most stringent to date.
△ Less
Submitted 29 September, 2024;
originally announced September 2024.
-
RoboNurse-VLA: Robotic Scrub Nurse System based on Vision-Language-Action Model
Authors:
Shunlei Li,
Jin Wang,
Rui Dai,
Wanyu Ma,
Wing Yin Ng,
Yingbai Hu,
Zheng Li
Abstract:
In modern healthcare, the demand for autonomous robotic assistants has grown significantly, particularly in the operating room, where surgical tasks require precision and reliability. Robotic scrub nurses have emerged as a promising solution to improve efficiency and reduce human error during surgery. However, challenges remain in terms of accurately grasping and handing over surgical instruments,…
▽ More
In modern healthcare, the demand for autonomous robotic assistants has grown significantly, particularly in the operating room, where surgical tasks require precision and reliability. Robotic scrub nurses have emerged as a promising solution to improve efficiency and reduce human error during surgery. However, challenges remain in terms of accurately grasping and handing over surgical instruments, especially when dealing with complex or difficult objects in dynamic environments. In this work, we introduce a novel robotic scrub nurse system, RoboNurse-VLA, built on a Vision-Language-Action (VLA) model by integrating the Segment Anything Model 2 (SAM 2) and the Llama 2 language model.
The proposed RoboNurse-VLA system enables highly precise grasping and handover of surgical instruments in real-time based on voice commands from the surgeon. Leveraging state-of-the-art vision and language models, the system can address key challenges for object detection, pose optimization, and the handling of complex and difficult-to-grasp instruments. Through extensive evaluations, RoboNurse-VLA demonstrates superior performance compared to existing models, achieving high success rates in surgical instrument handovers, even with unseen tools and challenging items. This work presents a significant step forward in autonomous surgical assistance, showcasing the potential of integrating VLA models for real-world medical applications. More details can be found at https://robonurse-vla.github.io.
△ Less
Submitted 29 September, 2024;
originally announced September 2024.
-
Movable Antenna Enabled Near-Field Communications: Channel Modeling and Performance Optimization
Authors:
Lipeng Zhu,
Wenyan Ma,
Zhenyu Xiao,
Rui Zhang
Abstract:
Movable antenna (MA) technology offers promising potential to enhance wireless communication by allowing flexible antenna movement. To maximize spatial degrees of freedom (DoFs), larger movable regions are required, which may render the conventional far-field assumption for channels between transceivers invalid. In light of it, we investigate in this paper MA-enabled near-field communications, whe…
▽ More
Movable antenna (MA) technology offers promising potential to enhance wireless communication by allowing flexible antenna movement. To maximize spatial degrees of freedom (DoFs), larger movable regions are required, which may render the conventional far-field assumption for channels between transceivers invalid. In light of it, we investigate in this paper MA-enabled near-field communications, where a base station (BS) with multiple movable subarrays serves multiple users, each equipped with a fixed-position antenna (FPA). First, we extend the field response channel model for MA systems to the near-field propagation scenario. Next, we examine MA-aided multiuser communication systems under both digital and analog beamforming architectures. For digital beamforming, spatial division multiple access (SDMA) is utilized, where an upper bound on the minimum signal-to-interference-plus-noise ratio (SINR) across users is derived in closed form. A low-complexity algorithm based on zero-forcing (ZF) is then proposed to jointly optimize the antenna position vector (APV) and digital beamforming matrix (DBFM) to approach this bound. For analog beamforming, orthogonal frequency division multiple access (OFDMA) is employed, and an upper bound on the minimum signal-to-noise ratio (SNR) among users is derived. An alternating optimization (AO) algorithm is proposed to iteratively optimize the APV, analog beamforming vector (ABFV), and power allocation until convergence. For both architectures, we further explore MA design strategies based on statistical channel state information (CSI), with the APV updated less frequently to reduce the antenna movement overhead. Simulation results demonstrate that our proposed algorithms achieve performance close to the derived bounds and also outperform the benchmark schemes using dense or sparse arrays with FPAs.
△ Less
Submitted 28 September, 2024;
originally announced September 2024.
-
Exploring the Diversity of Nuclear Density through Information Entropy
Authors:
Wei-Hu Ma,
Yu-Gang Ma
Abstract:
This study explores the role of information entropy in understanding nuclear density distributions, including both stable configurations and non-traditional structures such as neutron halos and $α$-clustering. By quantifying the uncertainty and disorder inherent in nucleon distributions in nuclear many-body systems, information entropy provides a macroscopic measure of the physical properties of t…
▽ More
This study explores the role of information entropy in understanding nuclear density distributions, including both stable configurations and non-traditional structures such as neutron halos and $α$-clustering. By quantifying the uncertainty and disorder inherent in nucleon distributions in nuclear many-body systems, information entropy provides a macroscopic measure of the physical properties of the system. A more dispersed and disordered density distribution results in a higher value of information entropy. This intrinsic relationship between information entropy and system complexity allows us to quantify uncertainty and disorder in nuclear structures by analyzing various geometric parameters such as nuclear radius, diffuseness, neutron skin, and cluster structural features.
△ Less
Submitted 28 September, 2024;
originally announced September 2024.
-
A Novel Proposal for Exploring Spacetime Microstructure with Scaling
Authors:
Weihu Ma,
Yu-Gang Ma
Abstract:
The study of physics at the Planck scale has garnered significant attention due to its implications for understanding the fundamental nature of the universe. At this scale, quantum fluctuations in spacetime become apparent, as suggested by the Heisenberg uncertainty principle. These fluctuations indicate that spacetime is not a smooth manifold but rather has a more complex structure that might be…
▽ More
The study of physics at the Planck scale has garnered significant attention due to its implications for understanding the fundamental nature of the universe. At this scale, quantum fluctuations in spacetime become apparent, as suggested by the Heisenberg uncertainty principle. These fluctuations indicate that spacetime is not a smooth manifold but rather has a more complex structure that might be fractal-like, exhibiting self-similarity across different scales. This study investigates the scaling behavior of spacetime microstructure at the Planck scale. By introducing the concept of self-similar spacetime microelement measurements, a scaling-characterized metric tensor is derived from analyzing the Lorentz scalar line element, which imposes constraints on spacetime dimensions and components. In linear scale measurements of spacetime, equivalent equations - such as the geodesic equations, Einstein field equations, Klein-Gordon equation, and Dirac equation - can be expressed in scaling form. Notably, the golden ratio naturally appears in the microstructure of linear scale measurements, suggesting a potential explanation for the Planck length's significance. This work provides insights into quantum fluctuations and proposes modifications to classical geometric intuitions, potentially advancing our understanding of the microstructure of spacetime.
△ Less
Submitted 28 September, 2024;
originally announced September 2024.
-
Most Influential Subset Selection: Challenges, Promises, and Beyond
Authors:
Yuzheng Hu,
Pingbang Hu,
Han Zhao,
Jiaqi W. Ma
Abstract:
How can we attribute the behaviors of machine learning models to their training data? While the classic influence function sheds light on the impact of individual samples, it often fails to capture the more complex and pronounced collective influence of a set of samples. To tackle this challenge, we study the Most Influential Subset Selection (MISS) problem, which aims to identify a subset of trai…
▽ More
How can we attribute the behaviors of machine learning models to their training data? While the classic influence function sheds light on the impact of individual samples, it often fails to capture the more complex and pronounced collective influence of a set of samples. To tackle this challenge, we study the Most Influential Subset Selection (MISS) problem, which aims to identify a subset of training samples with the greatest collective influence. We conduct a comprehensive analysis of the prevailing approaches in MISS, elucidating their strengths and weaknesses. Our findings reveal that influence-based greedy heuristics, a dominant class of algorithms in MISS, can provably fail even in linear regression. We delineate the failure modes, including the errors of influence function and the non-additive structure of the collective influence. Conversely, we demonstrate that an adaptive version of these heuristics which applies them iteratively, can effectively capture the interactions among samples and thus partially address the issues. Experiments on real-world datasets corroborate these theoretical findings, and further demonstrate that the merit of adaptivity can extend to more complex scenarios such as classification tasks and non-linear neural networks. We conclude our analysis by emphasizing the inherent trade-off between performance and computational efficiency, questioning the use of additive metrics such as the linear datamodeling score, and offering a range of discussions.
△ Less
Submitted 25 September, 2024;
originally announced September 2024.
-
Triple Point Masking
Authors:
Jiaming Liu,
Linghe Kong,
Yue Wu,
Maoguo Gong,
Hao Li,
Qiguang Miao,
Wenping Ma,
Can Qin
Abstract:
Existing 3D mask learning methods encounter performance bottlenecks under limited data, and our objective is to overcome this limitation. In this paper, we introduce a triple point masking scheme, named TPM, which serves as a scalable framework for pre-training of masked autoencoders to achieve multi-mask learning for 3D point clouds. Specifically, we augment the baselines with two additional mask…
▽ More
Existing 3D mask learning methods encounter performance bottlenecks under limited data, and our objective is to overcome this limitation. In this paper, we introduce a triple point masking scheme, named TPM, which serves as a scalable framework for pre-training of masked autoencoders to achieve multi-mask learning for 3D point clouds. Specifically, we augment the baselines with two additional mask choices (i.e., medium mask and low mask) as our core insight is that the recovery process of an object can manifest in diverse ways. Previous high-masking schemes focus on capturing the global representation but lack the fine-grained recovery capability, so that the generated pre-trained weights tend to play a limited role in the fine-tuning process. With the support of the proposed TPM, available methods can exhibit more flexible and accurate completion capabilities, enabling the potential autoencoder in the pre-training stage to consider multiple representations of a single 3D object. In addition, an SVM-guided weight selection module is proposed to fill the encoder parameters for downstream networks with the optimal weight during the fine-tuning stage, maximizing linear accuracy and facilitating the acquisition of intricate representations for new objects. Extensive experiments show that the four baselines equipped with the proposed TPM achieve comprehensive performance improvements on various downstream tasks. Our code and models are available at https://github.com/liujia99/TPM.
△ Less
Submitted 15 October, 2024; v1 submitted 26 September, 2024;
originally announced September 2024.
-
Parse Trees Guided LLM Prompt Compression
Authors:
Wenhao Mao,
Chengbin Hou,
Tianyu Zhang,
Xinyu Lin,
Ke Tang,
Hairong Lv
Abstract:
Offering rich contexts to Large Language Models (LLMs) has shown to boost the performance in various tasks, but the resulting longer prompt would increase the computational cost and might exceed the input limit of LLMs. Recently, some prompt compression methods have been suggested to shorten the length of prompts by using language models to generate shorter prompts or by developing computational m…
▽ More
Offering rich contexts to Large Language Models (LLMs) has shown to boost the performance in various tasks, but the resulting longer prompt would increase the computational cost and might exceed the input limit of LLMs. Recently, some prompt compression methods have been suggested to shorten the length of prompts by using language models to generate shorter prompts or by developing computational models to select important parts of original prompt. The generative compression methods would suffer from issues like hallucination, while the selective compression methods have not involved linguistic rules and overlook the global structure of prompt. To this end, we propose a novel selective compression method called PartPrompt. It first obtains a parse tree for each sentence based on linguistic rules, and calculates local information entropy for each node in a parse tree. These local parse trees are then organized into a global tree according to the hierarchical structure such as the dependency of sentences, paragraphs, and sections. After that, the root-ward propagation and leaf-ward propagation are proposed to adjust node values over the global tree. Finally, a recursive algorithm is developed to prune the global tree based on the adjusted node values. The experiments show that PartPrompt receives the state-of-the-art performance across various datasets, metrics, compression ratios, and target LLMs for inference. The in-depth ablation studies confirm the effectiveness of designs in PartPrompt, and other additional experiments also demonstrate its superiority in terms of the coherence of compressed prompts and in the extreme long prompt scenario.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
PathSeeker: Exploring LLM Security Vulnerabilities with a Reinforcement Learning-Based Jailbreak Approach
Authors:
Zhihao Lin,
Wei Ma,
Mingyi Zhou,
Yanjie Zhao,
Haoyu Wang,
Yang Liu,
Jun Wang,
Li Li
Abstract:
In recent years, Large Language Models (LLMs) have gained widespread use, raising concerns about their security. Traditional jailbreak attacks, which often rely on the model internal information or have limitations when exploring the unsafe behavior of the victim model, limiting their reducing their general applicability. In this paper, we introduce PathSeeker, a novel black-box jailbreak method,…
▽ More
In recent years, Large Language Models (LLMs) have gained widespread use, raising concerns about their security. Traditional jailbreak attacks, which often rely on the model internal information or have limitations when exploring the unsafe behavior of the victim model, limiting their reducing their general applicability. In this paper, we introduce PathSeeker, a novel black-box jailbreak method, which is inspired by the game of rats escaping a maze. We think that each LLM has its unique "security maze", and attackers attempt to find the exit learning from the received feedback and their accumulated experience to compromise the target LLM's security defences. Our approach leverages multi-agent reinforcement learning, where smaller models collaborate to guide the main LLM in performing mutation operations to achieve the attack objectives. By progressively modifying inputs based on the model's feedback, our system induces richer, harmful responses. During our manual attempts to perform jailbreak attacks, we found that the vocabulary of the response of the target model gradually became richer and eventually produced harmful responses. Based on the observation, we also introduce a reward mechanism that exploits the expansion of vocabulary richness in LLM responses to weaken security constraints. Our method outperforms five state-of-the-art attack techniques when tested across 13 commercial and open-source LLMs, achieving high attack success rates, especially in strongly aligned commercial models like GPT-4o-mini, Claude-3.5, and GLM-4-air with strong safety alignment. This study aims to improve the understanding of LLM security vulnerabilities and we hope that this sturdy can contribute to the development of more robust defenses.
△ Less
Submitted 3 October, 2024; v1 submitted 21 September, 2024;
originally announced September 2024.
-
6D Movable Antenna Enhanced Interference Mitigation for Cellular-Connected UAV Communications
Authors:
Tianshi Ren,
Xianchao Zhang,
Lipeng Zhu,
Wenyan Ma,
Xiaozheng Gao,
Rui Zhang
Abstract:
Cellular-connected unmanned aerial vehicle (UAV) communications is an enabling technology to transmit control signaling or payload data for UAVs through cellular networks. Due to the line-of-sight (LoS) dominant air-to-ground channels, efficient interference mitigation is crucial to UAV communications, while the conventional fixed-position antenna (FPA) arrays have limited degrees of freedom (DoFs…
▽ More
Cellular-connected unmanned aerial vehicle (UAV) communications is an enabling technology to transmit control signaling or payload data for UAVs through cellular networks. Due to the line-of-sight (LoS) dominant air-to-ground channels, efficient interference mitigation is crucial to UAV communications, while the conventional fixed-position antenna (FPA) arrays have limited degrees of freedom (DoFs) to suppress the interference between the UAV and its non-associated co-channel base stations (BSs). To address this challenge, we propose in this letter a new approach by utilizing the six-dimensional movable antenna (6DMA) arrays to enhance the interference mitigation for the UAV. Specifically, we propose an efficient block coordinate descent (BCD) algorithm to iteratively optimize the antenna position vector (APV), array rotation vector (ARV), receive beamforming vector, and associated BS of the UAV to maximize its signal-to-interference-plus-noise ratio (SINR). Numerical results show that the proposed 6DMA enhanced cellular-connected UAV communication can significantly outperform that with the traditional FPA arrays and other benchmark schemes in terms of interference mitigation.
△ Less
Submitted 20 September, 2024;
originally announced September 2024.
-
Multiple-models prediction for light neutron-rich isotopes cross section by $Q_g$ systematics in $^{40}$Ar projectile fragmentation reactions
Authors:
X. B. Wei,
H. L. Wei,
C. W. Ma,
C. Y. Qiao,
Y. F. Guo,
J. Pu,
K. X. Cheng,
Y. T. Wang,
Z. X. Wang,
T. R. Zhou,
D. Peng,
S. T. Wang,
S. W. Tang,
Y. H. Yu,
X. H. Zhang,
Y. Z. Sun,
S. Y. Jin,
G. L. Zhang,
X. Jiang,
Z. Y. Li,
Y. F. Xu,
F. H. Lu,
T. Q. Liu
Abstract:
Precise predictions for nuclei near drip lines are crucial for experiments in new generation of rare isotope facilities. A multi-models investigation of the $Q_g$ systematics for fragments production cross sections, with $Q_g$ defined as the difference of mass excess (ME) between the projectile ($Z_{p}, A_{p}$) and the fragment ($Z_{f}, A_{f}$) nuclei $Q_{g}=ME(Z_{p}, A_{p})-ME(Z_{f}, A_{f})$, has…
▽ More
Precise predictions for nuclei near drip lines are crucial for experiments in new generation of rare isotope facilities. A multi-models investigation of the $Q_g$ systematics for fragments production cross sections, with $Q_g$ defined as the difference of mass excess (ME) between the projectile ($Z_{p}, A_{p}$) and the fragment ($Z_{f}, A_{f}$) nuclei $Q_{g}=ME(Z_{p}, A_{p})-ME(Z_{f}, A_{f})$, has been performed to verify the model prediction abilities for light neutron-rich isotopes in measured $^{40}$Ar + $^9$Be projectile fragmentation reactions from 57$A$ MeV to 1$A$ GeV. The models used are the FRACS parametrizations and the newly developed Bayesian neural networks (BNN) model. %method The results show that FRACS, BNN, and $Q_g$ extrapolations are generally consistent, except for fragments near the nuclear mass of the projectile. Additionally, both measured data and model extrapolations provide evidence for a shell closure at $N=$ 16 in fluorine and neon, as well as the disappearance of the traditional magic number $N=$ 20 in neon, sodium and magnesium.
△ Less
Submitted 14 September, 2024;
originally announced September 2024.
-
NeutralUniverseMachine: How Filaments and Dark Matter Halo Influence the Galaxy Cold Gas Content
Authors:
Wenlin Ma,
Hong Guo,
Michael G. Jones
Abstract:
Aims. To investigate the influence of distance to filaments and dark matter halos on galaxy cold gas content in the empirical model NeutralUniverseMachine (NUM) and the hydrodynamical simulation IllustrisTNG. Methods. We use DisPerSE to identify cosmic web structures and calculate the distance of galaxies to filaments for both observations and models. We show the results of the HI and H2 mass func…
▽ More
Aims. To investigate the influence of distance to filaments and dark matter halos on galaxy cold gas content in the empirical model NeutralUniverseMachine (NUM) and the hydrodynamical simulation IllustrisTNG. Methods. We use DisPerSE to identify cosmic web structures and calculate the distance of galaxies to filaments for both observations and models. We show the results of the HI and H2 mass functions, HI- and H2-halo mass relations, HI- and H2-stellar mass relations for galaxies in the NUM model and IllustrisTNG with different distances to filaments and compare them with observational measurements. We also show the evolution of HI, H2 mass densities in different distance to filament bins. Results. We find that the role of filaments in affecting the HI gas is generally less significant compared to the halo environment. There is a weak trend in the observations at z = 0 that low-mass halos lying closer to filaments tend to have reduced HI masses. However, this trend reverses for massive halos with log(Mvir/Msun) > 12.5. This behavior is accurately reproduced in the NUM model due to the dependence of HI gas on the halo formation time, but it does not appear in IllustrisTNG. The influence of filaments on the HI gas becomes slightly weaker at higher redshifts and is only significant for galaxies residing in massive halos in the NUM model. Filaments have almost no impact on the H2-stellar mass relation in both models, confirming that H2 is primarily determined by the galaxy stellar mass and star formation rate.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
Consensus-based Distributed Quantum Kernel Learning for Speech Recognition
Authors:
Kuan-Cheng Chen,
Wenxuan Ma,
Xiaotian Xu
Abstract:
This paper presents a Consensus-based Distributed Quantum Kernel Learning (CDQKL) framework aimed at improving speech recognition through distributed quantum computing.CDQKL addresses the challenges of scalability and data privacy in centralized quantum kernel learning. It does this by distributing computational tasks across quantum terminals, which are connected through classical channels. This a…
▽ More
This paper presents a Consensus-based Distributed Quantum Kernel Learning (CDQKL) framework aimed at improving speech recognition through distributed quantum computing.CDQKL addresses the challenges of scalability and data privacy in centralized quantum kernel learning. It does this by distributing computational tasks across quantum terminals, which are connected through classical channels. This approach enables the exchange of model parameters without sharing local training data, thereby maintaining data privacy and enhancing computational efficiency. Experimental evaluations on benchmark speech emotion recognition datasets demonstrate that CDQKL achieves competitive classification accuracy and scalability compared to centralized and local quantum kernel learning models. The distributed nature of CDQKL offers advantages in privacy preservation and computational efficiency, making it suitable for data-sensitive fields such as telecommunications, automotive, and finance. The findings suggest that CDQKL can effectively leverage distributed quantum computing for large-scale machine-learning tasks.
△ Less
Submitted 9 September, 2024;
originally announced September 2024.
-
Adversarial Attacks on Data Attribution
Authors:
Xinhe Wang,
Pingbang Hu,
Junwei Deng,
Jiaqi W. Ma
Abstract:
Data attribution aims to quantify the contribution of individual training data points to the outputs of an AI model, which has been used to measure the value of training data and compensate data providers. Given the impact on financial decisions and compensation mechanisms, a critical question arises concerning the adversarial robustness of data attribution methods. However, there has been little…
▽ More
Data attribution aims to quantify the contribution of individual training data points to the outputs of an AI model, which has been used to measure the value of training data and compensate data providers. Given the impact on financial decisions and compensation mechanisms, a critical question arises concerning the adversarial robustness of data attribution methods. However, there has been little to no systematic research addressing this issue. In this work, we aim to bridge this gap by detailing a threat model with clear assumptions about the adversary's goal and capabilities and proposing principled adversarial attack methods on data attribution. We present two methods, Shadow Attack and Outlier Attack, which generate manipulated datasets to inflate the compensation adversarially. The Shadow Attack leverages knowledge about the data distribution in the AI applications, and derives adversarial perturbations through "shadow training", a technique commonly used in membership inference attacks. In contrast, the Outlier Attack does not assume any knowledge about the data distribution and relies solely on black-box queries to the target model's predictions. It exploits an inductive bias present in many data attribution methods - outlier data points are more likely to be influential - and employs adversarial examples to generate manipulated datasets. Empirically, in image classification and text generation tasks, the Shadow Attack can inflate the data-attribution-based compensation by at least 200%, while the Outlier Attack achieves compensation inflation ranging from 185% to as much as 643%.
△ Less
Submitted 4 October, 2024; v1 submitted 9 September, 2024;
originally announced September 2024.
-
Incorporating external data for analyzing randomized clinical trials: A transfer learning approach
Authors:
Yujia Gu,
Hanzhong Liu,
Wei Ma
Abstract:
Randomized clinical trials are the gold standard for analyzing treatment effects, but high costs and ethical concerns can limit recruitment, potentially leading to invalid inferences. Incorporating external trial data with similar characteristics into the analysis using transfer learning appears promising for addressing these issues. In this paper, we present a formal framework for applying transf…
▽ More
Randomized clinical trials are the gold standard for analyzing treatment effects, but high costs and ethical concerns can limit recruitment, potentially leading to invalid inferences. Incorporating external trial data with similar characteristics into the analysis using transfer learning appears promising for addressing these issues. In this paper, we present a formal framework for applying transfer learning to the analysis of clinical trials, considering three key perspectives: transfer algorithm, theoretical foundation, and inference method. For the algorithm, we adopt a parameter-based transfer learning approach to enhance the lasso-adjusted stratum-specific estimator developed for estimating treatment effects. A key component in constructing the transfer learning estimator is deriving the regression coefficient estimates within each stratum, accounting for the bias between source and target data. To provide a theoretical foundation, we derive the $l_1$ convergence rate for the estimated regression coefficients and establish the asymptotic normality of the transfer learning estimator. Our results show that when external trial data resembles current trial data, the sample size requirements can be reduced compared to using only the current trial data. Finally, we propose a consistent nonparametric variance estimator to facilitate inference. Numerical studies demonstrate the effectiveness and robustness of our proposed estimator across various scenarios.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
Survey of Data-driven Newsvendor: Unified Analysis and Spectrum of Achievable Regrets
Authors:
Zhuoxin Chen,
Will Ma
Abstract:
In the Newsvendor problem, the goal is to guess the number that will be drawn from some distribution, with asymmetric consequences for guessing too high vs. too low. In the data-driven version, the distribution is unknown, and one must work with samples from the distribution. Data-driven Newsvendor has been studied under many variants: additive vs. multiplicative regret, high probability vs. expec…
▽ More
In the Newsvendor problem, the goal is to guess the number that will be drawn from some distribution, with asymmetric consequences for guessing too high vs. too low. In the data-driven version, the distribution is unknown, and one must work with samples from the distribution. Data-driven Newsvendor has been studied under many variants: additive vs. multiplicative regret, high probability vs. expectation bounds, and different distribution classes. This paper studies all combinations of these variants, filling in many gaps in the literature and simplifying many proofs. In particular, we provide a unified analysis based on the notion of clustered distributions, which in conjunction with our new lower bounds, shows that the entire spectrum of regrets between $1/\sqrt{n}$ and $1/n$ can be possible.
△ Less
Submitted 17 September, 2024; v1 submitted 5 September, 2024;
originally announced September 2024.
-
Sketch: A Toolkit for Streamlining LLM Operations
Authors:
Xin Jiang,
Xiang Li,
Wenjia Ma,
Xuezhi Fang,
Yiqun Yao,
Naitong Yu,
Xuying Meng,
Peng Han,
Jing Li,
Aixin Sun,
Yequan Wang
Abstract:
Large language models (LLMs) represented by GPT family have achieved remarkable success. The characteristics of LLMs lie in their ability to accommodate a wide range of tasks through a generative approach. However, the flexibility of their output format poses challenges in controlling and harnessing the model's outputs, thereby constraining the application of LLMs in various domains. In this work,…
▽ More
Large language models (LLMs) represented by GPT family have achieved remarkable success. The characteristics of LLMs lie in their ability to accommodate a wide range of tasks through a generative approach. However, the flexibility of their output format poses challenges in controlling and harnessing the model's outputs, thereby constraining the application of LLMs in various domains. In this work, we present Sketch, an innovative toolkit designed to streamline LLM operations across diverse fields. Sketch comprises the following components: (1) a suite of task description schemas and prompt templates encompassing various NLP tasks; (2) a user-friendly, interactive process for building structured output LLM services tailored to various NLP tasks; (3) an open-source dataset for output format control, along with tools for dataset construction; and (4) an open-source model based on LLaMA3-8B-Instruct that adeptly comprehends and adheres to output formatting instructions. We anticipate this initiative to bring considerable convenience to LLM users, achieving the goal of ''plug-and-play'' for various applications. The components of Sketch will be progressively open-sourced at https://github.com/cofe-ai/Sketch.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
Searching for MeV-scale Axion-like Particles and Dark Photons with PandaX-4T
Authors:
PandaX Collaboration,
Tao Li,
Zihao Bo,
Wei Chen,
Xun Chen,
Yunhua Chen,
Zhaokan Cheng,
Xiangyi Cui,
Yingjie Fan,
Deqing Fang,
Zhixing Gao,
Lisheng Geng,
Karl Giboni,
Xunan Guo,
Xuyuan Guo,
Zichao Guo,
Chencheng Han,
Ke HanChangda He,
Jinrong He,
Di Huang,
Houqi Huang,
Junting Huang,
Ruquan Hou,
Yu Hou,
Xiangdong Ji
, et al. (76 additional authors not shown)
Abstract:
Axion-like particles (ALPs) and dark photons (DPs) are viable dark matter particle candidates. We have searched for possible ALP/DP signals in the PandaX-4T liquid xenon detector using 94.8 days of data. A binned likelihood fit is constructed to search for possible mono-energetic peaks induced by the absorption processes between ALPs/DPs and atomic electrons of xenon. A detailed temporal model of…
▽ More
Axion-like particles (ALPs) and dark photons (DPs) are viable dark matter particle candidates. We have searched for possible ALP/DP signals in the PandaX-4T liquid xenon detector using 94.8 days of data. A binned likelihood fit is constructed to search for possible mono-energetic peaks induced by the absorption processes between ALPs/DPs and atomic electrons of xenon. A detailed temporal model of decays associated with xenon isotopes is introduced to constrain the number of background events. No signal excess over background expectations is observed, and we have established the most stringent exclusion limits for most ALP/DP masses ranging from 150 keV/$c^2$ to 1 MeV/$c^2$.
△ Less
Submitted 1 September, 2024;
originally announced September 2024.
-
Joint Estimation and Prediction of City-wide Delivery Demand: A Large Language Model Empowered Graph-based Learning Approach
Authors:
Tong Nie,
Junlin He,
Yuewen Mei,
Guoyang Qin,
Guilong Li,
Jian Sun,
Wei Ma
Abstract:
The proliferation of e-commerce and urbanization has significantly intensified delivery operations in urban areas, boosting the volume and complexity of delivery demand. Data-driven predictive methods, especially those utilizing machine learning techniques, have emerged to handle these complexities in urban delivery demand management problems. One particularly pressing problem that has not yet bee…
▽ More
The proliferation of e-commerce and urbanization has significantly intensified delivery operations in urban areas, boosting the volume and complexity of delivery demand. Data-driven predictive methods, especially those utilizing machine learning techniques, have emerged to handle these complexities in urban delivery demand management problems. One particularly pressing problem that has not yet been sufficiently studied is the joint estimation and prediction of city-wide delivery demand. To this end, we formulate this problem as a graph-based spatiotemporal learning task. First, a message-passing neural network model is formalized to capture the interaction between demand patterns of associated regions. Second, by exploiting recent advances in large language models, we extract general geospatial knowledge encodings from the unstructured locational data and integrate them into the demand predictor. Last, to encourage the cross-city transferability of the model, an inductive training scheme is developed in an end-to-end routine. Extensive empirical results on two real-world delivery datasets, including eight cities in China and the US, demonstrate that our model significantly outperforms state-of-the-art baselines in these challenging tasks.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
Harmonic metrics of $\mathrm{SO}_{0}(n,n)$-Higgs bundles in the Hitchin section on non-compact hyperbolic surfaces
Authors:
Weihan Ma
Abstract:
Let $X$ be a Riemann surface. Using the canonical line bundle $K$ and some holomorphic differentials $\boldsymbol{q}$, Hitchin constructed the $G$-Higgs bundles in the Hitchin section for a split real form $G$ of a complex simple Lie group. We study the ${\mathrm{SO}_0(n,n)}$ case. In our work, we establish the existence of harmonic metrics for these Higgs bundles, which are compatible with the…
▽ More
Let $X$ be a Riemann surface. Using the canonical line bundle $K$ and some holomorphic differentials $\boldsymbol{q}$, Hitchin constructed the $G$-Higgs bundles in the Hitchin section for a split real form $G$ of a complex simple Lie group. We study the ${\mathrm{SO}_0(n,n)}$ case. In our work, we establish the existence of harmonic metrics for these Higgs bundles, which are compatible with the ${\mathrm{SO}_0(n,n)}$-structure for any non-compact hyperbolic Riemann surface. Moreover, these harmonic metrics also weakly dominate $h_X$ which is the natural diagonal harmonic metric induced by the unique complete Kähler hyperbolic metric $g_X$ on $X$. Assuming that these holomorphic differentials are all bounded with respect to the metric $g_X$, we are able to prove the uniqueness of such a harmonic metric.
△ Less
Submitted 16 August, 2024;
originally announced August 2024.
-
Relation Also Knows: Rethinking the Recall and Editing of Factual Associations in Auto-Regressive Transformer Language Models
Authors:
Xiyu Liu,
Zhengxiao Liu,
Naibin Gu,
Zheng Lin,
Wanli Ma,
Ji Xiang,
Weiping Wang
Abstract:
The storage and recall of factual associations in auto-regressive transformer language models (LMs) have drawn a great deal of attention, inspiring knowledge editing by directly modifying the located model weights. Most editing works achieve knowledge editing under the guidance of existing interpretations of knowledge recall that mainly focus on subject knowledge. However, these interpretations ar…
▽ More
The storage and recall of factual associations in auto-regressive transformer language models (LMs) have drawn a great deal of attention, inspiring knowledge editing by directly modifying the located model weights. Most editing works achieve knowledge editing under the guidance of existing interpretations of knowledge recall that mainly focus on subject knowledge. However, these interpretations are seriously flawed, neglecting relation information and leading to the over-generalizing problem for editing. In this work, we discover a novel relation-focused perspective to interpret the knowledge recall of transformer LMs during inference and apply it on knowledge editing to avoid over-generalizing. Experimental results on the dataset supplemented with a new R-Specificity criterion demonstrate that our editing approach significantly alleviates over-generalizing while remaining competitive on other criteria, breaking the domination of subject-focused editing for future research.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
Robo-GS: A Physics Consistent Spatial-Temporal Model for Robotic Arm with Hybrid Representation
Authors:
Haozhe Lou,
Yurong Liu,
Yike Pan,
Yiran Geng,
Jianteng Chen,
Wenlong Ma,
Chenglong Li,
Lin Wang,
Hengzhen Feng,
Lu Shi,
Liyi Luo,
Yongliang Shi
Abstract:
Real2Sim2Real plays a critical role in robotic arm control and reinforcement learning, yet bridging this gap remains a significant challenge due to the complex physical properties of robots and the objects they manipulate. Existing methods lack a comprehensive solution to accurately reconstruct real-world objects with spatial representations and their associated physics attributes.
We propose a…
▽ More
Real2Sim2Real plays a critical role in robotic arm control and reinforcement learning, yet bridging this gap remains a significant challenge due to the complex physical properties of robots and the objects they manipulate. Existing methods lack a comprehensive solution to accurately reconstruct real-world objects with spatial representations and their associated physics attributes.
We propose a Real2Sim pipeline with a hybrid representation model that integrates mesh geometry, 3D Gaussian kernels, and physics attributes to enhance the digital asset representation of robotic arms.
This hybrid representation is implemented through a Gaussian-Mesh-Pixel binding technique, which establishes an isomorphic mapping between mesh vertices and Gaussian models. This enables a fully differentiable rendering pipeline that can be optimized through numerical solvers, achieves high-fidelity rendering via Gaussian Splatting, and facilitates physically plausible simulation of the robotic arm's interaction with its environment using mesh-based methods.
The code,full presentation and datasets will be made publicly available at our website https://robostudioapp.com
△ Less
Submitted 17 September, 2024; v1 submitted 27 August, 2024;
originally announced August 2024.
-
Irrelevance of 1H composition to the superconductivity in the infinite-layer nickelates: judging from the MeV energy scale
Authors:
Jia-Cai Nie,
Xing-Yu Chen,
Yi Bian,
Xue-Yan Wang,
Ting-Na Shao,
Jing-Xin Gao,
Wei Mao,
Bing-Hui Ge,
Arnold Muller,
Jikun Chen
Abstract:
The discovery of the superconductivity in the infinite-layer nickelates, as topotactically reduced from their respective perovskite percussors via co-annealing with CaH2, extends the understanding in superconductivity. Nevertheless, whether the incorporated 1H composition is critical to the infinite-layer superconductivity recently arouses considerable debates, while the central challenge lies in…
▽ More
The discovery of the superconductivity in the infinite-layer nickelates, as topotactically reduced from their respective perovskite percussors via co-annealing with CaH2, extends the understanding in superconductivity. Nevertheless, whether the incorporated 1H composition is critical to the infinite-layer superconductivity recently arouses considerable debates, while the central challenge lies in the quantification of 1H that is easily interfered by the conventional electron or orbital associated processes. Herein, we demonstrate the irrelevance between the superconductivity in the infinite-layer nickelates and their incorporated 1H composition, assisted by nuclear reaction analysis (NRA) and heavy ion energy recoil detection analysis (HIERDA) based on the nuclear interactions at MeV energy scale. These approaches completely overwhelm the conventional interferes, such as ionization, activation and chemical bonds, and achieves the 1H quantification within superconducting La0.8Sr0.2NiO2 (or Nd0.8Sr0.2NiO2). A large diversity of 1H composition far beyond the previously expected critical dome was observed, while their TC were not changed significantly. Furthermore, the superconductivity was demonstrated to be achievable for La0.8Sr0.2NiO2 reduced by Al without any hydrogen associated process, while the superconducting properties for the CaH2 reduced La0.8Sr0.2NiO2 is rather stable after long term exposure in air, despite the high volatility of 1H within oxides. All these results indicate that the 1H incorporation composition is not critical to the superconductivity of the infinite-layer nickelates.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
Programmable Jumping-Droplet Condensation
Authors:
Shan Gao,
Jian Qu,
Dehui Wang,
Zhichun Liu,
Weigang Ma
Abstract:
Self-propelled droplet jumping during condensation has attractive prospects for energy harvesting, water collection and thermal management, but its real-life applications are greatly limited to the challenge of enabling a sustainable control on the entire droplet lifecycle. Herein, we propose a programmable jumping-droplet condensation that evolves along an artificially designed pathway without ex…
▽ More
Self-propelled droplet jumping during condensation has attractive prospects for energy harvesting, water collection and thermal management, but its real-life applications are greatly limited to the challenge of enabling a sustainable control on the entire droplet lifecycle. Herein, we propose a programmable jumping-droplet condensation that evolves along an artificially designed pathway without external stimulations, where the droplets can uniformly form at specific sites, spontaneously migrate and coalesce with their neighboring droplets, and jump off effectively to continuously refresh surface, significantly enhancing the heat transfer performance and durability of condensation. The programmable jumping-droplet condensation is achieved using a wedge-walled rhombus lattice structure surface inspired from the structures and functions of Namib desert beetle skin, shorebird beak and setaria viridis leaf vein. This surface integrates wetting contrast patterns with dual-gradient hierarchical structures, providing persistent and multidimensional droplet rectifications and thus realizing a sustainable control on the entire droplet lifecycle. Furthermore, we systematically investigate the morphology and behavior evolutions of droplets throughout their entire lifecycle, and fully elucidate the programmable control mechanisms of the lattice structure determined by its topology and wettability features. This work not only serves as theoretical foundations and reference framework to realize a durable jumping-droplet condensation and achieve its performance ceiling in a controlled manner, but also promotes the design and fabrication of functional structured surfaces for droplet manipulation and delivery, self-cleaning and anti-fogging/icing.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Authors:
Jinheng Xie,
Weijia Mao,
Zechen Bai,
David Junhao Zhang,
Weihao Wang,
Kevin Qinghong Lin,
Yuchao Gu,
Zhijie Chen,
Zhenheng Yang,
Mike Zheng Shou
Abstract:
We present a unified transformer, i.e., Show-o, that unifies multimodal understanding and generation. Unlike fully autoregressive models, Show-o unifies autoregressive and (discrete) diffusion modeling to adaptively handle inputs and outputs of various and mixed modalities. The unified model flexibly supports a wide range of vision-language tasks including visual question-answering, text-to-image…
▽ More
We present a unified transformer, i.e., Show-o, that unifies multimodal understanding and generation. Unlike fully autoregressive models, Show-o unifies autoregressive and (discrete) diffusion modeling to adaptively handle inputs and outputs of various and mixed modalities. The unified model flexibly supports a wide range of vision-language tasks including visual question-answering, text-to-image generation, text-guided inpainting/extrapolation, and mixed-modality generation. Across various benchmarks, it demonstrates comparable or superior performance to existing individual models with an equivalent or larger number of parameters tailored for understanding or generation. This significantly highlights its potential as a next-generation foundation model. Code and models are released at https://github.com/showlab/Show-o.
△ Less
Submitted 20 October, 2024; v1 submitted 22 August, 2024;
originally announced August 2024.
-
Geolocation Representation from Large Language Models are Generic Enhancers for Spatio-Temporal Learning
Authors:
Junlin He,
Tong Nie,
Wei Ma
Abstract:
In the geospatial domain, universal representation models are significantly less prevalent than their extensive use in natural language processing and computer vision. This discrepancy arises primarily from the high costs associated with the input of existing representation models, which often require street views and mobility data. To address this, we develop a novel, training-free method that le…
▽ More
In the geospatial domain, universal representation models are significantly less prevalent than their extensive use in natural language processing and computer vision. This discrepancy arises primarily from the high costs associated with the input of existing representation models, which often require street views and mobility data. To address this, we develop a novel, training-free method that leverages large language models (LLMs) and auxiliary map data from OpenStreetMap to derive geolocation representations (LLMGeovec). LLMGeovec can represent the geographic semantics of city, country, and global scales, which acts as a generic enhancer for spatio-temporal learning. Specifically, by direct feature concatenation, we introduce a simple yet effective paradigm for enhancing multiple spatio-temporal tasks including geographic prediction (GP), long-term time series forecasting (LTSF), and graph-based spatio-temporal forecasting (GSTF). LLMGeovec can seamlessly integrate into a wide spectrum of spatio-temporal learning models, providing immediate enhancements. Experimental results demonstrate that LLMGeovec achieves global coverage and significantly boosts the performance of leading GP, LTSF, and GSTF models.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Research on the Construction of Maximum Distance Separable Codes via Arbitrary twisted Generalized Reed-Solomon Codes
Authors:
Chun'e Zhao,
Wenping Ma,
Tongjiang Yan,
Yuhua Sun
Abstract:
Maximum distance separable (MDS) codes have significant combinatorial and cryptographic applications due to their certain optimality. Generalized Reed-Solomon (GRS) codes are the most prominent MDS codes. Twisted generalized Reed-Solomon (TGRS) codes may not necessarily be MDS. It is meaningful to study the conditions under which TGRS codes are MDS. In this paper, we study a general class of TGRS…
▽ More
Maximum distance separable (MDS) codes have significant combinatorial and cryptographic applications due to their certain optimality. Generalized Reed-Solomon (GRS) codes are the most prominent MDS codes. Twisted generalized Reed-Solomon (TGRS) codes may not necessarily be MDS. It is meaningful to study the conditions under which TGRS codes are MDS. In this paper, we study a general class of TGRS (A-TGRS) codes which include all the known special ones. First, we obtain a new explicit expression of the inverse of the Vandermonde matrix. Based on this, we further derive an equivalent condition under which an A-TGRS code is MDS. According to this, the A-TGRS MDS codes include nearly all the known related results in the previous literatures. More importantly, we also obtain many other classes of MDS TGRS codes with new parameter matrices. In addition, we present a new method to compute the inverse of the lower triangular Toplitz matrix by a linear feedback shift register, which will be very useful in many research fields.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Sex chromosome evolution: The classical paradigm and so much beyond
Authors:
Paris Veltsos,
Sagar Shinde,
Wen-Juan Ma
Abstract:
Sex chromosomes have independently evolved in species with separate sexes in most lineages across the tree of life. However, the well-accepted canonical model of sex chromosome evolution is not universally supported. There is no single trajectory for sex chromosome formation and evolution across the tree of life, suggesting the underlying mechanisms and evolutionary forces are diverse and lineage…
▽ More
Sex chromosomes have independently evolved in species with separate sexes in most lineages across the tree of life. However, the well-accepted canonical model of sex chromosome evolution is not universally supported. There is no single trajectory for sex chromosome formation and evolution across the tree of life, suggesting the underlying mechanisms and evolutionary forces are diverse and lineage specific. We review the diversity of sex chromosome systems, describe the canonical model of sex chromosome evolution, and summarize studies challenging various aspects of this model. They include evidence that many lineages experience frequent sex chromosome turnovers or maintain homomorphic sex chromosomes over long periods of time, suggesting sex chromosome degeneration is not inevitable. Sometimes the sex-limited Y/W chromosomes expand before they contract in size. Both transposable elements and gene gains could contribute to this size expansion, which further challenges gene loss being the hallmark of sex chromosome degeneration. Finally, empirical support for the role of sexually antagonistic selection as a driver of recombination suppression on sex chromosomes remains elusive. We summarize models that result in loss of recombination without invoking sexually antagonistic selection, which have not been empirically verified yet, and suggest future avenues for sex chromosome research.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Unsupervised Transfer Learning via Adversarial Contrastive Training
Authors:
Chenguang Duan,
Yuling Jiao,
Huazhen Lin,
Wensen Ma,
Jerry Zhijian Yang
Abstract:
Learning a data representation for downstream supervised learning tasks under unlabeled scenario is both critical and challenging. In this paper, we propose a novel unsupervised transfer learning approach using adversarial contrastive training (ACT). Our experimental results demonstrate outstanding classification accuracy with both fine-tuned linear probe and K-NN protocol across various datasets,…
▽ More
Learning a data representation for downstream supervised learning tasks under unlabeled scenario is both critical and challenging. In this paper, we propose a novel unsupervised transfer learning approach using adversarial contrastive training (ACT). Our experimental results demonstrate outstanding classification accuracy with both fine-tuned linear probe and K-NN protocol across various datasets, showing competitiveness with existing state-of-the-art self-supervised learning methods. Moreover, we provide an end-to-end theoretical guarantee for downstream classification tasks in a misspecified, over-parameterized setting, highlighting how a large amount of unlabeled data contributes to prediction accuracy. Our theoretical findings suggest that the testing error of downstream tasks depends solely on the efficiency of data augmentation used in ACT when the unlabeled sample size is sufficiently large. This offers a theoretical understanding of learning downstream tasks with a small sample size.
△ Less
Submitted 16 August, 2024;
originally announced August 2024.
-
Enhanced Equivalent Circuit Model for High Current Discharge of Lithium-Ion Batteries with Application to Electric Vertical Takeoff and Landing Aircraft
Authors:
Alireza Goshtasbi,
Ruxiu Zhao,
Ruiting Wang,
Sangwoo Han,
Wenting Ma,
Jeremy Neubauer
Abstract:
Conventional battery equivalent circuit models (ECMs) have limited capability to predict performance at high discharge rates, where lithium depleted regions may develop and cause a sudden exponential drop in the cell's terminal voltage. Having accurate predictions of performance under such conditions is necessary for electric vertical takeoff and landing (eVTOL) aircraft applications, where high d…
▽ More
Conventional battery equivalent circuit models (ECMs) have limited capability to predict performance at high discharge rates, where lithium depleted regions may develop and cause a sudden exponential drop in the cell's terminal voltage. Having accurate predictions of performance under such conditions is necessary for electric vertical takeoff and landing (eVTOL) aircraft applications, where high discharge currents can be required during fault scenarios and the inability to provide these currents can be safety-critical. To address this challenge, we utilize data-driven modeling methods to derive a parsimonious addition to a conventional ECM that can capture the observed rapid voltage drop with only one additional state. We also provide a detailed method for identifying the resulting model parameters, including an extensive characterization data set along with a well-regularized objective function formulation. The model is validated against a novel data set of over 150 flights encompassing a wide array of conditions for an eVTOL aircraft using an application-specific and safety-relevant reserve duration metric for quantifying accuracy. The model is shown to predict the landing hover capability with an error mean and standard deviation of 2.9 and 6.2 seconds, respectively, defining the model's ability to capture the cell voltage behavior under high discharge currents.
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
Exploring New Physics with PandaX-4T Low Energy Electronic Recoil Data
Authors:
PandaX Collaboration,
Xinning Zeng,
Zihao Bo,
Wei Chen,
Xun Chen,
Yunhua Chen,
Zhaokan Cheng,
Xiangyi Cui,
Yingjie Fan,
Deqing Fang,
Zhixing Gao,
Lisheng Geng,
Karl Giboni,
Xunan Guo,
Xuyuan Guo,
Zichao Guo,
Chencheng Han,
Ke HanChangda He,
Jinrong He,
Di Huang,
Houqi Huang,
Junting Huang,
Ruquan Hou,
Yu Hou,
Xiangdong Ji
, et al. (76 additional authors not shown)
Abstract:
New particles beyond the Standard Model of particle physics, such as axions, can be effectively searched through their interactions with electrons. We use the large liquid xenon detector PandaX-4T to search for novel electronic recoil signals induced by solar axions, neutrinos with anomalous magnetic moment, axion-like particles, dark photons, and light fermionic dark matter. A detailed background…
▽ More
New particles beyond the Standard Model of particle physics, such as axions, can be effectively searched through their interactions with electrons. We use the large liquid xenon detector PandaX-4T to search for novel electronic recoil signals induced by solar axions, neutrinos with anomalous magnetic moment, axion-like particles, dark photons, and light fermionic dark matter. A detailed background model is established with the latest datasets with 1.54 $\rm tonne \cdot year$ exposure. No significant excess above the background has been observed, and we have obtained competitive constraints for axion couplings, neutrino magnetic moment, and fermionic dark matter interactions.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.