-
The $Λ_c^+\toηπ^+Λ$ reaction and the $Λa_0^+(980)$ and $π^+Λ(1670)$ contributions
Authors:
Man-Yu Duan,
Wen-Tao Lyu,
Chu-Wen Xiao,
En Wang,
Ju-Jun Xie,
Dian-Yong Chen,
Dian-Yong Chen
Abstract:
We study from the theoretical point of view the $Λ_c^+\to π^+ ηΛ$ reaction, recently measured by the Belle and BESIII Collaborations, where clear signals are observed for $a_0(980)$, $Λ(1670)$, and $Σ(1385)$ excitation. By considering the $a_0(980)$ and $Λ(1670)$ as dynamically generated resonances from the meson meson and meson baryon interaction, respectively, we are able to determine their rela…
▽ More
We study from the theoretical point of view the $Λ_c^+\to π^+ ηΛ$ reaction, recently measured by the Belle and BESIII Collaborations, where clear signals are observed for $a_0(980)$, $Λ(1670)$, and $Σ(1385)$ excitation. By considering the $a_0(980)$ and $Λ(1670)$ as dynamically generated resonances from the meson meson and meson baryon interaction, respectively, we are able to determine their relative production strength in the reaction, which is also tied to the strength of the $π^+ ηΛ$ tree level contribution. We observe that this latter strength is very big and there are large destructive interferences between the tree level and the rescattering terms where the $a_0(980)$ and $Λ(1670)$ are generated. The $Σ(1385)$ contribution is included by means of a free parameter, the only one of the theory, up to a global normalization, when one considers only external emission, and we observe that the spin flip part of this term, usually ignored in theoretical and experimental works, plays an important role determining the shape of the mass distributions. Internal emission is also considered and it is found to play a minor role.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
ConSinger: Efficient High-Fidelity Singing Voice Generation with Minimal Steps
Authors:
Yulin Song,
Guorui Sang,
Jing Yu,
Chuangbai Xiao
Abstract:
Singing voice synthesis (SVS) system is expected to generate high-fidelity singing voice from given music scores (lyrics, duration and pitch). Recently, diffusion models have performed well in this field. However, sacrificing inference speed to exchange with high-quality sample generation limits its application scenarios. In order to obtain high quality synthetic singing voice more efficiently, we…
▽ More
Singing voice synthesis (SVS) system is expected to generate high-fidelity singing voice from given music scores (lyrics, duration and pitch). Recently, diffusion models have performed well in this field. However, sacrificing inference speed to exchange with high-quality sample generation limits its application scenarios. In order to obtain high quality synthetic singing voice more efficiently, we propose a singing voice synthesis method based on the consistency model, ConSinger, to achieve high-fidelity singing voice synthesis with minimal steps. The model is trained by applying consistency constraint and the generation quality is greatly improved at the expense of a small amount of inference speed. Our experiments show that ConSinger is highly competitive with the baseline model in terms of generation speed and quality. Audio samples are available at https://keylxiao.github.io/consinger.
△ Less
Submitted 20 October, 2024;
originally announced October 2024.
-
TRIZ Method for Urban Building Energy Optimization: GWO-SARIMA-LSTM Forecasting model
Authors:
Shirong Zheng,
Shaobo Liu,
Zhenhong Zhang,
Dian Gu,
Chunqiu Xia,
Huadong Pang,
Enock Mintah Ampaw
Abstract:
With the advancement of global climate change and sustainable development goals, urban building energy consumption optimization and carbon emission reduction have become the focus of research. Traditional energy consumption prediction methods often lack accuracy and adaptability due to their inability to fully consider complex energy consumption patterns, especially in dealing with seasonal fluctu…
▽ More
With the advancement of global climate change and sustainable development goals, urban building energy consumption optimization and carbon emission reduction have become the focus of research. Traditional energy consumption prediction methods often lack accuracy and adaptability due to their inability to fully consider complex energy consumption patterns, especially in dealing with seasonal fluctuations and dynamic changes. This study proposes a hybrid deep learning model that combines TRIZ innovation theory with GWO, SARIMA and LSTM to improve the accuracy of building energy consumption prediction. TRIZ plays a key role in model design, providing innovative solutions to achieve an effective balance between energy efficiency, cost and comfort by systematically analyzing the contradictions in energy consumption optimization. GWO is used to optimize the parameters of the model to ensure that the model maintains high accuracy under different conditions. The SARIMA model focuses on capturing seasonal trends in the data, while the LSTM model handles short-term and long-term dependencies in the data, further improving the accuracy of the prediction. The main contribution of this research is the development of a robust model that leverages the strengths of TRIZ and advanced deep learning techniques, improving the accuracy of energy consumption predictions. Our experiments demonstrate a significant 15% reduction in prediction error compared to existing models. This innovative approach not only enhances urban energy management but also provides a new framework for optimizing energy use and reducing carbon emissions, contributing to sustainable development.
△ Less
Submitted 20 October, 2024;
originally announced October 2024.
-
CAST: Corpus-Aware Self-similarity Enhanced Topic modelling
Authors:
Yanan Ma,
Chenghao Xiao,
Chenhan Yuan,
Sabine N van der Veer,
Lamiece Hassan,
Chenghua Lin,
Goran Nenadic
Abstract:
Topic modelling is a pivotal unsupervised machine learning technique for extracting valuable insights from large document collections. Existing neural topic modelling methods often encode contextual information of documents, while ignoring contextual details of candidate centroid words, leading to the inaccurate selection of topic words due to the contextualization gap. In parallel, it is found th…
▽ More
Topic modelling is a pivotal unsupervised machine learning technique for extracting valuable insights from large document collections. Existing neural topic modelling methods often encode contextual information of documents, while ignoring contextual details of candidate centroid words, leading to the inaccurate selection of topic words due to the contextualization gap. In parallel, it is found that functional words are frequently selected over topical words. To address these limitations, we introduce CAST: Corpus-Aware Self-similarity Enhanced Topic modelling, a novel topic modelling method that builds upon candidate centroid word embeddings contextualized on the dataset, and a novel self-similarity-based method to filter out less meaningful tokens. Inspired by findings in contrastive learning that self-similarities of functional token embeddings in different contexts are much lower than topical tokens, we find self-similarity to be an effective metric to prevent functional words from acting as candidate topic words. Our approach significantly enhances the coherence and diversity of generated topics, as well as the topic model's ability to handle noisy data. Experiments on news benchmark datasets and one Twitter dataset demonstrate the method's superiority in generating coherent, diverse topics, and handling noisy data, outperforming strong baselines.
△ Less
Submitted 19 October, 2024;
originally announced October 2024.
-
Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion
Authors:
Chaodong Xiao,
Minghan Li,
Zhengqiang Zhang,
Deyu Meng,
Lei Zhang
Abstract:
Selective state space models (SSMs), such as Mamba, highly excel at capturing long-range dependencies in 1D sequential data, while their applications to 2D vision tasks still face challenges. Current visual SSMs often convert images into 1D sequences and employ various scanning patterns to incorporate local spatial dependencies. However, these methods are limited in effectively capturing the compl…
▽ More
Selective state space models (SSMs), such as Mamba, highly excel at capturing long-range dependencies in 1D sequential data, while their applications to 2D vision tasks still face challenges. Current visual SSMs often convert images into 1D sequences and employ various scanning patterns to incorporate local spatial dependencies. However, these methods are limited in effectively capturing the complex image spatial structures and the increased computational cost caused by the lengthened scanning paths. To address these limitations, we propose Spatial-Mamba, a novel approach that establishes neighborhood connectivity directly in the state space. Instead of relying solely on sequential state transitions, we introduce a structure-aware state fusion equation, which leverages dilated convolutions to capture image spatial structural dependencies, significantly enhancing the flow of visual contextual information. Spatial-Mamba proceeds in three stages: initial state computation in a unidirectional scan, spatial context acquisition through structure-aware state fusion, and final state computation using the observation equation. Our theoretical analysis shows that Spatial-Mamba unifies the original Mamba and linear attention under the same matrix multiplication framework, providing a deeper understanding of our method. Experimental results demonstrate that Spatial-Mamba, even with a single scan, attains or surpasses the state-of-the-art SSM-based models in image classification, detection and segmentation. Source codes and trained models can be found at $\href{https://github.com/EdwardChasel/Spatial-Mamba}{\text{this https URL}}$.
△ Less
Submitted 19 October, 2024;
originally announced October 2024.
-
SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment
Authors:
Qin Liu,
Fei Wang,
Chaowei Xiao,
Muhao Chen
Abstract:
Existing preference alignment is a one-size-fits-all alignment mechanism, where the part of the large language model (LLM) parametric knowledge with non-preferred features is uniformly blocked to all the users. However, this part of knowledge can be useful to advanced users whose expertise qualifies them to handle these information. The one-size-fits-all alignment mechanism undermines LLM's utilit…
▽ More
Existing preference alignment is a one-size-fits-all alignment mechanism, where the part of the large language model (LLM) parametric knowledge with non-preferred features is uniformly blocked to all the users. However, this part of knowledge can be useful to advanced users whose expertise qualifies them to handle these information. The one-size-fits-all alignment mechanism undermines LLM's utility for these qualified users. To address this problem, we propose SudoLM, a framework that lets LLMs learn access control over specific parametric knowledge for users with different credentials via authorization alignment. SudoLM allows authorized users to unlock their access to all the parametric knowledge with an assigned SUDO key while blocking access to non-qualified users. Experiments on two application scenarios demonstrate that SudoLM effectively controls the user's access to the parametric knowledge and maintains its general utility.
△ Less
Submitted 18 October, 2024;
originally announced October 2024.
-
SIMformer: Single-Layer Vanilla Transformer Can Learn Free-Space Trajectory Similarity
Authors:
Chuang Yang,
Renhe Jiang,
Xiaohang Xu,
Chuan Xiao,
Kaoru Sezaki
Abstract:
Free-space trajectory similarity calculation, e.g., DTW, Hausdorff, and Frechet, often incur quadratic time complexity, thus learning-based methods have been proposed to accelerate the computation. The core idea is to train an encoder to transform trajectories into representation vectors and then compute vector similarity to approximate the ground truth. However, existing methods face dual challen…
▽ More
Free-space trajectory similarity calculation, e.g., DTW, Hausdorff, and Frechet, often incur quadratic time complexity, thus learning-based methods have been proposed to accelerate the computation. The core idea is to train an encoder to transform trajectories into representation vectors and then compute vector similarity to approximate the ground truth. However, existing methods face dual challenges of effectiveness and efficiency: 1) they all utilize Euclidean distance to compute representation similarity, which leads to the severe curse of dimensionality issue -- reducing the distinguishability among representations and significantly affecting the accuracy of subsequent similarity search tasks; 2) most of them are trained in triplets manner and often necessitate additional information which downgrades the efficiency; 3) previous studies, while emphasizing the scalability in terms of efficiency, overlooked the deterioration of effectiveness when the dataset size grows. To cope with these issues, we propose a simple, yet accurate, fast, scalable model that only uses a single-layer vanilla transformer encoder as the feature extractor and employs tailored representation similarity functions to approximate various ground truth similarity measures. Extensive experiments demonstrate our model significantly mitigates the curse of dimensionality issue and outperforms the state-of-the-arts in effectiveness, efficiency, and scalability.
△ Less
Submitted 18 October, 2024;
originally announced October 2024.
-
The Logarithmic Sobolev inequality on non-compact self-shrinkers
Authors:
Guofang Wang,
Chao Xia,
Xiqiang Zhang
Abstract:
In the paper we establish an optimal logarithmic Sobolev inequality for complete, non-compact, properly embedded self-shrinkers in the Euclidean space, which generalizes a recent result of Brendle \cite{Brendle22} for closed self-shrinkers. We first provide a proof for the logarithmic Sobolev inequality in the Euclidean space by using the Alexandrov-Bakelman-Pucci (ABP) method. Then we use this ap…
▽ More
In the paper we establish an optimal logarithmic Sobolev inequality for complete, non-compact, properly embedded self-shrinkers in the Euclidean space, which generalizes a recent result of Brendle \cite{Brendle22} for closed self-shrinkers. We first provide a proof for the logarithmic Sobolev inequality in the Euclidean space by using the Alexandrov-Bakelman-Pucci (ABP) method. Then we use this approach to show an optimal logarithmic Sobolev inequality for complete, non-compact, properly embedded self-shrinkers in the Euclidean space, which is a sharp version of the result of Ecker in \cite{Ecker}. The proof is a noncompact modification of Brendle's proof for closed submanifolds and has a big potential to provide new inequalities in noncompact manifolds.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
Temporal-Enhanced Multimodal Transformer for Referring Multi-Object Tracking and Segmentation
Authors:
Changcheng Xiao,
Qiong Cao,
Yujie Zhong,
Xiang Zhang,
Tao Wang,
Canqun Yang,
Long Lan
Abstract:
Referring multi-object tracking (RMOT) is an emerging cross-modal task that aims to locate an arbitrary number of target objects and maintain their identities referred by a language expression in a video. This intricate task involves the reasoning of linguistic and visual modalities, along with the temporal association of target objects. However, the seminal work employs only loose feature fusion…
▽ More
Referring multi-object tracking (RMOT) is an emerging cross-modal task that aims to locate an arbitrary number of target objects and maintain their identities referred by a language expression in a video. This intricate task involves the reasoning of linguistic and visual modalities, along with the temporal association of target objects. However, the seminal work employs only loose feature fusion and overlooks the utilization of long-term information on tracked objects. In this study, we introduce a compact Transformer-based method, termed TenRMOT. We conduct feature fusion at both encoding and decoding stages to fully exploit the advantages of Transformer architecture. Specifically, we incrementally perform cross-modal fusion layer-by-layer during the encoding phase. In the decoding phase, we utilize language-guided queries to probe memory features for accurate prediction of the desired objects. Moreover, we introduce a query update module that explicitly leverages temporal prior information of the tracked objects to enhance the consistency of their trajectories. In addition, we introduce a novel task called Referring Multi-Object Tracking and Segmentation (RMOTS) and construct a new dataset named Ref-KITTI Segmentation. Our dataset consists of 18 videos with 818 expressions, and each expression averages 10.7 masks, which poses a greater challenge compared to the typical single mask in most existing referring video segmentation datasets. TenRMOT demonstrates superior performance on both the referring multi-object tracking and the segmentation tasks.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
Segment as You Wish -- Free-Form Language-Based Segmentation for Medical Images
Authors:
Longchao Da,
Rui Wang,
Xiaojian Xu,
Parminder Bhatia,
Taha Kass-Hout,
Hua Wei,
Cao Xiao
Abstract:
Medical imaging is crucial for diagnosing a patient's health condition, and accurate segmentation of these images is essential for isolating regions of interest to ensure precise diagnosis and treatment planning. Existing methods primarily rely on bounding boxes or point-based prompts, while few have explored text-related prompts, despite clinicians often describing their observations and instruct…
▽ More
Medical imaging is crucial for diagnosing a patient's health condition, and accurate segmentation of these images is essential for isolating regions of interest to ensure precise diagnosis and treatment planning. Existing methods primarily rely on bounding boxes or point-based prompts, while few have explored text-related prompts, despite clinicians often describing their observations and instructions in natural language. To address this gap, we first propose a RAG-based free-form text prompt generator, that leverages the domain corpus to generate diverse and realistic descriptions. Then, we introduce FLanS, a novel medical image segmentation model that handles various free-form text prompts, including professional anatomy-informed queries, anatomy-agnostic position-driven queries, and anatomy-agnostic size-driven queries. Additionally, our model also incorporates a symmetry-aware canonicalization module to ensure consistent, accurate segmentations across varying scan orientations and reduce confusion between the anatomical position of an organ and its appearance in the scan. FLanS is trained on a large-scale dataset of over 100k medical images from 7 public datasets. Comprehensive experiments demonstrate the model's superior language understanding and segmentation precision, along with a deep comprehension of the relationship between them, outperforming SOTA baselines on both in-domain and out-of-domain datasets.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
Anomalously Enhanced Diffusivity of Moiré Excitons via Manipulating the Interplay with Correlated Electrons
Authors:
Li Yan,
Lei Ma,
Yuze Meng,
Chengxin Xiao,
Bo Chen,
Qiran Wu,
Jingyuan Cui,
Qingrui Cao,
Rounak Banerjee,
Takashi Taniguchi,
Kenji Watanabe,
Seth Ariel Tongay,
Benjamin Hunt,
Yong-Tao Cui,
Wang Yao,
Su-Fei Shi
Abstract:
Semiconducting transitional metal dichalcogenides (TMDCs) moiré superlattice provides an exciting platform for manipulating excitons. The in-situ control of moiré potential confined exciton would usher in unprecedented functions of excitonic devices but remains challenging. Meanwhile, as a dipolar composite boson, interlayer exciton in the type-II aligned TMDC moiré superlattice strongly interacts…
▽ More
Semiconducting transitional metal dichalcogenides (TMDCs) moiré superlattice provides an exciting platform for manipulating excitons. The in-situ control of moiré potential confined exciton would usher in unprecedented functions of excitonic devices but remains challenging. Meanwhile, as a dipolar composite boson, interlayer exciton in the type-II aligned TMDC moiré superlattice strongly interacts with fermionic charge carriers. Here, we demonstrate active manipulation of the exciton diffusivity by tuning their interplay with correlated carriers in moiré potentials. At fractional fillings where carriers are known to form generalized Wigner crystals, we observed suppressed diffusivity of exciton. In contrast, in Fermi liquid states where carriers dynamically populate all moiré traps, the repulsive carrier-exciton interaction can effectively reduce the moiré potential confinement seen by the exciton, leading to enhanced diffusivity with the increase of the carrier density. Notably, the exciton diffusivity is enhanced by orders of magnitude near the Mott insulator state, and the enhancement is much more pronounced for the 0-degree than the 60-degree aligned WS2/WSe2 heterobilayer due to the more localized nature of interlayer excitons. Our study inspires further engineering and controlling exotic excitonic states in TMDC moiré superlattices for fascinating quantum phenomena and novel excitonic devices.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
HumanFT: A Human-like Fingertip Multimodal Visuo-Tactile Sensor
Authors:
Yifan Wu,
Yuzhou Chen,
Zhengying Zhu,
Xuhao Qin,
Chenxi Xiao
Abstract:
Tactile sensors play a crucial role in enabling robots to interact effectively and safely with objects in everyday tasks. In particular, visuotactile sensors have seen increasing usage in two and three-fingered grippers due to their high-quality feedback. However, a significant gap remains in the development of sensors suitable for humanoid robots, especially five-fingered dexterous hands. One rea…
▽ More
Tactile sensors play a crucial role in enabling robots to interact effectively and safely with objects in everyday tasks. In particular, visuotactile sensors have seen increasing usage in two and three-fingered grippers due to their high-quality feedback. However, a significant gap remains in the development of sensors suitable for humanoid robots, especially five-fingered dexterous hands. One reason is because of the challenges in designing and manufacturing sensors that are compact in size. In this paper, we propose HumanFT, a multimodal visuotactile sensor that replicates the shape and functionality of a human fingertip. To bridge the gap between human and robotic tactile sensing, our sensor features real-time force measurements, high-frequency vibration detection, and overtemperature alerts. To achieve this, we developed a suite of fabrication techniques for a new type of elastomer optimized for force propagation and temperature sensing. Besides, our sensor integrates circuits capable of sensing pressure and vibration. These capabilities have been validated through experiments. The proposed design is simple and cost-effective to fabricate. We believe HumanFT can enhance humanoid robots' perception by capturing and interpreting multimodal tactile information.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
Gaseous Scissor-mediated Electrochemical Exfoliation of Halogenated MXenes and its Boosting in Wear-Resisting Tribovoltaic Devices
Authors:
Qi Fan,
Minghua Chen,
Longyi Li,
Minghui Li,
Chuanxiao Xiao,
Tianci Zhao,
Long Pan,
Ningning Liang,
Qing Huang,
Laipan Zhu,
Michael Naguib,
Kun Liang
Abstract:
Two-dimensional transition metal carbides (MXenes), especially their few-layered nanosheets, have triggered burgeoning research attentions owing to their superiorities including extraordinary conductivity, accessible active surface, and adjustable processability. Molten salts etching route further achieves their controllable surface chemistry. However, the method encounters challenges in achieving…
▽ More
Two-dimensional transition metal carbides (MXenes), especially their few-layered nanosheets, have triggered burgeoning research attentions owing to their superiorities including extraordinary conductivity, accessible active surface, and adjustable processability. Molten salts etching route further achieves their controllable surface chemistry. However, the method encounters challenges in achieving few-layer structures due to more complex delamination behaviors. Herein, we present an efficient strategy to fabricate Cl- or Br-terminated MXene nanoflakes with few-layers, achieved by electrochemical intercalation of Li ions and concomitant solvent molecules in the electrolyte solution, with gaseous scissors (propylene molecules) to break up interlayer forces. By controlling cut-off voltages, the optimal protocol results in nanosheets with an ultrahigh yield (~93%) and preserved surface chemistry. The resultant MXenes dispersions were employed as lubricants to enhance tribovoltaic nanogenerators, where Ti3C2Br2 displayed superior electrical output. These findings facilitate the understanding of MXenes' intrinsic physical properties and enable the nanoengineering of advanced electronic devices.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
BIPEFT: Budget-Guided Iterative Search for Parameter Efficient Fine-Tuning of Large Pretrained Language Models
Authors:
Aofei Chang,
Jiaqi Wang,
Han Liu,
Parminder Bhatia,
Cao Xiao,
Ting Wang,
Fenglong Ma
Abstract:
Parameter Efficient Fine-Tuning (PEFT) offers an efficient solution for fine-tuning large pretrained language models for downstream tasks. However, most PEFT strategies are manually designed, often resulting in suboptimal performance. Recent automatic PEFT approaches aim to address this but face challenges such as search space entanglement, inefficiency, and lack of integration between parameter b…
▽ More
Parameter Efficient Fine-Tuning (PEFT) offers an efficient solution for fine-tuning large pretrained language models for downstream tasks. However, most PEFT strategies are manually designed, often resulting in suboptimal performance. Recent automatic PEFT approaches aim to address this but face challenges such as search space entanglement, inefficiency, and lack of integration between parameter budgets and search processes. To overcome these issues, we introduce a novel Budget-guided Iterative search strategy for automatic PEFT (BIPEFT), significantly enhancing search efficiency. BIPEFT employs a new iterative search strategy to disentangle the binary module and rank dimension search spaces. Additionally, we design early selection strategies based on parameter budgets, accelerating the learning process by gradually removing unimportant modules and fixing rank dimensions. Extensive experiments on public benchmarks demonstrate the superior performance of BIPEFT in achieving efficient and effective PEFT for downstream tasks with a low parameter budget.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
RePD: Defending Jailbreak Attack through a Retrieval-based Prompt Decomposition Process
Authors:
Peiran Wang,
Xiaogeng Liu,
Chaowei Xiao
Abstract:
In this study, we introduce RePD, an innovative attack Retrieval-based Prompt Decomposition framework designed to mitigate the risk of jailbreak attacks on large language models (LLMs). Despite rigorous pretraining and finetuning focused on ethical alignment, LLMs are still susceptible to jailbreak exploits. RePD operates on a one-shot learning model, wherein it accesses a database of pre-collecte…
▽ More
In this study, we introduce RePD, an innovative attack Retrieval-based Prompt Decomposition framework designed to mitigate the risk of jailbreak attacks on large language models (LLMs). Despite rigorous pretraining and finetuning focused on ethical alignment, LLMs are still susceptible to jailbreak exploits. RePD operates on a one-shot learning model, wherein it accesses a database of pre-collected jailbreak prompt templates to identify and decompose harmful inquiries embedded within user prompts. This process involves integrating the decomposition of the jailbreak prompt into the user's original query into a one-shot learning example to effectively teach the LLM to discern and separate malicious components. Consequently, the LLM is equipped to first neutralize any potentially harmful elements before addressing the user's prompt in a manner that aligns with its ethical guidelines. RePD is versatile and compatible with a variety of open-source LLMs acting as agents. Through comprehensive experimentation with both harmful and benign prompts, we have demonstrated the efficacy of our proposed RePD in enhancing the resilience of LLMs against jailbreak attacks, without compromising their performance in responding to typical user requests.
△ Less
Submitted 11 October, 2024;
originally announced October 2024.
-
Complex Logical Query Answering by Calibrating Knowledge Graph Completion Models
Authors:
Changyi Xiao,
Yixin Cao
Abstract:
Complex logical query answering (CLQA) is a challenging task that involves finding answer entities for complex logical queries over incomplete knowledge graphs (KGs). Previous research has explored the use of pre-trained knowledge graph completion (KGC) models, which can predict the missing facts in KGs, to answer complex logical queries. However, KGC models are typically evaluated using ranking e…
▽ More
Complex logical query answering (CLQA) is a challenging task that involves finding answer entities for complex logical queries over incomplete knowledge graphs (KGs). Previous research has explored the use of pre-trained knowledge graph completion (KGC) models, which can predict the missing facts in KGs, to answer complex logical queries. However, KGC models are typically evaluated using ranking evaluation metrics, which may result in values of predictions of KGC models that are not well-calibrated. In this paper, we propose a method for calibrating KGC models, namely CKGC, which enables KGC models to adapt to answering complex logical queries. Notably, CKGC is lightweight and effective. The adaptation function is simple, allowing the model to quickly converge during the adaptation process. The core concept of CKGC is to map the values of predictions of KGC models to the range [0, 1], ensuring that values associated with true facts are close to 1, while values linked to false facts are close to 0. Through experiments on three benchmark datasets, we demonstrate that our proposed calibration method can significantly boost model performance in the CLQA task. Moreover, our approach can enhance the performance of CLQA while preserving the ranking evaluation metrics of KGC models. The code is available at https://github.com/changyi7231/CKGC.
△ Less
Submitted 30 September, 2024;
originally announced October 2024.
-
Representations of non-finitely graded Heisenberg-Virasoro type Lie algebras
Authors:
Chunguang Xia,
Tianyu Ma,
Wei Wang,
Mingjing Zhang
Abstract:
We construct and study non-finitely graded Lie algebras $\mathcal{HV}(a,b;ε)$ related to Heisenberg-Virasoro type Lie algebras, where $a,b$ are complex numbers, and $ε= \pm 1$. Using combinatorial techniques, we completely classify the free $\mathcal{U}(\mathfrak h)$-modules of rank one over $\mathcal{HV}(a,b;ε)$. It turns out that these modules are more varied and complex than those over non-fini…
▽ More
We construct and study non-finitely graded Lie algebras $\mathcal{HV}(a,b;ε)$ related to Heisenberg-Virasoro type Lie algebras, where $a,b$ are complex numbers, and $ε= \pm 1$. Using combinatorial techniques, we completely classify the free $\mathcal{U}(\mathfrak h)$-modules of rank one over $\mathcal{HV}(a,b;ε)$. It turns out that these modules are more varied and complex than those over non-finitely graded Virasoro algebras, and in particular admit infinitely many free parameters if $b=1$ and $ε=-1$. Meanwhile, we also determine the simplicity and isomorphism classes of these modules.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs
Authors:
Cheng Gao,
Chaojun Xiao,
Zhenghao Liu,
Huimin Chen,
Zhiyuan Liu,
Maosong Sun
Abstract:
Legal case retrieval (LCR) aims to provide similar cases as references for a given fact description. This task is crucial for promoting consistent judgments in similar cases, effectively enhancing judicial fairness and improving work efficiency for judges. However, existing works face two main challenges for real-world applications: existing works mainly focus on case-to-case retrieval using lengt…
▽ More
Legal case retrieval (LCR) aims to provide similar cases as references for a given fact description. This task is crucial for promoting consistent judgments in similar cases, effectively enhancing judicial fairness and improving work efficiency for judges. However, existing works face two main challenges for real-world applications: existing works mainly focus on case-to-case retrieval using lengthy queries, which does not match real-world scenarios; and the limited data scale, with current datasets containing only hundreds of queries, is insufficient to satisfy the training requirements of existing data-hungry neural models. To address these issues, we introduce an automated method to construct synthetic query-candidate pairs and build the largest LCR dataset to date, LEAD, which is hundreds of times larger than existing datasets. This data construction method can provide ample training signals for LCR models. Experimental results demonstrate that model training with our constructed data can achieve state-of-the-art results on two widely-used LCR benchmarks. Besides, the construction method can also be applied to civil cases and achieve promising results. The data and codes can be found in https://github.com/thunlp/LEAD.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
LeanAgent: Lifelong Learning for Formal Theorem Proving
Authors:
Adarsh Kumarappan,
Mo Tiwari,
Peiyang Song,
Robert Joseph George,
Chaowei Xiao,
Anima Anandkumar
Abstract:
Large Language Models (LLMs) have been successful in mathematical reasoning tasks such as formal theorem proving when integrated with interactive proof assistants like Lean. Existing approaches involve training or fine-tuning an LLM on a specific dataset to perform well on particular domains, such as undergraduate-level mathematics. These methods struggle with generalizability to advanced mathemat…
▽ More
Large Language Models (LLMs) have been successful in mathematical reasoning tasks such as formal theorem proving when integrated with interactive proof assistants like Lean. Existing approaches involve training or fine-tuning an LLM on a specific dataset to perform well on particular domains, such as undergraduate-level mathematics. These methods struggle with generalizability to advanced mathematics. A fundamental limitation is that these approaches operate on static domains, failing to capture how mathematicians often work across multiple domains and projects simultaneously or cyclically. We present LeanAgent, a novel lifelong learning framework for theorem proving that continuously generalizes to and improves on ever-expanding mathematical knowledge without forgetting previously learned knowledge. LeanAgent introduces several key innovations, including a curriculum learning strategy that optimizes the learning trajectory in terms of mathematical difficulty, a dynamic database for efficient management of evolving mathematical knowledge, and progressive training to balance stability and plasticity. LeanAgent successfully proves 162 theorems previously unproved by humans across 23 diverse Lean repositories, many from advanced mathematics. It performs significantly better than the static LLM baseline, proving challenging theorems in domains like abstract algebra and algebraic topology while showcasing a clear progression of learning from basic concepts to advanced topics. In addition, we analyze LeanAgent's superior performance on key lifelong learning metrics. LeanAgent achieves exceptional scores in stability and backward transfer, where learning new tasks improves performance on previously learned tasks. This emphasizes LeanAgent's continuous generalizability and improvement, explaining its superior theorem-proving performance.
△ Less
Submitted 17 October, 2024; v1 submitted 8 October, 2024;
originally announced October 2024.
-
QERA: an Analytical Framework for Quantization Error Reconstruction
Authors:
Cheng Zhang,
Jeffrey T. H. Wong,
Can Xiao,
George A. Constantinides,
Yiren Zhao
Abstract:
he growing number of parameters and computational demands of large language models (LLMs) present significant challenges for their efficient deployment. Recently, there is an increasing interest in quantizing weights to extremely low precision while offsetting the resulting error with low-rank, high-precision error reconstruction terms. The combination of quantization and low-rank approximation is…
▽ More
he growing number of parameters and computational demands of large language models (LLMs) present significant challenges for their efficient deployment. Recently, there is an increasing interest in quantizing weights to extremely low precision while offsetting the resulting error with low-rank, high-precision error reconstruction terms. The combination of quantization and low-rank approximation is now popular in both adapter-based, parameter-efficient fine-tuning methods such as LoftQ and low-precision inference techniques including ZeroQuant-V2. Usually, the low-rank terms are calculated via the singular value decomposition (SVD) of the weight quantization error, minimizing the Frobenius and spectral norms of the weight approximation error. Recent methods like LQ-LoRA and LQER introduced hand-crafted heuristics to minimize errors in layer outputs (activations) rather than weights, resulting improved quantization results. However, these heuristic methods lack an analytical solution to guide the design of quantization error reconstruction terms. In this paper, we revisit this problem and formulate an analytical framework, named Quantization Error Reconstruction Analysis (QERA), and offer a closed-form solution to the problem. We show QERA benefits both existing low-precision fine-tuning and inference methods -- QERA achieves a fine-tuned accuracy gain of $Δ_{\text{acc}}$ = 6.05% of 2-bit RoBERTa-base on GLUE compared to LoftQ; and obtains $Δ_{\text{acc}}$ = 2.97% higher post-training quantization accuracy of 4-bit Llama-3.1-70B on average than ZeroQuant-V2 and $Δ_{\text{ppl}}$ = - 0.28 lower perplexity on WikiText2 than LQER.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
The $X(4500)$ state considered as the mixture of hadronic molecule and diquark-antidiquark within effective field theory
Authors:
De-Shun Zhang,
Wei He,
Chu-Wen Xiao,
Zhi-Feng Sun
Abstract:
In the present work, we construct the Lagrangians including three-meson, meson-diquark-antidiquark vertices, such that the diquark-antidiquark component as well as the molecular component are introduced within the effective field theory. With the obtained effective potentials projecting to spin 0, 1 and 2, we solve the Bethe-Salpeter equation with the on-shell approximation, and find that…
▽ More
In the present work, we construct the Lagrangians including three-meson, meson-diquark-antidiquark vertices, such that the diquark-antidiquark component as well as the molecular component are introduced within the effective field theory. With the obtained effective potentials projecting to spin 0, 1 and 2, we solve the Bethe-Salpeter equation with the on-shell approximation, and find that $X(4500)$ can be explained as the mixture of components $D_{s}^{*+}D_{s}^{*-}$, ${A}_{cq}\bar{A}_{cq}$ and ${A}_{cs}\bar{A}_{cs}$ with $I^G(J^{PC})=0^+(0^{++})$. In addition, another two resonances with quantum numbers $I^G(J^{PC})=0^+(1^{++})$ and $I^G(J^{PC})=0^+(2^{++})$ are predicted.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
Swift Sampler: Efficient Learning of Sampler by 10 Parameters
Authors:
Jiawei Yao,
Chuming Li,
Canran Xiao
Abstract:
Data selection is essential for training deep learning models. An effective data sampler assigns proper sampling probability for training data and helps the model converge to a good local minimum with high performance. Previous studies in data sampling are mainly based on heuristic rules or learning through a huge amount of time-consuming trials. In this paper, we propose an automatic \textbf{swif…
▽ More
Data selection is essential for training deep learning models. An effective data sampler assigns proper sampling probability for training data and helps the model converge to a good local minimum with high performance. Previous studies in data sampling are mainly based on heuristic rules or learning through a huge amount of time-consuming trials. In this paper, we propose an automatic \textbf{swift sampler} search algorithm, \textbf{SS}, to explore automatically learning effective samplers efficiently. In particular, \textbf{SS} utilizes a novel formulation to map a sampler to a low dimension of hyper-parameters and uses an approximated local minimum to quickly examine the quality of a sampler. Benefiting from its low computational expense, \textbf{SS} can be applied on large-scale data sets with high efficiency. Comprehensive experiments on various tasks demonstrate that \textbf{SS} powered sampling can achieve obvious improvements (e.g., 1.5\% on ImageNet) and transfer among different neural networks. Project page: https://github.com/Alexander-Yao/Swift-Sampler.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs
Authors:
Xiaogeng Liu,
Peiran Li,
Edward Suh,
Yevgeniy Vorobeychik,
Zhuoqing Mao,
Somesh Jha,
Patrick McDaniel,
Huan Sun,
Bo Li,
Chaowei Xiao
Abstract:
In this paper, we propose AutoDAN-Turbo, a black-box jailbreak method that can automatically discover as many jailbreak strategies as possible from scratch, without any human intervention or predefined scopes (e.g., specified candidate strategies), and use them for red-teaming. As a result, AutoDAN-Turbo can significantly outperform baseline methods, achieving a 74.3% higher average attack success…
▽ More
In this paper, we propose AutoDAN-Turbo, a black-box jailbreak method that can automatically discover as many jailbreak strategies as possible from scratch, without any human intervention or predefined scopes (e.g., specified candidate strategies), and use them for red-teaming. As a result, AutoDAN-Turbo can significantly outperform baseline methods, achieving a 74.3% higher average attack success rate on public benchmarks. Notably, AutoDAN-Turbo achieves an 88.5 attack success rate on GPT-4-1106-turbo. In addition, AutoDAN-Turbo is a unified framework that can incorporate existing human-designed jailbreak strategies in a plug-and-play manner. By integrating human-designed strategies, AutoDAN-Turbo can even achieve a higher attack success rate of 93.4 on GPT-4-1106-turbo.
△ Less
Submitted 13 October, 2024; v1 submitted 3 October, 2024;
originally announced October 2024.
-
On the Rigour of Scientific Writing: Criteria, Analysis, and Insights
Authors:
Joseph James,
Chenghao Xiao,
Yucheng Li,
Chenghua Lin
Abstract:
Rigour is crucial for scientific research as it ensures the reproducibility and validity of results and findings. Despite its importance, little work exists on modelling rigour computationally, and there is a lack of analysis on whether these criteria can effectively signal or measure the rigour of scientific papers in practice. In this paper, we introduce a bottom-up, data-driven framework to aut…
▽ More
Rigour is crucial for scientific research as it ensures the reproducibility and validity of results and findings. Despite its importance, little work exists on modelling rigour computationally, and there is a lack of analysis on whether these criteria can effectively signal or measure the rigour of scientific papers in practice. In this paper, we introduce a bottom-up, data-driven framework to automatically identify and define rigour criteria and assess their relevance in scientific writing. Our framework includes rigour keyword extraction, detailed rigour definition generation, and salient criteria identification. Furthermore, our framework is domain-agnostic and can be tailored to the evaluation of scientific rigour for different areas, accommodating the distinct salient criteria across fields. We conducted comprehensive experiments based on datasets collected from two high impact venues for Machine Learning and NLP (i.e., ICLR and ACL) to demonstrate the effectiveness of our framework in modelling rigour. In addition, we analyse linguistic patterns of rigour, revealing that framing certainty is crucial for enhancing the perception of scientific rigour, while suggestion certainty and probability uncertainty diminish it.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models
Authors:
Xinyu Liu,
Runsong Zhao,
Pengcheng Huang,
Chunyang Xiao,
Bei Li,
Jingang Wang,
Tong Xiao,
Jingbo Zhu
Abstract:
Numerous recent works target to extend effective context length for language models and various methods, tasks and benchmarks exist to measure model's effective memorization length. However, through thorough investigations, we find limitations for currently existing evaluations on model's memorization capability. We provide an extensive survey for limitations in this work and propose a new method…
▽ More
Numerous recent works target to extend effective context length for language models and various methods, tasks and benchmarks exist to measure model's effective memorization length. However, through thorough investigations, we find limitations for currently existing evaluations on model's memorization capability. We provide an extensive survey for limitations in this work and propose a new method called forgetting curve to measure the memorization capability of long-context models. We show that forgetting curve has the advantage of being robust to the tested corpus and the experimental settings, of not relying on prompts and can be applied to any model size.
We apply our forgetting curve to a large variety of models involving both transformer and RNN/SSM based architectures. Our measurement provides empirical evidence for the effectiveness of transformer extension techniques while raises questions for the effective length of RNN/SSM based models. We also examine the difference between our measurement and existing benchmarks as well as popular metrics for various models. Our code and results can be found at https://github.com/1azybug/ForgettingCurve.
△ Less
Submitted 6 October, 2024;
originally announced October 2024.
-
Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval
Authors:
Pengcheng Jiang,
Cao Xiao,
Minhao Jiang,
Parminder Bhatia,
Taha Kass-Hout,
Jimeng Sun,
Jiawei Han
Abstract:
Large language models (LLMs) have demonstrated significant potential in clinical decision support. Yet LLMs still suffer from hallucinations and lack fine-grained contextual medical knowledge, limiting their high-stake healthcare applications such as clinical diagnosis. Traditional retrieval-augmented generation (RAG) methods attempt to address these limitations but frequently retrieve sparse or i…
▽ More
Large language models (LLMs) have demonstrated significant potential in clinical decision support. Yet LLMs still suffer from hallucinations and lack fine-grained contextual medical knowledge, limiting their high-stake healthcare applications such as clinical diagnosis. Traditional retrieval-augmented generation (RAG) methods attempt to address these limitations but frequently retrieve sparse or irrelevant information, undermining prediction accuracy. We introduce KARE, a novel framework that integrates knowledge graph (KG) community-level retrieval with LLM reasoning to enhance healthcare predictions. KARE constructs a comprehensive multi-source KG by integrating biomedical databases, clinical literature, and LLM-generated insights, and organizes it using hierarchical graph community detection and summarization for precise and contextually relevant information retrieval. Our key innovations include: (1) a dense medical knowledge structuring approach enabling accurate retrieval of relevant information; (2) a dynamic knowledge retrieval mechanism that enriches patient contexts with focused, multi-faceted medical insights; and (3) a reasoning-enhanced prediction framework that leverages these enriched contexts to produce both accurate and interpretable clinical predictions. Extensive experiments demonstrate that KARE outperforms leading models by up to 10.8-15.0% on MIMIC-III and 12.6-12.7% on MIMIC-IV for mortality and readmission predictions. In addition to its impressive prediction accuracy, our framework leverages the reasoning capabilities of LLMs, enhancing the trustworthiness of clinical predictions.
△ Less
Submitted 6 October, 2024;
originally announced October 2024.
-
HaTT: Hadamard avoiding TT recompression
Authors:
Zhonghao Sun,
Jizu Huang,
Chuanfu Xiao,
Chao Yang
Abstract:
The Hadamard product of tensor train (TT) tensors is one of the most fundamental nonlinear operations in scientific computing and data analysis. Due to its tendency to significantly increase TT ranks, the Hadamard product presents a major computational challenge in TT tensor-based algorithms. Therefore, it is essential to develop recompression algorithms that mitigate the effects of this rank incr…
▽ More
The Hadamard product of tensor train (TT) tensors is one of the most fundamental nonlinear operations in scientific computing and data analysis. Due to its tendency to significantly increase TT ranks, the Hadamard product presents a major computational challenge in TT tensor-based algorithms. Therefore, it is essential to develop recompression algorithms that mitigate the effects of this rank increase. Existing recompression algorithms require an explicit representation of the Hadamard product, resulting in high computational and storage complexity. In this work, we propose the Hadamard avoiding TT recompression (HaTT) algorithm. Leveraging the structure of the Hadamard product in TT tensors and its Hadamard product-free property, the overall complexity of the HaTT algorithm is significantly lower than that of existing TT recompression algorithms. This is validated through complexity analysis and several numerical experiments.
△ Less
Submitted 6 October, 2024;
originally announced October 2024.
-
Exploring the Benefit of Activation Sparsity in Pre-training
Authors:
Zhengyan Zhang,
Chaojun Xiao,
Qiujieli Qin,
Yankai Lin,
Zhiyuan Zeng,
Xu Han,
Zhiyuan Liu,
Ruobing Xie,
Maosong Sun,
Jie Zhou
Abstract:
Pre-trained Transformers inherently possess the characteristic of sparse activation, where only a small fraction of the neurons are activated for each token. While sparse activation has been explored through post-training methods, its potential in pre-training remains untapped. In this work, we first study how activation properties change during pre-training. Our examination reveals that Transform…
▽ More
Pre-trained Transformers inherently possess the characteristic of sparse activation, where only a small fraction of the neurons are activated for each token. While sparse activation has been explored through post-training methods, its potential in pre-training remains untapped. In this work, we first study how activation properties change during pre-training. Our examination reveals that Transformers exhibit sparse activation throughout the majority of the pre-training process while the activation correlation keeps evolving as training progresses. Leveraging this observation, we propose Switchable Sparse-Dense Learning (SSD). SSD adaptively switches between the Mixtures-of-Experts (MoE) based sparse training and the conventional dense training during the pre-training process, leveraging the efficiency of sparse training and avoiding the static activation correlation of sparse training. Compared to dense training, SSD achieves comparable performance with identical model size and reduces pre-training costs. Moreover, the models trained with SSD can be directly used as MoE models for sparse inference and achieve the same performance as dense models with up to $2\times$ faster inference speed. Codes are available at https://github.com/thunlp/moefication.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
LLMs May Not Be Human-Level Players, But They Can Be Testers: Measuring Game Difficulty with LLM Agents
Authors:
Chang Xiao,
Brenda Z. Yang
Abstract:
Recent advances in Large Language Models (LLMs) have demonstrated their potential as autonomous agents across various tasks. One emerging application is the use of LLMs in playing games. In this work, we explore a practical problem for the gaming industry: Can LLMs be used to measure game difficulty? We propose a general game-testing framework using LLM agents and test it on two widely played stra…
▽ More
Recent advances in Large Language Models (LLMs) have demonstrated their potential as autonomous agents across various tasks. One emerging application is the use of LLMs in playing games. In this work, we explore a practical problem for the gaming industry: Can LLMs be used to measure game difficulty? We propose a general game-testing framework using LLM agents and test it on two widely played strategy games: Wordle and Slay the Spire. Our results reveal an interesting finding: although LLMs may not perform as well as the average human player, their performance, when guided by simple, generic prompting techniques, shows a statistically significant and strong correlation with difficulty indicated by human players. This suggests that LLMs could serve as effective agents for measuring game difficulty during the development process. Based on our experiments, we also outline general principles and guidelines for incorporating LLMs into the game testing process.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
Authors:
Xiangyu Peng,
Congying Xia,
Xinyi Yang,
Caiming Xiong,
Chien-Sheng Wu,
Chen Xing
Abstract:
Post-training Large Language Models (LLMs) with explicit reasoning trajectories can enhance their reasoning abilities. However, acquiring such high-quality trajectory data typically demands meticulous supervision from humans or superior models, which can be either expensive or license-constrained. In this paper, we explore how far an LLM can improve its reasoning by self-synthesizing reasoning pat…
▽ More
Post-training Large Language Models (LLMs) with explicit reasoning trajectories can enhance their reasoning abilities. However, acquiring such high-quality trajectory data typically demands meticulous supervision from humans or superior models, which can be either expensive or license-constrained. In this paper, we explore how far an LLM can improve its reasoning by self-synthesizing reasoning paths as training data without any additional supervision. Existing self-synthesizing methods, such as STaR, suffer from poor generalization to out-of-domain (OOD) reasoning tasks. We hypothesize it is due to that their self-synthesized reasoning paths are too task-specific, lacking general task-agnostic reasoning guidance. To address this, we propose Reasoning Generalist via Self-Improvement (ReGenesis), a method to self-synthesize reasoning paths as post-training data by progressing from abstract to concrete. More specifically, ReGenesis self-synthesizes reasoning paths by converting general reasoning guidelines into task-specific ones, generating reasoning structures, and subsequently transforming these structures into reasoning paths, without the need for human-designed task-specific examples used in existing methods. We show that ReGenesis achieves superior performance on all in-domain and OOD settings tested compared to existing methods. For six OOD tasks specifically, while previous methods exhibited an average performance decrease of approximately 4.6% after post training, ReGenesis delivers around 6.1% performance improvement. We also conduct in-depth analysis of our framework and show ReGenesis is effective across various LLMs and design choices.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads
Authors:
Yuxiang Huang,
Binhang Yuan,
Xu Han,
Chaojun Xiao,
Zhiyuan Liu
Abstract:
Large language models (LLMs) have shown remarkable advances in supporting long-context comprehension and processing tasks. However, scaling the generation inference of LLMs to such long contexts incurs significant additional computation load, and demands a substantial GPU memory footprint to maintain the key-value (KV) cache of transformer-based LLMs. Existing KV cache compression methods, such as…
▽ More
Large language models (LLMs) have shown remarkable advances in supporting long-context comprehension and processing tasks. However, scaling the generation inference of LLMs to such long contexts incurs significant additional computation load, and demands a substantial GPU memory footprint to maintain the key-value (KV) cache of transformer-based LLMs. Existing KV cache compression methods, such as quantization, face memory bottlenecks as context length increases, while static-sized caches, such as eviction, suffer from inefficient policies. These limitations restrict deployment on consumer-grade devices like a single Nvidia 4090 GPU. To overcome this, we propose Locret, a framework for long-context LLM inference that introduces retaining heads to evaluate the causal importance of KV cache units, allowing for more accurate eviction within a fixed cache size. Locret is fine-tuned on top of the frozen backbone LLM using a minimal amount of data from standard long-context SFT datasets. During inference, we evict low-importance cache units along with a chunked prefill pattern, significantly reducing peak GPU memory usage. We conduct an extensive empirical study to evaluate Locret, where the experimental results show that Locret outperforms the recent competitive approaches, including InfLLM, Quantization, SirLLM, and MInference, in terms of memory efficiency and the quality of generated contents -- Locret achieves over a 20x and 8x KV cache compression ratio compared to the full KV cache for Phi-3-mini-128K and Llama-3.1-8B-instruct. Additionally, Locret can be combined with other methods, such as quantization and token merging. To our knowledge, Locret is the first framework capable of deploying Llama-3.1-8B or similar models on a single Nvidia 4090 GPU, enabling 128K long-context inference without compromising generation quality, and requiring little additional system optimizations.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
Possible signal of an exotic $I=1$, $J=2$ state in the $B \to D^{*-}D^+K^+$ reaction
Authors:
Wen-Tao Lyu,
Man-Yu Duan,
Chu-Wen Xiao,
En Wang,
Eulogio Oset
Abstract:
We study the \proc reaction, showing that a peak in the \dk mass distribution around 2834 MeV reported by LHCb could be associated with a theoretical exotic state with that mass, a width of 19 MeV and $J^P=2^+$, stemming from the interaction of the $D^{*+}K^{*+}$ and $D^{*+}_s ρ^+$ channels, which is a partner of the $0^+$ $T_{c\bar{s}}(2900)$. We show that the data is compatible with this assumpt…
▽ More
We study the \proc reaction, showing that a peak in the \dk mass distribution around 2834 MeV reported by LHCb could be associated with a theoretical exotic state with that mass, a width of 19 MeV and $J^P=2^+$, stemming from the interaction of the $D^{*+}K^{*+}$ and $D^{*+}_s ρ^+$ channels, which is a partner of the $0^+$ $T_{c\bar{s}}(2900)$. We show that the data is compatible with this assumption, but also see that the mass distribution itself cannot discriminate between the spins $J=0$, $1$, $2$ of the state. Then we evaluate the momenta of the angular mass distribution and show that they are very different for each of the spin assumptions, and that the momenta coming from interference terms have larger strength at the resonant energy than the peaks seen in the angular integrated mass distribution. We make a call for the experimental determination of these magnitudes, which has already been used by the LHCb in related decay reactions.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges
Authors:
Qin Liu,
Wenjie Mo,
Terry Tong,
Jiashu Xu,
Fei Wang,
Chaowei Xiao,
Muhao Chen
Abstract:
The advancement of Large Language Models (LLMs) has significantly impacted various domains, including Web search, healthcare, and software development. However, as these models scale, they become more vulnerable to cybersecurity risks, particularly backdoor attacks. By exploiting the potent memorization capacity of LLMs, adversaries can easily inject backdoors into LLMs by manipulating a small por…
▽ More
The advancement of Large Language Models (LLMs) has significantly impacted various domains, including Web search, healthcare, and software development. However, as these models scale, they become more vulnerable to cybersecurity risks, particularly backdoor attacks. By exploiting the potent memorization capacity of LLMs, adversaries can easily inject backdoors into LLMs by manipulating a small portion of training data, leading to malicious behaviors in downstream applications whenever the hidden backdoor is activated by the pre-defined triggers. Moreover, emerging learning paradigms like instruction tuning and reinforcement learning from human feedback (RLHF) exacerbate these risks as they rely heavily on crowdsourced data and human feedback, which are not fully controlled. In this paper, we present a comprehensive survey of emerging backdoor threats to LLMs that appear during LLM development or inference, and cover recent advancement in both defense and detection strategies for mitigating backdoor threats to LLMs. We also outline key challenges in addressing these threats, highlighting areas for future research.
△ Less
Submitted 30 September, 2024;
originally announced September 2024.
-
Knowledge Graph Embedding by Normalizing Flows
Authors:
Changyi Xiao,
Xiangnan He,
Yixin Cao
Abstract:
A key to knowledge graph embedding (KGE) is to choose a proper representation space, e.g., point-wise Euclidean space and complex vector space. In this paper, we propose a unified perspective of embedding and introduce uncertainty into KGE from the view of group theory. Our model can incorporate existing models (i.e., generality), ensure the computation is tractable (i.e., efficiency) and enjoy th…
▽ More
A key to knowledge graph embedding (KGE) is to choose a proper representation space, e.g., point-wise Euclidean space and complex vector space. In this paper, we propose a unified perspective of embedding and introduce uncertainty into KGE from the view of group theory. Our model can incorporate existing models (i.e., generality), ensure the computation is tractable (i.e., efficiency) and enjoy the expressive power of complex random variables (i.e., expressiveness). The core idea is that we embed entities/relations as elements of a symmetric group, i.e., permutations of a set. Permutations of different sets can reflect different properties of embedding. And the group operation of symmetric groups is easy to compute. In specific, we show that the embedding of many existing models, point vectors, can be seen as elements of a symmetric group. To reflect uncertainty, we first embed entities/relations as permutations of a set of random variables. A permutation can transform a simple random variable into a complex random variable for greater expressiveness, called a normalizing flow. We then define scoring functions by measuring the similarity of two normalizing flows, namely NFE. We construct several instantiating models and prove that they are able to learn logical rules. Experimental results demonstrate the effectiveness of introducing uncertainty and our model. The code is available at https://github.com/changyi7231/NFE.
△ Less
Submitted 30 September, 2024;
originally announced September 2024.
-
An Interactive Hands-Free Controller for a Riding Ballbot to Enable Simple Shared Control Tasks
Authors:
Chenzhang Xiao,
Seung Yun Song,
Yu Chen,
Mahshid Mansouri,
Joao Ramos,
William R. Norris,
Elizabeth T. Hsiao-Wecksler
Abstract:
Our team developed a riding ballbot (called PURE) that is dynamically stable, omnidirectional, and driven by lean-to-steer control. A hands-free admittance control scheme (HACS) was previously integrated to allow riders with different torso functions to control the robot's movements via torso leaning and twisting. Such an interface requires motor coordination skills and could result in collisions…
▽ More
Our team developed a riding ballbot (called PURE) that is dynamically stable, omnidirectional, and driven by lean-to-steer control. A hands-free admittance control scheme (HACS) was previously integrated to allow riders with different torso functions to control the robot's movements via torso leaning and twisting. Such an interface requires motor coordination skills and could result in collisions with obstacles due to low proficiency. Hence, a shared controller (SC) that limits the speed of PURE could be helpful to ensure the safety of riders. However, the self-balancing dynamics of PURE could result in a weak control authority of its motion, in which the torso motion of the rider could easily result in poor tracking of the command speed dictated by the shared controller. Thus, we proposed an interactive hands-free admittance control scheme (iHACS), which added two modules to HACS to improve the speed-tracking performance of PURE: control gain personalization module and interaction compensation module. Human riding tests of simple tasks, idle-keeping and speed-limiting, were conducted to compare the performance of HACS and iHACS. Two manual wheelchair users and two able-bodied individuals participated in this study. They were instructed to use "adversarial" torso motions that would tax the SC's ability to keep the ballbot idling or below a set speed. In the idle-keeping tasks, iHACS demonstrated minimal translational motion and low command speed tracking RMSE, even with significant torso lean angles. During the speed-limiting task with command speed saturated at 0.5 m/s, the system achieved an average maximum speed of 1.1 m/s with iHACS, compared with that of over 1.9 m/s with HACS. These results suggest that iHACS can enhance PURE's control authority over the rider, which enables PURE to provide physical interactions back to the rider and results in a collaborative rider-robot synergy.
△ Less
Submitted 27 September, 2024;
originally announced September 2024.
-
System-Level Defense against Indirect Prompt Injection Attacks: An Information Flow Control Perspective
Authors:
Fangzhou Wu,
Ethan Cecchetti,
Chaowei Xiao
Abstract:
Large Language Model-based systems (LLM systems) are information and query processing systems that use LLMs to plan operations from natural-language prompts and feed the output of each successive step into the LLM to plan the next. This structure results in powerful tools that can process complex information from diverse sources but raises critical security concerns. Malicious information from any…
▽ More
Large Language Model-based systems (LLM systems) are information and query processing systems that use LLMs to plan operations from natural-language prompts and feed the output of each successive step into the LLM to plan the next. This structure results in powerful tools that can process complex information from diverse sources but raises critical security concerns. Malicious information from any source may be processed by the LLM and can compromise the query processing, resulting in nearly arbitrary misbehavior. To tackle this problem, we present a system-level defense based on the principles of information flow control that we call an f-secure LLM system. An f-secure LLM system disaggregates the components of an LLM system into a context-aware pipeline with dynamically generated structured executable plans, and a security monitor filters out untrusted input into the planning process. This structure prevents compromise while maximizing flexibility. We provide formal models for both existing LLM systems and our f-secure LLM system, allowing analysis of critical security guarantees. We further evaluate case studies and benchmarks showing that f-secure LLM systems provide robust security while preserving functionality and efficiency. Our code is released at https://github.com/fzwark/Secure_LLM_System.
△ Less
Submitted 10 October, 2024; v1 submitted 27 September, 2024;
originally announced September 2024.
-
Exploiting Physical Human-Robot Interaction to Provide a Unique Rolling Experience with a Riding Ballbot
Authors:
Chenzhang Xiao,
Seung Yun Song,
Yu Chen,
Mahshid Mansouri,
João Ramos,
Adam W. Bleakney,
William R. Norris,
Elizabeth T. Hsiao-Wecksler
Abstract:
This study introduces the development of hands-free control schemes for a riding ballbot, designed to allow riders including manual wheelchair users to control its movement through torso leaning and twisting. The hardware platform, Personal Unique Rolling Experience (PURE), utilizes a ballbot drivetrain, a dynamically stable mobile robot that uses a ball as its wheel to provide omnidirectional man…
▽ More
This study introduces the development of hands-free control schemes for a riding ballbot, designed to allow riders including manual wheelchair users to control its movement through torso leaning and twisting. The hardware platform, Personal Unique Rolling Experience (PURE), utilizes a ballbot drivetrain, a dynamically stable mobile robot that uses a ball as its wheel to provide omnidirectional maneuverability. To accommodate users with varying torso motion functions, the hanads-free control scheme should be adjustable based on the rider's torso function and personal preferences. Therefore, concepts of (a) impedance control and (b) admittance control were integrated into the control scheme. A duo-agent optimization framework was utilized to assess the efficiency of this rider-ballbot system for a safety-critical task: braking from 1.4 m/s. The candidate control schemes were further implemented in the physical robot hardware and validated with two experienced users, demonstrating the efficiency and robustness of the hands-free admittance control scheme (HACS). This interface, which utilized physical human-robot interaction (pHRI) as the input, resulted in lower braking effort and shorter braking distance and time. Subsequently, 12 novice participants (six able-bodied users and six manual wheelchair users) with different levels of torso motion capability were then recruited to benchmark the braking performance with HACS. The indoor navigation capability of PURE was further demonstrated with these participants in courses simulating narrow hallways, tight turns, and navigation through static and dynamic obstacles. By exploiting pHRI, the proposed admittance-style control scheme provided effective control of the ballbot via torso motions. This interface enables PURE to provide a personal unique rolling experience to manual wheelchair users for safe and agile indoor navigation.
△ Less
Submitted 27 September, 2024;
originally announced September 2024.
-
Identifying Bridges from Asymmetric Load-Bearing Structures in Tapped Granular Packings
Authors:
Chijin Zhou,
Shuyang Zhang,
Xueliang Dai,
Yixin Cao,
Ye Yuan,
Chengjie Xia,
Zhikun Zeng,
Yujie Wang
Abstract:
Using high-resolution x-ray tomography, we experimentally investigate the bridge structures in tapped granular packings composed of particles with varying friction coefficients. We find that gravity can induce subtle structural changes on the load-bearing contacts, allowing us to identify the correct load-bearing contacts based on structural information alone. Using these identified load-bearing c…
▽ More
Using high-resolution x-ray tomography, we experimentally investigate the bridge structures in tapped granular packings composed of particles with varying friction coefficients. We find that gravity can induce subtle structural changes on the load-bearing contacts, allowing us to identify the correct load-bearing contacts based on structural information alone. Using these identified load-bearing contacts, we investigate the cooperative bridge structures which are mechanical backbones of the system. We characterize the geometric properties of these bridges and find that their cooperativity increases as the packing fraction decreases. The knowledge of bridges can enhance our understanding of the rheological properties of granular materials.
△ Less
Submitted 26 September, 2024;
originally announced September 2024.
-
HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection
Authors:
Xuefeng Du,
Chaowei Xiao,
Yixuan Li
Abstract:
The surge in applications of large language models (LLMs) has prompted concerns about the generation of misleading or fabricated information, known as hallucinations. Therefore, detecting hallucinations has become critical to maintaining trust in LLM-generated content. A primary challenge in learning a truthfulness classifier is the lack of a large amount of labeled truthful and hallucinated data.…
▽ More
The surge in applications of large language models (LLMs) has prompted concerns about the generation of misleading or fabricated information, known as hallucinations. Therefore, detecting hallucinations has become critical to maintaining trust in LLM-generated content. A primary challenge in learning a truthfulness classifier is the lack of a large amount of labeled truthful and hallucinated data. To address the challenge, we introduce HaloScope, a novel learning framework that leverages the unlabeled LLM generations in the wild for hallucination detection. Such unlabeled data arises freely upon deploying LLMs in the open world, and consists of both truthful and hallucinated information. To harness the unlabeled data, we present an automated membership estimation score for distinguishing between truthful and untruthful generations within unlabeled mixture data, thereby enabling the training of a binary truthfulness classifier on top. Importantly, our framework does not require extra data collection and human annotations, offering strong flexibility and practicality for real-world applications. Extensive experiments show that HaloScope can achieve superior hallucination detection performance, outperforming the competitive rivals by a significant margin. Code is available at https://github.com/deeplearningwisc/haloscope.
△ Less
Submitted 25 September, 2024;
originally announced September 2024.
-
Efficient Collision Detection Framework for Enhancing Collision-Free Robot Motion
Authors:
Xiankun Zhu,
Yucheng Xin,
Shoujie Li,
Houde Liu,
Chongkun Xia,
Bin Liang
Abstract:
Fast and efficient collision detection is essential for motion generation in robotics. In this paper, we propose an efficient collision detection framework based on the Signed Distance Field (SDF) of robots, seamlessly integrated with a self-collision detection module. Firstly, we decompose the robot's SDF using forward kinematics and leverage multiple extremely lightweight networks in parallel to…
▽ More
Fast and efficient collision detection is essential for motion generation in robotics. In this paper, we propose an efficient collision detection framework based on the Signed Distance Field (SDF) of robots, seamlessly integrated with a self-collision detection module. Firstly, we decompose the robot's SDF using forward kinematics and leverage multiple extremely lightweight networks in parallel to efficiently approximate the SDF. Moreover, we introduce support vector machines to integrate the self-collision detection module into the framework, which we refer to as the SDF-SC framework. Using statistical features, our approach unifies the representation of collision distance for both SDF and self-collision detection. During this process, we maintain and utilize the differentiable properties of the framework to optimize collision-free robot trajectories. Finally, we develop a reactive motion controller based on our framework, enabling real-time avoidance of multiple dynamic obstacles. While maintaining high accuracy, our framework achieves inference speeds up to five times faster than previous methods. Experimental results on the Franka robotic arm demonstrate the effectiveness of our approach.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
Like a Martial Arts Dodge: Safe Expeditious Whole-Body Control of Mobile Manipulators for Collision Avoidance
Authors:
Bingjie Chen,
Houde Liu,
Chongkun Xia,
Liang Han,
Xueqian Wang,
Bin Liang
Abstract:
In the control task of mobile manipulators(MM), achieving efficient and agile obstacle avoidance in dynamic environments is challenging. In this letter, we present a safe expeditious whole-body(SEWB) control for MMs that ensures both external and internal collision-free. SEWB is constructed by a two-layer optimization structure. Firstly, control barrier functions(CBFs) are employed for a MM to est…
▽ More
In the control task of mobile manipulators(MM), achieving efficient and agile obstacle avoidance in dynamic environments is challenging. In this letter, we present a safe expeditious whole-body(SEWB) control for MMs that ensures both external and internal collision-free. SEWB is constructed by a two-layer optimization structure. Firstly, control barrier functions(CBFs) are employed for a MM to establish initial safety constraints. Moreover, to resolve the pseudo-equilibrium problem of CBFs and improve avoidance agility, we propose a novel sub-optimization called adaptive cyclic inequality(ACI). ACI considers obstacle positions, velocities, and predefined directions to generate directional constraints. Then, we combine CBF and ACI to decompose safety constraints alongside an equality constraint for expectation control. Considering all these constraints, we formulate a quadratic programming(QP) as our primary optimization. In the QP cost function, we account for the motion accuracy differences between the base and manipulator, as well as obstacle influences, to achieve optimized motion. We validate the effectiveness of our SEWB control in avoiding collision and reaching target points through simulations and real-world experiments, particularly in challenging scenarios that involve fast-moving obstacles. SEWB has been proven to achieve whole-body collision-free and improve avoidance agility, similar to a "martial arts dodge".
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
CushionCatch: Compliant Catching Mechanism for Mobile Manipulators via Combined Optimization and Learning
Authors:
Bingjie Chen,
Keyu Fan,
Houde Liu,
Chongkun Xia,
Liang Han,
Bin Liang
Abstract:
This paper presents a framework to achieve compliant catching with cushioning mechanism(CCCM) for mobile manipulators. First, we introduce a two-level motion optimization scheme, comprising a high-level capture planner and a low-level joint planner. The low-level joint planner consists of two distinct components: Pre-Catching (PRC) planner and Post-Catching (POC) planner. Next, we propose a networ…
▽ More
This paper presents a framework to achieve compliant catching with cushioning mechanism(CCCM) for mobile manipulators. First, we introduce a two-level motion optimization scheme, comprising a high-level capture planner and a low-level joint planner. The low-level joint planner consists of two distinct components: Pre-Catching (PRC) planner and Post-Catching (POC) planner. Next, we propose a network that leverages the strengths of LSTM for temporal dependencies and positional encoding for spatial context(P-LSTM). P-LSTM is designed to effectively learn compliant control strategies from human demonstrations. To account for structural differences between humans and robots, safety constraints are incorporated into POC planner to avoid potential collisions. We validate the CCCM framework through both simulated and real-world ball-catching scenarios, achieving a success rate of 98.70% in simulation, 92.59% in real-world tests, and a 33.2% reduction in impact torques.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
More Effective LLM Compressed Tokens with Uniformly Spread Position Identifiers and Compression Loss
Authors:
Runsong Zhao,
Pengcheng Huang,
Xinyu Liu,
Chunyang Xiao,
Tong Xiao,
Jingbo Zhu
Abstract:
Compressing Transformer inputs into compressd tokens allows running LLMs with improved speed and cost efficiency. Based on the compression method ICAE, we carefully examine the position identifier choices for compressed tokens and also propose a new compression loss. We demonstrate empirically that our proposed methods achieve significantly higher compression ratios (15x compared to 4x for ICAE),…
▽ More
Compressing Transformer inputs into compressd tokens allows running LLMs with improved speed and cost efficiency. Based on the compression method ICAE, we carefully examine the position identifier choices for compressed tokens and also propose a new compression loss. We demonstrate empirically that our proposed methods achieve significantly higher compression ratios (15x compared to 4x for ICAE), while being able to attain comparable reconstruction performance.
△ Less
Submitted 27 September, 2024; v1 submitted 22 September, 2024;
originally announced September 2024.
-
Unlocking Memorization in Large Language Models with Dynamic Soft Prompting
Authors:
Zhepeng Wang,
Runxue Bao,
Yawen Wu,
Jackson Taylor,
Cao Xiao,
Feng Zheng,
Weiwen Jiang,
Shangqian Gao,
Yanfu Zhang
Abstract:
Pretrained large language models (LLMs) have revolutionized natural language processing (NLP) tasks such as summarization, question answering, and translation. However, LLMs pose significant security risks due to their tendency to memorize training data, leading to potential privacy breaches and copyright infringement. Accurate measurement of this memorization is essential to evaluate and mitigate…
▽ More
Pretrained large language models (LLMs) have revolutionized natural language processing (NLP) tasks such as summarization, question answering, and translation. However, LLMs pose significant security risks due to their tendency to memorize training data, leading to potential privacy breaches and copyright infringement. Accurate measurement of this memorization is essential to evaluate and mitigate these potential risks. However, previous attempts to characterize memorization are constrained by either using prefixes only or by prepending a constant soft prompt to the prefixes, which cannot react to changes in input. To address this challenge, we propose a novel method for estimating LLM memorization using dynamic, prefix-dependent soft prompts. Our approach involves training a transformer-based generator to produce soft prompts that adapt to changes in input, thereby enabling more accurate extraction of memorized data. Our method not only addresses the limitations of previous methods but also demonstrates superior performance in diverse experimental settings compared to state-of-the-art techniques. In particular, our method can achieve the maximum relative improvement of 112.75% and 32.26% over the vanilla baseline in terms of discoverable memorization rate for the text generation task and code generation task respectively.
△ Less
Submitted 20 September, 2024;
originally announced September 2024.
-
Mixed phases of compact star matter in a unified mean-field approach
Authors:
Cheng-Jun Xia,
Toshiki Maruyama,
Nobutoshi Yasutake,
Toshitaka Tatsumi
Abstract:
Based on an extended NJL model that treats baryons as clusters of quarks, we investigate the properties and microscopic structures of mixed phases for various types of first-order phase transitions in a unified manner, where the model parameters are fixed by reproducing nuclear matter properties and the binding energies of finite nuclei. In particular, based on the Thomas-Fermi approximation, we i…
▽ More
Based on an extended NJL model that treats baryons as clusters of quarks, we investigate the properties and microscopic structures of mixed phases for various types of first-order phase transitions in a unified manner, where the model parameters are fixed by reproducing nuclear matter properties and the binding energies of finite nuclei. In particular, based on the Thomas-Fermi approximation, we investigate the mixed phases arise from the liquid-gas phase transition of nuclear matter, chiral phase transition, and deconfinement phase transition in dense stellar matter adopting spherical and cylindrical approximations for the Wigner-Seitz cells. It is found that the geometrical structures do not emerge for chiral phases transition, while the droplet, rod, slab, tube, and bubble phases emerge sequentially as density increases for the liquid-gas and deconfinement phase transitions. Additional attractive interactions between strange quark matter and hyperons are observed as the deconfinement phase transition is entangled with chiral phase transition of $s$ quarks. The results obtained here should be useful to understand the properties and structures of dense stellar matter throughout compact stars and in particular the matter state in the core regions. Meanwhile, more extensive investigations in a three-dimensional geometry with large box sizes are necessary for our future study.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
MuxHand: A Cable-driven Dexterous Robotic Hand Using Time-division Multiplexing Motors
Authors:
Jianle Xu,
Shoujie Li,
Hong Luo,
Houde Liu,
Xueqian Wang,
Wenbo Ding,
Chongkun Xia
Abstract:
The robotic dexterous hand is responsible for both grasping and dexterous manipulation. The number of motors directly influences both the dexterity and the cost of such systems. In this paper, we present MuxHand, a robotic hand that employs a time-division multiplexing motor (TDMM) mechanism. This system allows 9 cables to be independently controlled by just 4 motors, significantly reducing cost w…
▽ More
The robotic dexterous hand is responsible for both grasping and dexterous manipulation. The number of motors directly influences both the dexterity and the cost of such systems. In this paper, we present MuxHand, a robotic hand that employs a time-division multiplexing motor (TDMM) mechanism. This system allows 9 cables to be independently controlled by just 4 motors, significantly reducing cost while maintaining high dexterity. To enhance stability and smoothness during grasping and manipulation tasks, we have integrated magnetic joints into the three 3D-printed fingers. These joints offer superior impact resistance and self-resetting capabilities. We conduct a series of experiments to evaluate the grasping and manipulation performance of MuxHand. The results demonstrate that the TDMM mechanism can precisely control each cable connected to the finger joints, enabling robust grasping and dexterous manipulation. Furthermore, the fingertip load capacity reached 1.0 kg, and the magnetic joints effectively absorbed impact and corrected misalignments without damage.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
Authors:
Zeyi Liao,
Lingbo Mo,
Chejian Xu,
Mintong Kang,
Jiawei Zhang,
Chaowei Xiao,
Yuan Tian,
Bo Li,
Huan Sun
Abstract:
Generalist web agents have demonstrated remarkable potential in autonomously completing a wide range of tasks on real websites, significantly boosting human productivity. However, web tasks, such as booking flights, usually involve users' PII, which may be exposed to potential privacy risks if web agents accidentally interact with compromised websites, a scenario that remains largely unexplored in…
▽ More
Generalist web agents have demonstrated remarkable potential in autonomously completing a wide range of tasks on real websites, significantly boosting human productivity. However, web tasks, such as booking flights, usually involve users' PII, which may be exposed to potential privacy risks if web agents accidentally interact with compromised websites, a scenario that remains largely unexplored in the literature. In this work, we narrow this gap by conducting the first study on the privacy risks of generalist web agents in adversarial environments. First, we present a realistic threat model for attacks on the website, where we consider two adversarial targets: stealing users' specific PII or the entire user request. Then, we propose a novel attack method, termed Environmental Injection Attack (EIA). EIA injects malicious content designed to adapt well to environments where the agents operate and our work instantiates EIA specifically for privacy scenarios in web environments. We collect 177 action steps that involve diverse PII categories on realistic websites from the Mind2Web, and conduct experiments using one of the most capable generalist web agent frameworks to date. The results demonstrate that EIA achieves up to 70% ASR in stealing specific PII and 16% ASR for full user request. Additionally, by accessing the stealthiness and experimenting with a defensive system prompt, we indicate that EIA is hard to detect and mitigate. Notably, attacks that are not well adapted for a webpage can be detected via human inspection, leading to our discussion about the trade-off between security and autonomy. However, extra attackers' efforts can make EIA seamlessly adapted, rendering such supervision ineffective. Thus, we further discuss the defenses at the pre- and post-deployment stages of the websites without relying on human supervision and call for more advanced defense strategies.
△ Less
Submitted 3 October, 2024; v1 submitted 17 September, 2024;
originally announced September 2024.
-
Intrinsic Dynamic Generation of Spin Polarization by Time-Varying Electric Field
Authors:
Xukun Feng,
Jin Cao,
Zhi-Fan Zhang,
Lay Kee Ang,
Shen Lai,
Hua Jiang,
Cong Xiao,
Shengyuan A. Yang
Abstract:
Electric control of spin in insulators is desired for low-consumption and ultrafast spintronics, but the underlying mechanism remains largely unexplored. Here, we propose an intrinsic effect of dynamic spin generation driven by time-varying electric field. In the intraband response regime, it can be nicely formulated as a Berry curvature effect and leads to two phenomena that are forbidden in the…
▽ More
Electric control of spin in insulators is desired for low-consumption and ultrafast spintronics, but the underlying mechanism remains largely unexplored. Here, we propose an intrinsic effect of dynamic spin generation driven by time-varying electric field. In the intraband response regime, it can be nicely formulated as a Berry curvature effect and leads to two phenomena that are forbidden in the $dc$ limit: linear spin generation in nonmagnetic insulators and intrinsic N{é}el spin-orbit torque in $\mathcal{PT}$-symmetric antiferromagnetic insulators. These phenomena are driven by the time derivative of field rather than the field itself, and have a quantum origin in the first-order dynamic anomalous spin polarizability. Combined with first-principles calculations, we predict sizable effects driven by terahertz field in nonmagnetic monolayer Bi and in antiferromagnetic even-layer MnBi$_2$Te$_4$, which can be detected in experiment.
△ Less
Submitted 15 September, 2024;
originally announced September 2024.
-
DrawingSpinUp: 3D Animation from Single Character Drawings
Authors:
Jie Zhou,
Chufeng Xiao,
Miu-Ling Lam,
Hongbo Fu
Abstract:
Animating various character drawings is an engaging visual content creation task. Given a single character drawing, existing animation methods are limited to flat 2D motions and thus lack 3D effects. An alternative solution is to reconstruct a 3D model from a character drawing as a proxy and then retarget 3D motion data onto it. However, the existing image-to-3D methods could not work well for ama…
▽ More
Animating various character drawings is an engaging visual content creation task. Given a single character drawing, existing animation methods are limited to flat 2D motions and thus lack 3D effects. An alternative solution is to reconstruct a 3D model from a character drawing as a proxy and then retarget 3D motion data onto it. However, the existing image-to-3D methods could not work well for amateur character drawings in terms of appearance and geometry. We observe the contour lines, commonly existing in character drawings, would introduce significant ambiguity in texture synthesis due to their view-dependence. Additionally, thin regions represented by single-line contours are difficult to reconstruct (e.g., slim limbs of a stick figure) due to their delicate structures. To address these issues, we propose a novel system, DrawingSpinUp, to produce plausible 3D animations and breathe life into character drawings, allowing them to freely spin up, leap, and even perform a hip-hop dance. For appearance improvement, we adopt a removal-then-restoration strategy to first remove the view-dependent contour lines and then render them back after retargeting the reconstructed character. For geometry refinement, we develop a skeleton-based thinning deformation algorithm to refine the slim structures represented by the single-line contours. The experimental evaluations and a perceptual user study show that our proposed method outperforms the existing 2D and 3D animation methods and generates high-quality 3D animations from a single character drawing. Please refer to our project page (https://lordliang.github.io/DrawingSpinUp) for the code and generated animations.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
RT-DETRv3: Real-time End-to-End Object Detection with Hierarchical Dense Positive Supervision
Authors:
Shuo Wang,
Chunlong Xia,
Feng Lv,
Yifeng Shi
Abstract:
RT-DETR is the first real-time end-to-end transformer-based object detector. Its efficiency comes from the framework design and the Hungarian matching. However, compared to dense supervision detectors like the YOLO series, the Hungarian matching provides much sparser supervision, leading to insufficient model training and difficult to achieve optimal results. To address these issues, we proposed a…
▽ More
RT-DETR is the first real-time end-to-end transformer-based object detector. Its efficiency comes from the framework design and the Hungarian matching. However, compared to dense supervision detectors like the YOLO series, the Hungarian matching provides much sparser supervision, leading to insufficient model training and difficult to achieve optimal results. To address these issues, we proposed a hierarchical dense positive supervision method based on RT-DETR, named RT-DETRv3. Firstly, we introduce a CNN-based auxiliary branch that provides dense supervision that collaborates with the original decoder to enhance the encoder feature representation. Secondly, to address insufficient decoder training, we propose a novel learning strategy involving self-attention perturbation. This strategy diversifies label assignment for positive samples across multiple query groups, thereby enriching positive supervisions. Additionally, we introduce a shared-weight decoder branch for dense positive supervision to ensure more high-quality queries matching each ground truth. Notably, all aforementioned modules are training-only. We conduct extensive experiments to demonstrate the effectiveness of our approach on COCO val2017. RT-DETRv3 significantly outperforms existing real-time detectors, including the RT-DETR series and the YOLO series. For example, RT-DETRv3-R18 achieves 48.1% AP (+1.6%/+1.4%) compared to RT-DETR-R18/RT-DETRv2-R18 while maintaining the same latency. Meanwhile, it requires only half of epochs to attain a comparable performance. Furthermore, RT-DETRv3-R101 can attain an impressive 54.6% AP outperforming YOLOv10-X. Code will be released soon.
△ Less
Submitted 12 September, 2024;
originally announced September 2024.