subscribe to arXiv mailings

TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling

Authors: Jiahao Qiu, Yifu Lu, Yifan Zeng, Jiacheng Guo, Jiayi Geng, Huazheng Wang, Kaixuan Huang, Yue Wu, Mengdi Wang

Abstract: Inference-time alignment enhances the performance of large language models without requiring additional training or fine-tuning but presents challenges due to balancing computational efficiency with high-quality output. Best-of-N (BoN) sampling, as a simple yet powerful approach, generates multiple responses and selects the best one, achieving improved performance but with a high computational cos… ▽ More Inference-time alignment enhances the performance of large language models without requiring additional training or fine-tuning but presents challenges due to balancing computational efficiency with high-quality output. Best-of-N (BoN) sampling, as a simple yet powerful approach, generates multiple responses and selects the best one, achieving improved performance but with a high computational cost. We propose TreeBoN, a novel framework that integrates a speculative tree-search strategy into Best-of-N (BoN) Sampling. TreeBoN maintains a set of parent nodes, iteratively branching and pruning low-quality responses, thereby reducing computational overhead while maintaining high output quality. Our approach also leverages token-level rewards from Direct Preference Optimization (DPO) to guide tree expansion and prune low-quality paths. We evaluate TreeBoN using AlpacaFarm, UltraFeedback, GSM8K, HH-RLHF, and TutorEval datasets, demonstrating consistent improvements. Specifically, TreeBoN achieves a 65% win rate at maximum lengths of 192 and 384 tokens, outperforming standard BoN with the same computational cost. Furthermore, TreeBoN achieves around a 60% win rate across longer responses, showcasing its scalability and alignment efficacy. △ Less

Submitted 18 October, 2024; originally announced October 2024.

arXiv:2410.16020 [pdf, other]

START: A Generalized State Space Model with Saliency-Driven Token-Aware Transformation

Authors: Jintao Guo, Lei Qi, Yinghuan Shi, Yang Gao

Abstract: Domain Generalization (DG) aims to enable models to generalize to unseen target domains by learning from multiple source domains. Existing DG methods primarily rely on convolutional neural networks (CNNs), which inherently learn texture biases due to their limited receptive fields, making them prone to overfitting source domains. While some works have introduced transformer-based methods (ViTs) fo… ▽ More Domain Generalization (DG) aims to enable models to generalize to unseen target domains by learning from multiple source domains. Existing DG methods primarily rely on convolutional neural networks (CNNs), which inherently learn texture biases due to their limited receptive fields, making them prone to overfitting source domains. While some works have introduced transformer-based methods (ViTs) for DG to leverage the global receptive field, these methods incur high computational costs due to the quadratic complexity of self-attention. Recently, advanced state space models (SSMs), represented by Mamba, have shown promising results in supervised learning tasks by achieving linear complexity in sequence length during training and fast RNN-like computation during inference. Inspired by this, we investigate the generalization ability of the Mamba model under domain shifts and find that input-dependent matrices within SSMs could accumulate and amplify domain-specific features, thus hindering model generalization. To address this issue, we propose a novel SSM-based architecture with saliency-based token-aware transformation (namely START), which achieves state-of-the-art (SOTA) performances and offers a competitive alternative to CNNs and ViTs. Our START can selectively perturb and suppress domain-specific features in salient tokens within the input-dependent matrices of SSMs, thus effectively reducing the discrepancy between different domains. Extensive experiments on five benchmarks demonstrate that START outperforms existing SOTA DG methods with efficient linear complexity. Our code is available at https://github.com/lingeringlight/START. △ Less

Submitted 21 October, 2024; originally announced October 2024.

Comments: Accepted by NeurIPS2024. The code is available at https://github.com/lingeringlight/START

arXiv:2410.15847 [pdf, other]

Random Token Fusion for Multi-View Medical Diagnosis

Authors: Jingyu Guo, Christos Matsoukas, Fredrik Strand, Kevin Smith

Abstract: In multi-view medical diagnosis, deep learning-based models often fuse information from different imaging perspectives to improve diagnostic performance. However, existing approaches are prone to overfitting and rely heavily on view-specific features, which can lead to trivial solutions. In this work, we introduce Random Token Fusion (RTF), a novel technique designed to enhance multi-view medical… ▽ More In multi-view medical diagnosis, deep learning-based models often fuse information from different imaging perspectives to improve diagnostic performance. However, existing approaches are prone to overfitting and rely heavily on view-specific features, which can lead to trivial solutions. In this work, we introduce Random Token Fusion (RTF), a novel technique designed to enhance multi-view medical image analysis using vision transformers. By integrating randomness into the feature fusion process during training, RTF addresses the issue of overfitting and enhances the robustness and accuracy of diagnostic models without incurring any additional cost at inference. We validate our approach on standard mammography and chest X-ray benchmark datasets. Through extensive experiments, we demonstrate that RTF consistently improves the performance of existing fusion methods, paving the way for a new generation of multi-view medical foundation models. △ Less

Submitted 21 October, 2024; originally announced October 2024.

Comments: Originally published at the NeurIPS 2024 Workshop on Advancements In Medical Foundation Models: Explainability, Robustness, Security, and Beyond (AIM-FM)

arXiv:2410.14196 [pdf]

doi 10.1021/acs.nanolett.4c01542

Quantum-Confined Tunable Ferromagnetism on the Surface of a van der Waals Antiferromagnet NaCrTe2

Authors: Yidian Li, Xian Du, Junjie Wang, Runzhe Xu, Wenxuan Zhao, Kaiyi Zhai, Jieyi Liu, Houke Chen, Yiheng Yang, Nicolas C. Plumb, Sailong Ju, Ming Shi, Zhongkai Liu, Jiangang Guo, Xiaolong Chen, Yulin Chen, Lexian Yang

Abstract: The surface of three-dimensional materials provides an ideal and versatile platform to explore quantum-confined physics. Here, we systematically investigate the electronic structure of Na-intercalated CrTe2, a van der Waals antiferromagnet, using angle-resolved photoemission spectroscopy and ab-initio calculations. The measured band structure deviates from the calculation of bulk NaCrTe2 but agree… ▽ More The surface of three-dimensional materials provides an ideal and versatile platform to explore quantum-confined physics. Here, we systematically investigate the electronic structure of Na-intercalated CrTe2, a van der Waals antiferromagnet, using angle-resolved photoemission spectroscopy and ab-initio calculations. The measured band structure deviates from the calculation of bulk NaCrTe2 but agrees with that of ferromagnetic monolayer CrTe2. Consistently, we observe an unexpected exchange splitting of the band dispersions, persisting well above the Néel temperature of bulk NaCrTe2. We argue that NaCrTe2 features a quantum-confined 2D ferromagnetic state in the topmost surface layer due to strong ferromagnetic correlation in the CrTe2 layer. Moreover, the exchange splitting and the critical temperature can be controlled by surface doping of alkali-metal atoms, suggesting a feasible tunability of the surface ferromagnetism. Our work not only presents a simple platform to explore tunable 2D ferromagnetism but also provides important insights into the quantum-confined low-dimensional magnetic states. △ Less

Submitted 18 October, 2024; originally announced October 2024.

Journal ref: Nano Lett. 24, 9832-9838 (2024)

arXiv:2410.13808 [pdf, other]

De-mark: Watermark Removal in Large Language Models

Authors: Ruibo Chen, Yihan Wu, Junfeng Guo, Heng Huang

Abstract: Watermarking techniques offer a promising way to identify machine-generated content via embedding covert information into the contents generated from language models (LMs). However, the robustness of the watermarking schemes has not been well explored. In this paper, we present De-mark, an advanced framework designed to remove n-gram-based watermarks effectively. Our method utilizes a novel queryi… ▽ More Watermarking techniques offer a promising way to identify machine-generated content via embedding covert information into the contents generated from language models (LMs). However, the robustness of the watermarking schemes has not been well explored. In this paper, we present De-mark, an advanced framework designed to remove n-gram-based watermarks effectively. Our method utilizes a novel querying strategy, termed random selection probing, which aids in assessing the strength of the watermark and identifying the red-green list within the n-gram watermark. Experiments on popular LMs, such as Llama3 and ChatGPT, demonstrate the efficiency and effectiveness of De-mark in watermark removal and exploitation tasks. △ Less