subscribe to arXiv mailings

SMART: Self-learning Meta-strategy Agent for Reasoning Tasks

Authors: Rongxing Liu, Kumar Shridhar, Manish Prajapat, Patrick Xia, Mrinmaya Sachan

Abstract: Tasks requiring deductive reasoning, especially those involving multiple steps, often demand adaptive strategies such as intermediate generation of rationales or programs, as no single approach is universally optimal. While Language Models (LMs) can enhance their outputs through iterative self-refinement and strategy adjustments, they frequently fail to apply the most effective strategy in their f… ▽ More Tasks requiring deductive reasoning, especially those involving multiple steps, often demand adaptive strategies such as intermediate generation of rationales or programs, as no single approach is universally optimal. While Language Models (LMs) can enhance their outputs through iterative self-refinement and strategy adjustments, they frequently fail to apply the most effective strategy in their first attempt. This inefficiency raises the question: Can LMs learn to select the optimal strategy in the first attempt, without a need for refinement? To address this challenge, we introduce SMART (Self-learning Meta-strategy Agent for Reasoning Tasks), a novel framework that enables LMs to autonomously learn and select the most effective strategies for various reasoning tasks. We model the strategy selection process as a Markov Decision Process and leverage reinforcement learning-driven continuous self-improvement to allow the model to find the suitable strategy to solve a given task. Unlike traditional self-refinement methods that rely on multiple inference passes or external feedback, SMART allows an LM to internalize the outcomes of its own reasoning processes and adjust its strategy accordingly, aiming for correct solutions on the first attempt. Our experiments across various reasoning datasets and with different model architectures demonstrate that SMART significantly enhances the ability of models to choose optimal strategies without external guidance (+15 points on the GSM8K dataset). By achieving higher accuracy with a single inference pass, SMART not only improves performance but also reduces computational costs for refinement-based strategies, paving the way for more efficient and intelligent reasoning in LMs. △ Less

Submitted 21 October, 2024; originally announced October 2024.

arXiv:2410.16049 [pdf, other]

Dirty-Waters: Detecting Software Supply Chain Smells

Authors: Raphina Liu, Sofia Bobadilla, Benoit Baudry, Martin Monperrus

Abstract: Using open-source dependencies is essential in modern software development. However, this practice implies significant trust in third-party code, while there is little support for developers to assess this trust. As a consequence, attacks have been increasingly occurring through third-party dependencies. These are called software supply chain attacks. In this paper, we target the problem of projec… ▽ More Using open-source dependencies is essential in modern software development. However, this practice implies significant trust in third-party code, while there is little support for developers to assess this trust. As a consequence, attacks have been increasingly occurring through third-party dependencies. These are called software supply chain attacks. In this paper, we target the problem of projects that use dependencies while unaware of the potential risks posed by their software supply chain. We define the novel concept of software supply chain smell and present Dirty-Waters, a novel tool for detecting software supply chain smells. We evaluate Dirty-Waters on three JavaScript projects across nine versions and demonstrate the prevalence of all proposed software supply chain smells. Not only are there smells in all projects, but there are many of them, which immediately reveal potential risks and provide clear indicators for developers to act on the security of their supply chain. △ Less

Submitted 21 October, 2024; originally announced October 2024.

arXiv:2410.14660 [pdf, other]

A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning

Authors: Shengjie Sun, Runze Liu, Jiafei Lyu, Jing-Wen Yang, Liangpeng Zhang, Xiu Li

Abstract: Large Language Models (LLMs) have shown significant potential in designing reward functions for Reinforcement Learning (RL) tasks. However, obtaining high-quality reward code often involves human intervention, numerous LLM queries, or repetitive RL training. To address these issues, we propose CARD, a LLM-driven Reward Design framework that iteratively generates and improves reward function code.… ▽ More Large Language Models (LLMs) have shown significant potential in designing reward functions for Reinforcement Learning (RL) tasks. However, obtaining high-quality reward code often involves human intervention, numerous LLM queries, or repetitive RL training. To address these issues, we propose CARD, a LLM-driven Reward Design framework that iteratively generates and improves reward function code. Specifically, CARD includes a Coder that generates and verifies the code, while a Evaluator provides dynamic feedback to guide the Coder in improving the code, eliminating the need for human feedback. In addition to process feedback and trajectory feedback, we introduce Trajectory Preference Evaluation (TPE), which evaluates the current reward function based on trajectory preferences. If the code fails the TPE, the Evaluator provides preference feedback, avoiding RL training at every iteration and making the reward function better aligned with the task objective. Empirical results on Meta-World and ManiSkill2 demonstrate that our method achieves an effective balance between task performance and token efficiency, outperforming or matching the baselines across all tasks. On 10 out of 12 tasks, CARD shows better or comparable performance to policies trained with expert-designed rewards, and our method even surpasses the oracle on 3 tasks. △ Less

Submitted 18 October, 2024; originally announced October 2024.

arXiv:2410.14642 [pdf, ps, other]

Joint Space-Time Adaptive Processing and Beamforming Design for Cell-Free ISAC Systems

Authors: Rang Liu, Ming Li, Qian Liu

Abstract: In this paper, we explore cooperative sensing and communication within cell-free integrated sensing and communication (ISAC) systems. Specifically, multiple transmit access points (APs) collaboratively serve multiple communication users while simultaneously illuminating a potential target, with a separate sensing AP dedicated to collecting echo signals for target detection. To improve the performa… ▽ More In this paper, we explore cooperative sensing and communication within cell-free integrated sensing and communication (ISAC) systems. Specifically, multiple transmit access points (APs) collaboratively serve multiple communication users while simultaneously illuminating a potential target, with a separate sensing AP dedicated to collecting echo signals for target detection. To improve the performance of identifying a moving target in the presence of strong interference originating from transmit APs, we employ the space-time adaptive processing (STAP) technique and jointly optimize the transmit/receive beamforming. Our goal is to maximize the radar output signal-to-interference-plus-noise ratio (SINR), subject to the communication SINR requirements and the power budget. An efficient alternating algorithm is developed to solve the resulting non-convex optimization problem. Simulations demonstrate significant performance improvements in target detection and validate the advantages of the proposed joint STAP and beamforming design for cell-free ISAC systems. △ Less

Submitted 18 October, 2024; originally announced October 2024.

Comments: 5 pages, 2 figures, submitted to IEEE conference

arXiv:2410.14250 [pdf, other]

Vision-Language Navigation with Energy-Based Policy

Authors: Rui Liu, Wenguan Wang, Yi Yang

Abstract: Vision-language navigation (VLN) requires an agent to execute actions following human instructions. Existing VLN models are optimized through expert demonstrations by supervised behavioural cloning or incorporating manual reward engineering. While straightforward, these efforts overlook the accumulation of errors in the Markov decision process, and struggle to match the distribution of the expert… ▽ More Vision-language navigation (VLN) requires an agent to execute actions following human instructions. Existing VLN models are optimized through expert demonstrations by supervised behavioural cloning or incorporating manual reward engineering. While straightforward, these efforts overlook the accumulation of errors in the Markov decision process, and struggle to match the distribution of the expert policy. Going beyond this, we propose an Energy-based Navigation Policy (ENP) to model the joint state-action distribution using an energy-based model. At each step, low energy values correspond to the state-action pairs that the expert is most likely to perform, and vice versa. Theoretically, the optimization objective is equivalent to minimizing the forward divergence between the occupancy measure of the expert and ours. Consequently, ENP learns to globally align with the expert policy by maximizing the likelihood of the actions and modeling the dynamics of the navigation states in a collaborative manner. With a variety of VLN architectures, ENP achieves promising performances on R2R, REVERIE, RxR, and R2R-CE, unleashing the power of existing VLN models. △ Less

Submitted 18 October, 2024; originally announced October 2024.

arXiv:2410.14101 [pdf, other]

Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech

Authors: Shuwei He, Rui Liu, Haizhou Li

Abstract: Visual Text-to-Speech (VTTS) aims to take the spatial environmental image as the prompt to synthesize the reverberation speech for the spoken content. Previous research focused on the RGB modality for global environmental modeling, overlooking the potential of multi-source spatial knowledge like depth, speaker position, and environmental semantics. To address the issues, we propose a novel multi-s… ▽ More Visual Text-to-Speech (VTTS) aims to take the spatial environmental image as the prompt to synthesize the reverberation speech for the spoken content. Previous research focused on the RGB modality for global environmental modeling, overlooking the potential of multi-source spatial knowledge like depth, speaker position, and environmental semantics. To address the issues, we propose a novel multi-source spatial knowledge understanding scheme for immersive VTTS, termed MS$^2$KU-VTTS. Specifically, we first prioritize RGB image as the dominant source and consider depth image, speaker position knowledge from object detection, and semantic captions from image understanding LLM as supplementary sources. Afterwards, we propose a serial interaction mechanism to deeply engage with both dominant and supplementary sources. The resulting multi-source knowledge is dynamically integrated based on their contributions.This enriched interaction and integration of multi-source spatial knowledge guides the speech generation model, enhancing the immersive spatial speech experience.Experimental results demonstrate that the MS$^2$KU-VTTS surpasses existing baselines in generating immersive speech. Demos and code are available at: https://github.com/MS2KU-VTTS/MS2KU-VTTS. △ Less