-
Multi-IF: Benchmarking LLMs on Multi-Turn and Multilingual Instructions Following
Authors:
Yun He,
Di Jin,
Chaoqi Wang,
Chloe Bi,
Karishma Mandyam,
Hejia Zhang,
Chen Zhu,
Ning Li,
Tengyu Xu,
Hongjiang Lv,
Shruti Bhosale,
Chenguang Zhu,
Karthik Abinav Sankararaman,
Eryk Helenowski,
Melanie Kambadur,
Aditya Tayade,
Hao Ma,
Han Fang,
Sinong Wang
Abstract:
Large Language Models (LLMs) have demonstrated impressive capabilities in various tasks, including instruction following, which is crucial for aligning model outputs with user expectations. However, evaluating LLMs' ability to follow instructions remains challenging due to the complexity and subjectivity of human language. Current benchmarks primarily focus on single-turn, monolingual instructions…
▽ More
Large Language Models (LLMs) have demonstrated impressive capabilities in various tasks, including instruction following, which is crucial for aligning model outputs with user expectations. However, evaluating LLMs' ability to follow instructions remains challenging due to the complexity and subjectivity of human language. Current benchmarks primarily focus on single-turn, monolingual instructions, which do not adequately reflect the complexities of real-world applications that require handling multi-turn and multilingual interactions. To address this gap, we introduce Multi-IF, a new benchmark designed to assess LLMs' proficiency in following multi-turn and multilingual instructions. Multi-IF, which utilizes a hybrid framework combining LLM and human annotators, expands upon the IFEval by incorporating multi-turn sequences and translating the English prompts into another 7 languages, resulting in a dataset of 4,501 multilingual conversations, where each has three turns. Our evaluation of 14 state-of-the-art LLMs on Multi-IF reveals that it presents a significantly more challenging task than existing benchmarks. All the models tested showed a higher rate of failure in executing instructions correctly with each additional turn. For example, o1-preview drops from 0.877 at the first turn to 0.707 at the third turn in terms of average accuracy over all languages. Moreover, languages with non-Latin scripts (Hindi, Russian, and Chinese) generally exhibit higher error rates, suggesting potential limitations in the models' multilingual capabilities. We release Multi-IF prompts and the evaluation code base to encourage further research in this critical area.
△ Less
Submitted 20 October, 2024;
originally announced October 2024.
-
A First Look at Package-to-Group Mechanism: An Empirical Study of the Linux Distributions
Authors:
Dongming Jin,
Nianyu Li,
Kai Yang,
Minghui Zhou,
Zhi Jin
Abstract:
Reusing third-party software packages is a common practice in software development. As the scale and complexity of open-source software (OSS) projects continue to grow (e.g., Linux distributions), the number of reused third-party packages has significantly increased. Therefore, maintaining effective package management is critical for developing and evolving OSS projects. To achieve this, a package…
▽ More
Reusing third-party software packages is a common practice in software development. As the scale and complexity of open-source software (OSS) projects continue to grow (e.g., Linux distributions), the number of reused third-party packages has significantly increased. Therefore, maintaining effective package management is critical for developing and evolving OSS projects. To achieve this, a package-to-group mechanism (P2G) is employed to enable unified installation, uninstallation, and updates of multiple packages at once. To better understand this mechanism, this paper takes Linux distributions as a case study and presents an empirical study focusing on its application trends, evolutionary patterns, group quality, and developer tendencies. By analyzing 11,746 groups and 193,548 packages from 89 versions of 5 popular Linux distributions and conducting questionnaire surveys with Linux practitioners and researchers, we derive several key insights. Our findings show that P2G is increasingly being adopted, particularly in popular Linux distributions. P2G follows six evolutionary patterns (\eg splitting and merging groups). Interestingly, packages no longer managed through P2G are more likely to remain in Linux distributions rather than being directly removed. To assess the effectiveness of P2G, we propose a metric called {\sc GValue} to evaluate the quality of groups and identify issues such as inadequate group descriptions and insufficient group sizes. We also summarize five types of packages that tend to adopt P2G, including graphical desktops, networks, etc. To the best of our knowledge, this is the first study focusing on the P2G mechanisms. We expect our study can assist in the efficient management of packages and reduce the burden on practitioners in rapidly growing Linux distributions and other open-source software projects.
△ Less
Submitted 13 October, 2024;
originally announced October 2024.
-
Low-Rank Continual Pyramid Vision Transformer: Incrementally Segment Whole-Body Organs in CT with Light-Weighted Adaptation
Authors:
Vince Zhu,
Zhanghexuan Ji,
Dazhou Guo,
Puyang Wang,
Yingda Xia,
Le Lu,
Xianghua Ye,
Wei Zhu,
Dakai Jin
Abstract:
Deep segmentation networks achieve high performance when trained on specific datasets. However, in clinical practice, it is often desirable that pretrained segmentation models can be dynamically extended to enable segmenting new organs without access to previous training datasets or without training from scratch. This would ensure a much more efficient model development and deployment paradigm acc…
▽ More
Deep segmentation networks achieve high performance when trained on specific datasets. However, in clinical practice, it is often desirable that pretrained segmentation models can be dynamically extended to enable segmenting new organs without access to previous training datasets or without training from scratch. This would ensure a much more efficient model development and deployment paradigm accounting for the patient privacy and data storage issues. This clinically preferred process can be viewed as a continual semantic segmentation (CSS) problem. Previous CSS works would either experience catastrophic forgetting or lead to unaffordable memory costs as models expand. In this work, we propose a new continual whole-body organ segmentation model with light-weighted low-rank adaptation (LoRA). We first train and freeze a pyramid vision transformer (PVT) base segmentation model on the initial task, then continually add light-weighted trainable LoRA parameters to the frozen model for each new learning task. Through a holistically exploration of the architecture modification, we identify three most important layers (i.e., patch-embedding, multi-head attention and feed forward layers) that are critical in adapting to the new segmentation tasks, while retaining the majority of the pretrained parameters fixed. Our proposed model continually segments new organs without catastrophic forgetting and meanwhile maintaining a low parameter increasing rate. Continually trained and tested on four datasets covering different body parts of a total of 121 organs, results show that our model achieves high segmentation accuracy, closely reaching the PVT and nnUNet upper bounds, and significantly outperforms other regularization-based CSS methods. When comparing to the leading architecture-based CSS method, our model has a substantial lower parameter increasing rate while achieving comparable performance.
△ Less
Submitted 6 October, 2024;
originally announced October 2024.
-
From Pixels to Personas: Investigating and Modeling Self-Anthropomorphism in Human-Robot Dialogues
Authors:
Yu Li,
Devamanyu Hazarika,
Di Jin,
Julia Hirschberg,
Yang Liu
Abstract:
Self-anthropomorphism in robots manifests itself through their display of human-like characteristics in dialogue, such as expressing preferences and emotions. Our study systematically analyzes self-anthropomorphic expression within various dialogue datasets, outlining the contrasts between self-anthropomorphic and non-self-anthropomorphic responses in dialogue systems. We show significant differen…
▽ More
Self-anthropomorphism in robots manifests itself through their display of human-like characteristics in dialogue, such as expressing preferences and emotions. Our study systematically analyzes self-anthropomorphic expression within various dialogue datasets, outlining the contrasts between self-anthropomorphic and non-self-anthropomorphic responses in dialogue systems. We show significant differences in these two types of responses and propose transitioning from one type to the other. We also introduce Pix2Persona, a novel dataset aimed at developing ethical and engaging AI systems in various embodiments. This dataset preserves the original dialogues from existing corpora and enhances them with paired responses: self-anthropomorphic and non-self-anthropomorphic for each original bot response. Our work not only uncovers a new category of bot responses that were previously under-explored but also lays the groundwork for future studies about dynamically adjusting self-anthropomorphism levels in AI systems to align with ethical standards and user expectations.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
SAG: Style-Aligned Article Generation via Model Collaboration
Authors:
Chenning Xu,
Fangxun Shu,
Dian Jin,
Jinghao Wei,
Hao Jiang
Abstract:
Large language models (LLMs) have increased the demand for personalized and stylish content generation. However, closed-source models like GPT-4 present limitations in optimization opportunities, while the substantial training costs and inflexibility of open-source alternatives, such as Qwen-72B, pose considerable challenges. Conversely, small language models (SLMs) struggle with understanding com…
▽ More
Large language models (LLMs) have increased the demand for personalized and stylish content generation. However, closed-source models like GPT-4 present limitations in optimization opportunities, while the substantial training costs and inflexibility of open-source alternatives, such as Qwen-72B, pose considerable challenges. Conversely, small language models (SLMs) struggle with understanding complex instructions and transferring learned capabilities to new contexts, often exhibiting more pronounced limitations. In this paper, we present a novel collaborative training framework that leverages the strengths of both LLMs and SLMs for style article generation, surpassing the performance of either model alone. We freeze the LLMs to harness their robust instruction-following capabilities and subsequently apply supervised fine-tuning on the SLM using style-specific data. Additionally, we introduce a self-improvement method to enhance style consistency. Our new benchmark, NoteBench, thoroughly evaluates style-aligned generation. Extensive experiments show that our approach achieves state-of-the-art performance, with improvements of 0.78 in ROUGE-L and 0.55 in BLEU-4 scores compared to GPT-4, while maintaining a low hallucination rate regarding factual and faithfulness.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
The Perfect Blend: Redefining RLHF with Mixture of Judges
Authors:
Tengyu Xu,
Eryk Helenowski,
Karthik Abinav Sankararaman,
Di Jin,
Kaiyan Peng,
Eric Han,
Shaoliang Nie,
Chen Zhu,
Hejia Zhang,
Wenxuan Zhou,
Zhouhao Zeng,
Yun He,
Karishma Mandyam,
Arya Talabzadeh,
Madian Khabsa,
Gabriel Cohen,
Yuandong Tian,
Hao Ma,
Sinong Wang,
Han Fang
Abstract:
Reinforcement learning from human feedback (RLHF) has become the leading approach for fine-tuning large language models (LLM). However, RLHF has limitations in multi-task learning (MTL) due to challenges of reward hacking and extreme multi-objective optimization (i.e., trade-off of multiple and/or sometimes conflicting objectives). Applying RLHF for MTL currently requires careful tuning of the wei…
▽ More
Reinforcement learning from human feedback (RLHF) has become the leading approach for fine-tuning large language models (LLM). However, RLHF has limitations in multi-task learning (MTL) due to challenges of reward hacking and extreme multi-objective optimization (i.e., trade-off of multiple and/or sometimes conflicting objectives). Applying RLHF for MTL currently requires careful tuning of the weights for reward model and data combinations. This is often done via human intuition and does not generalize. In this work, we introduce a novel post-training paradigm which we called Constrained Generative Policy Optimization (CGPO). The core of CGPO is Mixture of Judges (MoJ) with cost-efficient constrained policy optimization with stratification, which can identify the perfect blend in RLHF in a principled manner. It shows strong empirical results with theoretical guarantees, does not require extensive hyper-parameter tuning, and is plug-and-play in common post-training pipelines. Together, this can detect and mitigate reward hacking behaviors while reaching a pareto-optimal point across an extremely large number of objectives.
Our empirical evaluations demonstrate that CGPO significantly outperforms standard RLHF algorithms like PPO and DPO across various tasks including general chat, STEM questions, instruction following, and coding. Specifically, CGPO shows improvements of 7.4% in AlpacaEval-2 (general chat), 12.5% in Arena-Hard (STEM & reasoning), and consistent gains in other domains like math and coding. Notably, PPO, while commonly used, is prone to severe reward hacking in popular coding benchmarks, which CGPO successfully addresses. This breakthrough in RLHF not only tackles reward hacking and extreme multi-objective optimization challenges but also advances the state-of-the-art in aligning general-purpose LLMs for diverse applications.
△ Less
Submitted 30 September, 2024;
originally announced September 2024.
-
Some Thoughts on Symbolic Transfer Entropy
Authors:
Dian Jin
Abstract:
Transfer entropy is used to establish a measure of causal relationships between two variables. Symbolic transfer entropy, as an estimation method for transfer entropy, is widely applied due to its robustness against non-stationarity. This paper investigates the embedding dimension parameter in symbolic transfer entropy and proposes optimization methods for high complexity in extreme cases with com…
▽ More
Transfer entropy is used to establish a measure of causal relationships between two variables. Symbolic transfer entropy, as an estimation method for transfer entropy, is widely applied due to its robustness against non-stationarity. This paper investigates the embedding dimension parameter in symbolic transfer entropy and proposes optimization methods for high complexity in extreme cases with complex data. Additionally, it offers some perspectives on estimation methods for transfer entropy.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
FS-MedSAM2: Exploring the Potential of SAM2 for Few-Shot Medical Image Segmentation without Fine-tuning
Authors:
Yunhao Bai,
Qinji Yu,
Boxiang Yun,
Dakai Jin,
Yingda Xia,
Yan Wang
Abstract:
The Segment Anything Model 2 (SAM2) has recently demonstrated exceptional performance in zero-shot prompt segmentation for natural images and videos. However, it faces significant challenges when applied to medical images. Since its release, many attempts have been made to adapt SAM2's segmentation capabilities to the medical imaging domain. These efforts typically involve using a substantial amou…
▽ More
The Segment Anything Model 2 (SAM2) has recently demonstrated exceptional performance in zero-shot prompt segmentation for natural images and videos. However, it faces significant challenges when applied to medical images. Since its release, many attempts have been made to adapt SAM2's segmentation capabilities to the medical imaging domain. These efforts typically involve using a substantial amount of labeled data to fine-tune the model's weights. In this paper, we explore SAM2 from a different perspective via making the full use of its trained memory attention module and its ability of processing mask prompts. We introduce FS-MedSAM2, a simple yet effective framework that enables SAM2 to achieve superior medical image segmentation in a few-shot setting, without the need for fine-tuning. Our framework outperforms the current state-of-the-arts on two publicly available medical image datasets. The code is available at https://github.com/DeepMed-Lab-ECNU/FS_MedSAM2.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
Large-scale Urban Facility Location Selection with Knowledge-informed Reinforcement Learning
Authors:
Hongyuan Su,
Yu Zheng,
Jingtao Ding,
Depeng Jin,
Yong Li
Abstract:
The facility location problem (FLP) is a classical combinatorial optimization challenge aimed at strategically laying out facilities to maximize their accessibility. In this paper, we propose a reinforcement learning method tailored to solve large-scale urban FLP, capable of producing near-optimal solutions at superfast inference speed. We distill the essential swap operation from local search, an…
▽ More
The facility location problem (FLP) is a classical combinatorial optimization challenge aimed at strategically laying out facilities to maximize their accessibility. In this paper, we propose a reinforcement learning method tailored to solve large-scale urban FLP, capable of producing near-optimal solutions at superfast inference speed. We distill the essential swap operation from local search, and simulate it by intelligently selecting edges on a graph of urban regions, guided by a knowledge-informed graph neural network, thus sidestepping the need for heavy computation of local search. Extensive experiments on four US cities with different geospatial conditions demonstrate that our approach can achieve comparable performance to commercial solvers with less than 5\% accessibility loss, while displaying up to 1000 times speedup. We deploy our model as an online geospatial application at https://huggingface.co/spaces/randommmm/MFLP.
△ Less
Submitted 6 September, 2024; v1 submitted 3 September, 2024;
originally announced September 2024.
-
OMR: Occlusion-Aware Memory-Based Refinement for Video Lane Detection
Authors:
Dongkwon Jin,
Chang-Su Kim
Abstract:
A novel algorithm for video lane detection is proposed in this paper. First, we extract a feature map for a current frame and detect a latent mask for obstacles occluding lanes. Then, we enhance the feature map by developing an occlusion-aware memory-based refinement (OMR) module. It takes the obstacle mask and feature map from the current frame, previous output, and memory information as input, a…
▽ More
A novel algorithm for video lane detection is proposed in this paper. First, we extract a feature map for a current frame and detect a latent mask for obstacles occluding lanes. Then, we enhance the feature map by developing an occlusion-aware memory-based refinement (OMR) module. It takes the obstacle mask and feature map from the current frame, previous output, and memory information as input, and processes them recursively in a video. Moreover, we apply a novel data augmentation scheme for training the OMR module effectively. Experimental results show that the proposed algorithm outperforms existing techniques on video lane datasets. Our codes are available at https://github.com/dongkwonjin/OMR.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
CARE: A Clue-guided Assistant for CSRs to Read User Manuals
Authors:
Weihong Du,
Jia Liu,
Zujie Wen,
Dingnan Jin,
Hongru Liang,
Wenqiang Lei
Abstract:
It is time-saving to build a reading assistant for customer service representations (CSRs) when reading user manuals, especially information-rich ones. Current solutions don't fit the online custom service scenarios well due to the lack of attention to user questions and possible responses. Hence, we propose to develop a time-saving and careful reading assistant for CSRs, named CARE. It can help t…
▽ More
It is time-saving to build a reading assistant for customer service representations (CSRs) when reading user manuals, especially information-rich ones. Current solutions don't fit the online custom service scenarios well due to the lack of attention to user questions and possible responses. Hence, we propose to develop a time-saving and careful reading assistant for CSRs, named CARE. It can help the CSRs quickly find proper responses from the user manuals via explicit clue chains. Specifically, each of the clue chains is formed by inferring over the user manuals, starting from the question clue aligned with the user question and ending at a possible response. To overcome the shortage of supervised data, we adopt the self-supervised strategy for model learning. The offline experiment shows that CARE is efficient in automatically inferring accurate responses from the user manual. The online experiment further demonstrates the superiority of CARE to reduce CSRs' reading burden and keep high service quality, in particular with >35% decrease in time spent and keeping a >0.75 ICC score.
△ Less
Submitted 26 August, 2024; v1 submitted 7 August, 2024;
originally announced August 2024.
-
An Evaluation of Requirements Modeling for Cyber-Physical Systems via LLMs
Authors:
Dongming Jin,
Shengxin Zhao,
Zhi Jin,
Xiaohong Chen,
Chunhui Wang,
Zheng Fang,
Hongbin Xiao
Abstract:
Cyber-physical systems (CPSs) integrate cyber and physical components and enable them to interact with each other to meet user needs. The needs for CPSs span rich application domains such as healthcare and medicine, smart home, smart building, etc. This indicates that CPSs are all about solving real-world problems. With the increasing abundance of sensing devices and effectors, the problems wanted…
▽ More
Cyber-physical systems (CPSs) integrate cyber and physical components and enable them to interact with each other to meet user needs. The needs for CPSs span rich application domains such as healthcare and medicine, smart home, smart building, etc. This indicates that CPSs are all about solving real-world problems. With the increasing abundance of sensing devices and effectors, the problems wanted to solve with CPSs are becoming more and more complex. It is also becoming increasingly difficult to extract and express CPS requirements accurately. Problem frame approach aims to shape real-world problems by capturing the characteristics and interconnections of components, where the problem diagram is central to expressing the requirements. CPSs requirements are generally presented in domain-specific documents that are normally expressed in natural language. There is currently no effective way to extract problem diagrams from natural language documents. CPSs requirements extraction and modeling are generally done manually, which is time-consuming, labor-intensive, and error-prone. Large language models (LLMs) have shown excellent performance in natural language understanding. It can be interesting to explore the abilities of LLMs to understand domain-specific documents and identify modeling elements, which this paper is working on. To achieve this goal, we first formulate two tasks (i.e., entity recognition and interaction extraction) and propose a benchmark called CPSBench. Based on this benchmark, extensive experiments are conducted to evaluate the abilities and limitations of seven advanced LLMs. We find some interesting insights. Finally, we establish a taxonomy of LLMs hallucinations in CPSs requirements modeling using problem diagrams. These results will inspire research on the use of LLMs for automated CPSs requirements modeling.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
PateGail: A Privacy-Preserving Mobility Trajectory Generator with Imitation Learning
Authors:
Huandong Wang,
Changzheng Gao,
Yuchen Wu,
Depeng Jin,
Lina Yao,
Yong Li
Abstract:
Generating human mobility trajectories is of great importance to solve the lack of large-scale trajectory data in numerous applications, which is caused by privacy concerns. However, existing mobility trajectory generation methods still require real-world human trajectories centrally collected as the training data, where there exists an inescapable risk of privacy leakage. To overcome this limitat…
▽ More
Generating human mobility trajectories is of great importance to solve the lack of large-scale trajectory data in numerous applications, which is caused by privacy concerns. However, existing mobility trajectory generation methods still require real-world human trajectories centrally collected as the training data, where there exists an inescapable risk of privacy leakage. To overcome this limitation, in this paper, we propose PateGail, a privacy-preserving imitation learning model to generate mobility trajectories, which utilizes the powerful generative adversary imitation learning model to simulate the decision-making process of humans. Further, in order to protect user privacy, we train this model collectively based on decentralized mobility data stored in user devices, where personal discriminators are trained locally to distinguish and reward the real and generated human trajectories. In the training process, only the generated trajectories and their rewards obtained based on personal discriminators are shared between the server and devices, whose privacy is further preserved by our proposed perturbation mechanisms with theoretical proof to satisfy differential privacy. Further, to better model the human decision-making process, we propose a novel aggregation mechanism of the rewards obtained from personal discriminators. We theoretically prove that under the reward obtained based on the aggregation mechanism, our proposed model maximizes the lower bound of the discounted total rewards of users. Extensive experiments show that the trajectories generated by our model are able to resemble real-world trajectories in terms of five key statistical metrics, outperforming state-of-the-art algorithms by over 48.03%. Furthermore, we demonstrate that the synthetic trajectories are able to efficiently support practical applications, including mobility prediction and location recommendation.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Dynamic Dimension Wrapping (DDW) Algorithm: A Novel Approach for Efficient Cross-Dimensional Search in Dynamic Multidimensional Spaces
Authors:
Dongnan Jin,
Yali Liu,
Qiuzhi Song,
Xunju Ma,
Yue Liu,
Dehao Wu
Abstract:
In the real world, as the complexity of optimization problems continues to increase, there is an urgent need to research more efficient optimization methods. Current optimization algorithms excel in solving problems with a fixed number of dimensions. However, their efficiency in searching dynamic multi-dimensional spaces is unsatisfactory. In response to the challenge of cross-dimensional search i…
▽ More
In the real world, as the complexity of optimization problems continues to increase, there is an urgent need to research more efficient optimization methods. Current optimization algorithms excel in solving problems with a fixed number of dimensions. However, their efficiency in searching dynamic multi-dimensional spaces is unsatisfactory. In response to the challenge of cross-dimensional search in multi-dimensional spaces with varying numbers of dimensions, this study proposes a new optimization algorithm-Dynamic Dimension Wrapping (DDW) algorithm. Firstly, by utilizing the Dynamic Time Warping (DTW) algorithm and Euclidean distance, a mapping relationship between different time series across dimensions is established, thus creating a fitness function suitable for dimensionally dynamic multi-dimensional space. Additionally, DDW introduces a novel, more efficient cross-dimensional search mechanism for dynamic multidimensional spaces. Finally, through comparative tests with 31 optimization algorithms in dynamic multidimensional space search, the results demonstrate that DDW exhibits outstanding search efficiency and provides search results closest to the actual optimal solution.
△ Less
Submitted 18 July, 2024; v1 submitted 16 July, 2024;
originally announced July 2024.
-
Voluminous Fur Stroking Experience through Interactive Visuo-Haptic Model in Virtual Reality
Authors:
Juro Hosoi,
Du Jin,
Yuki Ban,
Shin'ichi Warisawa
Abstract:
The tactile sensation of stroking soft fur, known for its comfort and emotional benefits, has numerous applications in virtual reality, animal-assisted therapy, and household products. Previous studies have primarily utilized actual fur to present a voluminous fur experience that poses challenges concerning versatility and flexibility. In this study, we develop a system that integrates a head-moun…
▽ More
The tactile sensation of stroking soft fur, known for its comfort and emotional benefits, has numerous applications in virtual reality, animal-assisted therapy, and household products. Previous studies have primarily utilized actual fur to present a voluminous fur experience that poses challenges concerning versatility and flexibility. In this study, we develop a system that integrates a head-mounted display with an ultrasound haptic display to provide visual and haptic feedback. Measurements taken using an artificial skin sheet reveal directional differences in tactile and visual responses to voluminous fur. Based on observations and measurements, we propose interactive models that dynamically adjust to hand movements, simulating fur-stroking sensations. Our experiments demonstrate that the proposed model using visual and haptic modalities significantly enhances the realism of a fur-stroking experience. Our findings suggest that the interactive visuo-haptic model offers a promising fur-stroking experience in virtual reality, potentially enhancing the user experience in therapeutic, entertainment, and retail applications.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
A GPU-accelerated Large-scale Simulator for Transportation System Optimization Benchmarking
Authors:
Jun Zhang,
Wenxuan Ao,
Junbo Yan,
Depeng Jin,
Yong Li
Abstract:
With the development of artificial intelligence techniques, transportation system optimization is evolving from traditional methods relying on expert experience to simulation and learning-based decision and optimization methods. Learning-based optimization methods require extensive interactions with highly realistic microscopic traffic simulators. However, existing microscopic traffic simulators a…
▽ More
With the development of artificial intelligence techniques, transportation system optimization is evolving from traditional methods relying on expert experience to simulation and learning-based decision and optimization methods. Learning-based optimization methods require extensive interactions with highly realistic microscopic traffic simulators. However, existing microscopic traffic simulators are inefficient in large-scale scenarios and thus fail to support the adoption of these methods in large-scale transportation system optimization scenarios. In addition, the optimization scenarios supported by existing simulators are limited, mainly focusing on the traffic signal control. To address these challenges, we propose the first open-source GPU-accelerated large-scale microscopic simulator for transportation system simulation and optimization. The simulator can iterate at 84.09Hz, which achieves 88.92 times computational acceleration in the large-scale scenario with 2,464,950 vehicles compared to the best baseline CityFlow. Besides, it achieves a more realistic average road speeds simulated on real datasets by adopting the IDM model as the car-following model and the randomized MOBIL model as the lane-changing model. Based on it, we implement a set of microscopic and macroscopic controllable objects and metrics provided by Python API to support typical transportation system optimization scenarios. We choose five representative scenarios and benchmark classical rule-based algorithms, reinforcement learning algorithms, and black-box optimization algorithms in four cities. These experiments effectively demonstrate the usability of the simulator for large-scale traffic system optimization. The code of the simulator is available at https://github.com/tsinghua-fib-lab/moss. We build an open-registration web platform available at https://moss.fiblab.net to support no-code trials.
△ Less
Submitted 2 October, 2024; v1 submitted 15 June, 2024;
originally announced June 2024.
-
MOSS: A Large-scale Open Microscopic Traffic Simulation System
Authors:
Jun Zhang,
Wenxuan Ao,
Junbo Yan,
Can Rong,
Depeng Jin,
Wei Wu,
Yong Li
Abstract:
In the research of Intelligent Transportation Systems (ITS), traffic simulation is a key procedure for the evaluation of new methods and optimization of strategies. However, existing traffic simulation systems face two challenges. First, how to balance simulation scale with realism is a dilemma. Second, it is hard to simulate realistic results, which requires realistic travel demand data and simul…
▽ More
In the research of Intelligent Transportation Systems (ITS), traffic simulation is a key procedure for the evaluation of new methods and optimization of strategies. However, existing traffic simulation systems face two challenges. First, how to balance simulation scale with realism is a dilemma. Second, it is hard to simulate realistic results, which requires realistic travel demand data and simulator. These problems limit computer-aided optimization of traffic management strategies for large-scale road networks and reduce the usability of traffic simulations in areas where real-world travel demand data are lacking. To address these problems, we design and implement MObility Simulation System (MOSS). MOSS adopts GPU acceleration to significantly improve the efficiency and scale of microscopic traffic simulation, which enables realistic and fast simulations for large-scale road networks. It provides realistic travel Origin-Destination (OD) matrices generation through a pre-trained generative neural network model based on publicly available data on a global scale, such as satellite imagery, to help researchers build meaningful travel demand data. It also provides a complete open toolchain to help users with road network construction, demand generation, simulation, and result analysis. The whole toolchain including the simulator can be accessed at https://moss.fiblab.net and the codes are open-source for community collaboration.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models
Authors:
Tong Zhang,
Peixin Qin,
Yang Deng,
Chen Huang,
Wenqiang Lei,
Junhong Liu,
Dingnan Jin,
Hongru Liang,
Tat-Seng Chua
Abstract:
Large language models (LLMs) are increasingly used to meet user information needs, but their effectiveness in dealing with user queries that contain various types of ambiguity remains unknown, ultimately risking user trust and satisfaction. To this end, we introduce CLAMBER, a benchmark for evaluating LLMs using a well-organized taxonomy. Building upon the taxonomy, we construct ~12K high-quality…
▽ More
Large language models (LLMs) are increasingly used to meet user information needs, but their effectiveness in dealing with user queries that contain various types of ambiguity remains unknown, ultimately risking user trust and satisfaction. To this end, we introduce CLAMBER, a benchmark for evaluating LLMs using a well-organized taxonomy. Building upon the taxonomy, we construct ~12K high-quality data to assess the strengths, weaknesses, and potential risks of various off-the-shelf LLMs. Our findings indicate the limited practical utility of current LLMs in identifying and clarifying ambiguous user queries, even enhanced by chain-of-thought (CoT) and few-shot prompting. These techniques may result in overconfidence in LLMs and yield only marginal enhancements in identifying ambiguity. Furthermore, current LLMs fall short in generating high-quality clarifying questions due to a lack of conflict resolution and inaccurate utilization of inherent knowledge. In this paper, CLAMBER presents a guidance and promotes further research on proactive and trustworthy LLMs. Our dataset is available at https://github.com/zt991211/CLAMBER
△ Less
Submitted 1 June, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
STYLE: Improving Domain Transferability of Asking Clarification Questions in Large Language Model Powered Conversational Agents
Authors:
Yue Chen,
Chen Huang,
Yang Deng,
Wenqiang Lei,
Dingnan Jin,
Jia Liu,
Tat-Seng Chua
Abstract:
Equipping a conversational search engine with strategies regarding when to ask clarification questions is becoming increasingly important across various domains. Attributing to the context understanding capability of LLMs and their access to domain-specific sources of knowledge, LLM-based clarification strategies feature rapid transfer to various domains in a post-hoc manner. However, they still s…
▽ More
Equipping a conversational search engine with strategies regarding when to ask clarification questions is becoming increasingly important across various domains. Attributing to the context understanding capability of LLMs and their access to domain-specific sources of knowledge, LLM-based clarification strategies feature rapid transfer to various domains in a post-hoc manner. However, they still struggle to deliver promising performance on unseen domains, struggling to achieve effective domain transferability. We take the first step to investigate this issue and existing methods tend to produce one-size-fits-all strategies across diverse domains, limiting their search effectiveness. In response, we introduce a novel method, called Style, to achieve effective domain transferability. Our experimental results indicate that Style bears strong domain transferability, resulting in an average search performance improvement of ~10% on four unseen domains.
△ Less
Submitted 1 June, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
OpenGait: A Comprehensive Benchmark Study for Gait Recognition towards Better Practicality
Authors:
Chao Fan,
Saihui Hou,
Junhao Liang,
Chuanfu Shen,
Jingzhe Ma,
Dongyang Jin,
Yongzhen Huang,
Shiqi Yu
Abstract:
Gait recognition, a rapidly advancing vision technology for person identification from a distance, has made significant strides in indoor settings. However, evidence suggests that existing methods often yield unsatisfactory results when applied to newly released real-world gait datasets. Furthermore, conclusions drawn from indoor gait datasets may not easily generalize to outdoor ones. Therefore,…
▽ More
Gait recognition, a rapidly advancing vision technology for person identification from a distance, has made significant strides in indoor settings. However, evidence suggests that existing methods often yield unsatisfactory results when applied to newly released real-world gait datasets. Furthermore, conclusions drawn from indoor gait datasets may not easily generalize to outdoor ones. Therefore, the primary goal of this work is to present a comprehensive benchmark study aimed at improving practicality rather than solely focusing on enhancing performance. To this end, we first develop OpenGait, a flexible and efficient gait recognition platform. Using OpenGait as a foundation, we conduct in-depth ablation experiments to revisit recent developments in gait recognition. Surprisingly, we detect some imperfect parts of certain prior methods thereby resulting in several critical yet undiscovered insights. Inspired by these findings, we develop three structurally simple yet empirically powerful and practically robust baseline models, i.e., DeepGaitV2, SkeletonGait, and SkeletonGait++, respectively representing the appearance-based, model-based, and multi-modal methodology for gait pattern description. Beyond achieving SoTA performances, more importantly, our careful exploration sheds new light on the modeling experience of deep gait models, the representational capacity of typical gait modalities, and so on. We hope this work can inspire further research and application of gait recognition towards better practicality. The code is available at https://github.com/ShiqiYu/OpenGait.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Theoretical Analysis for Expectation-Maximization-Based Multi-Model 3D Registration
Authors:
David Jin,
Harry Zhang,
Kai Chang
Abstract:
We perform detailed theoretical analysis of an expectation-maximization-based algorithm recently proposed in for solving a variation of the 3D registration problem, named multi-model 3D registration. Despite having shown superior empirical results, did not theoretically justify the conditions under which the EM approach converges to the ground truth. In this project, we aim to close this gap by es…
▽ More
We perform detailed theoretical analysis of an expectation-maximization-based algorithm recently proposed in for solving a variation of the 3D registration problem, named multi-model 3D registration. Despite having shown superior empirical results, did not theoretically justify the conditions under which the EM approach converges to the ground truth. In this project, we aim to close this gap by establishing such conditions. In particular, the analysis revolves around the usage of probabilistic tail bounds that are developed and applied in various instances throughout the course. The problem studied in this project stands as another example, different from those seen in the course, in which tail-bounds help advance our algorithmic understanding in a probabilistic way. We provide self-contained background materials on 3D Registration
△ Less
Submitted 24 May, 2024; v1 submitted 14 May, 2024;
originally announced May 2024.
-
MARE: Multi-Agents Collaboration Framework for Requirements Engineering
Authors:
Dongming Jin,
Zhi Jin,
Xiaohong Chen,
Chunhui Wang
Abstract:
Requirements Engineering (RE) is a critical phase in the software development process that generates requirements specifications from stakeholders' needs. Recently, deep learning techniques have been successful in several RE tasks. However, obtaining high-quality requirements specifications requires collaboration across multiple tasks and roles. In this paper, we propose an innovative framework ca…
▽ More
Requirements Engineering (RE) is a critical phase in the software development process that generates requirements specifications from stakeholders' needs. Recently, deep learning techniques have been successful in several RE tasks. However, obtaining high-quality requirements specifications requires collaboration across multiple tasks and roles. In this paper, we propose an innovative framework called MARE, which leverages collaboration among large language models (LLMs) throughout the entire RE process. MARE divides the RE process into four tasks: elicitation, modeling, verification, and specification. Each task is conducted by engaging one or two specific agents and each agent can conduct several actions. MARE has five agents and nine actions. To facilitate collaboration between agents, MARE has designed a workspace for agents to upload their generated intermediate requirements artifacts and obtain the information they need. We conduct experiments on five public cases, one dataset, and four new cases created by this work. We compared MARE with three baselines using three widely used metrics for the generated requirements models. Experimental results show that MARE can generate more correct requirements models and outperform the state-of-the-art approaches by 15.4%. For the generated requirements specifications, we conduct a human evaluation in three aspects and provide insights about the quality
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Semantic Line Combination Detector
Authors:
Jinwon Ko,
Dongkwon Jin,
Chang-Su Kim
Abstract:
A novel algorithm, called semantic line combination detector (SLCD), to find an optimal combination of semantic lines is proposed in this paper. It processes all lines in each line combination at once to assess the overall harmony of the lines. First, we generate various line combinations from reliable lines. Second, we estimate the score of each line combination and determine the best one. Experi…
▽ More
A novel algorithm, called semantic line combination detector (SLCD), to find an optimal combination of semantic lines is proposed in this paper. It processes all lines in each line combination at once to assess the overall harmony of the lines. First, we generate various line combinations from reliable lines. Second, we estimate the score of each line combination and determine the best one. Experimental results demonstrate that the proposed SLCD outperforms existing semantic line detectors on various datasets. Moreover, it is shown that SLCD can be applied effectively to three vision tasks of vanishing point detection, symmetry axis detection, and composition-based image retrieval. Our codes are available at https://github.com/Jinwon-Ko/SLCD.
△ Less
Submitted 1 May, 2024; v1 submitted 28 April, 2024;
originally announced April 2024.
-
Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in Transformer
Authors:
Qinji Yu,
Yirui Wang,
Ke Yan,
Haoshen Li,
Dazhou Guo,
Li Zhang,
Le Lu,
Na Shen,
Qifeng Wang,
Xiaowei Ding,
Xianghua Ye,
Dakai Jin
Abstract:
Lymph node (LN) assessment is a critical, indispensable yet very challenging task in the routine clinical workflow of radiology and oncology. Accurate LN analysis is essential for cancer diagnosis, staging, and treatment planning. Finding scatteredly distributed, low-contrast clinically relevant LNs in 3D CT is difficult even for experienced physicians under high inter-observer variations. Previou…
▽ More
Lymph node (LN) assessment is a critical, indispensable yet very challenging task in the routine clinical workflow of radiology and oncology. Accurate LN analysis is essential for cancer diagnosis, staging, and treatment planning. Finding scatteredly distributed, low-contrast clinically relevant LNs in 3D CT is difficult even for experienced physicians under high inter-observer variations. Previous automatic LN detection works typically yield limited recall and high false positives (FPs) due to adjacent anatomies with similar image intensities, shapes, or textures (vessels, muscles, esophagus, etc). In this work, we propose a new LN DEtection TRansformer, named LN-DETR, to achieve more accurate performance. By enhancing the 2D backbone with a multi-scale 2.5D feature fusion to incorporate 3D context explicitly, more importantly, we make two main contributions to improve the representation quality of LN queries. 1) Considering that LN boundaries are often unclear, an IoU prediction head and a location debiased query selection are proposed to select LN queries of higher localization accuracy as the decoder query's initialization. 2) To reduce FPs, query contrastive learning is employed to explicitly reinforce LN queries towards their best-matched ground-truth queries over unmatched query predictions. Trained and tested on 3D CT scans of 1067 patients (with 10,000+ labeled LNs) via combining seven LN datasets from different body parts (neck, chest, and abdomen) and pathologies/cancers, our method significantly improves the performance of previous leading methods by > 4-5% average recall at the same FP rates in both internal and external testing. We further evaluate on the universal lesion detection task using NIH DeepLesion benchmark, and our method achieves the top performance of 88.46% averaged recall across 0.5 to 4 FPs per image, compared with other leading reported results.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Large Motion Model for Unified Multi-Modal Motion Generation
Authors:
Mingyuan Zhang,
Daisheng Jin,
Chenyang Gu,
Fangzhou Hong,
Zhongang Cai,
Jingfang Huang,
Chongzhi Zhang,
Xinying Guo,
Lei Yang,
Ying He,
Ziwei Liu
Abstract:
Human motion generation, a cornerstone technique in animation and video production, has widespread applications in various tasks like text-to-motion and music-to-dance. Previous works focus on developing specialist models tailored for each task without scalability. In this work, we present Large Motion Model (LMM), a motion-centric, multi-modal framework that unifies mainstream motion generation t…
▽ More
Human motion generation, a cornerstone technique in animation and video production, has widespread applications in various tasks like text-to-motion and music-to-dance. Previous works focus on developing specialist models tailored for each task without scalability. In this work, we present Large Motion Model (LMM), a motion-centric, multi-modal framework that unifies mainstream motion generation tasks into a generalist model. A unified motion model is appealing since it can leverage a wide range of motion data to achieve broad generalization beyond a single task. However, it is also challenging due to the heterogeneous nature of substantially different motion data and tasks. LMM tackles these challenges from three principled aspects: 1) Data: We consolidate datasets with different modalities, formats and tasks into a comprehensive yet unified motion generation dataset, MotionVerse, comprising 10 tasks, 16 datasets, a total of 320k sequences, and 100 million frames. 2) Architecture: We design an articulated attention mechanism ArtAttention that incorporates body part-aware modeling into Diffusion Transformer backbone. 3) Pre-Training: We propose a novel pre-training strategy for LMM, which employs variable frame rates and masking forms, to better exploit knowledge from diverse training data. Extensive experiments demonstrate that our generalist LMM achieves competitive performance across various standard motion generation tasks over state-of-the-art specialist models. Notably, LMM exhibits strong generalization capabilities and emerging properties across many unseen tasks. Additionally, our ablation studies reveal valuable insights about training and scaling up large motion models for future research.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Towards a Comprehensive, Efficient and Promptable Anatomic Structure Segmentation Model using 3D Whole-body CT Scans
Authors:
Heng Guo,
Jianfeng Zhang,
Jiaxing Huang,
Tony C. W. Mok,
Dazhou Guo,
Ke Yan,
Le Lu,
Dakai Jin,
Minfeng Xu
Abstract:
Segment anything model (SAM) demonstrates strong generalization ability on natural image segmentation. However, its direct adaption in medical image segmentation tasks shows significant performance drops with inferior accuracy and unstable results. It may also requires an excessive number of prompt points to obtain a reasonable accuracy. For segmenting 3D radiological CT or MRI scans, a 2D SAM mod…
▽ More
Segment anything model (SAM) demonstrates strong generalization ability on natural image segmentation. However, its direct adaption in medical image segmentation tasks shows significant performance drops with inferior accuracy and unstable results. It may also requires an excessive number of prompt points to obtain a reasonable accuracy. For segmenting 3D radiological CT or MRI scans, a 2D SAM model has to separately handle hundreds of 2D slices. Although quite a few studies explore adapting SAM into medical image volumes, the efficiency of 2D adaption methods is unsatisfactory and 3D adaptation methods only capable of segmenting specific organs/tumors. In this work, we propose a comprehensive and scalable 3D SAM model for whole-body CT segmentation, named CT-SAM3D. Instead of adapting SAM, we propose a 3D promptable segmentation model using a (nearly) fully labeled CT dataset. To train CT-SAM3D effectively, ensuring the model's accurate responses to higher-dimensional spatial prompts is crucial, and 3D patch-wise training is required due to GPU memory constraints. For this purpose, we propose two key technical developments: 1) a progressively and spatially aligned prompt encoding method to effectively encode click prompts in local 3D space; and 2) a cross-patch prompt learning scheme to capture more 3D spatial context, which is beneficial for reducing the editing workloads when interactively prompting on large organs. CT-SAM3D is trained and validated using a curated dataset of 1204 CT scans containing 107 whole-body anatomies, reporting significantly better quantitative performance against all previous SAM-derived models by a large margin with much fewer click prompts. Our model can handle segmenting unseen organ as well. Code, data, and our 3D interactive segmentation tool with quasi-real-time responses will be made publicly available.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Multi-role Consensus through LLMs Discussions for Vulnerability Detection
Authors:
Zhenyu Mao,
Jialong Li,
Dongming Jin,
Munan Li,
Kenji Tei
Abstract:
Recent advancements in large language models (LLMs) have highlighted the potential for vulnerability detection, a crucial component of software quality assurance. Despite this progress, most studies have been limited to the perspective of a single role, usually testers, lacking diverse viewpoints from different roles in a typical software development life-cycle, including both developers and teste…
▽ More
Recent advancements in large language models (LLMs) have highlighted the potential for vulnerability detection, a crucial component of software quality assurance. Despite this progress, most studies have been limited to the perspective of a single role, usually testers, lacking diverse viewpoints from different roles in a typical software development life-cycle, including both developers and testers. To this end, this paper introduces a multi-role approach to employ LLMs to act as different roles simulating a real-life code review process and engaging in discussions toward a consensus on the existence and classification of vulnerabilities in the code. Preliminary evaluation of this approach indicates a 13.48% increase in the precision rate, an 18.25% increase in the recall rate, and a 16.13% increase in the F1 score.
△ Less
Submitted 18 May, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework
Authors:
Kaiyan Chang,
Kun Wang,
Nan Yang,
Ying Wang,
Dantong Jin,
Wenlong Zhu,
Zhirong Chen,
Cangyuan Li,
Hao Yan,
Yunhao Zhou,
Zhuoliang Zhao,
Yuan Cheng,
Yudong Pan,
Yiqi Liu,
Mengdi Wang,
Shengwen Liang,
Yinhe Han,
Huawei Li,
Xiaowei Li
Abstract:
Recent advances in large language models have demonstrated their potential for automated generation of hardware description language (HDL) code from high-level prompts. Researchers have utilized fine-tuning to enhance the ability of these large language models (LLMs) in the field of Chip Design. However, the lack of Verilog data hinders further improvement in the quality of Verilog generation by L…
▽ More
Recent advances in large language models have demonstrated their potential for automated generation of hardware description language (HDL) code from high-level prompts. Researchers have utilized fine-tuning to enhance the ability of these large language models (LLMs) in the field of Chip Design. However, the lack of Verilog data hinders further improvement in the quality of Verilog generation by LLMs. Additionally, the absence of a Verilog and Electronic Design Automation (EDA) script data augmentation framework significantly increases the time required to prepare the training dataset for LLM trainers. This paper proposes an automated design-data augmentation framework, which generates high-volume and high-quality natural language aligned with Verilog and EDA scripts. For Verilog generation, it translates Verilog files to an abstract syntax tree and then maps nodes to natural language with a predefined template. For Verilog repair, it uses predefined rules to generate the wrong verilog file and then pairs EDA Tool feedback with the right and wrong verilog file. For EDA Script generation, it uses existing LLM(GPT-3.5) to obtain the description of the Script. To evaluate the effectiveness of our data augmentation method, we finetune Llama2-13B and Llama2-7B models using the dataset generated by our augmentation framework. The results demonstrate a significant improvement in the Verilog generation tasks with LLMs. Moreover, the accuracy of Verilog generation surpasses that of the current state-of-the-art open-source Verilog generation model, increasing from 58.8% to 70.6% with the same benchmark. Our 13B model (ChipGPT-FT) has a pass rate improvement compared with GPT-3.5 in Verilog generation and outperforms in EDA script (i.e., SiliconCompiler) generation with only 200 EDA script data.
△ Less
Submitted 10 July, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
Rumor Mitigation in Social Media Platforms with Deep Reinforcement Learning
Authors:
Hongyuan Su,
Yu Zheng,
Jingtao Ding,
Depeng Jin,
Yong Li
Abstract:
Social media platforms have become one of the main channels where people disseminate and acquire information, of which the reliability is severely threatened by rumors widespread in the network. Existing approaches such as suspending users or broadcasting real information to combat rumors are either with high cost or disturbing users. In this paper, we introduce a novel rumor mitigation paradigm,…
▽ More
Social media platforms have become one of the main channels where people disseminate and acquire information, of which the reliability is severely threatened by rumors widespread in the network. Existing approaches such as suspending users or broadcasting real information to combat rumors are either with high cost or disturbing users. In this paper, we introduce a novel rumor mitigation paradigm, where only a minimal set of links in the social network are intervened to decelerate the propagation of rumors, countering misinformation with low business cost and user awareness. A knowledge-informed agent embodying rumor propagation mechanisms is developed, which intervenes the social network with a graph neural network for capturing information flow in the social media platforms and a policy network for selecting links. Experiments on real social media platforms demonstrate that the proposed approach can effectively alleviate the influence of rumors, substantially reducing the affected populations by over 25%. Codes for this paper are released at https://github.com/tsinghua-fib-lab/DRL-Rumor-Mitigation.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
MetroGNN: Metro Network Expansion with Reinforcement Learning
Authors:
Hongyuan Su,
Yu Zheng,
Jingtao Ding,
Depeng Jin,
Yong Li
Abstract:
Selecting urban regions for metro network expansion to meet maximal transportation demands is crucial for urban development, while computationally challenging to solve. The expansion process relies not only on complicated features like urban demographics and origin-destination (OD) flow but is also constrained by the existing metro network and urban geography. In this paper, we introduce a reinfor…
▽ More
Selecting urban regions for metro network expansion to meet maximal transportation demands is crucial for urban development, while computationally challenging to solve. The expansion process relies not only on complicated features like urban demographics and origin-destination (OD) flow but is also constrained by the existing metro network and urban geography. In this paper, we introduce a reinforcement learning framework to address a Markov decision process within an urban heterogeneous multi-graph. Our approach employs an attentive policy network that intelligently selects nodes based on information captured by a graph neural network. Experiments on real-world urban data demonstrate that our proposed methodology substantially improve the satisfied transportation demands by over 30\% when compared with state-of-the-art methods. Codes are published at https://github.com/tsinghua-fib-lab/MetroGNN.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
GMKF: Generalized Moment Kalman Filter for Polynomial Systems with Arbitrary Noise
Authors:
Sangli Teng,
Harry Zhang,
David Jin,
Ashkan Jasour,
Maani Ghaffari,
Luca Carlone
Abstract:
This paper develops a new filtering approach for state estimation in polynomial systems corrupted by arbitrary noise, which commonly arise in robotics. We first consider a batch setup where we perform state estimation using all data collected from the initial to the current time. We formulate the batch state estimation problem as a Polynomial Optimization Problem (POP) and relax the assumption of…
▽ More
This paper develops a new filtering approach for state estimation in polynomial systems corrupted by arbitrary noise, which commonly arise in robotics. We first consider a batch setup where we perform state estimation using all data collected from the initial to the current time. We formulate the batch state estimation problem as a Polynomial Optimization Problem (POP) and relax the assumption of Gaussian noise by specifying a finite number of moments of the noise. We solve the resulting POP using a moment relaxation and prove that under suitable conditions on the rank of the relaxation, (i) we can extract a provably optimal estimate from the moment relaxation, and (ii) we can obtain a belief representation from the dual (sum-of-squares) relaxation. We then turn our attention to the filtering setup and apply similar insights to develop a GMKF for recursive state estimation in polynomial systems with arbitrary noise. The GMKF formulates the prediction and update steps as POPs and solves them using moment relaxations, carrying over a possibly non-Gaussian belief. In the linear-Gaussian case, GMKF reduces to the standard Kalman Filter. We demonstrate that GMKF performs well under highly non-Gaussian noise and outperforms common alternatives, including the Extended and Unscented Kalman Filter, and their variants on matrix Lie group.
△ Less
Submitted 8 March, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
InjectTST: A Transformer Method of Injecting Global Information into Independent Channels for Long Time Series Forecasting
Authors:
Ce Chi,
Xing Wang,
Kexin Yang,
Zhiyan Song,
Di Jin,
Lin Zhu,
Chao Deng,
Junlan Feng
Abstract:
Transformer has become one of the most popular architectures for multivariate time series (MTS) forecasting. Recent Transformer-based MTS models generally prefer channel-independent structures with the observation that channel independence can alleviate noise and distribution drift issues, leading to more robustness. Nevertheless, it is essential to note that channel dependency remains an inherent…
▽ More
Transformer has become one of the most popular architectures for multivariate time series (MTS) forecasting. Recent Transformer-based MTS models generally prefer channel-independent structures with the observation that channel independence can alleviate noise and distribution drift issues, leading to more robustness. Nevertheless, it is essential to note that channel dependency remains an inherent characteristic of MTS, carrying valuable information. Designing a model that incorporates merits of both channel-independent and channel-mixing structures is a key to further improvement of MTS forecasting, which poses a challenging conundrum. To address the problem, an injection method for global information into channel-independent Transformer, InjectTST, is proposed in this paper. Instead of designing a channel-mixing model directly, we retain the channel-independent backbone and gradually inject global information into individual channels in a selective way. A channel identifier, a global mixing module and a self-contextual attention module are devised in InjectTST. The channel identifier can help Transformer distinguish channels for better representation. The global mixing module produces cross-channel global information. Through the self-contextual attention module, the independent channels can selectively concentrate on useful global information without robustness degradation, and channel mixing is achieved implicitly. Experiments indicate that InjectTST can achieve stable improvement compared with state-of-the-art models.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration
Authors:
Tony C. W. Mok,
Zi Li,
Yunhao Bai,
Jianpeng Zhang,
Wei Liu,
Yan-Jie Zhou,
Ke Yan,
Dakai Jin,
Yu Shi,
Xiaoli Yin,
Le Lu,
Ling Zhang
Abstract:
Establishing dense anatomical correspondence across distinct imaging modalities is a foundational yet challenging procedure for numerous medical image analysis studies and image-guided radiotherapy. Existing multi-modality image registration algorithms rely on statistical-based similarity measures or local structural image representations. However, the former is sensitive to locally varying noise,…
▽ More
Establishing dense anatomical correspondence across distinct imaging modalities is a foundational yet challenging procedure for numerous medical image analysis studies and image-guided radiotherapy. Existing multi-modality image registration algorithms rely on statistical-based similarity measures or local structural image representations. However, the former is sensitive to locally varying noise, while the latter is not discriminative enough to cope with complex anatomical structures in multimodal scans, causing ambiguity in determining the anatomical correspondence across scans with different modalities. In this paper, we propose a modality-agnostic structural representation learning method, which leverages Deep Neighbourhood Self-similarity (DNS) and anatomy-aware contrastive learning to learn discriminative and contrast-invariance deep structural image representations (DSIR) without the need for anatomical delineations or pre-aligned training images. We evaluate our method on multiphase CT, abdomen MR-CT, and brain MR T1w-T2w registration. Comprehensive results demonstrate that our method is superior to the conventional local structural representation and statistical-based similarity measures in terms of discriminability and accuracy.
△ Less
Submitted 31 March, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
Large Language Model for Participatory Urban Planning
Authors:
Zhilun Zhou,
Yuming Lin,
Depeng Jin,
Yong Li
Abstract:
Participatory urban planning is the mainstream of modern urban planning that involves the active engagement of residents. However, the traditional participatory paradigm requires experienced planning experts and is often time-consuming and costly. Fortunately, the emerging Large Language Models (LLMs) have shown considerable ability to simulate human-like agents, which can be used to emulate the p…
▽ More
Participatory urban planning is the mainstream of modern urban planning that involves the active engagement of residents. However, the traditional participatory paradigm requires experienced planning experts and is often time-consuming and costly. Fortunately, the emerging Large Language Models (LLMs) have shown considerable ability to simulate human-like agents, which can be used to emulate the participatory process easily. In this work, we introduce an LLM-based multi-agent collaboration framework for participatory urban planning, which can generate land-use plans for urban regions considering the diverse needs of residents. Specifically, we construct LLM agents to simulate a planner and thousands of residents with diverse profiles and backgrounds. We first ask the planner to carry out an initial land-use plan. To deal with the different facilities needs of residents, we initiate a discussion among the residents in each community about the plan, where residents provide feedback based on their profiles. Furthermore, to improve the efficiency of discussion, we adopt a fishbowl discussion mechanism, where part of the residents discuss and the rest of them act as listeners in each round. Finally, we let the planner modify the plan based on residents' feedback. We deploy our method on two real-world regions in Beijing. Experiments show that our method achieves state-of-the-art performance in residents satisfaction and inclusion metrics, and also outperforms human experts in terms of service accessibility and ecology metrics.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Spatio-Temporal Few-Shot Learning via Diffusive Neural Network Generation
Authors:
Yuan Yuan,
Chenyang Shao,
Jingtao Ding,
Depeng Jin,
Yong Li
Abstract:
Spatio-temporal modeling is foundational for smart city applications, yet it is often hindered by data scarcity in many cities and regions. To bridge this gap, we propose a novel generative pre-training framework, GPD, for spatio-temporal few-shot learning with urban knowledge transfer. Unlike conventional approaches that heavily rely on common feature extraction or intricate few-shot learning des…
▽ More
Spatio-temporal modeling is foundational for smart city applications, yet it is often hindered by data scarcity in many cities and regions. To bridge this gap, we propose a novel generative pre-training framework, GPD, for spatio-temporal few-shot learning with urban knowledge transfer. Unlike conventional approaches that heavily rely on common feature extraction or intricate few-shot learning designs, our solution takes a novel approach by performing generative pre-training on a collection of neural network parameters optimized with data from source cities. We recast spatio-temporal few-shot learning as pre-training a generative diffusion model, which generates tailored neural networks guided by prompts, allowing for adaptability to diverse data distributions and city-specific characteristics. GPD employs a Transformer-based denoising diffusion model, which is model-agnostic to integrate with powerful spatio-temporal neural networks. By addressing challenges arising from data gaps and the complexity of generalizing knowledge across cities, our framework consistently outperforms state-of-the-art baselines on multiple real-world datasets for tasks such as traffic speed prediction and crowd flow prediction. The implementation of our approach is available: https://github.com/tsinghua-fib-lab/GPD.
△ Less
Submitted 25 March, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
UniST: A Prompt-Empowered Universal Model for Urban Spatio-Temporal Prediction
Authors:
Yuan Yuan,
Jingtao Ding,
Jie Feng,
Depeng Jin,
Yong Li
Abstract:
Urban spatio-temporal prediction is crucial for informed decision-making, such as traffic management, resource optimization, and emergence response. Despite remarkable breakthroughs in pretrained natural language models that enable one model to handle diverse tasks, a universal solution for spatio-temporal prediction remains challenging Existing prediction approaches are typically tailored for spe…
▽ More
Urban spatio-temporal prediction is crucial for informed decision-making, such as traffic management, resource optimization, and emergence response. Despite remarkable breakthroughs in pretrained natural language models that enable one model to handle diverse tasks, a universal solution for spatio-temporal prediction remains challenging Existing prediction approaches are typically tailored for specific spatio-temporal scenarios, requiring task-specific model designs and extensive domain-specific training data. In this study, we introduce UniST, a universal model designed for general urban spatio-temporal prediction across a wide range of scenarios. Inspired by large language models, UniST achieves success through: (i) utilizing diverse spatio-temporal data from different scenarios, (ii) effective pre-training to capture complex spatio-temporal dynamics, (iii) knowledge-guided prompts to enhance generalization capabilities. These designs together unlock the potential of building a universal model for various scenarios Extensive experiments on more than 20 spatio-temporal scenarios demonstrate UniST's efficacy in advancing state-of-the-art performance, especially in few-shot and zero-shot prediction. The datasets and code implementation are released on https://github.com/tsinghua-fib-lab/UniST.
△ Less
Submitted 30 June, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning
Authors:
Zhiyang Xu,
Chao Feng,
Rulin Shao,
Trevor Ashby,
Ying Shen,
Di Jin,
Yu Cheng,
Qifan Wang,
Lifu Huang
Abstract:
Despite vision-language models' (VLMs) remarkable capabilities as versatile visual assistants, two substantial challenges persist within the existing VLM frameworks: (1) lacking task diversity in pretraining and visual instruction tuning, and (2) annotation error and bias in GPT-4 synthesized instruction tuning data. Both challenges lead to issues such as poor generalizability, hallucination, and…
▽ More
Despite vision-language models' (VLMs) remarkable capabilities as versatile visual assistants, two substantial challenges persist within the existing VLM frameworks: (1) lacking task diversity in pretraining and visual instruction tuning, and (2) annotation error and bias in GPT-4 synthesized instruction tuning data. Both challenges lead to issues such as poor generalizability, hallucination, and catastrophic forgetting. To address these challenges, we construct Vision-Flan, the most diverse publicly available visual instruction tuning dataset to date, comprising 187 diverse tasks and 1,664,261 instances sourced from academic datasets, and each task is accompanied by an expert-written instruction. In addition, we propose a two-stage instruction tuning framework, in which VLMs are firstly finetuned on Vision-Flan and further tuned on GPT-4 synthesized data. We find this two-stage tuning framework significantly outperforms the traditional single-stage visual instruction tuning framework and achieves the state-of-the-art performance across a wide range of multi-modal evaluation benchmarks. Finally, we conduct in-depth analyses to understand visual instruction tuning and our findings reveal that: (1) GPT-4 synthesized data does not substantially enhance VLMs' capabilities but rather modulates the model's responses to human-preferred formats; (2) A minimal quantity (e.g., 1,000) of GPT-4 synthesized data can effectively align VLM responses with human-preference; (3) Visual instruction tuning mainly helps large-language models (LLMs) to understand visual features.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Multi-Model 3D Registration: Finding Multiple Moving Objects in Cluttered Point Clouds
Authors:
David Jin,
Sushrut Karmalkar,
Harry Zhang,
Luca Carlone
Abstract:
We investigate a variation of the 3D registration problem, named multi-model 3D registration. In the multi-model registration problem, we are given two point clouds picturing a set of objects at different poses (and possibly including points belonging to the background) and we want to simultaneously reconstruct how all objects moved between the two point clouds. This setup generalizes standard 3D…
▽ More
We investigate a variation of the 3D registration problem, named multi-model 3D registration. In the multi-model registration problem, we are given two point clouds picturing a set of objects at different poses (and possibly including points belonging to the background) and we want to simultaneously reconstruct how all objects moved between the two point clouds. This setup generalizes standard 3D registration where one wants to reconstruct a single pose, e.g., the motion of the sensor picturing a static scene. Moreover, it provides a mathematically grounded formulation for relevant robotics applications, e.g., where a depth sensor onboard a robot perceives a dynamic scene and has the goal of estimating its own motion (from the static portion of the scene) while simultaneously recovering the motion of all dynamic objects. We assume a correspondence-based setup where we have putative matches between the two point clouds and consider the practical case where these correspondences are plagued with outliers. We then propose a simple approach based on Expectation-Maximization (EM) and establish theoretical conditions under which the EM approach converges to the ground truth. We evaluate the approach in simulated and real datasets ranging from table-top scenes to self-driving scenarios and demonstrate its effectiveness when combined with state-of-the-art scene flow methods to establish dense correspondences.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Diffusion Model-based Probabilistic Downscaling for 180-year East Asian Climate Reconstruction
Authors:
Fenghua Ling,
Zeyu Lu,
Jing-Jia Luo,
Lei Bai,
Swadhin K. Behera,
Dachao Jin,
Baoxiang Pan,
Huidong Jiang,
Toshio Yamagata
Abstract:
As our planet is entering into the "global boiling" era, understanding regional climate change becomes imperative. Effective downscaling methods that provide localized insights are crucial for this target. Traditional approaches, including computationally-demanding regional dynamical models or statistical downscaling frameworks, are often susceptible to the influence of downscaling uncertainty. He…
▽ More
As our planet is entering into the "global boiling" era, understanding regional climate change becomes imperative. Effective downscaling methods that provide localized insights are crucial for this target. Traditional approaches, including computationally-demanding regional dynamical models or statistical downscaling frameworks, are often susceptible to the influence of downscaling uncertainty. Here, we address these limitations by introducing a diffusion probabilistic downscaling model (DPDM) into the meteorological field. This model can efficiently transform data from 1° to 0.1° resolution. Compared with deterministic downscaling schemes, it not only has more accurate local details, but also can generate a large number of ensemble members based on probability distribution sampling to evaluate the uncertainty of downscaling. Additionally, we apply the model to generate a 180-year dataset of monthly surface variables in East Asia, offering a more detailed perspective for understanding local scale climate change over the past centuries.
△ Less
Submitted 5 April, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
GOODAT: Towards Test-time Graph Out-of-Distribution Detection
Authors:
Luzhi Wang,
Dongxiao He,
He Zhang,
Yixin Liu,
Wenjie Wang,
Shirui Pan,
Di Jin,
Tat-Seng Chua
Abstract:
Graph neural networks (GNNs) have found widespread application in modeling graph data across diverse domains. While GNNs excel in scenarios where the testing data shares the distribution of their training counterparts (in distribution, ID), they often exhibit incorrect predictions when confronted with samples from an unfamiliar distribution (out-of-distribution, OOD). To identify and reject OOD sa…
▽ More
Graph neural networks (GNNs) have found widespread application in modeling graph data across diverse domains. While GNNs excel in scenarios where the testing data shares the distribution of their training counterparts (in distribution, ID), they often exhibit incorrect predictions when confronted with samples from an unfamiliar distribution (out-of-distribution, OOD). To identify and reject OOD samples with GNNs, recent studies have explored graph OOD detection, often focusing on training a specific model or modifying the data on top of a well-trained GNN. Despite their effectiveness, these methods come with heavy training resources and costs, as they need to optimize the GNN-based models on training data. Moreover, their reliance on modifying the original GNNs and accessing training data further restricts their universality. To this end, this paper introduces a method to detect Graph Out-of-Distribution At Test-time (namely GOODAT), a data-centric, unsupervised, and plug-and-play solution that operates independently of training data and modifications of GNN architecture. With a lightweight graph masker, GOODAT can learn informative subgraphs from test samples, enabling the capture of distinct graph patterns between OOD and ID samples. To optimize the graph masker, we meticulously design three unsupervised objective functions based on the graph information bottleneck principle, motivating the masker to capture compact yet informative subgraphs for OOD detection. Comprehensive evaluations confirm that our GOODAT method outperforms state-of-the-art benchmarks across a variety of real-world datasets. The code is available at Github: https://github.com/Ee1s/GOODAT
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Robust Geometry and Reflectance Disentanglement for 3D Face Reconstruction from Sparse-view Images
Authors:
Daisheng Jin,
Jiangbei Hu,
Baixin Xu,
Yuxin Dai,
Chen Qian,
Ying He
Abstract:
This paper presents a novel two-stage approach for reconstructing human faces from sparse-view images, a task made challenging by the unique geometry and complex skin reflectance of each individual. Our method focuses on decomposing key facial attributes, including geometry, diffuse reflectance, and specular reflectance, from ambient light. Initially, we create a general facial template from a div…
▽ More
This paper presents a novel two-stage approach for reconstructing human faces from sparse-view images, a task made challenging by the unique geometry and complex skin reflectance of each individual. Our method focuses on decomposing key facial attributes, including geometry, diffuse reflectance, and specular reflectance, from ambient light. Initially, we create a general facial template from a diverse collection of individual faces, capturing essential geometric and reflectance characteristics. Guided by this template, we refine each specific face model in the second stage, which further considers the interaction between geometry and reflectance, as well as the subsurface scattering effects on facial skin. Our method enables the reconstruction of high-quality facial representations from as few as three images, offering improved geometric accuracy and reflectance detail. Through comprehensive evaluations and comparisons, our method demonstrates superiority over existing techniques. Our method effectively disentangles geometry and reflectance components, leading to enhanced quality in synthesizing new views and opening up possibilities for applications such as relighting and reflectance editing. We will make the code publicly available.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
SAME++: A Self-supervised Anatomical eMbeddings Enhanced medical image registration framework using stable sampling and regularized transformation
Authors:
Lin Tian,
Zi Li,
Fengze Liu,
Xiaoyu Bai,
Jia Ge,
Le Lu,
Marc Niethammer,
Xianghua Ye,
Ke Yan,
Daikai Jin
Abstract:
Image registration is a fundamental medical image analysis task. Ideally, registration should focus on aligning semantically corresponding voxels, i.e., the same anatomical locations. However, existing methods often optimize similarity measures computed directly on intensities or on hand-crafted features, which lack anatomical semantic information. These similarity measures may lead to sub-optimal…
▽ More
Image registration is a fundamental medical image analysis task. Ideally, registration should focus on aligning semantically corresponding voxels, i.e., the same anatomical locations. However, existing methods often optimize similarity measures computed directly on intensities or on hand-crafted features, which lack anatomical semantic information. These similarity measures may lead to sub-optimal solutions where large deformations, complex anatomical differences, or cross-modality imagery exist. In this work, we introduce a fast and accurate method for unsupervised 3D medical image registration building on top of a Self-supervised Anatomical eMbedding (SAM) algorithm, which is capable of computing dense anatomical correspondences between two images at the voxel level. We name our approach SAM-Enhanced registration (SAME++), which decomposes image registration into four steps: affine transformation, coarse deformation, deep non-parametric transformation, and instance optimization. Using SAM embeddings, we enhance these steps by finding more coherent correspondence and providing features with better semantic guidance. We extensively evaluated SAME++ using more than 50 labeled organs on three challenging inter-subject registration tasks of different body parts. As a complete registration framework, SAME++ markedly outperforms leading methods by $4.2\%$ - $8.2\%$ in terms of Dice score while being orders of magnitude faster than numerical optimization-based methods. Code is available at \url{https://github.com/alibaba-damo-academy/same}.
△ Less
Submitted 25 February, 2024; v1 submitted 25 November, 2023;
originally announced November 2023.
-
Data-Efficient Alignment of Large Language Models with Human Feedback Through Natural Language
Authors:
Di Jin,
Shikib Mehri,
Devamanyu Hazarika,
Aishwarya Padmakumar,
Sungjin Lee,
Yang Liu,
Mahdi Namazifar
Abstract:
Learning from human feedback is a prominent technique to align the output of large language models (LLMs) with human expectations. Reinforcement learning from human feedback (RLHF) leverages human preference signals that are in the form of ranking of response pairs to perform this alignment. However, human preference on LLM outputs can come in much richer forms including natural language, which ma…
▽ More
Learning from human feedback is a prominent technique to align the output of large language models (LLMs) with human expectations. Reinforcement learning from human feedback (RLHF) leverages human preference signals that are in the form of ranking of response pairs to perform this alignment. However, human preference on LLM outputs can come in much richer forms including natural language, which may provide detailed feedback on strengths and weaknesses of a given response. In this work we investigate data efficiency of modeling human feedback that is in natural language. Specifically, we fine-tune an open-source LLM, e.g., Falcon-40B-Instruct, on a relatively small amount (1000 records or even less) of human feedback in natural language in the form of critiques and revisions of responses. We show that this model is able to improve the quality of responses from even some of the strongest LLMs such as ChatGPT, BARD, and Vicuna, through critique and revision of those responses. For instance, through one iteration of revision of ChatGPT responses, the revised responses have 56.6% win rate over the original ones, and this win rate can be further improved to 65.9% after applying the revision for five iterations.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
SkeletonGait: Gait Recognition Using Skeleton Maps
Authors:
Chao Fan,
Jingzhe Ma,
Dongyang Jin,
Chuanfu Shen,
Shiqi Yu
Abstract:
The choice of the representations is essential for deep gait recognition methods. The binary silhouettes and skeletal coordinates are two dominant representations in recent literature, achieving remarkable advances in many scenarios. However, inherent challenges remain, in which silhouettes are not always guaranteed in unconstrained scenes, and structural cues have not been fully utilized from ske…
▽ More
The choice of the representations is essential for deep gait recognition methods. The binary silhouettes and skeletal coordinates are two dominant representations in recent literature, achieving remarkable advances in many scenarios. However, inherent challenges remain, in which silhouettes are not always guaranteed in unconstrained scenes, and structural cues have not been fully utilized from skeletons. In this paper, we introduce a novel skeletal gait representation named skeleton map, together with SkeletonGait, a skeleton-based method to exploit structural information from human skeleton maps. Specifically, the skeleton map represents the coordinates of human joints as a heatmap with Gaussian approximation, exhibiting a silhouette-like image devoid of exact body structure. Beyond achieving state-of-the-art performances over five popular gait datasets, more importantly, SkeletonGait uncovers novel insights about how important structural features are in describing gait and when they play a role. Furthermore, we propose a multi-branch architecture, named SkeletonGait++, to make use of complementary features from both skeletons and silhouettes. Experiments indicate that SkeletonGait++ outperforms existing state-of-the-art methods by a significant margin in various scenarios. For instance, it achieves an impressive rank-1 accuracy of over 85% on the challenging GREW dataset. All the source code is available at https://github.com/ShiqiYu/OpenGait.
△ Less
Submitted 18 December, 2023; v1 submitted 22 November, 2023;
originally announced November 2023.
-
How AI-driven Digital Twins Can Empower Mobile Networks
Authors:
Tong Li,
Fenyu Jiang,
Qiaohong Yu,
Wenzhen Huang,
Tao Jiang,
Depeng Jin
Abstract:
The growing complexity of next-generation networks exacerbates the modeling and algorithmic flaws of conventional network optimization methodology. In this paper, we propose a mobile network digital twin (MNDT) architecture for 6G networks. To address the modeling and algorithmic shortcomings, the MNDT uses a simulation-optimization structure. The feedback from the network simulation engine, which…
▽ More
The growing complexity of next-generation networks exacerbates the modeling and algorithmic flaws of conventional network optimization methodology. In this paper, we propose a mobile network digital twin (MNDT) architecture for 6G networks. To address the modeling and algorithmic shortcomings, the MNDT uses a simulation-optimization structure. The feedback from the network simulation engine, which serves as validation for the optimizer's decision outcomes, is used explicitly to train artificial intelligence (AI) empowered optimizers iteratively. In practice, we develop a network digital twin prototype system leveraging data-driven technology to accurately model the behaviors of mobile network elements (e.g., mobile users and base stations), wireless environments, and network performance. An AI-powered network optimizer has been developed based on the deployed MNDT prototype system for providing reliable and optimized network configurations. The results of the experiments demonstrate that the proposed MNDT infrastructure can provide practical network optimization solutions while adapting to the more complex environment.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Inverse Learning with Extremely Sparse Feedback for Recommendation
Authors:
Guanyu Lin,
Chen Gao,
Yu Zheng,
Yinfeng Li,
Jianxin Chang,
Yanan Niu,
Yang Song,
Kun Gai,
Zhiheng Li,
Depeng Jin,
Yong Li
Abstract:
Modern personalized recommendation services often rely on user feedback, either explicit or implicit, to improve the quality of services. Explicit feedback refers to behaviors like ratings, while implicit feedback refers to behaviors like user clicks. However, in the scenario of full-screen video viewing experiences like Tiktok and Reels, the click action is absent, resulting in unclear feedback f…
▽ More
Modern personalized recommendation services often rely on user feedback, either explicit or implicit, to improve the quality of services. Explicit feedback refers to behaviors like ratings, while implicit feedback refers to behaviors like user clicks. However, in the scenario of full-screen video viewing experiences like Tiktok and Reels, the click action is absent, resulting in unclear feedback from users, hence introducing noises in modeling training. Existing approaches on de-noising recommendation mainly focus on positive instances while ignoring the noise in a large amount of sampled negative feedback. In this paper, we propose a meta-learning method to annotate the unlabeled data from loss and gradient perspectives, which considers the noises in both positive and negative instances. Specifically, we first propose an Inverse Dual Loss (IDL) to boost the true label learning and prevent the false label learning. Then we further propose an Inverse Gradient (IG) method to explore the correct updating gradient and adjust the updating based on meta-learning. Finally, we conduct extensive experiments on both benchmark and industrial datasets where our proposed method can significantly improve AUC by 9.25% against state-of-the-art methods. Further analysis verifies the proposed inverse learning framework is model-agnostic and can improve a variety of recommendation backbones. The source code, along with the best hyper-parameter settings, is available at this link: https://github.com/Guanyu-Lin/InverseLearning.
△ Less
Submitted 20 November, 2023; v1 submitted 14 November, 2023;
originally announced November 2023.
-
Mixed Attention Network for Cross-domain Sequential Recommendation
Authors:
Guanyu Lin,
Chen Gao,
Yu Zheng,
Jianxin Chang,
Yanan Niu,
Yang Song,
Kun Gai,
Zhiheng Li,
Depeng Jin,
Yong Li,
Meng Wang
Abstract:
In modern recommender systems, sequential recommendation leverages chronological user behaviors to make effective next-item suggestions, which suffers from data sparsity issues, especially for new users. One promising line of work is the cross-domain recommendation, which trains models with data across multiple domains to improve the performance in data-scarce domains. Recent proposed cross-domain…
▽ More
In modern recommender systems, sequential recommendation leverages chronological user behaviors to make effective next-item suggestions, which suffers from data sparsity issues, especially for new users. One promising line of work is the cross-domain recommendation, which trains models with data across multiple domains to improve the performance in data-scarce domains. Recent proposed cross-domain sequential recommendation models such as PiNet and DASL have a common drawback relying heavily on overlapped users in different domains, which limits their usage in practical recommender systems. In this paper, we propose a Mixed Attention Network (MAN) with local and global attention modules to extract the domain-specific and cross-domain information. Firstly, we propose a local/global encoding layer to capture the domain-specific/cross-domain sequential pattern. Then we propose a mixed attention layer with item similarity attention, sequence-fusion attention, and group-prototype attention to capture the local/global item similarity, fuse the local/global item sequence, and extract the user groups across different domains, respectively. Finally, we propose a local/global prediction layer to further evolve and combine the domain-specific and cross-domain interests. Experimental results on two real-world datasets (each with two domains) demonstrate the superiority of our proposed model. Further study also illustrates that our proposed method and components are model-agnostic and effective, respectively. The code and data are available at https://github.com/Guanyu-Lin/MAN.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Optimal vintage factor analysis with deflation varimax
Authors:
Xin Bing,
Dian Jin,
Yuqian Zhang
Abstract:
Vintage factor analysis is one important type of factor analysis that aims to first find a low-dimensional representation of the original data, and then to seek a rotation such that the rotated low-dimensional representation is scientifically meaningful. The most widely used vintage factor analysis is the Principal Component Analysis (PCA) followed by the varimax rotation. Despite its popularity,…
▽ More
Vintage factor analysis is one important type of factor analysis that aims to first find a low-dimensional representation of the original data, and then to seek a rotation such that the rotated low-dimensional representation is scientifically meaningful. The most widely used vintage factor analysis is the Principal Component Analysis (PCA) followed by the varimax rotation. Despite its popularity, little theoretical guarantee can be provided to date mainly because varimax rotation requires to solve a non-convex optimization over the set of orthogonal matrices.
In this paper, we propose a deflation varimax procedure that solves each row of an orthogonal matrix sequentially. In addition to its net computational gain and flexibility, we are able to fully establish theoretical guarantees for the proposed procedure in a broader context. Adopting this new deflation varimax as the second step after PCA, we further analyze this two step procedure under a general class of factor models. Our results show that it estimates the factor loading matrix in the minimax optimal rate when the signal-to-noise-ratio (SNR) is moderate or large. In the low SNR regime, we offer possible improvement over using PCA and the deflation varimax when the additive noise under the factor model is structured. The modified procedure is shown to be minimax optimal in all SNR regimes. Our theory is valid for finite sample and allows the number of the latent factors to grow with the sample size as well as the ambient dimension to grow with, or even exceed, the sample size. Extensive simulation and real data analysis further corroborate our theoretical findings.
△ Less
Submitted 24 September, 2024; v1 submitted 16 October, 2023;
originally announced October 2023.
-
Stance Detection with Collaborative Role-Infused LLM-Based Agents
Authors:
Xiaochong Lan,
Chen Gao,
Depeng Jin,
Yong Li
Abstract:
Stance detection automatically detects the stance in a text towards a target, vital for content analysis in web and social media research. Despite their promising capabilities, LLMs encounter challenges when directly applied to stance detection. First, stance detection demands multi-aspect knowledge, from deciphering event-related terminologies to understanding the expression styles in social medi…
▽ More
Stance detection automatically detects the stance in a text towards a target, vital for content analysis in web and social media research. Despite their promising capabilities, LLMs encounter challenges when directly applied to stance detection. First, stance detection demands multi-aspect knowledge, from deciphering event-related terminologies to understanding the expression styles in social media platforms. Second, stance detection requires advanced reasoning to infer authors' implicit viewpoints, as stance are often subtly embedded rather than overtly stated in the text. To address these challenges, we design a three-stage framework COLA (short for Collaborative rOle-infused LLM-based Agents) in which LLMs are designated distinct roles, creating a collaborative system where each role contributes uniquely. Initially, in the multidimensional text analysis stage, we configure the LLMs to act as a linguistic expert, a domain specialist, and a social media veteran to get a multifaceted analysis of texts, thus overcoming the first challenge. Next, in the reasoning-enhanced debating stage, for each potential stance, we designate a specific LLM-based agent to advocate for it, guiding the LLM to detect logical connections between text features and stance, tackling the second challenge. Finally, in the stance conclusion stage, a final decision maker agent consolidates prior insights to determine the stance. Our approach avoids extra annotated data and model training and is highly usable. We achieve state-of-the-art performance across multiple datasets. Ablation studies validate the effectiveness of each design role in handling stance detection. Further experiments have demonstrated the explainability and the versatility of our approach. Our approach excels in usability, accuracy, effectiveness, explainability and versatility, highlighting its value.
△ Less
Submitted 16 April, 2024; v1 submitted 16 October, 2023;
originally announced October 2023.
-
Collaborative Distributed Machine Learning
Authors:
David Jin,
Niclas Kannengießer,
Sascha Rank,
Ali Sunyaev
Abstract:
Various collaborative distributed machine learning (CDML) systems, including federated learning systems and swarm learning systems, with different key traits were developed to leverage resources for development and use of machine learning (ML) models in a confidentiality-preserving way. To meet use case requirements, suitable CDML systems need to be selected. However, comparison between CDML syste…
▽ More
Various collaborative distributed machine learning (CDML) systems, including federated learning systems and swarm learning systems, with different key traits were developed to leverage resources for development and use of machine learning (ML) models in a confidentiality-preserving way. To meet use case requirements, suitable CDML systems need to be selected. However, comparison between CDML systems regarding their suitability for use cases is often difficult. This work presents a CDML system conceptualization and CDML archetypes to support comparison of CDML systems and introduce scientific and practical audiences to the principal functioning and key traits of CDML systems.
△ Less
Submitted 21 March, 2024; v1 submitted 28 September, 2023;
originally announced September 2023.