subscribe to arXiv mailings

A Ducted Fan UAV for Safe Aerial Grabbing and Transfer of Multiple Loads Using Electromagnets

Abstract: In recent years, research on aerial grasping, manipulation, and transportation of objects has garnered significant attention. These tasks often require UAVs to operate safely close to environments or objects and to efficiently grasp payloads. However, current widely adopted flying platforms pose safety hazards: unprotected high-speed rotating propellers can cause harm to the surroundings. Addition… ▽ More In recent years, research on aerial grasping, manipulation, and transportation of objects has garnered significant attention. These tasks often require UAVs to operate safely close to environments or objects and to efficiently grasp payloads. However, current widely adopted flying platforms pose safety hazards: unprotected high-speed rotating propellers can cause harm to the surroundings. Additionally, the space for carrying payloads on the fuselage is limited, and the restricted position of the payload also hinders efficient grasping. To address these issues, this paper presents a coaxial ducted fan UAV which is equipped with electromagnets mounted externally on the fuselage, enabling safe grasping and transfer of multiple loads in midair without complex additional actuators. It also has the capability to achieve direct human-UAV cargo transfer in the air. The forces acting on the loads during magnetic attachment and their influencing factors were analyzed. An ADRC controller is utilized to counteract disturbances during grasping and achieve attitude control. Finally, flight tests are conducted to verify the UAV's ability to directly grasp multiple loads from human hands in flight while maintaining attitude tracking. △ Less

Submitted 24 September, 2024; originally announced September 2024.

Comments: 8pages, 13figures,accepted by IROS2024 This work has been submitted to the IEEE for possible publication

arXiv:2409.03155 [pdf, other]

Debate on Graph: a Flexible and Reliable Reasoning Framework for Large Language Models

Authors: Jie Ma, Zhitao Gao, Qi Chai, Wangchun Sun, Pinghui Wang, Hongbin Pei, Jing Tao, Lingyun Song, Jun Liu, Chen Zhang, Lizhen Cui

Abstract: Large Language Models (LLMs) may suffer from hallucinations in real-world applications due to the lack of relevant knowledge. In contrast, knowledge graphs encompass extensive, multi-relational structures that store a vast array of symbolic facts. Consequently, integrating LLMs with knowledge graphs has been extensively explored, with Knowledge Graph Question Answering (KGQA) serving as a critical… ▽ More Large Language Models (LLMs) may suffer from hallucinations in real-world applications due to the lack of relevant knowledge. In contrast, knowledge graphs encompass extensive, multi-relational structures that store a vast array of symbolic facts. Consequently, integrating LLMs with knowledge graphs has been extensively explored, with Knowledge Graph Question Answering (KGQA) serving as a critical touchstone for the integration. This task requires LLMs to answer natural language questions by retrieving relevant triples from knowledge graphs. However, existing methods face two significant challenges: \textit{excessively long reasoning paths distracting from the answer generation}, and \textit{false-positive relations hindering the path refinement}. In this paper, we propose an iterative interactive KGQA framework that leverages the interactive learning capabilities of LLMs to perform reasoning and Debating over Graphs (DoG). Specifically, DoG employs a subgraph-focusing mechanism, allowing LLMs to perform answer trying after each reasoning step, thereby mitigating the impact of lengthy reasoning paths. On the other hand, DoG utilizes a multi-role debate team to gradually simplify complex questions, reducing the influence of false-positive relations. This debate mechanism ensures the reliability of the reasoning process. Experimental results on five public datasets demonstrate the effectiveness and superiority of our architecture. Notably, DoG outperforms the state-of-the-art method ToG by 23.7\% and 9.1\% in accuracy on WebQuestions and GrailQA, respectively. Furthermore, the integration experiments with various LLMs on the mentioned datasets highlight the flexibility of DoG. Code is available at \url{https://github.com/reml-group/DoG}. △ Less

Submitted 4 September, 2024; originally announced September 2024.

Comments: 12 pages

ACM Class: I.2.4

arXiv:2404.12020 [pdf, other]

Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering

Authors: Jie Ma, Min Hu, Pinghui Wang, Wangchun Sun, Lingyun Song, Hongbin Pei, Jun Liu, Youtian Du

Abstract: Audio-Visual Question Answering (AVQA) is a complex multi-modal reasoning task, demanding intelligent systems to accurately respond to natural language queries based on audio-video input pairs. Nevertheless, prevalent AVQA approaches are prone to overlearning dataset biases, resulting in poor robustness. Furthermore, current datasets may not provide a precise diagnostic for these methods. To tackl… ▽ More Audio-Visual Question Answering (AVQA) is a complex multi-modal reasoning task, demanding intelligent systems to accurately respond to natural language queries based on audio-video input pairs. Nevertheless, prevalent AVQA approaches are prone to overlearning dataset biases, resulting in poor robustness. Furthermore, current datasets may not provide a precise diagnostic for these methods. To tackle these challenges, firstly, we propose a novel dataset, MUSIC-AVQA-R, crafted in two steps: rephrasing questions within the test split of a public dataset (MUSIC-AVQA) and subsequently introducing distribution shifts to split questions. The former leads to a large, diverse test space, while the latter results in a comprehensive robustness evaluation on rare, frequent, and overall questions. Secondly, we propose a robust architecture that utilizes a multifaceted cycle collaborative debiasing strategy to overcome bias learning. Experimental results show that this architecture achieves state-of-the-art performance on MUSIC-AVQA-R, notably obtaining a significant improvement of 9.32%. Extensive ablation experiments are conducted on the two datasets mentioned to analyze the component effectiveness within the debiasing strategy. Additionally, we highlight the limited robustness of existing multi-modal QA methods through the evaluation on our dataset. We also conduct experiments combining various baselines with our proposed strategy on two datasets to verify its plug-and-play capability. Our dataset and code are available at https://github.com/reml-group/MUSIC-AVQA-R. △ Less

Submitted 21 October, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

Comments: Accepted by NeurIPS 2024

ACM Class: I.2.10

arXiv:2402.01253 [pdf, other]

RimiRec: Modeling Refined Multi-interest in Hierarchical Structure for Recommendation

Authors: Haolei Pei, Yuanyuan Xu, Yangping Zhu, Yuan Nie

Abstract: Industrial recommender systems usually consist of the retrieval stage and the ranking stage, to handle the billion-scale of users and items. The retrieval stage retrieves candidate items relevant to user interests for recommendations and has attracted much attention. Frequently, a user shows refined multi-interests in a hierarchical structure. For example, a user likes Conan and Kuroba Kaito, whic… ▽ More Industrial recommender systems usually consist of the retrieval stage and the ranking stage, to handle the billion-scale of users and items. The retrieval stage retrieves candidate items relevant to user interests for recommendations and has attracted much attention. Frequently, a user shows refined multi-interests in a hierarchical structure. For example, a user likes Conan and Kuroba Kaito, which are the roles in hierarchical structure "Animation, Japanese Animation, Detective Conan". However, most existing methods ignore this hierarchical nature, and simply average the fine-grained interest information. Therefore, we propose a novel two-stage approach to explicitly modeling refined multi-interest in a hierarchical structure for recommendation. In the first hierarchical multi-interest mining stage, the hierarchical clustering and transformer-based model adaptively generate circles or sub-circles that users are interested in. In the second stage, the partition of retrieval space allows the EBR models to deal only with items within each circle and accurately capture users' refined interests. Experimental results show that the proposed approach achieves state-of-the-art performance. Our framework has also been deployed at Lofter. △ Less

Submitted 5 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: 4 pages, 4 figures

arXiv:2311.16522 [pdf]

Dynamic Fault Characteristics Evaluation in Power Grid

Authors: Hao Pei, Si Lin, Chuanfu Li, Che Wang, Haoming Chen, Sizhe Li

Abstract: To enhance the intelligence degree in operation and maintenance, a novel method for fault detection in power grids is proposed. The proposed GNN-based approach first identifies fault nodes through a specialized feature extraction method coupled with a knowledge graph. By incorporating temporal data, the method leverages the status of nodes from preceding and subsequent time periods to help current… ▽ More To enhance the intelligence degree in operation and maintenance, a novel method for fault detection in power grids is proposed. The proposed GNN-based approach first identifies fault nodes through a specialized feature extraction method coupled with a knowledge graph. By incorporating temporal data, the method leverages the status of nodes from preceding and subsequent time periods to help current fault detection. To validate the effectiveness of the node features, a correlation analysis of the output features from each node was conducted. The results from experiments show that this method can accurately locate fault nodes in simulation scenarios with a remarkable accuracy. Additionally, the graph neural network based feature modeling allows for a qualitative examination of how faults spread across nodes, which provides valuable insights for analyzing fault nodes. △ Less

Submitted 27 January, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.11225 [pdf, other]

TextGuard: Provable Defense against Backdoor Attacks on Text Classification

Authors: Hengzhi Pei, Jinyuan Jia, Wenbo Guo, Bo Li, Dawn Song

Abstract: Backdoor attacks have become a major security threat for deploying machine learning models in security-critical applications. Existing research endeavors have proposed many defenses against backdoor attacks. Despite demonstrating certain empirical defense efficacy, none of these techniques could provide a formal and provable security guarantee against arbitrary attacks. As a result, they can be ea… ▽ More Backdoor attacks have become a major security threat for deploying machine learning models in security-critical applications. Existing research endeavors have proposed many defenses against backdoor attacks. Despite demonstrating certain empirical defense efficacy, none of these techniques could provide a formal and provable security guarantee against arbitrary attacks. As a result, they can be easily broken by strong adaptive attacks, as shown in our evaluation. In this work, we propose TextGuard, the first provable defense against backdoor attacks on text classification. In particular, TextGuard first divides the (backdoored) training data into sub-training sets, achieved by splitting each training sentence into sub-sentences. This partitioning ensures that a majority of the sub-training sets do not contain the backdoor trigger. Subsequently, a base classifier is trained from each sub-training set, and their ensemble provides the final prediction. We theoretically prove that when the length of the backdoor trigger falls within a certain threshold, TextGuard guarantees that its prediction will remain unaffected by the presence of the triggers in training and testing inputs. In our evaluation, we demonstrate the effectiveness of TextGuard on three benchmark text classification tasks, surpassing the certification accuracy of existing certified defenses against backdoor attacks. Furthermore, we propose additional strategies to enhance the empirical performance of TextGuard. Comparisons with state-of-the-art empirical defenses validate the superiority of TextGuard in countering multiple backdoor attacks. Our code and data are available at https://github.com/AI-secure/TextGuard. △ Less

Submitted 24 November, 2023; v1 submitted 18 November, 2023; originally announced November 2023.

Comments: Accepted by NDSS Symposium 2024

arXiv:2308.07723 [pdf, other]

Extended Preintegration for Relative State Estimation of Leader-Follower Platform

Authors: Ruican Xia, Hailong Pei

Abstract: Relative state estimation using exteroceptive sensors suffers from limitations of the field of view (FOV) and false detection, that the proprioceptive sensor (IMU) data are usually engaged to compensate. Recently ego-motion constraint obtained by Inertial measurement unit (IMU) preintegration has been extensively used in simultaneous localization and mapping (SLAM) to alleviate the computation bur… ▽ More Relative state estimation using exteroceptive sensors suffers from limitations of the field of view (FOV) and false detection, that the proprioceptive sensor (IMU) data are usually engaged to compensate. Recently ego-motion constraint obtained by Inertial measurement unit (IMU) preintegration has been extensively used in simultaneous localization and mapping (SLAM) to alleviate the computation burden. This paper introduces an extended preintegration incorporating the IMU preintegration of two platforms to formulate the motion constraint of relative state. One merit of this analytic constraint is that it can be seamlessly integrated into the unified graph optimization framework to implement the relative state estimation in a high-performance real-time tracking thread, another point is a full smoother design with this precise constraint to optimize the 3D coordinate and refine the state for the refinement thread. We compare extensively in simulations the proposed algorithms with two existing approaches to confirm our outperformance. In the real virtual reality (VR) application design with the proposed estimator, we properly realize the visual tracking of the six degrees of freedom (6DoF) controller suitable for almost all scenarios, including the challenging environment with missing features, light mutation, dynamic scenes, etc. The demo video is at https://www.youtube.com/watch?v=0idb9Ls2iAM. For the benefit of the community, we make the source code public. △ Less

Submitted 15 August, 2023; originally announced August 2023.

arXiv:2307.11471 [pdf, other]

Robust Visual Question Answering: Datasets, Methods, and Future Challenges

Authors: Jie Ma, Pinghui Wang, Dechen Kong, Zewei Wang, Jun Liu, Hongbin Pei, Junzhou Zhao

Abstract: Visual question answering requires a system to provide an accurate natural language answer given an image and a natural language question. However, it is widely recognized that previous generic VQA methods often exhibit a tendency to memorize biases present in the training data rather than learning proper behaviors, such as grounding images before predicting answers. Therefore, these methods usual… ▽ More Visual question answering requires a system to provide an accurate natural language answer given an image and a natural language question. However, it is widely recognized that previous generic VQA methods often exhibit a tendency to memorize biases present in the training data rather than learning proper behaviors, such as grounding images before predicting answers. Therefore, these methods usually achieve high in-distribution but poor out-of-distribution performance. In recent years, various datasets and debiasing methods have been proposed to evaluate and enhance the VQA robustness, respectively. This paper provides the first comprehensive survey focused on this emerging fashion. Specifically, we first provide an overview of the development process of datasets from in-distribution and out-of-distribution perspectives. Then, we examine the evaluation metrics employed by these datasets. Thirdly, we propose a typology that presents the development process, similarities and differences, robustness comparison, and technical features of existing debiasing methods. Furthermore, we analyze and discuss the robustness of representative vision-and-language pre-training models on VQA. Finally, through a thorough review of the available literature and experimental analysis, we discuss the key areas for future research from various viewpoints. △ Less

Submitted 18 February, 2024; v1 submitted 21 July, 2023; originally announced July 2023.

Comments: Accepted by IEEE TPAMI

ACM Class: I.2.10

arXiv:2306.11698 [pdf, other]

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models

Authors: Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer, Sang T. Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, Bo Li

Abstract: Generative Pre-trained Transformer (GPT) models have exhibited exciting progress in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the literature on the trustworthiness of GPT models remains limited, practitioners have proposed employing capable GPT models for sensitive applications such as healthcare and finance -- where mistakes can be costly. To thi… ▽ More Generative Pre-trained Transformer (GPT) models have exhibited exciting progress in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the literature on the trustworthiness of GPT models remains limited, practitioners have proposed employing capable GPT models for sensitive applications such as healthcare and finance -- where mistakes can be costly. To this end, this work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5, considering diverse perspectives -- including toxicity, stereotype bias, adversarial robustness, out-of-distribution robustness, robustness on adversarial demonstrations, privacy, machine ethics, and fairness. Based on our evaluations, we discover previously unpublished vulnerabilities to trustworthiness threats. For instance, we find that GPT models can be easily misled to generate toxic and biased outputs and leak private information in both training data and conversation history. We also find that although GPT-4 is usually more trustworthy than GPT-3.5 on standard benchmarks, GPT-4 is more vulnerable given jailbreaking system or user prompts, potentially because GPT-4 follows (misleading) instructions more precisely. Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps. Our benchmark is publicly available at https://decodingtrust.github.io/ ; our dataset can be previewed at https://huggingface.co/datasets/AI-Secure/DecodingTrust ; a concise version of this work is at https://openreview.net/pdf?id=kaHpo8OZw2 . △ Less

Submitted 26 February, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

Comments: NeurIPS 2023 Outstanding Paper (Datasets and Benchmarks Track)

arXiv:2306.00381 [pdf, other]

Better Context Makes Better Code Language Models: A Case Study on Function Call Argument Completion

Authors: Hengzhi Pei, Jinman Zhao, Leonard Lausen, Sheng Zha, George Karypis

Abstract: Pretrained code language models have enabled great progress towards program synthesis. However, common approaches only consider in-file local context and thus miss information and constraints imposed by other parts of the codebase and its external dependencies. Existing code completion benchmarks also lack such context. To resolve these restrictions we curate a new dataset of permissively licensed… ▽ More Pretrained code language models have enabled great progress towards program synthesis. However, common approaches only consider in-file local context and thus miss information and constraints imposed by other parts of the codebase and its external dependencies. Existing code completion benchmarks also lack such context. To resolve these restrictions we curate a new dataset of permissively licensed Python packages that includes full projects and their dependencies and provide tools to extract non-local information with the help of program analyzers. We then focus on the task of function call argument completion which requires predicting the arguments to function calls. We show that existing code completion models do not yield good results on our completion task. To better solve this task, we query a program analyzer for information relevant to a given function call, and consider ways to provide the analyzer results to different code completion models during inference and training. Our experiments show that providing access to the function implementation and function usages greatly improves the argument completion performance. Our ablation study provides further insights on how different types of information available from the program analyzer and different ways of incorporating the information affect the model performance. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: 12 pages. Accepted to AAAI 2023

ACM Class: I.2.2; I.2.7

arXiv:2304.12205 [pdf, other]

doi 10.1109/TIV.2023.3331024

Synthetic Datasets for Autonomous Driving: A Survey

Authors: Zhihang Song, Zimin He, Xingyu Li, Qiming Ma, Ruibo Ming, Zhiqi Mao, Huaxin Pei, Lihui Peng, Jianming Hu, Danya Yao, Yi Zhang

Abstract: Autonomous driving techniques have been flourishing in recent years while thirsting for huge amounts of high-quality data. However, it is difficult for real-world datasets to keep up with the pace of changing requirements due to their expensive and time-consuming experimental and labeling costs. Therefore, more and more researchers are turning to synthetic datasets to easily generate rich and chan… ▽ More Autonomous driving techniques have been flourishing in recent years while thirsting for huge amounts of high-quality data. However, it is difficult for real-world datasets to keep up with the pace of changing requirements due to their expensive and time-consuming experimental and labeling costs. Therefore, more and more researchers are turning to synthetic datasets to easily generate rich and changeable data as an effective complement to the real world and to improve the performance of algorithms. In this paper, we summarize the evolution of synthetic dataset generation methods and review the work to date in synthetic datasets related to single and multi-task categories for to autonomous driving study. We also discuss the role that synthetic dataset plays the evaluation, gap test, and positive effect in autonomous driving related algorithm testing, especially on trustworthiness and safety aspects. Finally, we discuss general trends and possible development directions. To the best of our knowledge, this is the first survey focusing on the application of synthetic datasets in autonomous driving. This survey also raises awareness of the problems of real-world deployment of autonomous driving technology and provides researchers with a possible solution. △ Less

Submitted 27 February, 2024; v1 submitted 24 April, 2023; originally announced April 2023.

Comments: 19 pages, 5 figures

Journal ref: in IEEE Transactions on Intelligent Vehicles, vol. 9, no. 1, pp. 1847-1864, Jan. 2024

arXiv:2304.06966 [pdf]

Self-Supervised Learning based Depth Estimation from Monocular Images

Authors: Mayank Poddar, Akash Mishra, Mohit Kewlani, Haoyang Pei

Abstract: Depth Estimation has wide reaching applications in the field of Computer vision such as target tracking, augmented reality, and self-driving cars. The goal of Monocular Depth Estimation is to predict the depth map, given a 2D monocular RGB image as input. The traditional depth estimation methods are based on depth cues and used concepts like epipolar geometry. With the evolution of Convolutional N… ▽ More Depth Estimation has wide reaching applications in the field of Computer vision such as target tracking, augmented reality, and self-driving cars. The goal of Monocular Depth Estimation is to predict the depth map, given a 2D monocular RGB image as input. The traditional depth estimation methods are based on depth cues and used concepts like epipolar geometry. With the evolution of Convolutional Neural Networks, depth estimation has undergone tremendous strides. In this project, our aim is to explore possible extensions to existing SoTA Deep Learning based Depth Estimation Models and to see whether performance metrics could be further improved. In a broader sense, we are looking at the possibility of implementing Pose Estimation, Efficient Sub-Pixel Convolution Interpolation, Semantic Segmentation Estimation techniques to further enhance our proposed architecture and to provide fine-grained and more globally coherent depth map predictions. We also plan to do away with camera intrinsic parameters during training and apply weather augmentations to further generalize our model. △ Less

Submitted 14 April, 2023; originally announced April 2023.

arXiv:2211.03252 [pdf, other]

Zero-Shot Classification by Logical Reasoning on Natural Language Explanations

Authors: Chi Han, Hengzhi Pei, Xinya Du, Heng Ji

Abstract: Humans can classify data of an unseen category by reasoning on its language explanations. This ability is owing to the compositional nature of language: we can combine previously seen attributes to describe the new category. For example, we might describe a sage thrasher as "it has a slim straight relatively short bill, yellow eyes and a long tail", so that others can use their knowledge of attrib… ▽ More Humans can classify data of an unseen category by reasoning on its language explanations. This ability is owing to the compositional nature of language: we can combine previously seen attributes to describe the new category. For example, we might describe a sage thrasher as "it has a slim straight relatively short bill, yellow eyes and a long tail", so that others can use their knowledge of attributes "slim straight relatively short bill", "yellow eyes" and "long tail" to recognize a sage thrasher. Inspired by this observation, in this work we tackle zero-shot classification task by logically parsing and reasoning on natural language expla-nations. To this end, we propose the framework CLORE (Classification by LOgical Reasoning on Explanations). While previous methods usually regard textual information as implicit features, CLORE parses explanations into logical structures and then explicitly reasons along thess structures on the input to produce a classification score. Experimental results on explanation-based zero-shot classification benchmarks demonstrate that CLORE is superior to baselines, which we further show mainly comes from higher scores on tasks requiring more logical reasoning. We also demonstrate that our framework can be extended to zero-shot classification on visual modality. Alongside classification decisions, CLORE can provide the logical parsing and reasoning process as a clear form of rationale. Through empirical analysis we demonstrate that CLORE is also less affected by linguistic biases than baselines. △ Less

Submitted 25 May, 2023; v1 submitted 6 November, 2022; originally announced November 2022.

Comments: 9 pages, 8 figures. Accepted in the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023) Findings

arXiv:2209.08237 [pdf, other]

Understanding the Impact of Image Quality and Distance of Objects to Object Detection Performance

Authors: Yu Hao, Haoyang Pei, Yixuan Lyu, Zhongzheng Yuan, John-Ross Rizzo, Yao Wang, Yi Fang

Abstract: Deep learning has made great strides for object detection in images. The detection accuracy and computational cost of object detection depend on the spatial resolution of an image, which may be constrained by both the camera and storage considerations. Compression is often achieved by reducing either spatial or amplitude resolution or, at times, both, both of which have well-known effects on perfo… ▽ More Deep learning has made great strides for object detection in images. The detection accuracy and computational cost of object detection depend on the spatial resolution of an image, which may be constrained by both the camera and storage considerations. Compression is often achieved by reducing either spatial or amplitude resolution or, at times, both, both of which have well-known effects on performance. Detection accuracy also depends on the distance of the object of interest from the camera. Our work examines the impact of spatial and amplitude resolution, as well as object distance, on object detection accuracy and computational cost. We develop a resolution-adaptive variant of YOLOv5 (RA-YOLO), which varies the number of scales in the feature pyramid and detection head based on the spatial resolution of the input image. To train and evaluate this new method, we created a dataset of images with diverse spatial and amplitude resolutions by combining images from the TJU and Eurocity datasets and generating different resolutions by applying spatial resizing and compression. We first show that RA-YOLO achieves a good trade-off between detection accuracy and inference time over a large range of spatial resolutions. We then evaluate the impact of spatial and amplitude resolutions on object detection accuracy using the proposed RA-YOLO model. We demonstrate that the optimal spatial resolution that leads to the highest detection accuracy depends on the 'tolerated' image size. We further assess the impact of the distance of an object to the camera on the detection accuracy and show that higher spatial resolution enables a greater detection range. These results provide important guidelines for choosing the image spatial resolution and compression settings predicated on available bandwidth, storage, desired inference time, and/or desired detection range, in practical applications. △ Less

Submitted 17 September, 2022; originally announced September 2022.

arXiv:2206.12840 [pdf, other]

Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One

Authors: Yezhen Wang, Tong Che, Bo Li, Kaitao Song, Hengzhi Pei, Yoshua Bengio, Dongsheng Li

Abstract: Autoregressive generative models are commonly used, especially for those tasks involving sequential data. They have, however, been plagued by a slew of inherent flaws due to the intrinsic characteristics of chain-style conditional modeling (e.g., exposure bias or lack of long-range coherence), severely limiting their ability to model distributions properly. In this paper, we propose a unique metho… ▽ More Autoregressive generative models are commonly used, especially for those tasks involving sequential data. They have, however, been plagued by a slew of inherent flaws due to the intrinsic characteristics of chain-style conditional modeling (e.g., exposure bias or lack of long-range coherence), severely limiting their ability to model distributions properly. In this paper, we propose a unique method termed E-ARM for training autoregressive generative models that takes advantage of a well-designed energy-based learning objective. By leveraging the extra degree of freedom of the softmax operation, we are allowed to make the autoregressive model itself be an energy-based model for measuring the likelihood of input without introducing any extra parameters. Furthermore, we show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem and increase temporal coherence for autoregressive generative models. Extensive empirical results, covering benchmarks like language modeling, neural machine translation, and image generation, demonstrate the effectiveness of the proposed approach. △ Less

Submitted 26 June, 2022; originally announced June 2022.

Comments: Preprint version

arXiv:2203.02677 [pdf]

Tightly Coupled Optimization-based GPS-Visual-Inertial Odometry with Online Calibration and Initialization

Authors: Shihao Han, Feiyang Deng, Tao Li, Hailong Pei

Abstract: In this paper, we present a tightly coupled optimization-based GPS-Visual-Inertial odometry system to solve the trajectory drift of the visual-inertial odometry especially over long-term runs. Visual reprojection residuals, IMU residuals, and GPS measurement residuals are jointly minimized within a local bundle adjustment, in which we apply GPS measurements and IMU preintegration used for the IMU… ▽ More In this paper, we present a tightly coupled optimization-based GPS-Visual-Inertial odometry system to solve the trajectory drift of the visual-inertial odometry especially over long-term runs. Visual reprojection residuals, IMU residuals, and GPS measurement residuals are jointly minimized within a local bundle adjustment, in which we apply GPS measurements and IMU preintegration used for the IMU residuals to formulate a novel GPS residual. To improve the efficiency and robustness of the system, we propose a fast reference frames initialization method and an online calibration method for GPS-IMU extrinsic and time offset. In addition, we further test the performance and convergence of our online calibration method. Experimental results on EuRoC datasets show that our method consistently outperforms other tightly coupled and loosely coupled approaches. Meanwhile, this system has been validated on KAIST datasets, which proves that our system can work well in the case of visual or GPS failure. △ Less

Submitted 5 March, 2022; originally announced March 2022.

Comments: 7 pages, 10 figures

arXiv:2112.13194 [pdf, other]

doi 10.1109/ACCESS.2022.3157876

Network-Aware 5G Edge Computing for Object Detection: Augmenting Wearables to "See" More, Farther and Faster

Authors: Zhongzheng Yuan, Tommy Azzino, Yu Hao, Yixuan Lyu, Haoyang Pei, Alain Boldini, Marco Mezzavilla, Mahya Beheshti, Maurizio Porfiri, Todd Hudson, William Seiple, Yi Fang, Sundeep Rangan, Yao Wang, J. R. Rizzo

Abstract: Advanced wearable devices are increasingly incorporating high-resolution multi-camera systems. As state-of-the-art neural networks for processing the resulting image data are computationally demanding, there has been growing interest in leveraging fifth generation (5G) wireless connectivity and mobile edge computing for offloading this processing to the cloud. To assess this possibility, this pape… ▽ More Advanced wearable devices are increasingly incorporating high-resolution multi-camera systems. As state-of-the-art neural networks for processing the resulting image data are computationally demanding, there has been growing interest in leveraging fifth generation (5G) wireless connectivity and mobile edge computing for offloading this processing to the cloud. To assess this possibility, this paper presents a detailed simulation and evaluation of 5G wireless offloading for object detection within a powerful, new smart wearable called VIS4ION, for the Blind-and-Visually Impaired (BVI). The current VIS4ION system is an instrumented book-bag with high-resolution cameras, vision processing and haptic and audio feedback. The paper considers uploading the camera data to a mobile edge cloud to perform real-time object detection and transmitting the detection results back to the wearable. To determine the video requirements, the paper evaluates the impact of video bit rate and resolution on object detection accuracy and range. A new street scene dataset with labeled objects relevant to BVI navigation is leveraged for analysis. The vision evaluation is combined with a detailed full-stack wireless network simulation to determine the distribution of throughputs and delays with real navigation paths and ray-tracing from new high-resolution 3D models in an urban environment. For comparison, the wireless simulation considers both a standard 4G-Long Term Evolution (LTE) carrier and high-rate 5G millimeter-wave (mmWave) carrier. The work thus provides a thorough and realistic assessment of edge computing with mmWave connectivity in an application with both high bandwidth and low latency requirements. △ Less

Submitted 15 April, 2022; v1 submitted 25 December, 2021; originally announced December 2021.

Comments: Published in: IEEE Access ( Volume: 10)

arXiv:2111.08386 [pdf, other]

Towards Generating Real-World Time Series Data

Authors: Hengzhi Pei, Kan Ren, Yuqing Yang, Chang Liu, Tao Qin, Dongsheng Li

Abstract: Time series data generation has drawn increasing attention in recent years. Several generative adversarial network (GAN) based methods have been proposed to tackle the problem usually with the assumption that the targeted time series data are well-formatted and complete. However, real-world time series (RTS) data are far away from this utopia, e.g., long sequences with variable lengths and informa… ▽ More Time series data generation has drawn increasing attention in recent years. Several generative adversarial network (GAN) based methods have been proposed to tackle the problem usually with the assumption that the targeted time series data are well-formatted and complete. However, real-world time series (RTS) data are far away from this utopia, e.g., long sequences with variable lengths and informative missing data raise intractable challenges for designing powerful generation algorithms. In this paper, we propose a novel generative framework for RTS data - RTSGAN to tackle the aforementioned challenges. RTSGAN first learns an encoder-decoder module which provides a mapping between a time series instance and a fixed-dimension latent vector and then learns a generation module to generate vectors in the same latent space. By combining the generator and the decoder, RTSGAN is able to generate RTS which respect the original feature distributions and the temporal dynamics. To generate time series with missing values, we further equip RTSGAN with an observation embedding layer and a decide-and-generate decoder to better utilize the informative missing patterns. Experiments on the four RTS datasets show that the proposed framework outperforms the previous generation methods in terms of synthetic data utility for downstream classification and prediction tasks. △ Less

Submitted 16 November, 2021; originally announced November 2021.

Comments: Accepted in 21th IEEE International Conference on Data Mining (ICDM 2021). Code is available at https://seqml.github.io/rtsgan

arXiv:2011.14211 [pdf, other]

Curvature Regularization to Prevent Distortion in Graph Embedding

Authors: Hongbin Pei, Bingzhe Wei, Kevin Chen-Chuan Chang, Chunxu Zhang, Bo Yang

Abstract: Recent research on graph embedding has achieved success in various applications. Most graph embedding methods preserve the proximity in a graph into a manifold in an embedding space. We argue an important but neglected problem about this proximity-preserving strategy: Graph topology patterns, while preserved well into an embedding manifold by preserving proximity, may distort in the ambient embedd… ▽ More Recent research on graph embedding has achieved success in various applications. Most graph embedding methods preserve the proximity in a graph into a manifold in an embedding space. We argue an important but neglected problem about this proximity-preserving strategy: Graph topology patterns, while preserved well into an embedding manifold by preserving proximity, may distort in the ambient embedding Euclidean space, and hence to detect them becomes difficult for machine learning models. To address the problem, we propose curvature regularization, to enforce flatness for embedding manifolds, thereby preventing the distortion. We present a novel angle-based sectional curvature, termed ABS curvature, and accordingly three kinds of curvature regularization to induce flat embedding manifolds during graph embedding. We integrate curvature regularization into five popular proximity-preserving embedding methods, and empirical results in two applications show significant improvements on a wide range of open graph datasets. △ Less

Submitted 28 November, 2020; originally announced November 2020.

Comments: Published as a conference paper at NeurIPS 2020

arXiv:2003.00120 [pdf, other]

Improving Certified Robustness via Statistical Learning with Logical Reasoning

Authors: Zhuolin Yang, Zhikuan Zhao, Boxin Wang, Jiawei Zhang, Linyi Li, Hengzhi Pei, Bojan Karlas, Ji Liu, Heng Guo, Ce Zhang, Bo Li

Abstract: Intensive algorithmic efforts have been made to enable the rapid improvements of certificated robustness for complex ML models recently. However, current robustness certification methods are only able to certify under a limited perturbation radius. Given that existing pure data-driven statistical approaches have reached a bottleneck, in this paper, we propose to integrate statistical ML models wit… ▽ More Intensive algorithmic efforts have been made to enable the rapid improvements of certificated robustness for complex ML models recently. However, current robustness certification methods are only able to certify under a limited perturbation radius. Given that existing pure data-driven statistical approaches have reached a bottleneck, in this paper, we propose to integrate statistical ML models with knowledge (expressed as logical rules) as a reasoning component using Markov logic networks (MLN, so as to further improve the overall certified robustness. This opens new research questions about certifying the robustness of such a paradigm, especially the reasoning component (e.g., MLN). As the first step towards understanding these questions, we first prove that the computational complexity of certifying the robustness of MLN is #P-hard. Guided by this hardness result, we then derive the first certified robustness bound for MLN by carefully analyzing different model regimes. Finally, we conduct extensive experiments on five datasets including both high-dimensional images and natural language texts, and we show that the certified robustness with knowledge-based logical reasoning indeed significantly outperforms that of the state-of-the-arts. △ Less

Submitted 12 April, 2023; v1 submitted 28 February, 2020; originally announced March 2020.

Comments: Accepted by 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

arXiv:2002.05780 [pdf, other]

Reinforcement-Learning based Portfolio Management with Augmented Asset Movement Prediction States

Authors: Yunan Ye, Hengzhi Pei, Boxin Wang, Pin-Yu Chen, Yada Zhu, Jun Xiao, Bo Li

Abstract: Portfolio management (PM) is a fundamental financial planning task that aims to achieve investment goals such as maximal profits or minimal risks. Its decision process involves continuous derivation of valuable information from various data sources and sequential decision optimization, which is a prospective research direction for reinforcement learning (RL). In this paper, we propose SARL, a nove… ▽ More Portfolio management (PM) is a fundamental financial planning task that aims to achieve investment goals such as maximal profits or minimal risks. Its decision process involves continuous derivation of valuable information from various data sources and sequential decision optimization, which is a prospective research direction for reinforcement learning (RL). In this paper, we propose SARL, a novel State-Augmented RL framework for PM. Our framework aims to address two unique challenges in financial PM: (1) data heterogeneity -- the collected information for each asset is usually diverse, noisy and imbalanced (e.g., news articles); and (2) environment uncertainty -- the financial market is versatile and non-stationary. To incorporate heterogeneous data and enhance robustness against environment uncertainty, our SARL augments the asset information with their price movement prediction as additional states, where the prediction can be solely based on financial data (e.g., asset prices) or derived from alternative sources such as news. Experiments on two real-world datasets, (i) Bitcoin market and (ii) HighTech stock market with 7-year Reuters news articles, validate the effectiveness of SARL over existing PM approaches, both in terms of accumulated profits and risk-adjusted profits. Moreover, extensive simulations are conducted to demonstrate the importance of our proposed state augmentation, providing new insights and boosting performance significantly over standard RL-based PM method and other baselines. △ Less

Submitted 9 February, 2020; originally announced February 2020.

Comments: AAAI 2020

arXiv:2002.05287 [pdf, other]

Geom-GCN: Geometric Graph Convolutional Networks

Authors: Hongbin Pei, Bingzhe Wei, Kevin Chen-Chuan Chang, Yu Lei, Bo Yang

Abstract: Message-passing neural networks (MPNNs) have been successfully applied to representation learning on graphs in a variety of real-world applications. However, two fundamental weaknesses of MPNNs' aggregators limit their ability to represent graph-structured data: losing the structural information of nodes in neighborhoods and lacking the ability to capture long-range dependencies in disassortative… ▽ More Message-passing neural networks (MPNNs) have been successfully applied to representation learning on graphs in a variety of real-world applications. However, two fundamental weaknesses of MPNNs' aggregators limit their ability to represent graph-structured data: losing the structural information of nodes in neighborhoods and lacking the ability to capture long-range dependencies in disassortative graphs. Few studies have noticed the weaknesses from different perspectives. From the observations on classical neural network and network geometry, we propose a novel geometric aggregation scheme for graph neural networks to overcome the two weaknesses. The behind basic idea is the aggregation on a graph can benefit from a continuous space underlying the graph. The proposed aggregation scheme is permutation-invariant and consists of three modules, node embedding, structural neighborhood, and bi-level aggregation. We also present an implementation of the scheme in graph convolutional networks, termed Geom-GCN (Geometric Graph Convolutional Networks), to perform transductive learning on graphs. Experimental results show the proposed Geom-GCN achieved state-of-the-art performance on a wide range of open datasets of graphs. Code is available at https://github.com/graphdml-uiuc-jlu/geom-gcn. △ Less

Submitted 13 February, 2020; v1 submitted 12 February, 2020; originally announced February 2020.

Comments: Published as a conference paper at ICLR 2020

arXiv:1912.10375 [pdf, other]

T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack

Authors: Boxin Wang, Hengzhi Pei, Boyuan Pan, Qian Chen, Shuohang Wang, Bo Li

Abstract: Adversarial attacks against natural language processing systems, which perform seemingly innocuous modifications to inputs, can induce arbitrary mistakes to the target models. Though raised great concerns, such adversarial attacks can be leveraged to estimate the robustness of NLP models. Compared with the adversarial example generation in continuous data domain (e.g., image), generating adversari… ▽ More Adversarial attacks against natural language processing systems, which perform seemingly innocuous modifications to inputs, can induce arbitrary mistakes to the target models. Though raised great concerns, such adversarial attacks can be leveraged to estimate the robustness of NLP models. Compared with the adversarial example generation in continuous data domain (e.g., image), generating adversarial text that preserves the original meaning is challenging since the text space is discrete and non-differentiable. To handle these challenges, we propose a target-controllable adversarial attack framework T3, which is applicable to a range of NLP tasks. In particular, we propose a tree-based autoencoder to embed the discrete text data into a continuous representation space, upon which we optimize the adversarial perturbation. A novel tree-based decoder is then applied to regularize the syntactic correctness of the generated text and manipulate it on either sentence (T3(Sent)) or word (T3(Word)) level. We consider two most representative NLP tasks: sentiment analysis and question answering (QA). Extensive experimental results and human studies show that T3 generated adversarial texts can successfully manipulate the NLP models to output the targeted incorrect answer without misleading the human. Moreover, we show that the generated adversarial texts have high transferability which enables the black-box attacks in practice. Our work sheds light on an effective and general way to examine the robustness of NLP models. Our code is publicly available at https://github.com/AI-secure/T3/. △ Less

Submitted 5 October, 2020; v1 submitted 21 December, 2019; originally announced December 2019.

Comments: Accepted to EMNLP 2020 as a long paper. 17 pages, 4 figures

arXiv:1911.07135 [pdf, other]

The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks

Authors: Yuheng Zhang, Ruoxi Jia, Hengzhi Pei, Wenxiao Wang, Bo Li, Dawn Song

Abstract: This paper studies model-inversion attacks, in which the access to a model is abused to infer information about the training data. Since its first introduction, such attacks have raised serious concerns given that training data usually contain privacy-sensitive information. Thus far, successful model-inversion attacks have only been demonstrated on simple models, such as linear regression and logi… ▽ More This paper studies model-inversion attacks, in which the access to a model is abused to infer information about the training data. Since its first introduction, such attacks have raised serious concerns given that training data usually contain privacy-sensitive information. Thus far, successful model-inversion attacks have only been demonstrated on simple models, such as linear regression and logistic regression. Previous attempts to invert neural networks, even the ones with simple architectures, have failed to produce convincing results. We present a novel attack method, termed the generative model-inversion attack, which can invert deep neural networks with high success rates. Rather than reconstructing private training data from scratch, we leverage partial public information, which can be very generic, to learn a distributional prior via generative adversarial networks (GANs) and use it to guide the inversion process. Moreover, we theoretically prove that a model's predictive power and its vulnerability to inversion attacks are indeed two sides of the same coin---highly predictive models are able to establish a strong correlation between features and labels, which coincides exactly with what an adversary exploits to mount the attacks. Our extensive experiments demonstrate that the proposed attack improves identification accuracy over the existing work by about 75\% for reconstructing face images from a state-of-the-art face recognition classifier. We also show that differential privacy, in its canonical form, is of little avail to defend against our attacks. △ Less

Submitted 17 April, 2020; v1 submitted 16 November, 2019; originally announced November 2019.

arXiv:1906.12035 [pdf, other]

A Concise Model for Multi-Criteria Chinese Word Segmentation with Transformer Encoder

Authors: Xipeng Qiu, Hengzhi Pei, Hang Yan, Xuanjing Huang

Abstract: Multi-criteria Chinese word segmentation (MCCWS) aims to exploit the relations among the multiple heterogeneous segmentation criteria and further improve the performance of each single criterion. Previous work usually regards MCCWS as different tasks, which are learned together under the multi-task learning framework. In this paper, we propose a concise but effective unified model for MCCWS, which… ▽ More Multi-criteria Chinese word segmentation (MCCWS) aims to exploit the relations among the multiple heterogeneous segmentation criteria and further improve the performance of each single criterion. Previous work usually regards MCCWS as different tasks, which are learned together under the multi-task learning framework. In this paper, we propose a concise but effective unified model for MCCWS, which is fully-shared for all the criteria. By leveraging the powerful ability of the Transformer encoder, the proposed unified model can segment Chinese text according to a unique criterion-token indicating the output criterion. Besides, the proposed unified model can segment both simplified and traditional Chinese and has an excellent transfer capability. Experiments on eight datasets with different criteria show that our model outperforms our single-criterion baseline model and other multi-criteria models. Source codes of this paper are available on Github https://github.com/acphile/MCCWS. △ Less

Submitted 5 October, 2020; v1 submitted 28 June, 2019; originally announced June 2019.

Comments: Findings of EMNLP 2020

arXiv:1712.00328 [pdf, other]

Group Sparse Bayesian Learning for Active Surveillance on Epidemic Dynamics

Authors: Hongbin Pei, Bo Yang, Jiming Liu, Lei Dong

Abstract: Predicting epidemic dynamics is of great value in understanding and controlling diffusion processes, such as infectious disease spread and information propagation. This task is intractable, especially when surveillance resources are very limited. To address the challenge, we study the problem of active surveillance, i.e., how to identify a small portion of system components as sentinels to effect… ▽ More Predicting epidemic dynamics is of great value in understanding and controlling diffusion processes, such as infectious disease spread and information propagation. This task is intractable, especially when surveillance resources are very limited. To address the challenge, we study the problem of active surveillance, i.e., how to identify a small portion of system components as sentinels to effect monitoring, such that the epidemic dynamics of an entire system can be readily predicted from the partial data collected by such sentinels. We propose a novel measure, the gamma value, to identify the sentinels by modeling a sentinel network with row sparsity structure. We design a flexible group sparse Bayesian learning algorithm to mine the sentinel network suitable for handling both linear and non-linear dynamical systems by using the expectation maximization method and variational approximation. The efficacy of the proposed algorithm is theoretically analyzed and empirically validated using both synthetic and real-world data. △ Less

Submitted 21 November, 2017; originally announced December 2017.

arXiv:1603.06780 [pdf, other]

Early Warning of Human Crowds Based on Query Data from Baidu Map: Analysis Based on Shanghai Stampede

Authors: Jingbo Zhou, Hongbin Pei, Haishan Wu

Abstract: Without sufficient preparation and on-site management, the mass scale unexpected huge human crowd is a serious threat to public safety. A recent impressive tragedy is the 2014 Shanghai Stampede, where 36 people were killed and 49 were injured in celebration of the New Year's Eve on December 31th 2014 in the Shanghai Bund. Due to the innately stochastic and complicated individual movement, it is no… ▽ More Without sufficient preparation and on-site management, the mass scale unexpected huge human crowd is a serious threat to public safety. A recent impressive tragedy is the 2014 Shanghai Stampede, where 36 people were killed and 49 were injured in celebration of the New Year's Eve on December 31th 2014 in the Shanghai Bund. Due to the innately stochastic and complicated individual movement, it is not easy to predict collective gatherings, which potentially leads to crowd events. In this paper, with leveraging the big data generated on Baidu map, we propose a novel approach to early warning such potential crowd disasters, which has profound public benefits. An insightful observation is that, with the prevalence and convenience of mobile map service, users usually search on the Baidu map to plan a routine. Therefore, aggregating users' query data on Baidu map can obtain priori and indication information for estimating future human population in a specific area ahead of time. Our careful analysis and deep investigation on the Baidu map data on various events also demonstrates a strong correlation pattern between the number of map query and the number of positioning users in an area. Based on such observation, we propose a decision method utilizing query data on Baidu map to invoke warnings for potential crowd events about 1-3 hours in advance. Then we also construct a machine learning model with heterogeneous data (such as query data and mobile positioning data) to quantitatively measure the risk of the potential crowd disasters. We evaluate the effectiveness of our methods on the data of Baidu map. △ Less

Submitted 22 March, 2016; originally announced March 2016.

arXiv:1602.06818 [pdf, ps, other]

doi 10.1080/01431161.2016.1249302

Graph Regularized Low Rank Representation for Aerosol Optical Depth Retrieval

Authors: Yubao Sun, Renlong Hang, Qingshan Liu, Fuping Zhu, Hucheng Pei

Abstract: In this paper, we propose a novel data-driven regression model for aerosol optical depth (AOD) retrieval. First, we adopt a low rank representation (LRR) model to learn a powerful representation of the spectral response. Then, graph regularization is incorporated into the LRR model to capture the local structure information and the nonlinear property of the remote-sensing data. Since it is easy to… ▽ More In this paper, we propose a novel data-driven regression model for aerosol optical depth (AOD) retrieval. First, we adopt a low rank representation (LRR) model to learn a powerful representation of the spectral response. Then, graph regularization is incorporated into the LRR model to capture the local structure information and the nonlinear property of the remote-sensing data. Since it is easy to acquire the rich satellite-retrieval results, we use them as a baseline to construct the graph. Finally, the learned feature representation is feeded into support vector machine (SVM) to retrieve AOD. Experiments are conducted on two widely used data sets acquired by different sensors, and the experimental results show that the proposed method can achieve superior performance compared to the physical models and other state-of-the-art empirical models. △ Less

Submitted 7 March, 2016; v1 submitted 22 February, 2016; originally announced February 2016.

Comments: 16 pages, 6 figures

Showing 1–28 of 28 results for author: Pei, H