Search SciRate

96 results for au:Zeng_K in:cs

Show all abstracts

Adaptive Prompt Learning with SAM for Few-shot Scanning Probe Microscope Image Segmentation
Yao Shen, Ziwei Wei, Chunmeng Liu, Shuming Wei, Qi Zhao, Kaiyang Zeng, Guangyao Li
Oct 17 2024 cs.CV arXiv:2410.12562v1

@misc{2410.12562, author = {Yao Shen and Ziwei Wei and Chunmeng Liu and Shuming Wei and Qi Zhao and Kaiyang Zeng and Guangyao Li}, title = {{A}daptive {P}rompt {L}earning with {SAM} for {F}ew-shot {S}canning {P}robe {M}icroscope {I}mage {S}egmentation}, year = {2024}, eprint = {2410.12562}, note = {arXiv:2410.12562v1} }
PDF
The Segment Anything Model (SAM) has demonstrated strong performance in image segmentation of natural scene images. However, its effectiveness diminishes markedly when applied to specific scientific domains, such as Scanning Probe Microscope (SPM) images. This decline in accuracy can be attributed to the distinct data distribution and limited availability of the data inherent in the scientific images. On the other hand, the acquisition of adequate SPM datasets is both time-intensive and laborious as well as skill-dependent. To address these challenges, we propose an Adaptive Prompt Learning with SAM (APL-SAM) framework tailored for few-shot SPM image segmentation. Our approach incorporates two key innovations to enhance SAM: 1) An Adaptive Prompt Learning module leverages few-shot embeddings derived from limited support set to learn adaptively central representatives, serving as visual prompts. This innovation eliminates the need for time-consuming online user interactions for providing prompts, such as exhaustively marking points and bounding boxes slice by slice; 2) A multi-source, multi-level mask decoder specifically designed for few-shot SPM image segmentation is introduced, which can effectively capture the correspondence between the support and query images. To facilitate comprehensive training and evaluation, we introduce a new dataset, SPM-Seg, curated for SPM image segmentation. Extensive experiments on this dataset reveal that the proposed APL-SAM framework significantly outperforms the original SAM, achieving over a 30% improvement in terms of Dice Similarity Coefficient with only one-shot guidance. Moreover, APL-SAM surpasses state-of-the-art few-shot segmentation methods and even fully supervised approaches in performance. Code and dataset used in this study will be made available upon acceptance.
FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning
Jiaheng Hu, Rose Hendrix, Ali Farhadi, Aniruddha Kembhavi, Roberto Martin-Martin, Peter Stone, Kuo-Hao Zeng, Kiana Ehsani
Sep 26 2024 cs.RO cs.CV cs.LG arXiv:2409.16578v2

@misc{2409.16578, author = {Jiaheng Hu and Rose Hendrix and Ali Farhadi and Aniruddha Kembhavi and Roberto Martin-Martin and Peter Stone and Kuo-Hao Zeng and Kiana Ehsani}, title = {{FL}a{R}e: {A}chieving {M}asterful and {A}daptive {R}obot {P}olicies with {L}arge-{S}cale {R}einforcement {L}earning {F}ine-{T}uning}, year = {2024}, eprint = {2409.16578}, note = {arXiv:2409.16578v2} }
PDF
In recent years, the Robotics field has initiated several efforts toward building generalist robot policies through large-scale multi-task Behavior Cloning. However, direct deployments of these policies have led to unsatisfactory performance, where the policy struggles with unseen states and tasks. How can we break through the performance plateau of these models and elevate their capabilities to new heights? In this paper, we propose FLaRe, a large-scale Reinforcement Learning fine-tuning framework that integrates robust pre-trained representations, large-scale training, and gradient stabilization techniques. Our method aligns pre-trained policies towards task completion, achieving state-of-the-art (SoTA) performance both on previously demonstrated and on entirely novel tasks and embodiments. Specifically, on a set of long-horizon mobile manipulation tasks, FLaRe achieves an average success rate of 79.5% in unseen environments, with absolute improvements of +23.6% in simulation and +30.7% on real robots over prior SoTA methods. By utilizing only sparse rewards, our approach can enable generalizing to new capabilities beyond the pretraining data with minimal human effort. Moreover, we demonstrate rapid adaptation to new embodiments and behaviors with less than a day of fine-tuning. Videos can be found on the project website at https://robot-flare.github.io/
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Matt Deitke, Christopher Clark, Sangho Lee, Rohun Tripathi, Yue Yang, Jae Sung Park, Mohammadreza Salehi, Niklas Muennighoff, Kyle Lo, Luca Soldaini, Jiasen Lu, Taira Anderson, Erin Bransom, Kiana Ehsani, Huong Ngo, YenSung Chen, Ajay Patel, Mark Yatskar, Chris Callison-Burch, Andrew Head, et al (31)
Sep 26 2024 cs.CV cs.CL cs.LG arXiv:2409.17146v1

@misc{2409.17146, author = {Matt Deitke and Christopher Clark and Sangho Lee and Rohun Tripathi and Yue Yang and Jae Sung Park and Mohammadreza Salehi and Niklas Muennighoff and Kyle Lo and Luca Soldaini and Jiasen Lu and Taira Anderson and Erin Bransom and Kiana Ehsani and Huong Ngo and YenSung Chen and Ajay Patel and Mark Yatskar and Chris Callison-Burch and Andrew Head and Rose Hendrix and Favyen Bastani and Eli VanderBilt and Nathan Lambert and Yvonne Chou and Arnavi Chheda and Jenna Sparks and Sam Skjonsberg and Michael Schmitz and Aaron Sarnat and Byron Bischoff and Pete Walsh and Chris Newell and Piper Wolters and Tanmay Gupta and Kuo-Hao Zeng and Jon Borchardt and Dirk Groeneveld and Jen Dumas and Crystal Nam and Sophie Lebrecht and Caitlin Wittlif and Carissa Schoenick and Oscar Michel and Ranjay Krishna and Luca Weihs and Noah A.~Smith and Hannaneh Hajishirzi and Ross Girshick and Ali Farhadi and Aniruddha Kembhavi}, title = {{M}olmo and {P}ix{M}o: {O}pen {W}eights and {O}pen {D}ata for {S}tate-of-the-{A}rt {M}ultimodal {M}odels}, year = {2024}, eprint = {2409.17146}, note = {arXiv:2409.17146v1} }
PDF
Today's most advanced multimodal models remain proprietary. The strongest open-weight models rely heavily on synthetic data from proprietary VLMs to achieve good performance, effectively distilling these closed models into open ones. As a result, the community is still missing foundational knowledge about how to build performant VLMs from scratch. We present Molmo, a new family of VLMs that are state-of-the-art in their class of openness. Our key innovation is a novel, highly detailed image caption dataset collected entirely from human annotators using speech-based descriptions. To enable a wide array of user interactions, we also introduce a diverse dataset mixture for fine-tuning that includes in-the-wild Q&A and innovative 2D pointing data. The success of our approach relies on careful choices for the model architecture details, a well-tuned training pipeline, and, most critically, the quality of our newly collected datasets, all of which will be released. The best-in-class 72B model within the Molmo family not only outperforms others in the class of open weight and data models but also compares favorably against proprietary systems like GPT-4o, Claude 3.5, and Gemini 1.5 on both academic benchmarks and human evaluation. We will be releasing all of our model weights, captioning and fine-tuning data, and source code in the near future. Select model weights, inference code, and demo are available at https://molmo.allenai.org.
Natias: Neuron Attribution based Transferable Image Adversarial Steganography
Zexin Fan, Kejiang Chen, Kai Zeng, Jiansong Zhang, Weiming Zhang, Nenghai Yu
Sep 10 2024 cs.CV cs.CR arXiv:2409.04968v1

@misc{2409.04968, author = {Zexin Fan and Kejiang Chen and Kai Zeng and Jiansong Zhang and Weiming Zhang and Nenghai Yu}, title = {{N}atias: {N}euron {A}ttribution based {T}ransferable {I}mage {A}dversarial {S}teganography}, year = {2024}, eprint = {2409.04968}, doi = {10.1109/TIFS.2024.3421893}, note = {arXiv:2409.04968v1} }
PDF
Image steganography is a technique to conceal secret messages within digital images. Steganalysis, on the contrary, aims to detect the presence of secret messages within images. Recently, deep-learning-based steganalysis methods have achieved excellent detection performance. As a countermeasure, adversarial steganography has garnered considerable attention due to its ability to effectively deceive deep-learning-based steganalysis. However, steganalysts often employ unknown steganalytic models for detection. Therefore, the ability of adversarial steganography to deceive non-target steganalytic models, known as transferability, becomes especially important. Nevertheless, existing adversarial steganographic methods do not consider how to enhance transferability. To address this issue, we propose a novel adversarial steganographic scheme named Natias. Specifically, we first attribute the output of a steganalytic model to each neuron in the target middle layer to identify critical features. Next, we corrupt these critical features that may be adopted by diverse steganalytic models. Consequently, it can promote the transferability of adversarial steganography. Our proposed method can be seamlessly integrated with existing adversarial steganography frameworks. Thorough experimental analyses affirm that our proposed technique possesses improved transferability when contrasted with former approaches, and it attains heightened security in retraining scenarios.
Text-Augmented Multimodal LLMs for Chemical Reaction Condition Recommendation
Yu Zhang, Ruijie Yu, Kaipeng Zeng, Ding Li, Feng Zhu, Xiaokang Yang, Yaohui Jin, Yanyan Xu
Jul 23 2024 cs.AI cs.LG physics.chem-ph arXiv:2407.15141v1

@misc{2407.15141, author = {Yu Zhang and Ruijie Yu and Kaipeng Zeng and Ding Li and Feng Zhu and Xiaokang Yang and Yaohui Jin and Yanyan Xu}, title = {{T}ext-{A}ugmented {M}ultimodal {LLM}s for {C}hemical {R}eaction {C}ondition {R}ecommendation}, year = {2024}, eprint = {2407.15141}, note = {arXiv:2407.15141v1} }
PDF
High-throughput reaction condition (RC) screening is fundamental to chemical synthesis. However, current RC screening suffers from laborious and costly trial-and-error workflows. Traditional computer-aided synthesis planning (CASP) tools fail to find suitable RCs due to data sparsity and inadequate reaction representations. Nowadays, large language models (LLMs) are capable of tackling chemistry-related problems, such as molecule design, and chemical logic Q\&A tasks. However, LLMs have not yet achieved accurate predictions of chemical reaction conditions. Here, we present MM-RCR, a text-augmented multimodal LLM that learns a unified reaction representation from SMILES, reaction graphs, and textual corpus for chemical reaction recommendation (RCR). To train MM-RCR, we construct 1.2 million pair-wised Q\&A instruction datasets. Our experimental results demonstrate that MM-RCR achieves state-of-the-art performance on two open benchmark datasets and exhibits strong generalization capabilities on out-of-domain (OOD) and High-Throughput Experimentation (HTE) datasets. MM-RCR has the potential to accelerate high-throughput condition screening in chemical synthesis.
LLMAEL: Large Language Models are Good Context Augmenters for Entity Linking
Amy Xin, Yunjia Qi, Zijun Yao, Fangwei Zhu, Kaisheng Zeng, Xu Bin, Lei Hou, Juanzi Li
Jul 08 2024 cs.CL arXiv:2407.04020v2

@misc{2407.04020, author = {Amy Xin and Yunjia Qi and Zijun Yao and Fangwei Zhu and Kaisheng Zeng and Xu Bin and Lei Hou and Juanzi Li}, title = {{LLMAEL}: {L}arge {L}anguage {M}odels are {G}ood {C}ontext {A}ugmenters for {E}ntity {L}inking}, year = {2024}, eprint = {2407.04020}, note = {arXiv:2407.04020v2} }
PDF
Entity Linking (EL) models are well-trained at mapping mentions to their corresponding entities according to a given context. However, EL models struggle to disambiguate long-tail entities due to their limited training data. Meanwhile, large language models (LLMs) are more robust at interpreting uncommon mentions. Yet, due to a lack of specialized training, LLMs suffer at generating correct entity IDs. Furthermore, training an LLM to perform EL is cost-intensive. Building upon these insights, we introduce LLM-Augmented Entity Linking LLMAEL, a plug-and-play approach to enhance entity linking through LLM data augmentation. We leverage LLMs as knowledgeable context augmenters, generating mention-centered descriptions as additional input, while preserving traditional EL models for task specific processing. Experiments on 6 standard datasets show that the vanilla LLMAEL outperforms baseline EL models in most cases, while the fine-tuned LLMAEL set the new state-of-the-art results across all 6 benchmarks.
PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators
Kuo-Hao Zeng, Zichen Zhang, Kiana Ehsani, Rose Hendrix, Jordi Salvador, Alvaro Herrasti, Ross Girshick, Aniruddha Kembhavi, Luca Weihs
Jul 01 2024 cs.RO cs.CV arXiv:2406.20083v1

@misc{2406.20083, author = {Kuo-Hao Zeng and Zichen Zhang and Kiana Ehsani and Rose Hendrix and Jordi Salvador and Alvaro Herrasti and Ross Girshick and Aniruddha Kembhavi and Luca Weihs}, title = {{P}oli{F}ormer: {S}caling {O}n-{P}olicy {RL} with {T}ransformers {R}esults in {M}asterful {N}avigators}, year = {2024}, eprint = {2406.20083}, note = {arXiv:2406.20083v1} }
PDF
We present PoliFormer (Policy Transformer), an RGB-only indoor navigation agent trained end-to-end with reinforcement learning at scale that generalizes to the real-world without adaptation despite being trained purely in simulation. PoliFormer uses a foundational vision transformer encoder with a causal transformer decoder enabling long-term memory and reasoning. It is trained for hundreds of millions of interactions across diverse environments, leveraging parallelized, multi-machine rollouts for efficient training with high throughput. PoliFormer is a masterful navigator, producing state-of-the-art results across two distinct embodiments, the LoCoBot and Stretch RE-1 robots, and four navigation benchmarks. It breaks through the plateaus of previous work, achieving an unprecedented 85.5% success rate in object goal navigation on the CHORES-S benchmark, a 28.5% absolute improvement. PoliFormer can also be trivially extended to a variety of downstream applications such as object tracking, multi-object navigation, and open-vocabulary navigation with no finetuning.
Take a Step Further: Understanding Page Spray in Linux Kernel Exploitation
Ziyi Guo, Dang K Le, Zhenpeng Lin, Kyle Zeng, Ruoyu Wang, Tiffany Bao, Yan Shoshitaishvili, Adam Doupé, Xinyu Xing
Jun 06 2024 cs.CR cs.SE arXiv:2406.02624v2

@misc{2406.02624, author = {Ziyi Guo and Dang K Le and Zhenpeng Lin and Kyle Zeng and Ruoyu Wang and Tiffany Bao and Yan Shoshitaishvili and Adam Doupé and Xinyu Xing}, title = {{T}ake a {S}tep {F}urther: {U}nderstanding {P}age {S}pray in {L}inux {K}ernel {E}xploitation}, year = {2024}, eprint = {2406.02624}, note = {arXiv:2406.02624v2} }
PDF
Recently, a novel method known as Page Spray emerges, focusing on page-level exploitation for kernel vulnerabilities. Despite the advantages it offers in terms of exploitability, stability, and compatibility, comprehensive research on Page Spray remains scarce. Questions regarding its root causes, exploitation model, comparative benefits over other exploitation techniques, and possible mitigation strategies have largely remained unanswered. In this paper, we conduct a systematic investigation into Page Spray, providing an in-depth understanding of this exploitation technique. We introduce a comprehensive exploit model termed the \sys model, elucidating its fundamental principles. Additionally, we conduct a thorough analysis of the root causes underlying Page Spray occurrences within the Linux Kernel. We design an analyzer based on the Page Spray analysis model to identify Page Spray callsites. Subsequently, we evaluate the stability, exploitability, and compatibility of Page Spray through meticulously designed experiments. Finally, we propose mitigation principles for addressing Page Spray and introduce our own lightweight mitigation approach. This research aims to assist security researchers and developers in gaining insights into Page Spray, ultimately enhancing our collective understanding of this emerging exploitation technique and making improvements to the community.
CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs
Haoyu Wang, Bei Liu, Hang Shao, Bo Xiao, Ke Zeng, Guanglu Wan, Yanmin Qian
May 28 2024 cs.LG arXiv:2405.17233v2

@misc{2405.17233, author = {Haoyu Wang and Bei Liu and Hang Shao and Bo Xiao and Ke Zeng and Guanglu Wan and Yanmin Qian}, title = {{CLAQ}: {P}ushing the {L}imits of {L}ow-{B}it {P}ost-{T}raining {Q}uantization for {LLM}s}, year = {2024}, eprint = {2405.17233}, note = {arXiv:2405.17233v2} }
PDF
Parameter quantization for Large Language Models (LLMs) has attracted increasing attentions recently in reducing memory costs and improving computational efficiency. Early approaches have been widely adopted. However, the existing methods suffer from poor performance in low-bit (such as 2 to 3 bits) scenarios. In this paper, we present a novel and effective Column-Level Adaptive weight Quantization (CLAQ) framework by introducing three different types of adaptive strategies for LLM quantization. Firstly, a K-Means clustering based algorithm is proposed that allows dynamic generation of quantization centroids for each column of a parameter matrix. Secondly, we design an outlier-guided adaptive precision search strategy which can dynamically assign varying bit-widths to different columns. Finally, a dynamic outlier reservation scheme is developed to retain some parameters in their original float point precision, in trade off of boosted model performance. Experiments on various mainstream open source LLMs including LLaMA-1, LLaMA-2 and Yi demonstrate that our methods achieve the state-of-the-art results across different bit settings, especially in extremely low-bit scenarios. Code is available at https://github.com/fayuge/CLAQ.
Swipe2Pair: Secure and Fast In-Band Wireless Device Pairing
Yaqi He, Kai Zeng, Long Jiao, Brian L. Mark, Khaled N. Khasawneh
May 07 2024 cs.CR cs.NI arXiv:2405.03045v1

@misc{2405.03045, author = {Yaqi He and Kai Zeng and Long Jiao and Brian L.~Mark and Khaled N.~Khasawneh}, title = {{S}wipe2{P}air: {S}ecure and {F}ast {I}n-{B}and {W}ireless {D}evice {P}airing}, year = {2024}, eprint = {2405.03045}, doi = {10.1145/3643833.3656127}, note = {arXiv:2405.03045v1} }
PDF
Wireless device pairing is a critical security mechanism to bootstrap the secure communication between two devices without a pre-shared secret. It has been widely used in many Internet of Things (IoT) applications, such as smart-home and smart-health. Most existing device pairing mechanisms are based on out-of-band channels, e.g., extra sensors or hardware, to validate the proximity of pairing devices. However, out-of-band channels are not universal across all wireless devices, so such a scheme is limited to certain application scenarios or conditions. On the other hand, in-band channel-based device pairing seeks universal applicability by only relying on wireless interfaces. Existing in-band channel-based pairing schemes either require multiple antennas separated by a good distance on one pairing device, which is not feasible in certain scenarios, or require users to repeat multiple sweeps, which is not optimal in terms of usability. Therefore, an in-band wireless device pairing scheme providing high security while maintaining high usability (simple pairing process and minimal user intervention) is highly desired. In this work, we propose an easy-to-use mutual authentication device pairing scheme, named Swipe2Pair, based on the proximity of pairing devices and randomization of wireless transmission power. We conduct extensive security analysis and collect considerable experimental data under various settings across different environments. Experimental results show that Swipe2Pair achieves high security and usability. It only takes less than one second to complete the pairing process with a simple swipe of one device in front of the other.
MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model
Kang Zeng, Hao Shi, Jiacheng Lin, Siyu Li, Jintao Cheng, Kaiwei Wang, Zhiyong Li, Kailun Yang
Apr 23 2024 cs.CV cs.MM cs.RO eess.IV arXiv:2404.12794v2

@misc{2404.12794, author = {Kang Zeng and Hao Shi and Jiacheng Lin and Siyu Li and Jintao Cheng and Kaiwei Wang and Zhiyong Li and Kailun Yang}, title = {{M}amba{MOS}: {L}i{DAR}-based 3{D} {M}oving {O}bject {S}egmentation with {M}otion-aware {S}tate {S}pace {M}odel}, year = {2024}, eprint = {2404.12794}, note = {arXiv:2404.12794v2} }
PDF
LiDAR-based Moving Object Segmentation (MOS) aims to locate and segment moving objects in point clouds of the current scan using motion information from previous scans. Despite the promising results achieved by previous MOS methods, several key issues, such as the weak coupling of temporal and spatial information, still need further study. In this paper, we propose a novel LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model, termed MambaMOS. Firstly, we develop a novel embedding module, the Time Clue Bootstrapping Embedding (TCBE), to enhance the coupling of temporal and spatial information in point clouds and alleviate the issue of overlooked temporal clues. Secondly, we introduce the Motion-aware State Space Model (MSSM) to endow the model with the capacity to understand the temporal correlations of the same object across different time steps. Specifically, MSSM emphasizes the motion states of the same object at different time steps through two distinct temporal modeling and correlation steps. We utilize an improved state space model to represent these motion differences, significantly modeling the motion states. Finally, extensive experiments on the SemanticKITTI-MOS and KITTI-Road benchmarks demonstrate that the proposed MambaMOS achieves state-of-the-art performance. The source code is publicly available at https://github.com/Terminal-K/MambaMOS.
CMNEE: A Large-Scale Document-Level Event Extraction Dataset based on Open-Source Chinese Military News
Mengna Zhu, Zijie Xu, Kaisheng Zeng, Kaiming Xiao, Mao Wang, Wenjun Ke, Hongbin Huang
Apr 19 2024 cs.CL arXiv:2404.12242v1

@misc{2404.12242, author = {Mengna Zhu and Zijie Xu and Kaisheng Zeng and Kaiming Xiao and Mao Wang and Wenjun Ke and Hongbin Huang}, title = {{CMNEE}: {A} {L}arge-{S}cale {D}ocument-{L}evel {E}vent {E}xtraction {D}ataset based on {O}pen-{S}ource {C}hinese {M}ilitary {N}ews}, year = {2024}, eprint = {2404.12242}, note = {arXiv:2404.12242v1} }
PDF
Extracting structured event knowledge, including event triggers and corresponding arguments, from military texts is fundamental to many applications, such as intelligence analysis and decision assistance. However, event extraction in the military field faces the data scarcity problem, which impedes the research of event extraction models in this domain. To alleviate this problem, we propose CMNEE, a large-scale, document-level open-source Chinese Military News Event Extraction dataset. It contains 17,000 documents and 29,223 events, which are all manually annotated based on a pre-defined schema for the military domain including 8 event types and 11 argument role types. We designed a two-stage, multi-turns annotation strategy to ensure the quality of CMNEE and reproduced several state-of-the-art event extraction models with a systematic evaluation. The experimental results on CMNEE fall shorter than those on other domain datasets obviously, which demonstrates that event extraction for military domain poses unique challenges and requires further research efforts. Our code and data can be obtained from https://github.com/Mzzzhu/CMNEE.
Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models
Zijin Yang, Kai Zeng, Kejiang Chen, Han Fang, Weiming Zhang, Nenghai Yu
Apr 09 2024 cs.CV cs.CR arXiv:2404.04956v3

@misc{2404.04956, author = {Zijin Yang and Kai Zeng and Kejiang Chen and Han Fang and Weiming Zhang and Nenghai Yu}, title = {{G}aussian {S}hading: {P}rovable {P}erformance-{L}ossless {I}mage {W}atermarking for {D}iffusion {M}odels}, year = {2024}, eprint = {2404.04956}, note = {arXiv:2404.04956v3} }
PDF
Ethical concerns surrounding copyright protection and inappropriate content generation pose challenges for the practical implementation of diffusion models. One effective solution involves watermarking the generated images. However, existing methods often compromise the model performance or require additional training, which is undesirable for operators and users. To address this issue, we propose Gaussian Shading, a diffusion model watermarking technique that is both performance-lossless and training-free, while serving the dual purpose of copyright protection and tracing of offending content. Our watermark embedding is free of model parameter modifications and thus is plug-and-play. We map the watermark to latent representations following a standard Gaussian distribution, which is indistinguishable from latent representations obtained from the non-watermarked diffusion model. Therefore we can achieve watermark embedding with lossless performance, for which we also provide theoretical proof. Furthermore, since the watermark is intricately linked with image semantics, it exhibits resilience to lossy processing and erasure attempts. The watermark can be extracted by Denoising Diffusion Implicit Models (DDIM) inversion and inverse sampling. We evaluate Gaussian Shading on multiple versions of Stable Diffusion, and the results demonstrate that Gaussian Shading not only is performance-lossless but also outperforms existing methods in terms of robustness.
UAlign: Pushing the Limit of Template-free Retrosynthesis Prediction with Unsupervised SMILES Alignment
Kaipeng Zeng, Bo yang, Xin Zhao, Yu Zhang, Fan Nie, Xiaokang Yang, Yaohui Jin, Yanyan Xu
Apr 02 2024 physics.chem-ph cs.AI cs.LG q-bio.QM arXiv:2404.00044v2

@misc{2404.00044, author = {Kaipeng Zeng and Bo yang and Xin Zhao and Yu Zhang and Fan Nie and Xiaokang Yang and Yaohui Jin and Yanyan Xu}, title = {{UA}lign: {P}ushing the {L}imit of {T}emplate-free {R}etrosynthesis {P}rediction with {U}nsupervised {SMILES} {A}lignment}, year = {2024}, eprint = {2404.00044}, note = {arXiv:2404.00044v2} }
PDF
Motivation: Retrosynthesis planning poses a formidable challenge in the organic chemical industry. Single-step retrosynthesis prediction, a crucial step in the planning process, has witnessed a surge in interest in recent years due to advancements in AI for science. Various deep learning-based methods have been proposed for this task in recent years, incorporating diverse levels of additional chemical knowledge dependency. Results: This paper introduces UAlign, a template-free graph-to-sequence pipeline for retrosynthesis prediction. By combining graph neural networks and Transformers, our method can more effectively leverage the inherent graph structure of molecules. Based on the fact that the majority of molecule structures remain unchanged during a chemical reaction, we propose a simple yet effective SMILES alignment technique to facilitate the reuse of unchanged structures for reactant generation. Extensive experiments show that our method substantially outperforms state-of-the-art template-free and semi-template-based approaches. Importantly, our template-free method achieves effectiveness comparable to, or even surpasses, established powerful template-based methods. Scientific contribution: We present a novel graph-to-sequence template-free retrosynthesis prediction pipeline that overcomes the limitations of Transformer-based methods in molecular representation learning and insufficient utilization of chemical information. We propose an unsupervised learning mechanism for establishing product-atom correspondence with reactant SMILES tokens, achieving even better results than supervised SMILES alignment methods. Extensive experiments demonstrate that UAlign significantly outperforms state-of-the-art template-free methods and rivals or surpasses template-based approaches, with up to 5\% (top-5) and 5.4\% (top-10) increased accuracy over the strongest baseline.
Distributed Swarm Learning for Edge Internet of Things
Yue Wang, Zhi Tian, FXin Fan, Zhipeng Cai, Cameron Nowzari, Kai Zeng
Apr 01 2024 cs.NI cs.AI cs.LG arXiv:2403.20188v1

@misc{2403.20188, author = {Yue Wang and Zhi Tian and FXin Fan and Zhipeng Cai and Cameron Nowzari and Kai Zeng}, title = {{D}istributed {S}warm {L}earning for {E}dge {I}nternet of {T}hings}, year = {2024}, eprint = {2403.20188}, note = {arXiv:2403.20188v1} }
PDF
The rapid growth of Internet of Things (IoT) has led to the widespread deployment of smart IoT devices at wireless edge for collaborative machine learning tasks, ushering in a new era of edge learning. With a huge number of hardware-constrained IoT devices operating in resource-limited wireless networks, edge learning encounters substantial challenges, including communication and computation bottlenecks, device and data heterogeneity, security risks, privacy leakages, non-convex optimization, and complex wireless environments. To address these issues, this article explores a novel framework known as distributed swarm learning (DSL), which combines artificial intelligence and biological swarm intelligence in a holistic manner. By harnessing advanced signal processing and communications, DSL provides efficient solutions and robust tools for large-scale IoT at the edge of wireless networks.
Provably Secure Disambiguating Neural Linguistic Steganography
Yuang Qi, Kejiang Chen, Kai Zeng, Weiming Zhang, Nenghai Yu
Mar 27 2024 cs.CR cs.CL arXiv:2403.17524v1

@misc{2403.17524, author = {Yuang Qi and Kejiang Chen and Kai Zeng and Weiming Zhang and Nenghai Yu}, title = {{P}rovably {S}ecure {D}isambiguating {N}eural {L}inguistic {S}teganography}, year = {2024}, eprint = {2403.17524}, note = {arXiv:2403.17524v1} }
PDF
Recent research in provably secure neural linguistic steganography has overlooked a crucial aspect: the sender must detokenize stegotexts to avoid raising suspicion from the eavesdropper. The segmentation ambiguity problem, which arises when using language models based on subwords, leads to occasional decoding failures in all neural language steganography implementations based on these models. Current solutions to this issue involve altering the probability distribution of candidate words, rendering them incompatible with provably secure steganography. We propose a novel secure disambiguation method named SyncPool, which effectively addresses the segmentation ambiguity problem. We group all tokens with prefix relationships in the candidate pool before the steganographic embedding algorithm runs to eliminate uncertainty among ambiguous tokens. To enable the receiver to synchronize the sampling process of the sender, a shared cryptographically-secure pseudorandom number generator (CSPRNG) is deployed to select a token from the ambiguity pool. SyncPool does not change the size of the candidate pool or the distribution of tokens and thus is applicable to provably secure language steganography methods. We provide theoretical proofs and experimentally demonstrate the applicability of our solution to various languages and models, showing its potential to significantly improve the reliability and security of neural linguistic steganography systems.
Learning or Self-aligning? Rethinking Instruction Fine-tuning
Mengjie Ren, Boxi Cao, Hongyu Lin, Cao Liu, Xianpei Han, Ke Zeng, Guanglu Wan, Xunliang Cai, Le Sun
Feb 29 2024 cs.CL arXiv:2402.18243v3

@misc{2402.18243, author = {Mengjie Ren and Boxi Cao and Hongyu Lin and Cao Liu and Xianpei Han and Ke Zeng and Guanglu Wan and Xunliang Cai and Le Sun}, title = {{L}earning or {S}elf-aligning? {R}ethinking {I}nstruction {F}ine-tuning}, year = {2024}, eprint = {2402.18243}, note = {arXiv:2402.18243v3} }
PDF
Instruction Fine-tuning~(IFT) is a critical phase in building large language models~(LLMs). Previous works mainly focus on the IFT's role in the transfer of behavioral norms and the learning of additional world knowledge. However, the understanding of the underlying mechanisms of IFT remains significantly limited. In this paper, we design a knowledge intervention framework to decouple the potential underlying factors of IFT, thereby enabling individual analysis of different factors. Surprisingly, our experiments reveal that attempting to learn additional world knowledge through IFT often struggles to yield positive impacts and can even lead to markedly negative effects. Further, we discover that maintaining internal knowledge consistency before and after IFT is a critical factor for achieving successful IFT. Our findings reveal the underlying mechanisms of IFT and provide robust support for some very recent and potential future works.
Multi-Bit Distortion-Free Watermarking for Large Language Models
Massieh Kordi Boroujeny, Ya Jiang, Kai Zeng, Brian Mark
Feb 28 2024 cs.CL cs.LG arXiv:2402.16578v1

@misc{2402.16578, author = {Massieh Kordi Boroujeny and Ya Jiang and Kai Zeng and Brian Mark}, title = {{M}ulti-{B}it {D}istortion-{F}ree {W}atermarking for {L}arge {L}anguage {M}odels}, year = {2024}, eprint = {2402.16578}, note = {arXiv:2402.16578v1} }
PDF
Methods for watermarking large language models have been proposed that distinguish AI-generated text from human-generated text by slightly altering the model output distribution, but they also distort the quality of the text, exposing the watermark to adversarial detection. More recently, distortion-free watermarking methods were proposed that require a secret key to detect the watermark. The prior methods generally embed zero-bit watermarks that do not provide additional information beyond tagging a text as being AI-generated. We extend an existing zero-bit distortion-free watermarking method by embedding multiple bits of meta-information as part of the watermark. We also develop a computationally efficient decoder that extracts the embedded information from the watermark with low bit error rate.
Event-level Knowledge Editing
Hao Peng, Xiaozhi Wang, Chunyang Li, Kaisheng Zeng, Jiangshan Duo, Yixin Cao, Lei Hou, Juanzi Li
Feb 21 2024 cs.CL cs.AI arXiv:2402.13093v2

@misc{2402.13093, author = {Hao Peng and Xiaozhi Wang and Chunyang Li and Kaisheng Zeng and Jiangshan Duo and Yixin Cao and Lei Hou and Juanzi Li}, title = {{E}vent-level {K}nowledge {E}diting}, year = {2024}, eprint = {2402.13093}, note = {arXiv:2402.13093v2} }
PDF
Knowledge editing aims at updating knowledge of large language models (LLMs) to prevent them from becoming outdated. Existing work edits LLMs at the level of factual knowledge triplets. However, natural knowledge updates in the real world come from the occurrences of new events rather than direct changes in factual triplets. In this paper, we propose a new task setting: event-level knowledge editing, which directly edits new events into LLMs and improves over conventional triplet-level editing on (1) Efficiency. A single event edit leads to updates in multiple entailed knowledge triplets. (2) Completeness. Beyond updating factual knowledge, event-level editing also requires considering the event influences and updating LLMs' knowledge about future trends. We construct a high-quality event-level editing benchmark ELKEN, consisting of 1,515 event edits, 6,449 questions about factual knowledge, and 10,150 questions about future tendencies. We systematically evaluate the performance of various knowledge editing methods and LLMs on this benchmark. We find that ELKEN poses significant challenges to existing knowledge editing approaches. Our codes and dataset are publicly released to facilitate further research.
MF-MOS: A Motion-Focused Model for Moving Object Segmentation
Jintao Cheng, Kang Zeng, Zhuoxu Huang, Xiaoyu Tang, Jin Wu, Chengxi Zhang, Xieyuanli Chen, Rui Fan
Jan 31 2024 cs.CV arXiv:2401.17023v1

@misc{2401.17023, author = {Jintao Cheng and Kang Zeng and Zhuoxu Huang and Xiaoyu Tang and Jin Wu and Chengxi Zhang and Xieyuanli Chen and Rui Fan}, title = {{MF}-{MOS}: {A} {M}otion-{F}ocused {M}odel for {M}oving {O}bject {S}egmentation}, year = {2024}, eprint = {2401.17023}, note = {arXiv:2401.17023v1} }
PDF
Moving object segmentation (MOS) provides a reliable solution for detecting traffic participants and thus is of great interest in the autonomous driving field. Dynamic capture is always critical in the MOS problem. Previous methods capture motion features from the range images directly. Differently, we argue that the residual maps provide greater potential for motion information, while range images contain rich semantic guidance. Based on this intuition, we propose MF-MOS, a novel motion-focused model with a dual-branch structure for LiDAR moving object segmentation. Novelly, we decouple the spatial-temporal information by capturing the motion from residual maps and generating semantic features from range images, which are used as movable object guidance for the motion branch. Our straightforward yet distinctive solution can make the most use of both range images and residual maps, thus greatly improving the performance of the LiDAR-based MOS task. Remarkably, our MF-MOS achieved a leading IoU of 76.7% on the MOS leaderboard of the SemanticKITTI dataset upon submission, demonstrating the current state-of-the-art performance. The implementation of our MF-MOS has been released at https://github.com/SCNU-RISLAB/MF-MOS.
MorphGrower: A Synchronized Layer-by-layer Growing Approach for Plausible Neuronal Morphology Generation
Nianzu Yang, Kaipeng Zeng, Haotian Lu, Yexin Wu, Zexin Yuan, Danni Chen, Shengdian Jiang, Jiaxiang Wu, Yimin Wang, Junchi Yan
Jan 19 2024 q-bio.NC cs.LG cs.NE arXiv:2401.09500v3

@misc{2401.09500, author = {Nianzu Yang and Kaipeng Zeng and Haotian Lu and Yexin Wu and Zexin Yuan and Danni Chen and Shengdian Jiang and Jiaxiang Wu and Yimin Wang and Junchi Yan}, title = {{M}orph{G}rower: {A} {S}ynchronized {L}ayer-by-layer {G}rowing {A}pproach for {P}lausible {N}euronal {M}orphology {G}eneration}, year = {2024}, eprint = {2401.09500}, note = {arXiv:2401.09500v3} }
PDF
Neuronal morphology is essential for studying brain functioning and understanding neurodegenerative disorders. As acquiring real-world morphology data is expensive, computational approaches for morphology generation have been studied. Traditional methods heavily rely on expert-set rules and parameter tuning, making it difficult to generalize across different types of morphologies. Recently, MorphVAE was introduced as the sole learning-based method, but its generated morphologies lack plausibility, i.e., they do not appear realistic enough and most of the generated samples are topologically invalid. To fill this gap, this paper proposes MorphGrower, which mimicks the neuron natural growth mechanism for generation. Specifically, MorphGrower generates morphologies layer by layer, with each subsequent layer conditioned on the previously generated structure. During each layer generation, MorphGrower utilizes a pair of sibling branches as the basic generation block and generates branch pairs synchronously. This approach ensures topological validity and allows for fine-grained generation, thereby enhancing the realism of the final generated morphologies. Results on four real-world datasets demonstrate that MorphGrower outperforms MorphVAE by a notable margin. Importantly, the electrophysiological response simulation demonstrates the plausibility of our generated samples from a neuroscience perspective. Our code is available at https://github.com/Thinklab-SJTU/MorphGrower.
Seeing the Unseen: Visual Common Sense for Semantic Placement
Ram Ramrakhya, Aniruddha Kembhavi, Dhruv Batra, Zsolt Kira, Kuo-Hao Zeng, Luca Weihs
Jan 17 2024 cs.CV arXiv:2401.07770v1

@misc{2401.07770, author = {Ram Ramrakhya and Aniruddha Kembhavi and Dhruv Batra and Zsolt Kira and Kuo-Hao Zeng and Luca Weihs}, title = {{S}eeing the {U}nseen: {V}isual {C}ommon {S}ense for {S}emantic {P}lacement}, year = {2024}, eprint = {2401.07770}, note = {arXiv:2401.07770v1} }
PDF
Computer vision tasks typically involve describing what is present in an image (e.g. classification, detection, segmentation, and captioning). We study a visual common sense task that requires understanding what is not present. Specifically, given an image (e.g. of a living room) and name of an object ("cushion"), a vision system is asked to predict semantically-meaningful regions (masks or bounding boxes) in the image where that object could be placed or is likely be placed by humans (e.g. on the sofa). We call this task: Semantic Placement (SP) and believe that such common-sense visual understanding is critical for assitive robots (tidying a house), and AR devices (automatically rendering an object in the user's space). Studying the invisible is hard. Datasets for image description are typically constructed by curating relevant images and asking humans to annotate the contents of the image; neither of those two steps are straightforward for objects not present in the image. We overcome this challenge by operating in the opposite direction: we start with an image of an object in context from web, and then remove that object from the image via inpainting. This automated pipeline converts unstructured web data into a dataset comprising pairs of images with/without the object. Using this, we collect a novel dataset, with ${\sim}1.3$M images across $9$ object categories, and train a SP prediction model called CLIP-UNet. CLIP-UNet outperforms existing VLMs and baselines that combine semantic priors with object detectors on real-world and simulated images. In our user studies, we find that the SP masks predicted by CLIP-UNet are favored $43.7\%$ and $31.3\%$ times when comparing against the $4$ SP baselines on real and simulated images. In addition, we demonstrate leveraging SP mask predictions from CLIP-UNet enables downstream applications like building tidying robots in indoor environments.
SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Kiana Ehsani, Tanmay Gupta, Rose Hendrix, Jordi Salvador, Luca Weihs, Kuo-Hao Zeng, Kunal Pratap Singh, Yejin Kim, Winson Han, Alvaro Herrasti, Ranjay Krishna, Dustin Schwenk, Eli VanderBilt, Aniruddha Kembhavi
Dec 06 2023 cs.RO cs.AI cs.CV arXiv:2312.02976v2

@misc{2312.02976, author = {Kiana Ehsani and Tanmay Gupta and Rose Hendrix and Jordi Salvador and Luca Weihs and Kuo-Hao Zeng and Kunal Pratap Singh and Yejin Kim and Winson Han and Alvaro Herrasti and Ranjay Krishna and Dustin Schwenk and Eli VanderBilt and Aniruddha Kembhavi}, title = {{SPOC}: {I}mitating {S}hortest {P}aths in {S}imulation {E}nables {E}ffective {N}avigation and {M}anipulation in the {R}eal {W}orld}, year = {2023}, eprint = {2312.02976}, note = {arXiv:2312.02976v2} }
PDF
Reinforcement learning (RL) with dense rewards and imitation learning (IL) with human-generated trajectories are the most widely used approaches for training modern embodied agents. RL requires extensive reward shaping and auxiliary losses and is often too slow and ineffective for long-horizon tasks. While IL with human supervision is effective, collecting human trajectories at scale is extremely expensive. In this work, we show that imitating shortest-path planners in simulation produces agents that, given a language instruction, can proficiently navigate, explore, and manipulate objects in both simulation and in the real world using only RGB sensors (no depth map or GPS coordinates). This surprising result is enabled by our end-to-end, transformer-based, SPOC architecture, powerful visual encoders paired with extensive image augmentation, and the dramatic scale and diversity of our training data: millions of frames of shortest-path-expert trajectories collected inside approximately 200,000 procedurally generated houses containing 40,000 unique 3D assets. Our models, data, training code, and newly proposed 10-task benchmarking suite CHORES are available in https://spoc-robot.github.io.
Turning Your Strength into Watermark: Watermarking Large Language Model via Knowledge Injection
Shuai Li, Kejiang Chen, Kunsheng Tang, Jie Zhang, Weiming Zhang, Nenghai Yu, Kai Zeng
Nov 17 2023 cs.CR arXiv:2311.09535v3

@misc{2311.09535, author = {Shuai Li and Kejiang Chen and Kunsheng Tang and Jie Zhang and Weiming Zhang and Nenghai Yu and Kai Zeng}, title = {{T}urning {Y}our {S}trength into {W}atermark: {W}atermarking {L}arge {L}anguage {M}odel via {K}nowledge {I}njection}, year = {2023}, eprint = {2311.09535}, note = {arXiv:2311.09535v3} }
PDF
Large language models (LLMs) have demonstrated outstanding performance, making them valuable digital assets with significant commercial potential. Unfortunately, the LLM and its API are susceptible to intellectual property theft. Watermarking is a classic solution for copyright verification. However, most recent emerging LLM watermarking methods focus on identifying AI-generated texts rather than watermarking LLM itself. Only a few attempts are based on weight quantification and backdoor watermarking, which are not robust or covert enough, limiting their applicability in practice. To address this issue, we propose a novel watermarking method for LLMs based on knowledge injection and innovatively use knowledge as the watermark carrier. Specifically, in the watermark embedding stage, we first embed the watermarks into the selected knowledge to obtain the watermarked knowledge, subsequently injected into the to-be-protected LLM. In the watermark extraction stage, questions related to the watermarked knowledge are designed, for querying the suspect LLM and extracting the watermarks from its response. The experiments show that the watermark extraction success rate is close to 100% and demonstrate the effectiveness, fidelity, stealthiness, and robustness of our proposed method.
MAVEN-Arg: Completing the Puzzle of All-in-One Event Understanding Dataset with Event Argument Annotation
Xiaozhi Wang, Hao Peng, Yong Guan, Kaisheng Zeng, Jianhui Chen, Lei Hou, Xu Han, Yankai Lin, Zhiyuan Liu, Ruobing Xie, Jie Zhou, Juanzi Li
Nov 16 2023 cs.CL arXiv:2311.09105v2

@misc{2311.09105, author = {Xiaozhi Wang and Hao Peng and Yong Guan and Kaisheng Zeng and Jianhui Chen and Lei Hou and Xu Han and Yankai Lin and Zhiyuan Liu and Ruobing Xie and Jie Zhou and Juanzi Li}, title = {{MAVEN}-{A}rg: {C}ompleting the {P}uzzle of {A}ll-in-{O}ne {E}vent {U}nderstanding {D}ataset with {E}vent {A}rgument {A}nnotation}, year = {2023}, eprint = {2311.09105}, note = {arXiv:2311.09105v2} }
PDF
Understanding events in texts is a core objective of natural language understanding, which requires detecting event occurrences, extracting event arguments, and analyzing inter-event relationships. However, due to the annotation challenges brought by task complexity, a large-scale dataset covering the full process of event understanding has long been absent. In this paper, we introduce MAVEN-Arg, which augments MAVEN datasets with event argument annotations, making the first all-in-one dataset supporting event detection, event argument extraction (EAE), and event relation extraction. As an EAE benchmark, MAVEN-Arg offers three main advantages: (1) a comprehensive schema covering 162 event types and 612 argument roles, all with expert-written definitions and examples; (2) a large data scale, containing 98,591 events and 290,613 arguments obtained with laborious human annotation; (3) the exhaustive annotation supporting all task variants of EAE, which annotates both entity and non-entity event arguments in document level. Experiments indicate that MAVEN-Arg is quite challenging for both fine-tuned EAE models and proprietary large language models (LLMs). Furthermore, to demonstrate the benefits of an all-in-one dataset, we preliminarily explore a potential application, future event prediction, with LLMs. MAVEN-Arg and codes can be obtained from https://github.com/THU-KEG/MAVEN-Argument.
When does In-context Learning Fall Short and Why? A Study on Specification-Heavy Tasks
Hao Peng, Xiaozhi Wang, Jianhui Chen, Weikai Li, Yunjia Qi, Zimu Wang, Zhili Wu, Kaisheng Zeng, Bin Xu, Lei Hou, Juanzi Li
Nov 16 2023 cs.CL cs.AI arXiv:2311.08993v1

@misc{2311.08993, author = {Hao Peng and Xiaozhi Wang and Jianhui Chen and Weikai Li and Yunjia Qi and Zimu Wang and Zhili Wu and Kaisheng Zeng and Bin Xu and Lei Hou and Juanzi Li}, title = {{W}hen does {I}n-context {L}earning {F}all {S}hort and {W}hy? {A} {S}tudy on {S}pecification-{H}eavy {T}asks}, year = {2023}, eprint = {2311.08993}, note = {arXiv:2311.08993v1} }
PDF
In-context learning (ICL) has become the default method for using large language models (LLMs), making the exploration of its limitations and understanding the underlying causes crucial. In this paper, we find that ICL falls short of handling specification-heavy tasks, which are tasks with complicated and extensive task specifications, requiring several hours for ordinary humans to master, such as traditional information extraction tasks. The performance of ICL on these tasks mostly cannot reach half of the state-of-the-art results. To explore the reasons behind this failure, we conduct comprehensive experiments on 18 specification-heavy tasks with various LLMs and identify three primary reasons: inability to specifically understand context, misalignment in task schema comprehension with humans, and inadequate long-text understanding ability. Furthermore, we demonstrate that through fine-tuning, LLMs can achieve decent performance on these tasks, indicating that the failure of ICL is not an inherent flaw of LLMs, but rather a drawback of existing alignment methods that renders LLMs incapable of handling complicated specification-heavy tasks via ICL. To substantiate this, we perform dedicated instruction tuning on LLMs for these tasks and observe a notable improvement. We hope the analyses in this paper could facilitate advancements in alignment methods enabling LLMs to meet more sophisticated human demands.
FireMatch: A Semi-Supervised Video Fire Detection Network Based on Consistency and Distribution Alignment
Qinghua Lin, Zuoyong Li, Kun Zeng, Haoyi Fan, Wei Li, Xiaoguang Zhou
Nov 10 2023 cs.CV cs.AI arXiv:2311.05168v1

@misc{2311.05168, author = {Qinghua Lin and Zuoyong Li and Kun Zeng and Haoyi Fan and Wei Li and Xiaoguang Zhou}, title = {{F}ire{M}atch: {A} {S}emi-{S}upervised {V}ideo {F}ire {D}etection {N}etwork {B}ased on {C}onsistency and {D}istribution {A}lignment}, year = {2023}, eprint = {2311.05168}, note = {arXiv:2311.05168v1} }
PDF
Deep learning techniques have greatly enhanced the performance of fire detection in videos. However, video-based fire detection models heavily rely on labeled data, and the process of data labeling is particularly costly and time-consuming, especially when dealing with videos. Considering the limited quantity of labeled video data, we propose a semi-supervised fire detection model called FireMatch, which is based on consistency regularization and adversarial distribution alignment. Specifically, we first combine consistency regularization with pseudo-label. For unlabeled data, we design video data augmentation to obtain corresponding weakly augmented and strongly augmented samples. The proposed model predicts weakly augmented samples and retains pseudo-label above a threshold, while training on strongly augmented samples to predict these pseudo-labels for learning more robust feature representations. Secondly, we generate video cross-set augmented samples by adversarial distribution alignment to expand the training data and alleviate the decline in classification performance caused by insufficient labeled data. Finally, we introduce a fairness loss to help the model produce diverse predictions for input samples, thereby addressing the issue of high confidence with the non-fire class in fire classification scenarios. The FireMatch achieved an accuracy of 76.92% and 91.81% on two real-world fire datasets, respectively. The experimental results demonstrate that the proposed method outperforms the current state-of-the-art semi-supervised classification methods.
Selective Visual Representations Improve Convergence and Generalization for Embodied AI
Ainaz Eftekhar, Kuo-Hao Zeng, Jiafei Duan, Ali Farhadi, Ani Kembhavi, Ranjay Krishna
Nov 08 2023 cs.CV cs.AI arXiv:2311.04193v2

@misc{2311.04193, author = {Ainaz Eftekhar and Kuo-Hao Zeng and Jiafei Duan and Ali Farhadi and Ani Kembhavi and Ranjay Krishna}, title = {{S}elective {V}isual {R}epresentations {I}mprove {C}onvergence and {G}eneralization for {E}mbodied {AI}}, year = {2023}, eprint = {2311.04193}, note = {arXiv:2311.04193v2} }
PDF
Embodied AI models often employ off the shelf vision backbones like CLIP to encode their visual observations. Although such general purpose representations encode rich syntactic and semantic information about the scene, much of this information is often irrelevant to the specific task at hand. This introduces noise within the learning process and distracts the agent's focus from task-relevant visual cues. Inspired by selective attention in humans-the process through which people filter their perception based on their experiences, knowledge, and the task at hand-we introduce a parameter-efficient approach to filter visual stimuli for embodied AI. Our approach induces a task-conditioned bottleneck using a small learnable codebook module. This codebook is trained jointly to optimize task reward and acts as a task-conditioned selective filter over the visual observation. Our experiments showcase state-of-the-art performance for object goal navigation and object displacement across 5 benchmarks, ProcTHOR, ArchitecTHOR, RoboTHOR, AI2-iTHOR, and ManipulaTHOR. The filtered representations produced by the codebook are also able generalize better and converge faster when adapted to other simulation environments such as Habitat. Our qualitative analyses show that agents explore their environments more effectively and their representations retain task-relevant information like target object recognition while ignoring superfluous information about other objects. Code and pretrained models are available at our project website: https://embodied-codebook.github.io.
One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models
Hang Shao, Bei Liu, Bo Xiao, Ke Zeng, Guanglu Wan, Yanmin Qian
Oct 17 2023 cs.CL cs.AI arXiv:2310.09499v4

@misc{2310.09499, author = {Hang Shao and Bei Liu and Bo Xiao and Ke Zeng and Guanglu Wan and Yanmin Qian}, title = {{O}ne-{S}hot {S}ensitivity-{A}ware {M}ixed {S}parsity {P}runing for {L}arge {L}anguage {M}odels}, year = {2023}, eprint = {2310.09499}, note = {arXiv:2310.09499v4} }
PDF
Various Large Language Models~(LLMs) from the Generative Pretrained Transformer(GPT) family have achieved outstanding performances in a wide range of text generation tasks. However, the enormous model sizes have hindered their practical use in real-world applications due to high inference latency. Therefore, improving the efficiencies of LLMs through quantization, pruning, and other means has been a key issue in LLM studies. In this work, we propose a method based on Hessian sensitivity-aware mixed sparsity pruning to prune LLMs to at least 50% sparsity without the need of any retraining. It allocates sparsity adaptively based on sensitivity, allowing us to reduce pruning-induced error while maintaining the overall sparsity level. The advantages of the proposed method exhibit even more when the sparsity is extremely high. Furthermore, our method is compatible with quantization, enabling further compression of LLMs. We have released the available code.
Mastering the Task of Open Information Extraction with Large Language Models and Consistent Reasoning Environment
Ji Qi, Kaixuan Ji, Xiaozhi Wang, Jifan Yu, Kaisheng Zeng, Lei Hou, Juanzi Li, Bin Xu
Oct 17 2023 cs.CL arXiv:2310.10590v1

@misc{2310.10590, author = {Ji Qi and Kaixuan Ji and Xiaozhi Wang and Jifan Yu and Kaisheng Zeng and Lei Hou and Juanzi Li and Bin Xu}, title = {{M}astering the {T}ask of {O}pen {I}nformation {E}xtraction with {L}arge {L}anguage {M}odels and {C}onsistent {R}easoning {E}nvironment}, year = {2023}, eprint = {2310.10590}, note = {arXiv:2310.10590v1} }
PDF
Open Information Extraction (OIE) aims to extract objective structured knowledge from natural texts, which has attracted growing attention to build dedicated models with human experience. As the large language models (LLMs) have exhibited remarkable in-context learning capabilities, a question arises as to whether the task of OIE can be effectively tackled with this paradigm? In this paper, we explore solving the OIE problem by constructing an appropriate reasoning environment for LLMs. Specifically, we first propose a method to effectively estimate the discrepancy of syntactic distribution between a LLM and test samples, which can serve as correlation evidence for preparing positive demonstrations. Upon the evidence, we introduce a simple yet effective mechanism to establish the reasoning environment for LLMs on specific tasks. Without bells and whistles, experimental results on the standard CaRB benchmark demonstrate that our $6$-shot approach outperforms state-of-the-art supervised method, achieving an $55.3$ $F_1$ score. Further experiments on TACRED and ACE05 show that our method can naturally generalize to other information extraction tasks, resulting in improvements of $5.7$ and $6.8$ $F_1$ scores, respectively.
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, et al (272)
Oct 16 2023 cs.RO arXiv:2310.08864v8

@misc{2310.08864, author = {Open X-Embodiment Collaboration and Abby O'Neill and Abdul Rehman and Abhinav Gupta and Abhiram Maddukuri and Abhishek Gupta and Abhishek Padalkar and Abraham Lee and Acorn Pooley and Agrim Gupta and Ajay Mandlekar and Ajinkya Jain and Albert Tung and Alex Bewley and Alex Herzog and Alex Irpan and Alexander Khazatsky and Anant Rai and Anchit Gupta and Andrew Wang and Andrey Kolobov and Anikait Singh and Animesh Garg and Aniruddha Kembhavi and Annie Xie and Anthony Brohan and Antonin Raffin and Archit Sharma and Arefeh Yavary and Arhan Jain and Ashwin Balakrishna and Ayzaan Wahid and Ben Burgess-Limerick and Beomjoon Kim and Bernhard Schölkopf and Blake Wulfe and Brian Ichter and Cewu Lu and Charles Xu and Charlotte Le and Chelsea Finn and Chen Wang and Chenfeng Xu and Cheng Chi and Chenguang Huang and Christine Chan and Christopher Agia and Chuer Pan and Chuyuan Fu and Coline Devin and Danfei Xu and Daniel Morton and Danny Driess and Daphne Chen and Deepak Pathak and Dhruv Shah and Dieter Büchler and Dinesh Jayaraman and Dmitry Kalashnikov and Dorsa Sadigh and Edward Johns and Ethan Foster and Fangchen Liu and Federico Ceola and Fei Xia and Feiyu Zhao and Felipe Vieira Frujeri and Freek Stulp and Gaoyue Zhou and Gaurav S.~Sukhatme and Gautam Salhotra and Ge Yan and Gilbert Feng and Giulio Schiavi and Glen Berseth and Gregory Kahn and Guangwen Yang and Guanzhi Wang and Hao Su and Hao-Shu Fang and Haochen Shi and Henghui Bao and Heni Ben Amor and Henrik I Christensen and Hiroki Furuta and Homanga Bharadhwaj and Homer Walke and Hongjie Fang and Huy Ha and Igor Mordatch and Ilija Radosavovic and Isabel Leal and Jacky Liang and Jad Abou-Chakra and Jaehyung Kim and Jaimyn Drake and Jan Peters and Jan Schneider and Jasmine Hsu and Jay Vakil and Jeannette Bohg and Jeffrey Bingham and Jeffrey Wu and Jensen Gao and Jiaheng Hu and Jiajun Wu and Jialin Wu and Jiankai Sun and Jianlan Luo and Jiayuan Gu and Jie Tan and Jihoon Oh and Jimmy Wu and Jingpei Lu and Jingyun Yang and Jitendra Malik and João Silvério and Joey Hejna and Jonathan Booher and Jonathan Tompson and Jonathan Yang and Jordi Salvador and Joseph J.~Lim and Junhyek Han and Kaiyuan Wang and Kanishka Rao and Karl Pertsch and Karol Hausman and Keegan Go and Keerthana Gopalakrishnan and Ken Goldberg and Kendra Byrne and Kenneth Oslund and Kento Kawaharazuka and Kevin Black and Kevin Lin and Kevin Zhang and Kiana Ehsani and Kiran Lekkala and Kirsty Ellis and Krishan Rana and Krishnan Srinivasan and Kuan Fang and Kunal Pratap Singh and Kuo-Hao Zeng and Kyle Hatch and Kyle Hsu and Laurent Itti and Lawrence Yunliang Chen and Lerrel Pinto and Li Fei-Fei and Liam Tan and Linxi "Jim" Fan and Lionel Ott and Lisa Lee and Luca Weihs and Magnum Chen and Marion Lepert and Marius Memmel and Masayoshi Tomizuka and Masha Itkina and Mateo Guaman Castro and Max Spero and Maximilian Du and Michael Ahn and Michael C.~Yip and Mingtong Zhang and Mingyu Ding and Minho Heo and Mohan Kumar Srirama and Mohit Sharma and Moo Jin Kim and Naoaki Kanazawa and Nicklas Hansen and Nicolas Heess and Nikhil J Joshi and Niko Suenderhauf and Ning Liu and Norman Di Palo and Nur Muhammad Mahi Shafiullah and Oier Mees and Oliver Kroemer and Osbert Bastani and Pannag R Sanketi and Patrick "Tree" Miller and Patrick Yin and Paul Wohlhart and Peng Xu and Peter David Fagan and Peter Mitrano and Pierre Sermanet and Pieter Abbeel and Priya Sundaresan and Qiuyu Chen and Quan Vuong and Rafael Rafailov and Ran Tian and Ria Doshi and Roberto Mart'in-Mart'in and Rohan Baijal and Rosario Scalise and Rose Hendrix and Roy Lin and Runjia Qian and Ruohan Zhang and Russell Mendonca and Rutav Shah and Ryan Hoque and Ryan Julian and Samuel Bustamante and Sean Kirmani and Sergey Levine and Shan Lin and Sherry Moore and Shikhar Bahl and Shivin Dass and Shubham Sonawani and Shubham Tulsiani and Shuran Song and Sichun Xu and Siddhant Haldar and Siddharth Karamcheti and Simeon Adebola and Simon Guist and Soroush Nasiriany and Stefan Schaal and Stefan Welker and Stephen Tian and Subramanian Ramamoorthy and Sudeep Dasari and Suneel Belkhale and Sungjae Park and Suraj Nair and Suvir Mirchandani and Takayuki Osa and Tanmay Gupta and Tatsuya Harada and Tatsuya Matsushima and Ted Xiao and Thomas Kollar and Tianhe Yu and Tianli Ding and Todor Davchev and Tony Z.~Zhao and Travis Armstrong and Trevor Darrell and Trinity Chung and Vidhi Jain and Vikash Kumar and Vincent Vanhoucke and Wei Zhan and Wenxuan Zhou and Wolfram Burgard and Xi Chen and Xiangyu Chen and Xiaolong Wang and Xinghao Zhu and Xinyang Geng and Xiyuan Liu and Xu Liangwei and Xuanlin Li and Yansong Pang and Yao Lu and Yecheng Jason Ma and Yejin Kim and Yevgen Chebotar and Yifan Zhou and Yifeng Zhu and Yilin Wu and Ying Xu and Yixuan Wang and Yonatan Bisk and Yongqiang Dou and Yoonyoung Cho and Youngwoon Lee and Yuchen Cui and Yue Cao and Yueh-Hua Wu and Yujin Tang and Yuke Zhu and Yunchu Zhang and Yunfan Jiang and Yunshuang Li and Yunzhu Li and Yusuke Iwasawa and Yutaka Matsuo and Zehan Ma and Zhuo Xu and Zichen Jeff Cui and Zichen Zhang and Zipeng Fu and Zipeng Lin}, title = {{O}pen {X}-{E}mbodiment: {R}obotic {L}earning {D}atasets and {RT}-{X} {M}odels}, year = {2023}, eprint = {2310.08864}, note = {arXiv:2310.08864v8} }
PDF
Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io.
Exploring Large Language Models for Multi-Modal Out-of-Distribution Detection
Yi Dai, Hao Lang, Kaisheng Zeng, Fei Huang, Yongbin Li
Oct 13 2023 cs.CL cs.CV arXiv:2310.08027v1

@misc{2310.08027, author = {Yi Dai and Hao Lang and Kaisheng Zeng and Fei Huang and Yongbin Li}, title = {{E}xploring {L}arge {L}anguage {M}odels for {M}ulti-{M}odal {O}ut-of-{D}istribution {D}etection}, year = {2023}, eprint = {2310.08027}, note = {arXiv:2310.08027v1} }
PDF
Out-of-distribution (OOD) detection is essential for reliable and trustworthy machine learning. Recent multi-modal OOD detection leverages textual information from in-distribution (ID) class names for visual OOD detection, yet it currently neglects the rich contextual information of ID classes. Large language models (LLMs) encode a wealth of world knowledge and can be prompted to generate descriptive features for each class. Indiscriminately using such knowledge causes catastrophic damage to OOD detection due to LLMs' hallucinations, as is observed by our analysis. In this paper, we propose to apply world knowledge to enhance OOD detection performance through selective generation from LLMs. Specifically, we introduce a consistency-based uncertainty calibration method to estimate the confidence score of each generation. We further extract visual objects from each image to fully capitalize on the aforementioned world knowledge. Extensive experiments demonstrate that our method consistently outperforms the state-of-the-art.
Advective Diffusion Transformers for Topological Generalization in Graph Learning
Qitian Wu, Chenxiao Yang, Kaipeng Zeng, Fan Nie, Michael Bronstein, Junchi Yan
Oct 11 2023 cs.LG cs.AI arXiv:2310.06417v1

@misc{2310.06417, author = {Qitian Wu and Chenxiao Yang and Kaipeng Zeng and Fan Nie and Michael Bronstein and Junchi Yan}, title = {{A}dvective {D}iffusion {T}ransformers for {T}opological {G}eneralization in {G}raph {L}earning}, year = {2023}, eprint = {2310.06417}, note = {arXiv:2310.06417v1} }
PDF
Graph diffusion equations are intimately related to graph neural networks (GNNs) and have recently attracted attention as a principled framework for analyzing GNN dynamics, formalizing their expressive power, and justifying architectural choices. One key open questions in graph learning is the generalization capabilities of GNNs. A major limitation of current approaches hinges on the assumption that the graph topologies in the training and test sets come from the same distribution. In this paper, we make steps towards understanding the generalization of GNNs by exploring how graph diffusion equations extrapolate and generalize in the presence of varying graph topologies. We first show deficiencies in the generalization capability of existing models built upon local diffusion on graphs, stemming from the exponential sensitivity to topology variation. Our subsequent analysis reveals the promise of non-local diffusion, which advocates for feature propagation over fully-connected latent graphs, under the assumption of a specific data-generating condition. In addition to these findings, we propose a novel graph encoder backbone, Advective Diffusion Transformer (ADiT), inspired by advective graph diffusion equations that have a closed-form solution backed up with theoretical guarantees of desired generalization under topological distribution shifts. The new model, functioning as a versatile graph Transformer, demonstrates superior performance across a wide range of graph learning tasks.
A Task-oriented Dialog Model with Task-progressive and Policy-aware Pre-training
Lucen Zhong, Hengtong Lu, Caixia Yuan, Xiaojie Wang, Jiashen Sun, Ke Zeng, Guanglu Wan
Oct 03 2023 cs.CL arXiv:2310.00597v1

@misc{2310.00597, author = {Lucen Zhong and Hengtong Lu and Caixia Yuan and Xiaojie Wang and Jiashen Sun and Ke Zeng and Guanglu Wan}, title = {{A} {T}ask-oriented {D}ialog {M}odel with {T}ask-progressive and {P}olicy-aware {P}re-training}, year = {2023}, eprint = {2310.00597}, note = {arXiv:2310.00597v1} }
PDF
Pre-trained conversation models (PCMs) have achieved promising progress in recent years. However, existing PCMs for Task-oriented dialog (TOD) are insufficient for capturing the sequential nature of the TOD-related tasks, as well as for learning dialog policy information. To alleviate these problems, this paper proposes a task-progressive PCM with two policy-aware pre-training tasks. The model is pre-trained through three stages where TOD-related tasks are progressively employed according to the task logic of the TOD system. A global policy consistency task is designed to capture the multi-turn dialog policy sequential relation, and an act-based contrastive learning task is designed to capture similarities among samples with the same dialog policy. Our model achieves better results on both MultiWOZ and In-Car end-to-end dialog modeling benchmarks with only 18\% parameters and 25\% pre-training data compared to the previous state-of-the-art PCM, GALAXY.
OmniEvent: A Comprehensive, Fair, and Easy-to-Use Toolkit for Event Understanding
Hao Peng, Xiaozhi Wang, Feng Yao, Zimu Wang, Chuzhao Zhu, Kaisheng Zeng, Lei Hou, Juanzi Li
Sep 27 2023 cs.CL cs.AI arXiv:2309.14258v1

@misc{2309.14258, author = {Hao Peng and Xiaozhi Wang and Feng Yao and Zimu Wang and Chuzhao Zhu and Kaisheng Zeng and Lei Hou and Juanzi Li}, title = {{O}mni{E}vent: {A} {C}omprehensive, {F}air, and {E}asy-to-{U}se {T}oolkit for {E}vent {U}nderstanding}, year = {2023}, eprint = {2309.14258}, note = {arXiv:2309.14258v1} }
PDF
Event understanding aims at understanding the content and relationship of events within texts, which covers multiple complicated information extraction tasks: event detection, event argument extraction, and event relation extraction. To facilitate related research and application, we present an event understanding toolkit OmniEvent, which features three desiderata: (1) Comprehensive. OmniEvent supports mainstream modeling paradigms of all the event understanding tasks and the processing of 15 widely-used English and Chinese datasets. (2) Fair. OmniEvent carefully handles the inconspicuous evaluation pitfalls reported in Peng et al. (2023), which ensures fair comparisons between different models. (3) Easy-to-use. OmniEvent is designed to be easily used by users with varying needs. We provide off-the-shelf models that can be directly deployed as web services. The modular framework also enables users to easily implement and evaluate new event understanding models with OmniEvent. The toolkit (https://github.com/THU-KEG/OmniEvent) is publicly released along with the demonstration website and video (https://omnievent.xlore.cn/).
Location Privacy and Spectrum Efficiency Enhancement in Spectrum Sharing Systems
Long Jiao, Yao Ge, Kai Zeng, B.C. Hilburn
Aug 29 2023 cs.NI arXiv:2308.13884v1

@misc{2308.13884, author = {Long Jiao and Yao Ge and Kai Zeng and B.C.~Hilburn}, title = {{L}ocation {P}rivacy and {S}pectrum {E}fficiency {E}nhancement in {S}pectrum {S}haring {S}ystems}, year = {2023}, eprint = {2308.13884}, note = {arXiv:2308.13884v1} }
PDF
In this work, we investigate the benefits of secondary user (SU) network beamforming on improving primary user (PU) location privacy in spectrum sharing systems, where the beamformer in the SU network is designed to suppress the aggregate interference to improve the location privacy of PUs. We consider two problems: improving SU network communication throughput subject to the specified PU location privacy requirements, and enhancing PU location privacy given the quality of service (QoS) requirements of SU networks. In the first problem, we provide an algorithm to achieve high data rates with the constrained PU location privacy level. Numerical results show that for a given PU location privacy requirement, the proposed scheme is able to interfere/exclude only a few SU nodes from the PU band and the network throughput can be greatly improved. In the second problem, to fully explore the potential of SU network beamforming for enhancing PU location privacy, we propose a two-step scheme to decouple the beamforming and privacy zone design so that the PU location privacy can be improved while satisfying the SU network throughput requirement. According to numerical evaluations, the proposed scheme can maintain/achieve higher PU location privacy than the benchmark beamforming schemes while satisfying a QoS requirement for the SU network.
KoLA: Carefully Benchmarking World Knowledge of Large Language Models
Jifan Yu, Xiaozhi Wang, Shangqing Tu, Shulin Cao, Daniel Zhang-Li, Xin Lv, Hao Peng, Zijun Yao, Xiaohan Zhang, Hanming Li, Chunyang Li, Zheyuan Zhang, Yushi Bai, Yantao Liu, Amy Xin, Nianyi Lin, Kaifeng Yun, Linlu Gong, Jianhui Chen, Zhili Wu, et al (15)
Jun 16 2023 cs.CL arXiv:2306.09296v3

@misc{2306.09296, author = {Jifan Yu and Xiaozhi Wang and Shangqing Tu and Shulin Cao and Daniel Zhang-Li and Xin Lv and Hao Peng and Zijun Yao and Xiaohan Zhang and Hanming Li and Chunyang Li and Zheyuan Zhang and Yushi Bai and Yantao Liu and Amy Xin and Nianyi Lin and Kaifeng Yun and Linlu Gong and Jianhui Chen and Zhili Wu and Yunjia Qi and Weikai Li and Yong Guan and Kaisheng Zeng and Ji Qi and Hailong Jin and Jinxin Liu and Yu Gu and Yuan Yao and Ning Ding and Lei Hou and Zhiyuan Liu and Bin Xu and Jie Tang and Juanzi Li}, title = {{K}o{LA}: {C}arefully {B}enchmarking {W}orld {K}nowledge of {L}arge {L}anguage {M}odels}, year = {2023}, eprint = {2306.09296}, note = {arXiv:2306.09296v3} }
PDF
The unprecedented performance of large language models (LLMs) necessitates improvements in evaluations. Rather than merely exploring the breadth of LLM abilities, we believe meticulous and thoughtful designs are essential to thorough, unbiased, and applicable evaluations. Given the importance of world knowledge to LLMs, we construct a Knowledge-oriented LLM Assessment benchmark (KoLA), in which we carefully design three crucial factors: (1) For \textbfability modeling, we mimic human cognition to form a four-level taxonomy of knowledge-related abilities, covering $19$ tasks. (2) For \textbfdata, to ensure fair comparisons, we use both Wikipedia, a corpus prevalently pre-trained by LLMs, along with continuously collected emerging corpora, aiming to evaluate the capacity to handle unseen data and evolving knowledge. (3) For \textbfevaluation criteria, we adopt a contrastive system, including overall standard scores for better numerical comparability across tasks and models and a unique self-contrast metric for automatically evaluating knowledge-creating ability. We evaluate $28$ open-source and commercial LLMs and obtain some intriguing findings. The KoLA dataset and open-participation leaderboard are publicly released at https://kola.xlore.cn and will be continuously updated to provide references for developing LLMs and knowledge-related systems.
The Devil is in the Details: On the Pitfalls of Event Extraction Evaluation
Hao Peng, Xiaozhi Wang, Feng Yao, Kaisheng Zeng, Lei Hou, Juanzi Li, Zhiyuan Liu, Weixing Shen
Jun 13 2023 cs.CL cs.AI arXiv:2306.06918v2

@misc{2306.06918, author = {Hao Peng and Xiaozhi Wang and Feng Yao and Kaisheng Zeng and Lei Hou and Juanzi Li and Zhiyuan Liu and Weixing Shen}, title = {{T}he {D}evil is in the {D}etails: {O}n the {P}itfalls of {E}vent {E}xtraction {E}valuation}, year = {2023}, eprint = {2306.06918}, note = {arXiv:2306.06918v2} }
PDF
Event extraction (EE) is a crucial task aiming at extracting events from texts, which includes two subtasks: event detection (ED) and event argument extraction (EAE). In this paper, we check the reliability of EE evaluations and identify three major pitfalls: (1) The data preprocessing discrepancy makes the evaluation results on the same dataset not directly comparable, but the data preprocessing details are not widely noted and specified in papers. (2) The output space discrepancy of different model paradigms makes different-paradigm EE models lack grounds for comparison and also leads to unclear mapping issues between predictions and annotations. (3) The absence of pipeline evaluation of many EAE-only works makes them hard to be directly compared with EE works and may not well reflect the model performance in real-world pipeline scenarios. We demonstrate the significant influence of these pitfalls through comprehensive meta-analyses of recent papers and empirical experiments. To avoid these pitfalls, we suggest a series of remedies, including specifying data preprocessing, standardizing outputs, and providing pipeline evaluation results. To help implement these remedies, we develop a consistent evaluation framework OMNIEVENT, which can be obtained from https://github.com/THU-KEG/OmniEvent.
Benchmarking Foundation Models with Language-Model-as-an-Examiner
Yushi Bai, Jiahao Ying, Yixin Cao, Xin Lv, Yuze He, Xiaozhi Wang, Jifan Yu, Kaisheng Zeng, Yijia Xiao, Haozhe Lyu, Jiayin Zhang, Juanzi Li, Lei Hou
Jun 08 2023 cs.CL cs.LG arXiv:2306.04181v2

@misc{2306.04181, author = {Yushi Bai and Jiahao Ying and Yixin Cao and Xin Lv and Yuze He and Xiaozhi Wang and Jifan Yu and Kaisheng Zeng and Yijia Xiao and Haozhe Lyu and Jiayin Zhang and Juanzi Li and Lei Hou}, title = {{B}enchmarking {F}oundation {M}odels with {L}anguage-{M}odel-as-an-{E}xaminer}, year = {2023}, eprint = {2306.04181}, note = {arXiv:2306.04181v2} }
PDF
Numerous benchmarks have been established to assess the performance of foundation models on open-ended question answering, which serves as a comprehensive test of a model's ability to understand and generate language in a manner similar to humans. Most of these works focus on proposing new datasets, however, we see two main issues within previous benchmarking pipelines, namely testing leakage and evaluation automation. In this paper, we propose a novel benchmarking framework, Language-Model-as-an-Examiner, where the LM serves as a knowledgeable examiner that formulates questions based on its knowledge and evaluates responses in a reference-free manner. Our framework allows for effortless extensibility as various LMs can be adopted as the examiner, and the questions can be constantly updated given more diverse trigger topics. For a more comprehensive and equitable evaluation, we devise three strategies: (1) We instruct the LM examiner to generate questions across a multitude of domains to probe for a broad acquisition, and raise follow-up questions to engage in a more in-depth assessment. (2) Upon evaluation, the examiner combines both scoring and ranking measurements, providing a reliable result as it aligns closely with human annotations. (3) We additionally propose a decentralized Peer-examination method to address the biases in a single examiner. Our data and benchmarking results are available at: http://lmexam.xlore.cn.
Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction
Ji Qi, Chuchun Zhang, Xiaozhi Wang, Kaisheng Zeng, Jifan Yu, Jinxin Liu, Jiuding Sun, Yuxiang Chen, Lei Hou, Juanzi Li, Bin Xu
May 24 2023 cs.CL cs.AI arXiv:2305.13981v2

@misc{2305.13981, author = {Ji Qi and Chuchun Zhang and Xiaozhi Wang and Kaisheng Zeng and Jifan Yu and Jinxin Liu and Jiuding Sun and Yuxiang Chen and Lei Hou and Juanzi Li and Bin Xu}, title = {{P}reserving {K}nowledge {I}nvariance: {R}ethinking {R}obustness {E}valuation of {O}pen {I}nformation {E}xtraction}, year = {2023}, eprint = {2305.13981}, note = {arXiv:2305.13981v2} }
PDF
The robustness to distribution changes ensures that NLP models can be successfully applied in the realistic world, especially for information extraction tasks. However, most prior evaluation benchmarks have been devoted to validating pairwise matching correctness, ignoring the crucial measurement of robustness. In this paper, we present the first benchmark that simulates the evaluation of open information extraction models in the real world, where the syntactic and expressive distributions under the same knowledge meaning may drift variously. We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique that consists of sentences with structured knowledge of the same meaning but with different syntactic and expressive forms. By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques. We perform experiments on typical models published in the last decade as well as a popular large language model, the results show that the existing successful models exhibit a frustrating degradation, with a maximum drop of 23.43 F1 score. Our resources and code are available at https://github.com/qijimrc/ROBUST.
Quiver: Supporting GPUs for Low-Latency, High-Throughput GNN Serving with Workload Awareness
Zeyuan Tan, Xiulong Yuan, Congjie He, Man-Kit Sit, Guo Li, Xiaoze Liu, Baole Ai, Kai Zeng, Peter Pietzuch, Luo Mai
May 19 2023 cs.DC cs.AI cs.LG cs.OS arXiv:2305.10863v1

@misc{2305.10863, author = {Zeyuan Tan and Xiulong Yuan and Congjie He and Man-Kit Sit and Guo Li and Xiaoze Liu and Baole Ai and Kai Zeng and Peter Pietzuch and Luo Mai}, title = {{Q}uiver: {S}upporting {GPU}s for {L}ow-{L}atency, {H}igh-{T}hroughput {GNN} {S}erving with {W}orkload {A}wareness}, year = {2023}, eprint = {2305.10863}, note = {arXiv:2305.10863v1} }
PDF
Systems for serving inference requests on graph neural networks (GNN) must combine low latency with high throughout, but they face irregular computation due to skew in the number of sampled graph nodes and aggregated GNN features. This makes it challenging to exploit GPUs effectively: using GPUs to sample only a few graph nodes yields lower performance than CPU-based sampling; and aggregating many features exhibits high data movement costs between GPUs and CPUs. Therefore, current GNN serving systems use CPUs for graph sampling and feature aggregation, limiting throughput. We describe Quiver, a distributed GPU-based GNN serving system with low-latency and high-throughput. Quiver's key idea is to exploit workload metrics for predicting the irregular computation of GNN requests, and governing the use of GPUs for graph sampling and feature aggregation: (1) for graph sampling, Quiver calculates the probabilistic sampled graph size, a metric that predicts the degree of parallelism in graph sampling. Quiver uses this metric to assign sampling tasks to GPUs only when the performance gains surpass CPU-based sampling; and (2) for feature aggregation, Quiver relies on the feature access probability to decide which features to partition and replicate across a distributed GPU NUMA topology. We show that Quiver achieves up to 35 times lower latency with an 8 times higher throughput compared to state-of-the-art GNN approaches (DGL and PyG).
Autoencoders for discovering manifold dimension and coordinates in data from complex dynamical systems
Kevin Zeng, Carlos E. Pérez De Jesús, Andrew J. Fox, Michael D. Graham
May 03 2023 cs.LG nlin.CD arXiv:2305.01090v3

@misc{2305.01090, author = {Kevin Zeng and Carlos E.~Pérez De Jesús and Andrew J.~Fox and Michael D.~Graham}, title = {{A}utoencoders for discovering manifold dimension and coordinates in data from complex dynamical systems}, year = {2023}, eprint = {2305.01090}, note = {arXiv:2305.01090v3} }
PDF
While many phenomena in physics and engineering are formally high-dimensional, their long-time dynamics often live on a lower-dimensional manifold. The present work introduces an autoencoder framework that combines implicit regularization with internal linear layers and $L_2$ regularization (weight decay) to automatically estimate the underlying dimensionality of a data set, produce an orthogonal manifold coordinate system, and provide the mapping functions between the ambient space and manifold space, allowing for out-of-sample projections. We validate our framework's ability to estimate the manifold dimension for a series of datasets from dynamical systems of varying complexities and compare to other state-of-the-art estimators. We analyze the training dynamics of the network to glean insight into the mechanism of low-rank learning and find that collectively each of the implicit regularizing layers compound the low-rank representation and even self-correct during training. Analysis of gradient descent dynamics for this architecture in the linear case reveals the role of the internal linear layers in leading to faster decay of a "collective weight variable" incorporating all layers, and the role of weight decay in breaking degeneracies and thus driving convergence along directions in which no decay would occur in its absence. We show that this framework can be naturally extended for applications of state-space modeling and forecasting by generating a data-driven dynamic model of a spatiotemporally chaotic partial differential equation using only the manifold coordinates. Finally, we demonstrate that our framework is robust to hyperparameter choices.
Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics
Kuo-Hao Zeng, Luca Weihs, Roozbeh Mottaghi, Ali Farhadi
Apr 25 2023 cs.CV cs.AI cs.RO arXiv:2304.12289v1

@misc{2304.12289, author = {Kuo-Hao Zeng and Luca Weihs and Roozbeh Mottaghi and Ali Farhadi}, title = {{M}oving {F}orward by {M}oving {B}ackward: {E}mbedding {A}ction {I}mpact over {A}ction {S}emantics}, year = {2023}, eprint = {2304.12289}, note = {arXiv:2304.12289v1} }
PDF
A common assumption when training embodied agents is that the impact of taking an action is stable; for instance, executing the "move ahead" action will always move the agent forward by a fixed distance, perhaps with some small amount of actuator-induced noise. This assumption is limiting; an agent may encounter settings that dramatically alter the impact of actions: a move ahead action on a wet floor may send the agent twice as far as it expects and using the same action with a broken wheel might transform the expected translation into a rotation. Instead of relying that the impact of an action stably reflects its pre-defined semantic meaning, we propose to model the impact of actions on-the-fly using latent embeddings. By combining these latent action embeddings with a novel, transformer-based, policy head, we design an Action Adaptive Policy (AAP). We evaluate our AAP on two challenging visual navigation tasks in the AI2-THOR and Habitat environments and show that our AAP is highly performant even when faced, at inference-time with missing actions and, previously unseen, perturbed action space. Moreover, we observe significant improvement in robustness against these actions when evaluating in real-world scenarios.
Turbulence control in plane Couette flow using low-dimensional neural ODE-based models and deep reinforcement learning
Alec J. Linot, Kevin Zeng, Michael D. Graham
Jan 31 2023 physics.flu-dyn cs.LG arXiv:2301.12098v1

@misc{2301.12098, author = {Alec J.~Linot and Kevin Zeng and Michael D.~Graham}, title = {{T}urbulence control in plane {C}ouette flow using low-dimensional neural {ODE}-based models and deep reinforcement learning}, year = {2023}, eprint = {2301.12098}, note = {arXiv:2301.12098v1} }
PDF
The high dimensionality and complex dynamics of turbulent flows remain an obstacle to the discovery and implementation of control strategies. Deep reinforcement learning (RL) is a promising avenue for overcoming these obstacles, but requires a training phase in which the RL agent iteratively interacts with the flow environment to learn a control policy, which can be prohibitively expensive when the environment involves slow experiments or large-scale simulations. We overcome this challenge using a framework we call "DManD-RL" (data-driven manifold dynamics-RL), which generates a data-driven low-dimensional model of our system that we use for RL training. With this approach, we seek to minimize drag in a direct numerical simulation (DNS) of a turbulent minimal flow unit of plane Couette flow at Re=400 using two slot jets on one wall. We obtain, from DNS data with $\mathcal{O}(10^5)$ degrees of freedom, a 25-dimensional DManD model of the dynamics by combining an autoencoder and neural ordinary differential equation. Using this model as the environment, we train an RL control agent, yielding a 440-fold speedup over training on the DNS, with equivalent control performance. The agent learns a policy that laminarizes 84% of unseen DNS test trajectories within 900 time units, significantly outperforming classical opposition control (58%), despite the actuation authority being much more restricted. The agent often achieves laminarization through a counterintuitive strategy that drives the formation of two low-speed streaks, with a spanwise wavelength that is too small to be self-sustaining. The agent demonstrates the same performance when we limit observations to wall shear rate.
Khaos: The Impact of Inter-procedural Code Obfuscation on Binary Diffing Techniques
Peihua Zhang, Chenggang Wu, Mingfan Peng, Kai Zeng, Ding Yu, Yuanming Lai, Yan Kang, Wei Wang, Zhe Wang
Jan 30 2023 cs.CR arXiv:2301.11586v1

@misc{2301.11586, author = {Peihua Zhang and Chenggang Wu and Mingfan Peng and Kai Zeng and Ding Yu and Yuanming Lai and Yan Kang and Wei Wang and Zhe Wang}, title = {{K}haos: {T}he {I}mpact of {I}nter-procedural {C}ode {O}bfuscation on {B}inary {D}iffing {T}echniques}, year = {2023}, eprint = {2301.11586}, note = {arXiv:2301.11586v1} }
PDF
Software obfuscation techniques can prevent binary diffing techniques from locating vulnerable code by obfuscating the third-party code, to achieve the purpose of protecting embedded device software. With the rapid development of binary diffing techniques, they can achieve more and more accurate function matching and identification by extracting the features within the function. This makes existing software obfuscation techniques, which mainly focus on the intra-procedural code obfuscation, no longer effective. In this paper, we propose a new inter-procedural code obfuscation mechanism Khaos, which moves the code across functions to obfuscate the function by using compilation optimizations. Two obfuscation primitives are proposed to separate and aggregate the function, which are called fission and fusion respectively. A prototype of Khaos is implemented based on the LLVM compiler and evaluated on a large number of real-world programs including SPEC CPU 2006 & 2017, CoreUtils, JavaScript engines, etc. Experimental results show that Khaos outperforms existing code obfuscations and can significantly reduce the accuracy rates of five state-of-the-art binary diffing techniques (less than 19%) with lower runtime overhead (less than 7%).
Step out of KG: Knowledge Graph Completion via Knowledgeable Retrieval and Reading Comprehension
Xin Lv, Yankai Lin, Zijun Yao, Kaisheng Zeng, Jiajie Zhang, Lei Hou, Juanzi Li
Oct 13 2022 cs.CL arXiv:2210.05921v1

@misc{2210.05921, author = {Xin Lv and Yankai Lin and Zijun Yao and Kaisheng Zeng and Jiajie Zhang and Lei Hou and Juanzi Li}, title = {{S}tep out of {KG}: {K}nowledge {G}raph {C}ompletion via {K}nowledgeable {R}etrieval and {R}eading {C}omprehension}, year = {2022}, eprint = {2210.05921}, note = {arXiv:2210.05921v1} }
PDF
Knowledge graphs, as the cornerstone of many AI applications, usually face serious incompleteness problems. In recent years, there have been many efforts to study automatic knowledge graph completion (KGC), most of which use existing knowledge to infer new knowledge. However, in our experiments, we find that not all relations can be obtained by inference, which constrains the performance of existing models. To alleviate this problem, we propose a new model based on information retrieval and reading comprehension, namely IR4KGC. Specifically, we pre-train a knowledge-based information retrieval module that can retrieve documents related to the triples to be completed. Then, the retrieved documents are handed over to the reading comprehension module to generate the predicted answers. In experiments, we find that our model can well solve relations that cannot be inferred from existing knowledge, and achieve good results on KGC datasets.
ConstGCN: Constrained Transmission-based Graph Convolutional Networks for Document-level Relation Extraction
Ji Qi, Bin Xu, Kaisheng Zeng, Jinxin Liu, Jifan Yu, Qi Gao, Juanzi Li, Lei Hou
Oct 11 2022 cs.CL arXiv:2210.03949v1

@misc{2210.03949, author = {Ji Qi and Bin Xu and Kaisheng Zeng and Jinxin Liu and Jifan Yu and Qi Gao and Juanzi Li and Lei Hou}, title = {{C}onst{GCN}: {C}onstrained {T}ransmission-based {G}raph {C}onvolutional {N}etworks for {D}ocument-level {R}elation {E}xtraction}, year = {2022}, eprint = {2210.03949}, note = {arXiv:2210.03949v1} }
PDF
Document-level relation extraction with graph neural networks faces a fundamental graph construction gap between training and inference - the golden graph structure only available during training, which causes that most methods adopt heuristic or syntactic rules to construct a prior graph as a pseudo proxy. In this paper, we propose $\textbf{ConstGCN}$, a novel graph convolutional network which performs knowledge-based information propagation between entities along with all specific relation spaces without any prior graph construction. Specifically, it updates the entity representation by aggregating information from all other entities along with each relation space, thus modeling the relation-aware spatial information. To control the information flow passing through the indeterminate relation spaces, we propose to constrain the propagation using transmitting scores learned from the Noise Contrastive Estimation between fact triples. Experimental results show that our method outperforms the previous state-of-the-art (SOTA) approaches on the DocRE dataset.
Fine-Grained Modeling and Optimization for Intelligent Resource Management in Big Data Processing
Chenghao Lyu, Qi Fan, Fei Song, Arnab Sinha, Yanlei Diao, Wei Chen, Li Ma, Yihui Feng, Yaliang Li, Kai Zeng, Jingren Zhou
Jul 06 2022 cs.DB cs.DC arXiv:2207.02026v2

@misc{2207.02026, author = {Chenghao Lyu and Qi Fan and Fei Song and Arnab Sinha and Yanlei Diao and Wei Chen and Li Ma and Yihui Feng and Yaliang Li and Kai Zeng and Jingren Zhou}, title = {{F}ine-{G}rained {M}odeling and {O}ptimization for {I}ntelligent {R}esource {M}anagement in {B}ig {D}ata {P}rocessing}, year = {2022}, eprint = {2207.02026}, note = {arXiv:2207.02026v2} }
PDF
Big data processing at the production scale presents a highly complex environment for resource optimization (RO), a problem crucial for meeting performance goals and budgetary constraints of analytical users. The RO problem is challenging because it involves a set of decisions (the partition count, placement of parallel instances on machines, and resource allocation to each instance), requires multi-objective optimization (MOO), and is compounded by the scale and complexity of big data systems while having to meet stringent time constraints for scheduling. This paper presents a MaxCompute-based integrated system to support multi-objective resource optimization via fine-grained instance-level modeling and optimization. We propose a new architecture that breaks RO into a series of simpler problems, new fine-grained predictive models, and novel optimization methods that exploit these models to make effective instance-level recommendations in a hierarchical MOO framework. Evaluation using production workloads shows that our new RO system could reduce 37-72% latency and 43-78% cost at the same time, compared to the current optimizer and scheduler, while running in 0.02-0.23s.
NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results
Yawei Li, Kai Zhang, Radu Timofte, Luc Van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, Jinshan Pan, et al (91)
May 12 2022 cs.CV eess.IV arXiv:2205.05675v1

@misc{2205.05675, author = {Yawei Li and Kai Zhang and Radu Timofte and Luc Van Gool and Fangyuan Kong and Mingxi Li and Songwei Liu and Zongcai Du and Ding Liu and Chenhui Zhou and Jingyi Chen and Qingrui Han and Zheyuan Li and Yingqi Liu and Xiangyu Chen and Haoming Cai and Yu Qiao and Chao Dong and Long Sun and Jinshan Pan and Yi Zhu and Zhikai Zong and Xiaoxiao Liu and Zheng Hui and Tao Yang and Peiran Ren and Xuansong Xie and Xian-Sheng Hua and Yanbo Wang and Xiaozhong Ji and Chuming Lin and Donghao Luo and Ying Tai and Chengjie Wang and Zhizhong Zhang and Yuan Xie and Shen Cheng and Ziwei Luo and Lei Yu and Zhihong Wen and Qi Wu1 and Youwei Li and Haoqiang Fan and Jian Sun and Shuaicheng Liu and Yuanfei Huang and Meiguang Jin and Hua Huang and Jing Liu and Xinjian Zhang and Yan Wang and Lingshun Long and Gen Li and Yuanfan Zhang and Zuowei Cao and Lei Sun and Panaetov Alexander and Yucong Wang and Minjie Cai and Li Wang and Lu Tian and Zheyuan Wang and Hongbing Ma and Jie Liu and Chao Chen and Yidong Cai and Jie Tang and Gangshan Wu and Weiran Wang and Shirui Huang and Honglei Lu and Huan Liu and Keyan Wang and Jun Chen and Shi Chen and Yuchun Miao and Zimo Huang and Lefei Zhang and Mustafa Ayazoğlu and Wei Xiong and Chengyi Xiong and Fei Wang and Hao Li and Ruimian Wen and Zhijing Yang and Wenbin Zou and Weixin Zheng and Tian Ye and Yuncheng Zhang and Xiangzhen Kong and Aditya Arora and Syed Waqas Zamir and Salman Khan and Munawar Hayat and Fahad Shahbaz Khan and Dandan Gaoand Dengwen Zhouand Qian Ning and Jingzhu Tang and Han Huang and Yufei Wang and Zhangheng Peng and Haobo Li and Wenxue Guan and Shenghua Gong and Xin Li and Jun Liu and Wanjun Wang and Dengwen Zhou and Kun Zeng and Hanjiang Lin and Xinyu Chen and Jinsheng Fang}, title = {{NTIRE} 2022 {C}hallenge on {E}fficient {S}uper-{R}esolution: {M}ethods and {R}esults}, year = {2022}, eprint = {2205.05675}, note = {arXiv:2205.05675v1} }
PDF
This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29.00dB on DIV2K validation set. IMDN is set as the baseline for efficiency measurement. The challenge had 3 tracks including the main track (runtime), sub-track one (model complexity), and sub-track two (overall performance). In the main track, the practical runtime performance of the submissions was evaluated. The rank of the teams were determined directly by the absolute value of the average runtime on the validation set and test set. In sub-track one, the number of parameters and FLOPs were considered. And the individual rankings of the two metrics were summed up to determine a final ranking in this track. In sub-track two, all of the five metrics mentioned in the description of the challenge including runtime, parameter count, FLOPs, activations, and memory consumption were considered. Similar to sub-track one, the rankings of five metrics were summed up to determine a final ranking. The challenge had 303 registered participants, and 43 teams made valid submissions. They gauge the state-of-the-art in efficient single image super-resolution.
Data-driven control of spatiotemporal chaos with reduced-order neural ODE-based models and reinforcement learning
Kevin Zeng, Alec J. Linot, Michael D. Graham
May 03 2022 cs.LG nlin.CD arXiv:2205.00579v1

@misc{2205.00579, author = {Kevin Zeng and Alec J.~Linot and Michael D.~Graham}, title = {{D}ata-driven control of spatiotemporal chaos with reduced-order neural {ODE}-based models and reinforcement learning}, year = {2022}, eprint = {2205.00579}, note = {arXiv:2205.00579v1} }
PDF
Deep reinforcement learning (RL) is a data-driven method capable of discovering complex control strategies for high-dimensional systems, making it promising for flow control applications. In particular, the present work is motivated by the goal of reducing energy dissipation in turbulent flows, and the example considered is the spatiotemporally chaotic dynamics of the Kuramoto-Sivashinsky equation (KSE). A major challenge associated with RL is that substantial training data must be generated by repeatedly interacting with the target system, making it costly when the system is computationally or experimentally expensive. We mitigate this challenge in a data-driven manner by combining dimensionality reduction via an autoencoder with a neural ODE framework to obtain a low-dimensional dynamical model from just a limited data set. We substitute this data-driven reduced-order model (ROM) in place of the true system during RL training to efficiently estimate the optimal policy, which can then be deployed on the true system. For the KSE actuated with localized forcing ("jets") at four locations, we demonstrate that we are able to learn a ROM that accurately captures the actuated dynamics as well as the underlying natural dynamics just from snapshots of the KSE experiencing random actuations. Using this ROM and a control objective of minimizing dissipation and power cost, we extract a control policy from it using deep RL. We show that the ROM-based control strategy translates well to the true KSE and highlight that the RL agent discovers and stabilizes an underlying forced equilibrium solution of the KSE system. We show that this forced equilibrium captured in the ROM and discovered through RL is related to an existing known equilibrium solution of the natural KSE.
Unsupervised Domain Adaptive Salient Object Detection Through Uncertainty-Aware Pseudo-Label Learning
Pengxiang Yan, Ziyi Wu, Mengmeng Liu, Kun Zeng, Liang Lin, Guanbin Li
Mar 01 2022 cs.CV arXiv:2202.13170v1

@misc{2202.13170, author = {Pengxiang Yan and Ziyi Wu and Mengmeng Liu and Kun Zeng and Liang Lin and Guanbin Li}, title = {{U}nsupervised {D}omain {A}daptive {S}alient {O}bject {D}etection {T}hrough {U}ncertainty-{A}ware {P}seudo-{L}abel {L}earning}, year = {2022}, eprint = {2202.13170}, note = {arXiv:2202.13170v1} }
PDF
Recent advances in deep learning significantly boost the performance of salient object detection (SOD) at the expense of labeling larger-scale per-pixel annotations. To relieve the burden of labor-intensive labeling, deep unsupervised SOD methods have been proposed to exploit noisy labels generated by handcrafted saliency methods. However, it is still difficult to learn accurate saliency details from rough noisy labels. In this paper, we propose to learn saliency from synthetic but clean labels, which naturally has higher pixel-labeling quality without the effort of manual annotations. Specifically, we first construct a novel synthetic SOD dataset by a simple copy-paste strategy. Considering the large appearance differences between the synthetic and real-world scenarios, directly training with synthetic data will lead to performance degradation on real-world scenarios. To mitigate this problem, we propose a novel unsupervised domain adaptive SOD method to adapt between these two domains by uncertainty-aware self-training. Experimental results show that our proposed method outperforms the existing state-of-the-art deep unsupervised SOD methods on several benchmark datasets, and is even comparable to fully-supervised ones.
Banyan: A Scoped Dataflow Engine for Graph Query Service
Li Su, Xiaoming Qin, Zichao Zhang, Rui Yang, Le Xu, Indranil Gupta, Wenyuan Yu, Kai Zeng, Jingren Zhou
Feb 28 2022 cs.DB cs.DC arXiv:2202.12530v1

@misc{2202.12530, author = {Li Su and Xiaoming Qin and Zichao Zhang and Rui Yang and Le Xu and Indranil Gupta and Wenyuan Yu and Kai Zeng and Jingren Zhou}, title = {{B}anyan: {A} {S}coped {D}ataflow {E}ngine for {G}raph {Q}uery {S}ervice}, year = {2022}, eprint = {2202.12530}, note = {arXiv:2202.12530v1} }
PDF
Graph query services (GQS) are widely used today to interactively answer graph traversal queries on large-scale graph data. Existing graph query engines focus largely on optimizing the latency of a single query. This ignores significant challenges posed by GQS, including fine-grained control and scheduling during query execution, as well as performance isolation and load balancing in various levels from across user to intra-query. To tackle these control and scheduling challenges, we propose a novel scoped dataflow for modeling graph traversal queries, which explicitly exposes concurrent execution and control of any subquery to the finest granularity. We implemented Banyan, an engine based on the scoped dataflow model for GQS. Banyan focuses on scaling up the performance on a single machine, and provides the ability to easily scale out. Extensive experiments on multiple benchmarks show that Banyan improves performance by up to three orders of magnitude over state-of-the-art graph query engines, while providing performance isolation and load balancing.
Molecule Generation for Drug Design: a Graph Learning Perspective
Nianzu Yang, Huaijin Wu, Kaipeng Zeng, Yang Li, Junchi Yan
Feb 21 2022 cs.LG cs.AI arXiv:2202.09212v2

@misc{2202.09212, author = {Nianzu Yang and Huaijin Wu and Kaipeng Zeng and Yang Li and Junchi Yan}, title = {{M}olecule {G}eneration for {D}rug {D}esign: a {G}raph {L}earning {P}erspective}, year = {2022}, eprint = {2202.09212}, note = {arXiv:2202.09212v2} }
PDF
Machine learning, particularly graph learning, is gaining increasing recognition for its transformative impact across various fields. One such promising application is in the realm of molecule design and discovery, notably within the pharmaceutical industry. Our survey offers a comprehensive overview of state-of-the-art methods in molecule design, particularly focusing on \emphde novo drug design, which incorporates (deep) graph learning techniques. We categorize these methods into three distinct groups: \emphi) \emphall-at-once, \emphii) \emphfragment-based, and \emphiii) \emphnode-by-node. Additionally, we introduce some key public datasets and outline the commonly used evaluation metrics for both the generation and optimization of molecules. In the end, we discuss the existing challenges in this field and suggest potential directions for future research.
Interactive Contrastive Learning for Self-supervised Entity Alignment
Kaisheng Zeng, Zhenhao Dong, Lei Hou, Yixin Cao, Minghao Hu, Jifan Yu, Xin Lv, Juanzi Li, Ling Feng
Jan 19 2022 cs.CL cs.AI arXiv:2201.06225v2

@misc{2201.06225, author = {Kaisheng Zeng and Zhenhao Dong and Lei Hou and Yixin Cao and Minghao Hu and Jifan Yu and Xin Lv and Juanzi Li and Ling Feng}, title = {{I}nteractive {C}ontrastive {L}earning for {S}elf-supervised {E}ntity {A}lignment}, year = {2022}, eprint = {2201.06225}, note = {arXiv:2201.06225v2} }
PDF
Self-supervised entity alignment (EA) aims to link equivalent entities across different knowledge graphs (KGs) without seed alignments. The current SOTA self-supervised EA method draws inspiration from contrastive learning, originally designed in computer vision based on instance discrimination and contrastive loss, and suffers from two shortcomings. Firstly, it puts unidirectional emphasis on pushing sampled negative entities far away rather than pulling positively aligned pairs close, as is done in the well-established supervised EA. Secondly, KGs contain rich side information (e.g., entity description), and how to effectively leverage those information has not been adequately investigated in self-supervised EA. In this paper, we propose an interactive contrastive learning model for self-supervised EA. The model encodes not only structures and semantics of entities (including entity name, entity description, and entity neighborhood), but also conducts cross-KG contrastive learning by building pseudo-aligned entity pairs. Experimental results show that our approach outperforms previous best self-supervised results by a large margin (over 9% average improvement) and performs on par with previous SOTA supervised counterparts, demonstrating the effectiveness of the interactive contrastive learning for self-supervised EA.
Performance Analysis and Power Allocation of Joint Communication and Sensing Towards Future Communication Networks
Meng Liu, Minglei Yang, Huifang Li, Kun Zeng, Zhaoming Zhang, Xiancheng Cheng, Arumugam Nallanathan, Derrick Wing Kwan Ng, Guangjian Wang
Jan 11 2022 cs.IT cs.SY eess.SY math.IT arXiv:2201.02972v1

@misc{2201.02972, author = {Meng Liu and Minglei Yang and Huifang Li and Kun Zeng and Zhaoming Zhang and Xiancheng Cheng and Arumugam Nallanathan and Derrick Wing Kwan Ng and Guangjian Wang}, title = {{P}erformance {A}nalysis and {P}ower {A}llocation of {J}oint {C}ommunication and {S}ensing {T}owards {F}uture {C}ommunication {N}etworks}, year = {2022}, eprint = {2201.02972}, note = {arXiv:2201.02972v1} }
PDF
To mitigate the radar and communication frequency overlapping caused by massive devices access, we propose a novel joint communication and sensing (JCS) system in this paper, where a micro base station (MiBS) can realize target sensing and cooperative communication simultaneously. Concretely, the MiBS, as the sensing equipment, can also serve as a full-duplex (FD) decode-and-forward (DF) relay to assist the end-to-end communication. To further improve the spectrum utilization, non-orthogonal multiple access (NOMA) is adopted such that the communication between the macro base station (MaBS) and the Internet-of-Things (IoT) devices. To facilitate the performance evaluation, the exact and asymptotic outage probabilities, ergodic rates, sensing probability of the system are characterized. Subsequently, two optimal power allocation (OPA) problems of maximizing the received signal-to-interference-plus-noise ratio of sensing signal and maximizing the sum rate for communication are designed that are solved by means of the Lagrangian method and function monotonicity. The simulation results demonstrate that: 1) the proposed JCS NOMA system can accomplish both communication enhancement and sensing function under the premise of the same power consumption as non-cooperative NOMA; 2) the proposed OPA schemes manifest superiorities over a random power allocation scheme.
Improving Dither Modulation based Robust Steganography by Overflow Suppression
Kai Zeng, Kejiang Chen, Yaofei Wang, Weiming Zhang, Nenghai Yu
Oct 19 2021 cs.CR arXiv:2110.08697v1

@misc{2110.08697, author = {Kai Zeng and Kejiang Chen and Yaofei Wang and Weiming Zhang and Nenghai Yu}, title = {{I}mproving {D}ither {M}odulation based {R}obust {S}teganography by {O}verflow {S}uppression}, year = {2021}, eprint = {2110.08697}, note = {arXiv:2110.08697v1} }
PDF
Nowadays, people are sharing their pictures on online social networks (OSNs), so OSN is a good platform for Steganography. But OSNs usually perform JPEG compression on the uploaded image, which will invalidate most of the existing steganography algorithms. Recently, some works try to design robust steganography which can resist JPEG compression, such as Dither Modulation-based robust Adaptive Steganography (DMAS) and Generalized dither Modulation-based robust Adaptive Steganography (GMAS). They relieve the problem that the receivers cannot extract the message correctly when the quality factor of channel JPEG compression is larger than that of cover images. However, they only can realize limited resistance to detection and compression due to robust domain selection. To overcome this problem, we meticulously explore three lossy operations in the JPEG recompression and discover that the key problem is spatial overflow. Then two preprocessing methods Overall Scaling (OS) and Specific Truncation (ST) are presented to remove overflow before message embedding as well as generate a reference image. The reference image is employed as the guidance to build asymmetric distortion for removing overflow during embedding. Experimental results show that the proposed methods significantly surpass GMAS in terms of security and achieve comparable robustness.
Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation
Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, Zhengping Qian, Jingren Zhou, Jiangneng Li, Bin Cui
Sep 14 2021 cs.DB cs.AI arXiv:2109.05877v3

@misc{2109.05877, author = {Yuxing Han and Ziniu Wu and Peizhi Wu and Rong Zhu and Jingyi Yang and Liang Wei Tan and Kai Zeng and Gao Cong and Yanzhao Qin and Andreas Pfadler and Zhengping Qian and Jingren Zhou and Jiangneng Li and Bin Cui}, title = {{C}ardinality {E}stimation in {DBMS}: {A} {C}omprehensive {B}enchmark {E}valuation}, year = {2021}, eprint = {2109.05877}, note = {arXiv:2109.05877v3} }
PDF
Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans for a query optimizer in DBMS. In the last decade, an increasing number of advanced CardEst methods (especially ML-based) have been proposed with outstanding estimation accuracy and inference latency. However, there exists no study that systematically evaluates the quality of these methods and answer the fundamental problem: to what extent can these methods improve the performance of query optimizer in real-world settings, which is the ultimate goal of a CardEst method. In this paper, we comprehensively and systematically compare the effectiveness of CardEst methods in a real DBMS. We establish a new benchmark for CardEst, which contains a new complex real-world dataset STATS and a diverse query workload STATS-CEB. We integrate multiple most representative CardEst methods into an open-source database system PostgreSQL, and comprehensively evaluate their true effectiveness in improving query plan quality, and other important aspects affecting their applicability, ranging from inference latency, model size, and training time, to update efficiency and accuracy. We obtain a number of key findings for the CardEst methods, under different data and query settings. Furthermore, we find that the widely used estimation accuracy metric(Q-Error) cannot distinguish the importance of different sub-plan queries during query optimization and thus cannot truly reflect the query plan quality generated by CardEst methods. Therefore, we propose a new metric P-Error to evaluate the performance of CardEst methods, which overcomes the limitation of Q-Error and is able to reflect the overall end-to-end performance of CardEst methods. We have made all of the benchmark data and evaluation code publicly available at https://github.com/Nathaniel-Han/End-to-End-CardEst-Benchmark.
A decreasing scaling transition scheme from Adam to SGD
Kun Zeng, Jinlan Liu, Zhixia Jiang, Dongpo Xu
Jun 15 2021 cs.LG arXiv:2106.06749v2

@misc{2106.06749, author = {Kun Zeng and Jinlan Liu and Zhixia Jiang and Dongpo Xu}, title = {{A} decreasing scaling transition scheme from {A}dam to {SGD}}, year = {2021}, eprint = {2106.06749}, note = {arXiv:2106.06749v2} }
PDF
Adaptive gradient algorithm (AdaGrad) and its variants, such as RMSProp, Adam, AMSGrad, etc, have been widely used in deep learning. Although these algorithms are faster in the early phase of training, their generalization performance is often not as good as stochastic gradient descent (SGD). Hence, a trade-off method of transforming Adam to SGD after a certain iteration to gain the merits of both algorithms is theoretically and practically significant. To that end, we propose a decreasing scaling transition scheme to achieve a smooth and stable transition from Adam to SGD, which is called DSTAdam. The convergence of the proposed DSTAdam is also proved in an online convex setting. Finally, the effectiveness of the DSTAdam is verified on the CIFAR-10/100 datasets. Our implementation is available at: https://github.com/kunzeng/DSTAdam.
Scaling transition from momentum stochastic gradient descent to plain stochastic gradient descent
Kun Zeng, Jinlan Liu, Zhixia Jiang, Dongpo Xu
Jun 15 2021 cs.LG arXiv:2106.06753v1

@misc{2106.06753, author = {Kun Zeng and Jinlan Liu and Zhixia Jiang and Dongpo Xu}, title = {{S}caling transition from momentum stochastic gradient descent to plain stochastic gradient descent}, year = {2021}, eprint = {2106.06753}, note = {arXiv:2106.06753v1} }
PDF
The plain stochastic gradient descent and momentum stochastic gradient descent have extremely wide applications in deep learning due to their simple settings and low computational complexity. The momentum stochastic gradient descent uses the accumulated gradient as the updated direction of the current parameters, which has a faster training speed. Because the direction of the plain stochastic gradient descent has not been corrected by the accumulated gradient. For the parameters that currently need to be updated, it is the optimal direction, and its update is more accurate. We combine the advantages of the momentum stochastic gradient descent with fast training speed and the plain stochastic gradient descent with high accuracy, and propose a scaling transition from momentum stochastic gradient descent to plain stochastic gradient descent(TSGD) method. At the same time, a learning rate that decreases linearly with the iterations is used instead of a constant learning rate. The TSGD algorithm has a larger step size in the early stage to speed up the training, and training with a smaller step size in the later stage can steadily converge. Our experimental results show that the TSGD algorithm has faster training speed, higher accuracy and better stability. Our implementation is available at: https://github.com/kunzeng/TSGD.
Real-Time Quantized Image Super-Resolution on Mobile NPUs, Mobile AI 2021 Challenge: Report
Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Andrew Lek, Mustafa Ayazoglu, Jie Liu, Zongcai Du, Jiaming Guo, Xueyi Zhou, Hao Jia, Youliang Yan, Zexin Zhang, Yixin Chen, Yunbo Peng, Yue Lin, Xindong Zhang, Hui Zeng, Kun Zeng, Peirong Li, et al (3)
May 18 2021 eess.IV cs.CV cs.LG arXiv:2105.07825v1

@misc{2105.07825, author = {Andrey Ignatov and Radu Timofte and Maurizio Denna and Abdel Younes and Andrew Lek and Mustafa Ayazoglu and Jie Liu and Zongcai Du and Jiaming Guo and Xueyi Zhou and Hao Jia and Youliang Yan and Zexin Zhang and Yixin Chen and Yunbo Peng and Yue Lin and Xindong Zhang and Hui Zeng and Kun Zeng and Peirong Li and Zhihuang Liu and Shiqi Xue and Shengpeng Wang}, title = {{R}eal-{T}ime {Q}uantized {I}mage {S}uper-{R}esolution on {M}obile {NPU}s, {M}obile {AI} 2021 {C}hallenge: {R}eport}, year = {2021}, eprint = {2105.07825}, note = {arXiv:2105.07825v1} }
PDF
Image super-resolution is one of the most popular computer vision problems with many important applications to mobile devices. While many solutions have been proposed for this task, they are usually not optimized even for common smartphone AI hardware, not to mention more constrained smart TV platforms that are often supporting INT8 inference only. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image super-resolution solutions that can demonstrate a real-time performance on mobile or edge NPUs. For this, the participants were provided with the DIV2K dataset and trained quantized models to do an efficient 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated NPU capable of accelerating quantized neural networks. The proposed solutions are fully compatible with all major mobile AI accelerators and are capable of reconstructing Full HD images under 40-60 ms while achieving high fidelity results. A detailed description of all models developed in the challenge is provided in this paper.
A Unified Transferable Model for ML-Enhanced DBMS
Ziniu Wu, Pei Yu, Peilun Yang, Rong Zhu, Yuxing Han, Yaliang Li, Defu Lian, Kai Zeng, Jingren Zhou
May 07 2021 cs.DB cs.AI arXiv:2105.02418v3

@misc{2105.02418, author = {Ziniu Wu and Pei Yu and Peilun Yang and Rong Zhu and Yuxing Han and Yaliang Li and Defu Lian and Kai Zeng and Jingren Zhou}, title = {{A} {U}nified {T}ransferable {M}odel for {ML}-{E}nhanced {DBMS}}, year = {2021}, eprint = {2105.02418}, note = {arXiv:2105.02418v3} }
PDF
Recently, the database management system (DBMS) community has witnessed the power of machine learning (ML) solutions for DBMS tasks. Despite their promising performance, these existing solutions can hardly be considered satisfactory. First, these ML-based methods in DBMS are not effective enough because they are optimized on each specific task, and cannot explore or understand the intrinsic connections between tasks. Second, the training process has serious limitations that hinder their practicality, because they need to retrain the entire model from scratch for a new DB. Moreover, for each retraining, they require an excessive amount of training data, which is very expensive to acquire and unavailable for a new DB. We propose to explore the transferabilities of the ML methods both across tasks and across DBs to tackle these fundamental drawbacks. In this paper, we propose a unified model MTMLF that uses a multi-task training procedure to capture the transferable knowledge across tasks and a pre-train fine-tune procedure to distill the transferable meta knowledge across DBs. We believe this paradigm is more suitable for cloud DB service, and has the potential to revolutionize the way how ML is used in DBMS. Furthermore, to demonstrate the predicting power and viability of MTMLF, we provide a concrete and very promising case study on query optimization tasks. Last but not least, we discuss several concrete research opportunities along this line of work.
Pushing it out of the Way: Interactive Visual Navigation
Kuo-Hao Zeng, Luca Weihs, Ali Farhadi, Roozbeh Mottaghi
Apr 30 2021 cs.CV cs.AI cs.RO arXiv:2104.14040v2

@misc{2104.14040, author = {Kuo-Hao Zeng and Luca Weihs and Ali Farhadi and Roozbeh Mottaghi}, title = {{P}ushing it out of the {W}ay: {I}nteractive {V}isual {N}avigation}, year = {2021}, eprint = {2104.14040}, note = {arXiv:2104.14040v2} }
PDF
We have observed significant progress in visual navigation for embodied agents. A common assumption in studying visual navigation is that the environments are static; this is a limiting assumption. Intelligent navigation may involve interacting with the environment beyond just moving forward/backward and turning left/right. Sometimes, the best way to navigate is to push something out of the way. In this paper, we study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals. To this end, we introduce the Neural Interaction Engine (NIE) to explicitly predict the change in the environment caused by the agent's actions. By modeling the changes while planning, we find that agents exhibit significant improvements in their navigational capabilities. More specifically, we consider two downstream tasks in the physics-enabled, visually rich, AI2-THOR environment: (1) reaching a target while the path to the target is blocked (2) moving an object to a target location by pushing it. For both tasks, agents equipped with an NIE significantly outperform agents without the understanding of the effect of the actions indicating the benefits of our approach.
Symmetry reduction for deep reinforcement learning active control of chaotic spatiotemporal dynamics
Kevin Zeng, Michael D. Graham
Apr 13 2021 cs.LG nlin.CD arXiv:2104.05437v1

@misc{2104.05437, author = {Kevin Zeng and Michael D.~Graham}, title = {{S}ymmetry reduction for deep reinforcement learning active control of chaotic spatiotemporal dynamics}, year = {2021}, eprint = {2104.05437}, note = {arXiv:2104.05437v1} }
PDF
Deep reinforcement learning (RL) is a data-driven, model-free method capable of discovering complex control strategies for macroscopic objectives in high-dimensional systems, making its application towards flow control promising. Many systems of flow control interest possess symmetries that, when neglected, can significantly inhibit the learning and performance of a naive deep RL approach. Using a test-bed consisting of the Kuramoto-Sivashinsky Equation (KSE), equally spaced actuators, and a goal of minimizing dissipation and power cost, we demonstrate that by moving the deep RL problem to a symmetry-reduced space, we can alleviate limitations inherent in the naive application of deep RL. We demonstrate that symmetry-reduced deep RL yields improved data efficiency as well as improved control policy efficacy compared to policies found by naive deep RL. Interestingly, the policy learned by the the symmetry aware control agent drives the system toward an equilibrium state of the forced KSE that is connected by continuation to an equilibrium of the unforced KSE, despite having been given no explicit information regarding its existence. I.e., to achieve its goal, the RL algorithm discovers and stabilizes an equilibrium state of the system. Finally, we demonstrate that the symmetry-reduced control policy is robust to observation and actuation signal noise, as well as to system parameters it has not observed before.
When Face Recognition Meets Occlusion: A New Benchmark
Baojin Huang, Zhongyuan Wang, Guangcheng Wang, Kui Jiang, Kangli Zeng, Zhen Han, Xin Tian, Yuhong Yang
Mar 05 2021 cs.CV arXiv:2103.02805v1

@misc{2103.02805, author = {Baojin Huang and Zhongyuan Wang and Guangcheng Wang and Kui Jiang and Kangli Zeng and Zhen Han and Xin Tian and Yuhong Yang}, title = {{W}hen {F}ace {R}ecognition {M}eets {O}cclusion: {A} {N}ew {B}enchmark}, year = {2021}, eprint = {2103.02805}, note = {arXiv:2103.02805v1} }
PDF
The existing face recognition datasets usually lack occlusion samples, which hinders the development of face recognition. Especially during the COVID-19 coronavirus epidemic, wearing a mask has become an effective means of preventing the virus spread. Traditional CNN-based face recognition models trained on existing datasets are almost ineffective for heavy occlusion. To this end, we pioneer a simulated occlusion face recognition dataset. In particular, we first collect a variety of glasses and masks as occlusion, and randomly combine the occlusion attributes (occlusion objects, textures,and colors) to achieve a large number of more realistic occlusion types. We then cover them in the proper position of the face image with the normal occlusion habit. Furthermore, we reasonably combine original normal face images and occluded face images to form our final dataset, termed as Webface-OCC. It covers 804,704 face images of 10,575 subjects, with diverse occlusion types to ensure its diversity and stability. Extensive experiments on public datasets show that the ArcFace retrained by our dataset significantly outperforms the state-of-the-arts. Webface-OCC is available at https://github.com/Baojin-Huang/Webface-OCC.
REPOSE: Distributed Top-k Trajectory Similarity Search with Local Reference Point Tries
Bolong Zheng, Lianggui Weng, Xi Zhao, Kai Zeng, Xiaofang Zhou, Christian S. Jensen
Jan 25 2021 cs.DB arXiv:2101.08929v2

@misc{2101.08929, author = {Bolong Zheng and Lianggui Weng and Xi Zhao and Kai Zeng and Xiaofang Zhou and Christian S.~Jensen}, title = {{REPOSE}: {D}istributed {T}op-k {T}rajectory {S}imilarity {S}earch with {L}ocal {R}eference {P}oint {T}ries}, year = {2021}, eprint = {2101.08929}, note = {arXiv:2101.08929v2} }
PDF
Trajectory similarity computation is a fundamental component in a variety of real-world applications, such as ridesharing, road planning, and transportation optimization. Recent advances in mobile devices have enabled an unprecedented increase in the amount of available trajectory data such that efficient query processing can no longer be supported by a single machine. As a result, means of performing distributed in-memory trajectory similarity search are called for. However, existing distributed proposals suffer from either computing resource waste or are unable to support the range of similarity measures that are being used. We propose a distributed in-memory management framework called REPOSE for processing top-k trajectory similarity queries on Spark. We develop a reference point trie (RP-Trie) index to organize trajectory data for local search. In addition, we design a novel heterogeneous global partitioning strategy to eliminate load imbalance in distributed settings. We report on extensive experiments with real-world data that offer insight into the performance of the solution, and show that the solution is capable of outperforming the state-of-the-art proposals.
A Pluggable Learned Index Method via Sampling and Gap Insertion
Yaliang Li, Daoyuan Chen, Bolin Ding, Kai Zeng, Jingren Zhou
Jan 05 2021 cs.DB cs.AI arXiv:2101.00808v1

@misc{2101.00808, author = {Yaliang Li and Daoyuan Chen and Bolin Ding and Kai Zeng and Jingren Zhou}, title = {{A} {P}luggable {L}earned {I}ndex {M}ethod via {S}ampling and {G}ap {I}nsertion}, year = {2021}, eprint = {2101.00808}, note = {arXiv:2101.00808v1} }
PDF
Database indexes facilitate data retrieval and benefit broad applications in real-world systems. Recently, a new family of index, named learned index, is proposed to learn hidden yet useful data distribution and incorporate such information into the learning of indexes, which leads to promising performance improvements. However, the "learning" process of learned indexes is still under-explored. In this paper, we propose a formal machine learning based framework to quantify the index learning objective, and study two general and pluggable techniques to enhance the learning efficiency and learning effectiveness for learned indexes. With the guidance of the formal learning objective, we can efficiently learn index by incorporating the proposed sampling technique, and learn precise index with enhanced generalization ability brought by the proposed result-driven gap insertion technique. We conduct extensive experiments on real-world datasets and compare several indexing methods from the perspective of the index learning objective. The results show the ability of the proposed framework to help to design suitable indexes for different scenarios. Further, we demonstrate the effectiveness of the proposed sampling technique, which achieves up to 78x construction speedup while maintaining non-degraded indexing performance. Finally, we show the gap insertion technique can enhance both the static and dynamic indexing performances of existing learned index methods with up to 1.59x query speedup. We will release our codes and processed data for further study, which can enable more exploration of learned indexes from both the perspectives of machine learning and database.
BayesCard: Revitilizing Bayesian Frameworks for Cardinality Estimation
Ziniu Wu, Amir Shaikhha, Rong Zhu, Kai Zeng, Yuxing Han, Jingren Zhou
Jan 01 2021 cs.DB cs.LG arXiv:2012.14743v2

@misc{2012.14743, author = {Ziniu Wu and Amir Shaikhha and Rong Zhu and Kai Zeng and Yuxing Han and Jingren Zhou}, title = {{B}ayes{C}ard: {R}evitilizing {B}ayesian {F}rameworks for {C}ardinality {E}stimation}, year = {2021}, eprint = {2012.14743}, note = {arXiv:2012.14743v2} }
PDF
Cardinality estimation (CardEst) is an essential component in query optimizers and a fundamental problem in DBMS. A desired CardEst method should attain good algorithm performance, be stable to varied data settings, and be friendly to system deployment. However, no existing CardEst method can fulfill the three criteria at the same time. Traditional methods often have significant algorithm drawbacks such as large estimation errors. Recently proposed deep learning based methods largely improve the estimation accuracy but their performance can be greatly affected by data and often difficult for system deployment. In this paper, we revitalize the Bayesian networks (BN) for CardEst by incorporating the techniques of probabilistic programming languages. We present BayesCard, the first framework that inherits the advantages of BNs, i.e., high estimation accuracy and interpretability, while overcomes their drawbacks, i.e. low structure learning and inference efficiency. This makes BayesCard a perfect candidate for commercial DBMS deployment. Our experimental results on several single-table and multi-table benchmarks indicate BayesCard's superiority over existing state-of-the-art CardEst methods: BayesCard achieves comparable or better accuracy, 1-2 orders of magnitude faster inference time, 1-3 orders faster training time, 1-3 orders smaller model size, and 1-2 orders faster updates. Meanwhile, BayesCard keeps stable performance when varying data with different settings. We also deploy BayesCard into PostgreSQL. On the IMDB benchmark workload, it improves the end-to-end query time by 13.3%, which is very close to the optimal result of 14.2% using an oracle of true cardinality.
FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation
Rong Zhu, Ziniu Wu, Yuxing Han, Kai Zeng, Andreas Pfadler, Zhengping Qian, Jingren Zhou, Bin Cui
Nov 19 2020 cs.DB cs.AI arXiv:2011.09022v5

@misc{2011.09022, author = {Rong Zhu and Ziniu Wu and Yuxing Han and Kai Zeng and Andreas Pfadler and Zhengping Qian and Jingren Zhou and Bin Cui}, title = {{FLAT}: {F}ast, {L}ightweight and {A}ccurate {M}ethod for {C}ardinality {E}stimation}, year = {2020}, eprint = {2011.09022}, note = {arXiv:2011.09022v5} }
PDF
Query optimizers rely on accurate cardinality estimation (CardEst) to produce good execution plans. The core problem of CardEst is how to model the rich joint distribution of attributes in an accurate and compact manner. Despite decades of research, existing methods either over simplify the models only using independent factorization which leads to inaccurate estimates, or over complicate them by lossless conditional factorization without any independent assumption which results in slow probability computation. In this paper, we propose FLAT, a CardEst method that is simultaneously fast in probability computation, lightweight in model size and accurate in estimation quality. The key idea of FLAT is a novel unsupervised graphical model, called FSPN. It utilizes both independent and conditional factorization to adaptively model different levels of attributes correlations, and thus dovetails their advantages. FLAT supports efficient online probability computation in near liner time on the underlying FSPN model, provides effective offline model construction and enables incremental model updates. It can estimate cardinality for both single table queries and multi table join queries. Extensive experimental study demonstrates the superiority of FLAT over existing CardEst methods on well known IMDB benchmarks: FLAT achieves 1 to 5 orders of magnitude better accuracy, 1 to 3 orders of magnitude faster probability computation speed and 1 to 2 orders of magnitude lower storage cost. We also integrate FLAT into Postgres to perform an end to end test. It improves the query execution time by 12.9% on the benchmark workload, which is very close to the optimal result 14.2% using the true cardinality.
FSPN: A New Class of Probabilistic Graphical Model
Ziniu Wu, Rong Zhu, Andreas Pfadler, Yuxing Han, Jiangneng Li, Zhengping Qian, Kai Zeng, Jingren Zhou
Nov 19 2020 cs.AI arXiv:2011.09020v2

@misc{2011.09020, author = {Ziniu Wu and Rong Zhu and Andreas Pfadler and Yuxing Han and Jiangneng Li and Zhengping Qian and Kai Zeng and Jingren Zhou}, title = {{FSPN}: {A} {N}ew {C}lass of {P}robabilistic {G}raphical {M}odel}, year = {2020}, eprint = {2011.09020}, note = {arXiv:2011.09020v2} }
PDF
We introduce factorize sum split product networks (FSPNs), a new class of probabilistic graphical models (PGMs). FSPNs are designed to overcome the drawbacks of existing PGMs in terms of estimation accuracy and inference efficiency. Specifically, Bayesian networks (BNs) have low inference speed and performance of tree structured sum product networks(SPNs) significantly degrades in presence of highly correlated variables. FSPNs absorb their advantages by adaptively modeling the joint distribution of variables according to their dependence degree, so that one can simultaneously attain the two desirable goals: high estimation accuracy and fast inference speed. We present efficient probability inference and structure learning algorithms for FSPNs, along with a theoretical analysis and extensive evaluation evidence. Our experimental results on synthetic and benchmark datasets indicate the superiority of FSPN over other PGMs.
MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions
Wenqi Jiang, Zhenhao He, Shuai Zhang, Thomas B. Preußer, Kai Zeng, Liang Feng, Jiansong Zhang, Tongxuan Liu, Yong Li, Jingren Zhou, Ce Zhang, Gustavo Alonso
Oct 13 2020 cs.AR cs.AI cs.IR cs.LG arXiv:2010.05894v2

@misc{2010.05894, author = {Wenqi Jiang and Zhenhao He and Shuai Zhang and Thomas B.~Preußer and Kai Zeng and Liang Feng and Jiansong Zhang and Tongxuan Liu and Yong Li and Jingren Zhou and Ce Zhang and Gustavo Alonso}, title = {{M}icro{R}ec: {E}fficient {R}ecommendation {I}nference by {H}ardware and {D}ata {S}tructure {S}olutions}, year = {2020}, eprint = {2010.05894}, note = {arXiv:2010.05894v2} }
PDF
Deep neural networks are widely used in personalized recommendation systems. Unlike regular DNN inference workloads, recommendation inference is memory-bound due to the many random memory accesses needed to lookup the embedding tables. The inference is also heavily constrained in terms of latency because producing a recommendation for a user must be done in about tens of milliseconds. In this paper, we propose MicroRec, a high-performance inference engine for recommendation systems. MicroRec accelerates recommendation inference by (1) redesigning the data structures involved in the embeddings to reduce the number of lookups needed and (2) taking advantage of the availability of High-Bandwidth Memory (HBM) in FPGA accelerators to tackle the latency by enabling parallel lookups. We have implemented the resulting design on an FPGA board including the embedding lookup step as well as the complete inference process. Compared to the optimized CPU baseline (16 vCPU, AVX2-enabled), MicroRec achieves 13.8~14.7x speedup on embedding lookup alone and 2.5$~5.4x speedup for the entire recommendation inference in terms of throughput. As for latency, CPU-based engines needs milliseconds for inferring a recommendation while MicroRec only takes microseconds, a significant advantage in real-time recommendation systems.