Skip to main content

Showing 1–50 of 228 results for author: Du, L

  1. Symmetry Nonnegative Matrix Factorization Algorithm Based on Self-paced Learning

    Authors: Lei Wang, Liang Du, Peng Zhou, Peng Wu

    Abstract: A symmetric nonnegative matrix factorization algorithm based on self-paced learning was proposed to improve the clustering performance of the model. It could make the model better distinguish normal samples from abnormal samples in an error-driven way. A weight variable that could measure the degree of difficulty to all samples was assigned in this method, and the variable was constrained by adopt… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

    Comments: in Chinese language

    Journal ref: Journal of Zhengzhou University(Natural Science Edition),2022,54 (05), 43-48

  2. Multiple Kernel Clustering via Local Regression Integration

    Authors: Liang Du, Xin Ren, Haiying Zhang, Peng Zhou

    Abstract: Multiple kernel methods less consider the intrinsic manifold structure of multiple kernel data and estimate the consensus kernel matrix with quadratic number of variables, which makes it vulnerable to the noise and outliers within multiple candidate kernels. This paper first presents the clustering method via kernelized local regression (CKLR). It captures the local structure of kernel data and em… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

    Comments: in Chinese language

    Journal ref: Computer Science, 2021,48(08),47-52

  3. Unsupervised feature selection algorithm framework based on neighborhood interval disturbance fusion

    Authors: Xiaolin Lv, Liang Du, Peng Zhou, Peng Wu

    Abstract: Feature selection technology is a key technology of data dimensionality reduction. Becauseof the lack of label information of collected data samples, unsupervised feature selection has attracted more attention. The universality and stability of many unsupervised feature selection algorithms are very low and greatly affected by the dataset structure. For this reason, many researchers have been keen… ▽ More

    Submitted 20 October, 2024; originally announced October 2024.

    Comments: in Chinese language

    Journal ref: Journal of Nanjing University of Science and Technology, 2021, 45(04), 420-428

  4. arXiv:2410.07739  [pdf, other

    cs.LG cs.CL

    SLIM: Let LLM Learn More and Forget Less with Soft LoRA and Identity Mixture

    Authors: Jiayi Han, Liang Du, Hongwei Du, Xiangguo Zhou, Yiwen Wu, Weibo Zheng, Donghong Han

    Abstract: Although many efforts have been made, it is still a challenge to balance the training budget, downstream performance, and the general capabilities of the LLMs in many applications. Training the whole model for downstream tasks is expensive, and could easily result in catastrophic forgetting. By introducing parameter-efficient fine-tuning (PEFT), the training cost could be reduced, but it still suf… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

    Comments: 11 pages, 6 figures, 4 tables

  5. arXiv:2410.05411  [pdf, other

    cs.IR cs.HC

    Constructing and Masking Preference Profile with LLMs for Filtering Discomforting Recommendation

    Authors: Jiahao Liu, YiYang Shao, Peng Zhang, Dongsheng Li, Hansu Gu, Chao Chen, Longzhi Du, Tun Lu, Ning Gu

    Abstract: Personalized algorithms can inadvertently expose users to discomforting recommendations, potentially triggering negative consequences. The subjectivity of discomfort and the black-box nature of these algorithms make it challenging to effectively identify and filter such content. To address this, we first conducted a formative study to understand users' practices and expectations regarding discomfo… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: 15 pages, under review

  6. arXiv:2410.00057  [pdf, other

    cs.LG

    STTM: A New Approach Based Spatial-Temporal Transformer And Memory Network For Real-time Pressure Signal In On-demand Food Delivery

    Authors: Jiang Wang, Haibin Wei, Xiaowei Xu, Jiacheng Shi, Jian Nie, Longzhi Du, Taixu Jiang

    Abstract: On-demand Food Delivery (OFD) services have become very common around the world. For example, on the Ele.me platform, users place more than 15 million food orders every day. Predicting the Real-time Pressure Signal (RPS) is crucial for OFD services, as it is primarily used to measure the current status of pressure on the logistics system. When RPS rises, the pressure increases, and the platform ne… ▽ More

    Submitted 29 September, 2024; originally announced October 2024.

  7. arXiv:2409.15820  [pdf, other

    cs.LG cs.CL

    Supervised Fine-Tuning Achieve Rapid Task Adaption Via Alternating Attention Head Activation Patterns

    Authors: Yang Zhao, Li Du, Xiao Ding, Kai Xiong, Ting Liu, Bing Qin

    Abstract: LLMs' performance on complex tasks is still unsatisfactory. A key issue is that presently LLMs learn in a data-driven schema, while the instructions about these complex tasks are both scarce and hard to collect or construct. On the contrary, a prominent phenomenon is that LLMs can learn rather fast on simpler tasks with adequate prior knowledge captured during pretraining stage. Thus, if the prere… ▽ More

    Submitted 18 October, 2024; v1 submitted 24 September, 2024; originally announced September 2024.

    Comments: in review

  8. arXiv:2409.14739  [pdf, other

    cs.ET eess.SY

    AmpAgent: An LLM-based Multi-Agent System for Multi-stage Amplifier Schematic Design from Literature for Process and Performance Porting

    Authors: Chengjie Liu, Weiyu Chen, Anlan Peng, Yuan Du, Li Du, Jun Yang

    Abstract: Multi-stage amplifiers are widely applied in analog circuits. However, their large number of components, complex transfer functions, and intricate pole-zero distributions necessitate extensive manpower for derivation and param sizing to ensure their stability. In order to achieve efficient derivation of the transfer function and simplify the difficulty of circuit design, we propose AmpAgent: a mul… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

  9. arXiv:2409.14457  [pdf, other

    cs.AI

    Large Model Agents: State-of-the-Art, Cooperation Paradigms, Security and Privacy, and Future Trends

    Authors: Yuntao Wang, Yanghe Pan, Quan Zhao, Yi Deng, Zhou Su, Linkang Du, Tom H. Luan

    Abstract: Large Model (LM) agents, powered by large foundation models such as GPT-4 and DALL-E 2, represent a significant step towards achieving Artificial General Intelligence (AGI). LM agents exhibit key characteristics of autonomy, embodiment, and connectivity, allowing them to operate across physical, virtual, and mixed-reality environments while interacting seamlessly with humans, other agents, and the… ▽ More

    Submitted 22 September, 2024; originally announced September 2024.

    Comments: 35 pages, 23 figures, 9 tables

  10. arXiv:2409.07045  [pdf, other

    cs.CL cs.AI

    Beyond IID: Optimizing Instruction Learning from the Perspective of Instruction Interaction and Dependency

    Authors: Hanyu Zhao, Li Du, Yiming Ju, Chengwei Wu, Tengfei Pan

    Abstract: With the availability of various instruction datasets, a pivotal challenge is how to effectively select and integrate these instructions to fine-tune large language models (LLMs). Previous research mainly focuses on selecting individual high-quality instructions. However, these works overlooked the joint interactions and dependencies between different categories of instructions, leading to subopti… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

  11. arXiv:2408.14721  [pdf, other

    cs.LG cs.AI cs.CL

    PAT: Pruning-Aware Tuning for Large Language Models

    Authors: Yijiang Liu, Huanrui Yang, Youxin Chen, Rongyu Zhang, Miao Wang, Yuan Du, Li Du

    Abstract: Large language models (LLMs) excel in language tasks, especially with supervised fine-tuning after pre-training. However, their substantial memory and computational requirements hinder practical applications. Structural pruning, which reduces less significant weight dimensions, is one solution. Yet, traditional post-hoc pruning often leads to significant performance loss, with limited recovery fro… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  12. arXiv:2408.12942  [pdf, other

    cs.CL cs.AI

    Causal-Guided Active Learning for Debiasing Large Language Models

    Authors: Li Du, Zhouhao Sun, Xiao Ding, Yixuan Ma, Yang Zhao, Kaitao Qiu, Ting Liu, Bing Qin

    Abstract: Although achieving promising performance, recent analyses show that current generative large language models (LLMs) may still capture dataset biases and utilize them for generation, leading to poor generalizability and harmfulness of LLMs. However, due to the diversity of dataset biases and the over-optimization problem, previous prior-knowledge-based debiasing methods and fine-tuning-based debias… ▽ More

    Submitted 30 August, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted as ACL 2024 main conference & Rewared as Outstanding Paper

  13. arXiv:2408.11431  [pdf, other

    cs.CL cs.AI

    Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning

    Authors: Kai Xiong, Xiao Ding, Li Du, Jiahao Ying, Ting Liu, Bing Qin, Yixin Cao

    Abstract: Large Language Models (LLMs) are versatile and demonstrate impressive generalization ability by mining and learning information from extensive unlabeled text. However, they still exhibit reasoning mistakes, often stemming from knowledge deficiencies, which can affect their trustworthiness and reliability. Although users can provide diverse and comprehensive queries, obtaining sufficient and effect… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Under Review

  14. arXiv:2408.09123  [pdf, other

    cs.LG math.AT

    Dynamic Neural Dowker Network: Approximating Persistent Homology in Dynamic Directed Graphs

    Authors: Hao Li, Hao Jiang, Jiajun Fan, Dongsheng Ye, Liang Du

    Abstract: Persistent homology, a fundamental technique within Topological Data Analysis (TDA), captures structural and shape characteristics of graphs, yet encounters computational difficulties when applied to dynamic directed graphs. This paper introduces the Dynamic Neural Dowker Network (DNDN), a novel framework specifically designed to approximate the results of dynamic Dowker filtration, aiming to capt… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: KDD 2024

  15. arXiv:2408.07759  [pdf, other

    cs.IR

    SWaT: Statistical Modeling of Video Watch Time through User Behavior Analysis

    Authors: Shentao Yang, Haichuan Yang, Linna Du, Adithya Ganesh, Bo Peng, Boying Liu, Serena Li, Ji Liu

    Abstract: The significance of estimating video watch time has been highlighted by the rising importance of (short) video recommendation, which has become a core product of mainstream social media platforms. Modeling video watch time, however, has been challenged by the complexity of user-video interaction, such as different user behavior modes in watching the recommended videos and varying watching probabil… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  16. arXiv:2408.06567  [pdf, other

    cs.CL cs.AI

    AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies

    Authors: Bo-Wen Zhang, Liangdong Wang, Ye Yuan, Jijie Li, Shuhao Gu, Mengdi Zhao, Xinya Wu, Guang Liu, Chengwei Wu, Hanyu Zhao, Li Du, Yiming Ju, Quanyue Ma, Yulong Ao, Yingli Zhao, Songhe Zhu, Zhou Cao, Dong Liang, Yonghua Lin, Ming Zhang, Shunfei Wang, Yanxin Zhou, Min Ye, Xuekai Chen, Xinyang Yu , et al. (2 additional authors not shown)

    Abstract: In recent years, with the rapid application of large language models across various fields, the scale of these models has gradually increased, and the resources required for their pre-training have grown exponentially. Training an LLM from scratch will cost a lot of computation resources while scaling up from a smaller model is a more efficient approach and has thus attracted significant attention… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  17. arXiv:2408.00950  [pdf, other

    cs.CV

    PrivateGaze: Preserving User Privacy in Black-box Mobile Gaze Tracking Services

    Authors: Lingyu Du, Jinyuan Jia, Xucong Zhang, Guohao Lan

    Abstract: Eye gaze contains rich information about human attention and cognitive processes. This capability makes the underlying technology, known as gaze tracking, a critical enabler for many ubiquitous applications and has triggered the development of easy-to-use gaze estimation services. Indeed, by utilizing the ubiquitous cameras on tablets and smartphones, users can readily access many gaze estimation… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  18. arXiv:2407.15488  [pdf, other

    cs.CV

    DiffX: Guide Your Layout to Cross-Modal Generative Modeling

    Authors: Zeyu Wang, Jingyu Lin, Yifei Qian, Yi Huang, Shicen Tian, Bosong Chai, Juncan Deng, Qu Yang, Lan Du, Cunjian Chen, Kejie Huang

    Abstract: Diffusion models have made significant strides in language-driven and layout-driven image generation. However, most diffusion models are limited to visible RGB image generation. In fact, human perception of the world is enriched by diverse viewpoints, such as chromatic contrast, thermal illumination, and depth information. In this paper, we introduce a novel diffusion model for general layout-guid… ▽ More

    Submitted 20 October, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

  19. arXiv:2407.11033  [pdf, other

    cs.LG cs.CL

    Hadamard Adapter: An Extreme Parameter-Efficient Adapter Tuning Method for Pre-trained Language Models

    Authors: Yuyan Chen, Qiang Fu, Ge Fan, Lun Du, Jian-Guang Lou, Shi Han, Dongmei Zhang, Zhixu Li, Yanghua Xiao

    Abstract: Recent years, Pre-trained Language models (PLMs) have swept into various fields of artificial intelligence and achieved great success. However, most PLMs, such as T5 and GPT3, have a huge amount of parameters, fine-tuning them is often expensive and time consuming, and storing them takes up a lot of space. Therefore, it is necessary to adopt a parameter-efficient approach to reduce parameters of P… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted to CIKM 2023 (Long Paper)

  20. arXiv:2407.04752  [pdf, other

    cs.LG cs.CL cs.NE

    SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking

    Authors: Xingrun Xing, Boyan Gao, Zheng Zhang, David A. Clifton, Shitao Xiao, Li Du, Guoqi Li, Jiajun Zhang

    Abstract: The recent advancements in large language models (LLMs) with billions of parameters have significantly boosted their performance across various real-world applications. However, the inference processes for these models require substantial energy and computational resources, presenting considerable deployment challenges. In contrast, human brains, which contain approximately 86 billion biological n… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  21. arXiv:2407.03082  [pdf, other

    cs.LG stat.ML

    Stable Heterogeneous Treatment Effect Estimation across Out-of-Distribution Populations

    Authors: Yuling Zhang, Anpeng Wu, Kun Kuang, Liang Du, Zixun Sun, Zhi Wang

    Abstract: Heterogeneous treatment effect (HTE) estimation is vital for understanding the change of treatment effect across individuals or subgroups. Most existing HTE estimation methods focus on addressing selection bias induced by imbalanced distributions of confounders between treated and control units, but ignore distribution shifts across populations. Thereby, their applicability has been limited to the… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted by ICDE'2024

  22. arXiv:2407.02913  [pdf, other

    cs.LG cs.AI eess.IV eess.SP math.NA

    SFC: Achieve Accurate Fast Convolution under Low-precision Arithmetic

    Authors: Liulu He, Yufei Zhao, Rui Gao, Yuan Du, Li Du

    Abstract: Fast convolution algorithms, including Winograd and FFT, can efficiently accelerate convolution operations in deep models. However, these algorithms depend on high-precision arithmetic to maintain inference accuracy, which conflicts with the model quantization. To resolve this conflict and further improve the efficiency of quantized convolution, we proposes SFC, a new algebra transform for fast co… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: ICML 2024

  23. arXiv:2406.09008  [pdf, other

    cs.CL

    LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models

    Authors: Xiaohao Yang, He Zhao, Dinh Phung, Wray Buntine, Lan Du

    Abstract: Topic modeling has been a widely used tool for unsupervised text analysis. However, comprehensive evaluations of a topic model remain challenging. Existing evaluation methods are either less comparable across different models (e.g., perplexity) or focus on only one specific aspect of a model (e.g., topic quality or document representation quality) at a time, which is insufficient to reflect the ov… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  24. arXiv:2406.04680  [pdf, other

    eess.IV cs.CV

    MTS-Net: Dual-Enhanced Positional Multi-Head Self-Attention for 3D CT Diagnosis of May-Thurner Syndrome

    Authors: Yixin Huang, Yiqi Jin, Ke Tao, Kaijian Xia, Jianfeng Gu, Lei Yu, Lan Du, Cunjian Chen

    Abstract: May-Thurner Syndrome (MTS), also known as iliac vein compression syndrome or Cockett's syndrome, is a condition potentially impacting over 20 percent of the population, leading to an increased risk of iliofemoral deep venous thrombosis. In this paper, we present a 3D-based deep learning approach called MTS-Net for diagnosing May-Thurner Syndrome using CT scans. To effectively capture the spatial-t… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  25. arXiv:2406.02088  [pdf, other

    cs.AR

    Fast and Practical Strassen's Matrix Multiplication using FPGAs

    Authors: Afzal Ahmad, Linfeng Du, Wei Zhang

    Abstract: Matrix multiplication is a cornerstone operation in a wide array of scientific fields, including machine learning and computer graphics. The standard algorithm for matrix multiplication has a complexity of $\mathcal{O}(n^3)$ for $n\times n$ matrices. Strassen's algorithm improves this to $\mathcal{O}(n^{2.807})$, but its practicality is limited for small to medium matrix sizes due to the large num… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted at 34th International Conference on Field-Programmable Logic and Applications (FPL 2024), 7 pages

    ACM Class: C.1.3

  26. arXiv:2406.00958  [pdf, other

    cs.LG cs.CV

    Navigating Conflicting Views: Harnessing Trust for Learning

    Authors: Jueqing Lu, Lan Du, Wray Buntine, Myong Chol Jung, Joanna Dipnall, Belinda Gabbe

    Abstract: Resolving conflicts is essential to make the decisions of multi-view classification more reliable. Much research has been conducted on learning consistent informative representations among different views, assuming that all views are identically important and strictly aligned. However, real-world multi-view data may not always conform to these assumptions, as some views may express distinct inform… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  27. arXiv:2405.16486  [pdf, other

    cs.CV cs.AI

    Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation

    Authors: Rongyu Zhang, Aosong Cheng, Yulin Luo, Gaole Dai, Huanrui Yang, Jiaming Liu, Ran Xu, Li Du, Yuan Du, Yanbing Jiang, Shanghang Zhang

    Abstract: Continual Test-Time Adaptation (CTTA), which aims to adapt the pre-trained model to ever-evolving target domains, emerges as an important task for vision models. As current vision models appear to be heavily biased towards texture, continuously adapting the model from one domain distribution to another can result in serious catastrophic forgetting. Drawing inspiration from the human visual system'… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  28. arXiv:2405.16447  [pdf, other

    cs.LG

    Fast Asymmetric Factorization for Large Scale Multiple Kernel Clustering

    Authors: Yan Chen, Liang Du, Lei Duan

    Abstract: Kernel methods are extensively employed for nonlinear data clustering, yet their effectiveness heavily relies on selecting suitable kernels and associated parameters, posing challenges in advance determination. In response, Multiple Kernel Clustering (MKC) has emerged as a solution, allowing the fusion of information from multiple base kernels for clustering. However, both early fusion and late fu… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  29. arXiv:2405.16091  [pdf, other

    cs.CV

    Enhancing Near OOD Detection in Prompt Learning: Maximum Gains, Minimal Costs

    Authors: Myong Chol Jung, He Zhao, Joanna Dipnall, Belinda Gabbe, Lan Du

    Abstract: Prompt learning has shown to be an efficient and effective fine-tuning method for vision-language models like CLIP. While numerous studies have focused on the generalisation of these models in few-shot classification, their capability in near out-of-distribution (OOD) detection has been overlooked. A few recent works have highlighted the promising performance of prompt learning in far OOD detectio… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  30. arXiv:2405.10630  [pdf, other

    cs.CL cs.AI

    Medical Dialogue: A Survey of Categories, Methods, Evaluation and Challenges

    Authors: Xiaoming Shi, Zeming Liu, Li Du, Yuxuan Wang, Hongru Wang, Yuhang Guo, Tong Ruan, Jie Xu, Shaoting Zhang

    Abstract: This paper surveys and organizes research works on medical dialog systems, which is an important yet challenging task. Although these systems have been surveyed in the medical community from an application perspective, a systematic review from a rigorous technical perspective has to date remained noticeably absent. As a result, an overview of the categories, methods, and evaluation of medical dial… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  31. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Yajing Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  32. arXiv:2404.08985  [pdf, other

    cs.LG cs.AI

    Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning

    Authors: Yijiang Liu, Rongyu Zhang, Huanrui Yang, Kurt Keutzer, Yuan Du, Li Du, Shanghang Zhang

    Abstract: Large Language Models (LLMs) have demonstrated significant potential in performing multiple tasks in multimedia applications, ranging from content generation to interactive entertainment, and artistic creation. However, the diversity of downstream tasks in multitask scenarios presents substantial adaptation challenges for LLMs. While traditional methods often succumb to knowledge confusion on thei… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: 13 pages, 5 figures

  33. arXiv:2404.08564  [pdf, ps, other

    cs.LG

    Federated Distillation: A Survey

    Authors: Lin Li, Jianping Gou, Baosheng Yu, Lan Du, Zhang Yiand Dacheng Tao

    Abstract: Federated Learning (FL) seeks to train a model collaboratively without sharing private training data from individual clients. Despite its promise, FL encounters challenges such as high communication costs for large-scale models and the necessity for uniform model architectures across all clients and the server. These challenges severely restrict the practical applications of FL. To address these l… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  34. Accel-NASBench: Sustainable Benchmarking for Accelerator-Aware NAS

    Authors: Afzal Ahmad, Linfeng Du, Zhiyao Xie, Wei Zhang

    Abstract: One of the primary challenges impeding the progress of Neural Architecture Search (NAS) is its extensive reliance on exorbitant computational resources. NAS benchmarks aim to simulate runs of NAS experiments at zero cost, remediating the need for extensive compute. However, existing NAS benchmarks use synthetic datasets and model proxies that make simplified assumptions about the characteristics o… ▽ More

    Submitted 18 June, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: Accepted at Design Automation Conference DAC'24

  35. arXiv:2404.01677  [pdf, other

    cs.AI cs.CL

    Towards Generalizable and Faithful Logic Reasoning over Natural Language via Resolution Refutation

    Authors: Zhouhao Sun, Xiao Ding, Li Du, Bibo Cai, Jinglong Gao, Ting Liu, Qin Bing

    Abstract: Large language models (LLMs) have achieved significant performance in various natural language reasoning tasks. However, they still struggle with performing first-order logic reasoning over formal logical theories expressed in natural language. This is because the previous LLMs-based reasoning systems have the theoretical incompleteness issue. As a result, it can only address a limited set of simp… ▽ More

    Submitted 3 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: LREC-Coling 2024

  36. arXiv:2403.18051  [pdf, other

    cs.CL cs.AI

    Supervisory Prompt Training

    Authors: Jean Ghislain Billa, Min Oh, Liang Du

    Abstract: The performance of Large Language Models (LLMs) relies heavily on the quality of prompts, which are often manually engineered and task-specific, making them costly and non-scalable. We propose a novel approach, Supervisory Prompt Training (SPT). SPT automates the generation of highly effective prompts using a dual LLM system. In this system, one LLM, the generator, performs a task while the other,… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  37. arXiv:2402.11537  [pdf, other

    cs.CL cs.AI

    Deciphering the Impact of Pretraining Data on Large Language Models through Machine Unlearning

    Authors: Yang Zhao, Li Du, Xiao Ding, Kai Xiong, Zhouhao Sun, Jun Shi, Ting Liu, Bing Qin

    Abstract: Through pretraining on a corpus with various sources, Large Language Models (LLMs) have gained impressive performance. However, the impact of each component of the pretraining corpus remains opaque. As a result, the organization of the pretraining corpus is still empirical and may deviate from the optimal. To address this issue, we systematically analyze the impact of 48 datasets from 5 major cate… ▽ More

    Submitted 28 August, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024 Findings

  38. EmoWear: Exploring Emotional Teasers for Voice Message Interaction on Smartwatches

    Authors: Pengcheng An, Jiawen Zhu, Zibo Zhang, Yifei Yin, Qingyuan Ma, Che Yan, Linghao Du, Jian Zhao

    Abstract: Voice messages, by nature, prevent users from gauging the emotional tone without fully diving into the audio content. This hinders the shared emotional experience at the pre-retrieval stage. Research scarcely explored "Emotional Teasers"-pre-retrieval cues offering a glimpse into an awaiting message's emotional tone without disclosing its content. We introduce EmoWear, a smartwatch voice messaging… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

    Comments: To appear at ACM CHI '24

  39. arXiv:2402.05359  [pdf, other

    cs.AI cs.CL cs.LG

    An Examination on the Effectiveness of Divide-and-Conquer Prompting in Large Language Models

    Authors: Yizhou Zhang, Lun Du, Defu Cao, Qiang Fu, Yan Liu

    Abstract: Foundation models, such as Large language Models (LLMs), have attracted significant amount of interest due to their large number of applications. However, when handling tasks involving repetitive sub-tasks and/or deceptive contents, such as arithmetic calculation and article-level fake news detection, simple instructional prompts suffer from inaccurate responses. Existing works show that more comp… ▽ More

    Submitted 2 July, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: Preprint

  40. arXiv:2402.03741  [pdf, other

    cs.LG cs.AI cs.CR

    SUB-PLAY: Adversarial Policies against Partially Observed Multi-Agent Reinforcement Learning Systems

    Authors: Oubo Ma, Yuwen Pu, Linkang Du, Yang Dai, Ruo Wang, Xiaolei Liu, Yingcai Wu, Shouling Ji

    Abstract: Recent advancements in multi-agent reinforcement learning (MARL) have opened up vast application prospects, such as swarm control of drones, collaborative manipulation by robotic arms, and multi-target encirclement. However, potential security threats during the MARL deployment need more attention and thorough investigation. Recent research reveals that attackers can rapidly exploit the victim's v… ▽ More

    Submitted 26 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: To appear in the ACM Conference on Computer and Communications Security (CCS'24), October 14-18, 2024, Salt Lake City, UT, USA

  41. arXiv:2401.17862  [pdf, other

    cs.CV

    Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis

    Authors: Jianing Li, Xi Nan, Ming Lu, Li Du, Shanghang Zhang

    Abstract: Multi-modal large language models (MLLMs) have demonstrated remarkable vision-language capabilities, primarily due to the exceptional in-context understanding and multi-task learning strengths of large language models (LLMs). The advent of visual instruction tuning has further enhanced MLLMs' performance in vision-language understanding. However, while existing MLLMs adeptly recognize \textit{what… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 15 pages,version 1

    ACM Class: I.5.4; I.2.7

  42. arXiv:2401.07853  [pdf, other

    cs.CV

    VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness

    Authors: Rongyu Zhang, Zefan Cai, Huanrui Yang, Zidong Liu, Denis Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Kurt Keutzer, Baobao Chang, Yuan Du, Li Du, Shanghang Zhang

    Abstract: Finetuning a pretrained vision model (PVM) is a common technique for learning downstream vision tasks. However, the conventional finetuning process with randomly sampled data points results in diminished training efficiency. To address this drawback, we propose a novel approach, Vision-language Collaborative Active Finetuning (VeCAF). With the emerging availability of labels and natural language a… ▽ More

    Submitted 13 April, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: 13 pages

  43. arXiv:2401.07525  [pdf, other

    cs.CL cs.AI

    TAROT: A Hierarchical Framework with Multitask Co-Pretraining on Semi-Structured Data towards Effective Person-Job Fit

    Authors: Yihan Cao, Xu Chen, Lun Du, Hao Chen, Qiang Fu, Shi Han, Yushu Du, Yanbin Kang, Guangming Lu, Zi Li

    Abstract: Person-job fit is an essential part of online recruitment platforms in serving various downstream applications like Job Search and Candidate Recommendation. Recently, pretrained large language models have further enhanced the effectiveness by leveraging richer textual information in user profiles and job descriptions apart from user behavior features and job metadata. However, the general domain-o… ▽ More

    Submitted 17 January, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: ICASSP 2024 camera ready. 5 pages, 1 figure, 3 tables

  44. arXiv:2401.07395  [pdf, other

    cs.LG cs.AI

    Harnessing the Power of Beta Scoring in Deep Active Learning for Multi-Label Text Classification

    Authors: Wei Tan, Ngoc Dang Nguyen, Lan Du, Wray Buntine

    Abstract: Within the scope of natural language processing, the domain of multi-label text classification is uniquely challenging due to its expansive and uneven label distribution. The complexity deepens due to the demand for an extensive set of annotated data for training an advanced deep learning model, especially in specialized fields where the labeling task can be labor-intensive and often requires doma… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

    Comments: 7 pages AAAI 2024

  45. arXiv:2401.00010  [pdf, other

    cs.SI cs.LG

    Professional Network Matters: Connections Empower Person-Job Fit

    Authors: Hao Chen, Lun Du, Yuxuan Lu, Qiang Fu, Xu Chen, Shi Han, Yanbin Kang, Guangming Lu, Zi Li

    Abstract: Online recruitment platforms typically employ Person-Job Fit models in the core service that automatically match suitable job seekers with appropriate job positions. While existing works leverage historical or contextual information, they often disregard a crucial aspect: job seekers' social relationships in professional networks. This paper emphasizes the importance of incorporating professional… ▽ More

    Submitted 19 December, 2023; originally announced January 2024.

    Comments: Accepted at WSDM 2024

  46. arXiv:2312.17710  [pdf, other

    cs.CL cs.LG

    Principled Gradient-based Markov Chain Monte Carlo for Text Generation

    Authors: Li Du, Afra Amini, Lucas Torroba Hennigen, Xinyan Velocity Yu, Jason Eisner, Holden Lee, Ryan Cotterell

    Abstract: Recent papers have demonstrated the possibility of energy-based text generation by adapting gradient-based sampling algorithms, a paradigm of MCMC algorithms that promises fast convergence. However, as we show in this paper, previous attempts on this approach to text generation all fail to sample correctly from the target language model distributions. To address this limitation, we consider the pr… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

    Comments: Preprint

  47. arXiv:2312.13671  [pdf, other

    cs.CL cs.LG

    Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries

    Authors: Xinyi He, Mengyu Zhou, Xinrun Xu, Xiaojun Ma, Rui Ding, Lun Du, Yan Gao, Ran Jia, Xu Chen, Shi Han, Zejian Yuan, Dongmei Zhang

    Abstract: Tabular data analysis is crucial in various fields, and large language models show promise in this area. However, current research mostly focuses on rudimentary tasks like Text2SQL and TableQA, neglecting advanced analysis like forecasting and chart generation. To address this gap, we developed the Text2Analysis benchmark, incorporating advanced analysis tasks that go beyond the SQL-compatible ope… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI'2024

  48. Bayesian Estimate of Mean Proper Scores for Diversity-Enhanced Active Learning

    Authors: Wei Tan, Lan Du, Wray Buntine

    Abstract: The effectiveness of active learning largely depends on the sampling efficiency of the acquisition function. Expected Loss Reduction (ELR) focuses on a Bayesian estimate of the reduction in classification error, and more general costs fit in the same framework. We propose Bayesian Estimate of Mean Proper Scores (BEMPS) to estimate the increase in strictly proper scores such as log probability or n… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: 16 pages, TPAMI. arXiv admin note: text overlap with arXiv:2110.14171

    Journal ref: TPAMI, 2023

  49. arXiv:2312.09039  [pdf, other

    cs.CL cs.AI

    TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning

    Authors: Yuan Sui, Jiaru Zou, Mengyu Zhou, Xinyi He, Lun Du, Shi Han, Dongmei Zhang

    Abstract: Table reasoning tasks have shown remarkable progress with the development of large language models (LLMs), which involve interpreting and drawing conclusions from tabular data based on natural language (NL) questions. Existing solutions mainly tested on smaller tables face scalability issues and struggle with complex queries due to incomplete or dispersed data across different table sections. To a… ▽ More

    Submitted 10 October, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: This paper has been accepted by EMNLP 2024

  50. arXiv:2312.08937  [pdf, other

    cs.LG

    BiPFT: Binary Pre-trained Foundation Transformer with Low-rank Estimation of Binarization Residual Polynomials

    Authors: Xingrun Xing, Li Du, Xinyuan Wang, Xianlin Zeng, Yequan Wang, Zheng Zhang, Jiajun Zhang

    Abstract: Pretrained foundation models offer substantial benefits for a wide range of downstream tasks, which can be one of the most potential techniques to access artificial general intelligence. However, scaling up foundation transformers for maximal task-agnostic knowledge has brought about computational challenges, especially on resource-limited devices such as mobiles. This work proposes the first Bina… ▽ More

    Submitted 20 June, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence. 2024, 38(14): 16094-16102