Skip to main content

Showing 1–50 of 301 results for author: Chua, T

  1. arXiv:2410.14148  [pdf, other

    cs.CV cs.CL

    Fine-Grained Verifiers: Preference Modeling as Next-token Prediction in Vision-Language Alignment

    Authors: Chenhang Cui, An Zhang, Yiyang Zhou, Zhaorun Chen, Gelei Deng, Huaxiu Yao, Tat-Seng Chua

    Abstract: The recent advancements in large language models (LLMs) and pre-trained vision models have accelerated the development of vision-language large models (VLLMs), enhancing the interaction between visual and linguistic modalities. Despite their notable success across various domains, VLLMs face challenges in modality alignment, which can lead to issues like hallucinations and unsafe content generatio… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: 23 pages

  2. arXiv:2410.13373  [pdf, other

    cs.LG

    Addressing Heterogeneity and Heterophily in Graphs: A Heterogeneous Heterophilic Spectral Graph Neural Network

    Authors: Kangkang Lu, Yanhua Yu, Zhiyong Huang, Jia Li, Yuling Wang, Meiyu Liang, Xiting Qin, Yimeng Ren, Tat-Seng Chua, Xidian Wang

    Abstract: Graph Neural Networks (GNNs) have garnered significant scholarly attention for their powerful capabilities in modeling graph structures. Despite this, two primary challenges persist: heterogeneity and heterophily. Existing studies often address heterogeneous and heterophilic graphs separately, leaving a research gap in the understanding of heterogeneous heterophilic graphs-those that feature diver… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  3. arXiv:2410.13117  [pdf, other

    cs.IR cs.AI

    Preference Diffusion for Recommendation

    Authors: Shuo Liu, An Zhang, Guoqing Hu, Hong Qian, Tat-seng Chua

    Abstract: Recommender systems predict personalized item rankings based on user preference distributions derived from historical behavior data. Recently, diffusion models (DMs) have gained attention in recommendation for their ability to model complex distributions, yet current DM-based recommenders often rely on traditional objectives like mean squared error (MSE) or recommendation objectives, which are not… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  4. arXiv:2410.11560  [pdf, other

    cs.CV

    PSVMA+: Exploring Multi-granularity Semantic-visual Adaption for Generalized Zero-shot Learning

    Authors: Man Liu, Huihui Bai, Feng Li, Chunjie Zhang, Yunchao Wei, Meng Wang, Tat-Seng Chua, Yao Zhao

    Abstract: Generalized zero-shot learning (GZSL) endeavors to identify the unseen categories using knowledge from the seen domain, necessitating the intrinsic interactions between the visual features and attribute semantic features. However, GZSL suffers from insufficient visual-semantic correspondences due to the attribute diversity and instance diversity. Attribute diversity refers to varying semantic gran… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: Accepted to TPAMI 2024. arXiv admin note: text overlap with arXiv:2303.15322

  5. arXiv:2410.05165  [pdf, other

    cs.IR cs.CL

    Efficient Inference for Large Language Model-based Generative Recommendation

    Authors: Xinyu Lin, Chaoqun Yang, Wenjie Wang, Yongqi Li, Cunxiao Du, Fuli Feng, See-Kiong Ng, Tat-Seng Chua

    Abstract: Large Language Model (LLM)-based generative recommendation has achieved notable success, yet its practical deployment is costly particularly due to excessive inference latency caused by autoregressive decoding. For lossless LLM decoding acceleration, Speculative Decoding (SD) has emerged as a promising solution. However, applying SD to generative recommendation presents unique challenges due to th… ▽ More

    Submitted 8 October, 2024; v1 submitted 7 October, 2024; originally announced October 2024.

  6. arXiv:2410.03739  [pdf, other

    cs.CL cs.AI

    Grammar Induction from Visual, Speech and Text

    Authors: Yu Zhao, Hao Fei, Shengqiong Wu, Meishan Zhang, Min Zhang, Tat-seng Chua

    Abstract: Grammar Induction could benefit from rich heterogeneous signals, such as text, vision, and acoustics. In the process, features from distinct modalities essentially serve complementary roles to each other. With such intuition, this work introduces a novel \emph{unsupervised visual-audio-text grammar induction} task (named \textbf{VAT-GI}), to induce the constituent grammar trees from parallel image… ▽ More

    Submitted 30 September, 2024; originally announced October 2024.

  7. arXiv:2410.02355  [pdf, other

    cs.CL cs.AI

    AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models

    Authors: Junfeng Fang, Houcheng Jiang, Kun Wang, Yunshan Ma, Xiang Wang, Xiangnan He, Tat-seng Chua

    Abstract: Large language models (LLMs) often exhibit hallucinations due to incorrect or outdated knowledge. Hence, model editing methods have emerged to enable targeted knowledge updates. To achieve this, a prevailing paradigm is the locating-then-editing approach, which first locates influential parameters and then edits them by introducing a perturbation. While effective, current studies have demonstrated… ▽ More

    Submitted 21 October, 2024; v1 submitted 3 October, 2024; originally announced October 2024.

  8. arXiv:2409.19594  [pdf, other

    cs.CR cs.AI cs.SE

    MASKDROID: Robust Android Malware Detection with Masked Graph Representations

    Authors: Jingnan Zheng, Jiaohao Liu, An Zhang, Jun Zeng, Ziqi Yang, Zhenkai Liang, Tat-Seng Chua

    Abstract: Android malware attacks have posed a severe threat to mobile users, necessitating a significant demand for the automated detection system. Among the various tools employed in malware detection, graph representations (e.g., function call graphs) have played a pivotal role in characterizing the behaviors of Android apps. However, though achieving impressive performance in malware detection, current… ▽ More

    Submitted 29 September, 2024; originally announced September 2024.

    Journal ref: IEEE/ACM Automated Software Engineering Conference 2024

  9. arXiv:2409.14399  [pdf, other

    cs.CL cs.AI

    Beyond Persuasion: Towards Conversational Recommender System with Credible Explanations

    Authors: Peixin Qin, Chen Huang, Yang Deng, Wenqiang Lei, Tat-Seng Chua

    Abstract: With the aid of large language models, current conversational recommender system (CRS) has gaining strong abilities to persuade users to accept recommended items. While these CRSs are highly persuasive, they can mislead users by incorporating incredible information in their explanations, ultimately damaging the long-term trust between users and the CRS. To address this, we propose a simple yet eff… ▽ More

    Submitted 7 October, 2024; v1 submitted 22 September, 2024; originally announced September 2024.

    Comments: Findings of EMNLP 2024. Our code is available at https://github.com/mumen798/PC-CRS

  10. arXiv:2409.14319  [pdf, other

    cs.CV cs.MM

    Scene-Text Grounding for Text-Based Video Question Answering

    Authors: Sheng Zhou, Junbin Xiao, Xun Yang, Peipei Song, Dan Guo, Angela Yao, Meng Wang, Tat-Seng Chua

    Abstract: Existing efforts in text-based video question answering (TextVideoQA) are criticized for their opaque decisionmaking and heavy reliance on scene-text recognition. In this paper, we propose to study Grounded TextVideoQA by forcing models to answer questions and spatio-temporally localize the relevant scene-text regions, thus decoupling QA from scenetext recognition and promoting research towards in… ▽ More

    Submitted 22 September, 2024; originally announced September 2024.

  11. arXiv:2409.02828  [pdf, other

    cs.CV cs.MM

    ExpLLM: Towards Chain of Thought for Facial Expression Recognition

    Authors: Xing Lan, Jian Xue, Ji Qi, Dongmei Jiang, Ke Lu, Tat-Seng Chua

    Abstract: Facial expression recognition (FER) is a critical task in multimedia with significant implications across various domains. However, analyzing the causes of facial expressions is essential for accurately recognizing them. Current approaches, such as those based on facial action units (AUs), typically provide AU names and intensities but lack insight into the interactions and relationships between A… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: project page: https://starhiking.github.io/ExpLLM_Page/

  12. arXiv:2408.04388  [pdf, other

    cs.MM cs.AI cs.IR

    MM-Forecast: A Multimodal Approach to Temporal Event Forecasting with Large Language Models

    Authors: Haoxuan Li, Zhengmao Yang, Yunshan Ma, Yi Bin, Yang Yang, Tat-Seng Chua

    Abstract: We study an emerging and intriguing problem of multimodal temporal event forecasting with large language models. Compared to using text or graph modalities, the investigation of utilizing images for temporal event forecasting has not been fully explored, especially in the era of large language models (LLMs). To bridge this gap, we are particularly interested in two key questions of: 1) why images… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    ACM Class: H.3.3

  13. arXiv:2408.04223  [pdf, other

    cs.CV cs.AI

    VideoQA in the Era of LLMs: An Empirical Study

    Authors: Junbin Xiao, Nanxin Huang, Hangyu Qin, Dongyang Li, Yicong Li, Fengbin Zhu, Zhulin Tao, Jianxing Yu, Liang Lin, Tat-Seng Chua, Angela Yao

    Abstract: Video Large Language Models (Video-LLMs) are flourishing and has advanced many video-language tasks. As a golden testbed, Video Question Answering (VideoQA) plays pivotal role in Video-LLM developing. This work conducts a timely and comprehensive study of Video-LLMs' behavior in VideoQA, aiming to elucidate their success and failure modes, and provide insights towards more human-like video underst… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Preprint. Under Review

  14. arXiv:2407.17274  [pdf, other

    cs.MM cs.AI cs.CV

    Revolutionizing Text-to-Image Retrieval as Autoregressive Token-to-Voken Generation

    Authors: Yongqi Li, Hongru Cai, Wenjie Wang, Leigang Qu, Yinwei Wei, Wenjie Li, Liqiang Nie, Tat-Seng Chua

    Abstract: Text-to-image retrieval is a fundamental task in multimedia processing, aiming to retrieve semantically relevant cross-modal content. Traditional studies have typically approached this task as a discriminative problem, matching the text and image via the cross-attention mechanism (one-tower framework) or in a common embedding space (two-tower framework). Recently, generative cross-modal retrieval… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Work in progress

  15. arXiv:2407.11712  [pdf, other

    cs.IR

    Harnessing Large Language Models for Multimodal Product Bundling

    Authors: Xiaohao Liu, Jie Wu, Zhulin Tao, Yunshan Ma, Yinwei Wei, Tat-seng Chua

    Abstract: Product bundling provides clients with a strategic combination of individual items. And it has gained significant attention in recent years as a fundamental prerequisite for online services. Recent methods utilize multimodal information through sophisticated extractors for bundling, but remain limited by inferior semantic understanding, the restricted scope of knowledge, and an inability to handle… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: under review

  16. arXiv:2407.11638  [pdf, other

    cs.CL cs.IR

    A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting

    Authors: He Chang, Chenchen Ye, Zhulin Tao, Jie Wu, Zhengmao Yang, Yunshan Ma, Xianglin Huang, Tat-Seng Chua

    Abstract: Recently, Large Language Models (LLMs) have demonstrated great potential in various data mining tasks, such as knowledge question answering, mathematical reasoning, and commonsense reasoning. However, the reasoning capability of LLMs on temporal event forecasting has been under-explored. To systematically investigate their abilities in temporal event forecasting, we conduct a comprehensive evaluat… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  17. arXiv:2407.07544  [pdf, other

    cs.CV cs.AI

    Disentangling Masked Autoencoders for Unsupervised Domain Generalization

    Authors: An Zhang, Han Wang, Xiang Wang, Tat-Seng Chua

    Abstract: Domain Generalization (DG), designed to enhance out-of-distribution (OOD) generalization, is all about learning invariance against domain shifts utilizing sufficient supervision signals. Yet, the scarcity of such labeled data has led to the rise of unsupervised domain generalization (UDG) - a more important yet challenging task in that models are trained across diverse domains in an unsupervised m… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  18. arXiv:2407.05441  [pdf, other

    cs.IR cs.AI

    Language Representations Can be What Recommenders Need: Findings and Potentials

    Authors: Leheng Sheng, An Zhang, Yi Zhang, Yuxin Chen, Xiang Wang, Tat-Seng Chua

    Abstract: Recent studies empirically indicate that language models (LMs) encode rich world knowledge beyond mere semantics, attracting significant attention across various fields. However, in the recommendation domain, it remains uncertain whether LMs implicitly encode user preference information. Contrary to prevailing understanding that LMs and traditional recommenders learn two distinct representation sp… ▽ More

    Submitted 2 October, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: Codes are available at https://github.com/LehengTHU/AlphaRec

  19. Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment

    Authors: Hao Fei, Shengqiong Wu, Meishan Zhang, Min Zhang, Tat-Seng Chua, Shuicheng Yan

    Abstract: While pre-training large-scale video-language models (VLMs) has shown remarkable potential for various downstream video-language tasks, existing VLMs can still suffer from certain commonly seen limitations, e.g., coarse-grained cross-modal aligning , under-modeling of temporal dynamics, detached video-language view. In this work, we target enhancing VLMs with a fine-grained structural spatio-tempo… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE TPAMI 2024

    Journal ref: [J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

  20. arXiv:2406.14333  [pdf, other

    cs.IR cs.SD eess.AS

    LARP: Language Audio Relational Pre-training for Cold-Start Playlist Continuation

    Authors: Rebecca Salganik, Xiaohao Liu, Yunshan Ma, Jian Kang, Tat-Seng Chua

    Abstract: As online music consumption increasingly shifts towards playlist-based listening, the task of playlist continuation, in which an algorithm suggests songs to extend a playlist in a personalized and musically cohesive manner, has become vital to the success of music streaming. Currently, many existing playlist continuation approaches rely on collaborative filtering methods to perform recommendation.… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  21. arXiv:2406.12639  [pdf, other

    cs.CL cs.AI

    Ask-before-Plan: Proactive Language Agents for Real-World Planning

    Authors: Xuan Zhang, Yang Deng, Zifeng Ren, See-Kiong Ng, Tat-Seng Chua

    Abstract: The evolution of large language models (LLMs) has enhanced the planning capabilities of language agents in diverse real-world scenarios. Despite these advancements, the potential of LLM-powered agents to comprehend ambiguous user instructions for reasoning and decision-making is still under exploration. In this work, we introduce a new task, Proactive Agent Planning, which requires language agents… ▽ More

    Submitted 1 October, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by EMNLP 2024 Findings

  22. arXiv:2406.09215  [pdf, other

    cs.IR cs.AI

    On Softmax Direct Preference Optimization for Recommendation

    Authors: Yuxin Chen, Junfei Tan, An Zhang, Zhengyi Yang, Leheng Sheng, Enzhi Zhang, Xiang Wang, Tat-Seng Chua

    Abstract: Recommender systems aim to predict personalized rankings based on user preference data. With the rise of Language Models (LMs), LM-based recommenders have been widely explored due to their extensive world knowledge and powerful reasoning abilities. Most of the LM-based recommenders convert historical interactions into language prompts, pairing with a positive item as the target response and fine-t… ▽ More

    Submitted 14 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  23. arXiv:2406.05925  [pdf, other

    cs.CL cs.AI

    Hello Again! LLM-powered Personalized Agent for Long-term Dialogue

    Authors: Hao Li, Chenghao Yang, An Zhang, Yang Deng, Xiang Wang, Tat-Seng Chua

    Abstract: Open-domain dialogue systems have seen remarkable advancements with the development of large language models (LLMs). Nonetheless, most existing dialogue systems predominantly focus on brief single-session interactions, neglecting the real-world demands for long-term companionship and personalized interactions with chatbots. Crucial to addressing this real-world need are event summary and persona m… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: 17 pages, 4 figures

  24. arXiv:2406.05814  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    Unified Text-to-Image Generation and Retrieval

    Authors: Leigang Qu, Haochuan Li, Tan Wang, Wenjie Wang, Yongqi Li, Liqiang Nie, Tat-Seng Chua

    Abstract: How humans can efficiently and effectively acquire images has always been a perennial question. A typical solution is text-to-image retrieval from an existing database given the text query; however, the limited database typically lacks creativity. By contrast, recent breakthroughs in text-to-image generation have made it possible to produce fancy and diverse visual content, but it faces challenges… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  25. arXiv:2406.05127  [pdf, other

    cs.CV

    Towards Semantic Equivalence of Tokenization in Multimodal LLM

    Authors: Shengqiong Wu, Hao Fei, Xiangtai Li, Jiayi Ji, Hanwang Zhang, Tat-Seng Chua, Shuicheng Yan

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in processing vision-language tasks. One of the crux of MLLMs lies in vision tokenization, which involves efficiently transforming input visual signals into feature representations that are most beneficial for LLMs. However, existing vision tokenizers, essential for semantic alignment between vision and language, r… ▽ More

    Submitted 9 October, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: Technical Report. The project page: https://chocowu.github.io/SeTok-web/

  26. arXiv:2406.03032  [pdf, other

    cs.CV

    Instructing Prompt-to-Prompt Generation for Zero-Shot Learning

    Authors: Man Liu, Huihui Bai, Feng Li, Chunjie Zhang, Yunchao Wei, Meng Wang, Tat-Seng Chua, Yao Zhao

    Abstract: Zero-shot learning (ZSL) aims to explore the semantic-visual interactions to discover comprehensive knowledge transferred from seen categories to classify unseen categories. Recently, prompt engineering has emerged in ZSL, demonstrating impressive potential as it enables the zero-shot transfer of diverse visual concepts to downstream tasks. However, these methods are still not well generalized to… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  27. arXiv:2406.02472  [pdf, other

    cs.CL

    Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding

    Authors: Zhihan Zhang, Yixin Cao, Chenchen Ye, Yunshan Ma, Lizi Liao, Tat-Seng Chua

    Abstract: The digital landscape is rapidly evolving with an ever-increasing volume of online news, emphasizing the need for swift and precise analysis of complex events. We refer to the complex events composed of many news articles over an extended period as Temporal Complex Event (TCE). This paper proposes a novel approach using Large Language Models (LLMs) to systematically extract and analyze the event c… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024

  28. arXiv:2405.20626  [pdf, other

    cs.IR cs.IT

    Causal Distillation for Alleviating Performance Heterogeneity in Recommender Systems

    Authors: Shengyu Zhang, Ziqi Jiang, Jiangchao Yao, Fuli Feng, Kun Kuang, Zhou Zhao, Shuo Li, Hongxia Yang, Tat-Seng Chua, Fei Wu

    Abstract: Recommendation performance usually exhibits a long-tail distribution over users -- a small portion of head users enjoy much more accurate recommendation services than the others. We reveal two sources of this performance heterogeneity problem: the uneven distribution of historical interactions (a natural source); and the biased training of recommender models (a model source). As addressing this pr… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: TKDE 2023

  29. arXiv:2405.17220  [pdf, other

    cs.CL

    RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

    Authors: Tianyu Yu, Haoye Zhang, Yuan Yao, Yunkai Dang, Da Chen, Xiaoman Lu, Ganqu Cui, Taiwen He, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun

    Abstract: Learning from feedback reduces the hallucination of multimodal large language models (MLLMs) by aligning them with human preferences. While traditional methods rely on labor-intensive and time-consuming manual labeling, recent approaches employing models as automatic labelers have shown promising results without human intervention. However, these methods heavily rely on costly proprietary models l… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Project Website: https://github.com/RLHF-V/RLAIF-V

  30. arXiv:2405.14225  [pdf, other

    q-bio.QM cs.CL cs.MM

    ReactXT: Understanding Molecular "Reaction-ship" via Reaction-Contextualized Molecule-Text Pretraining

    Authors: Zhiyuan Liu, Yaorui Shi, An Zhang, Sihang Li, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua

    Abstract: Molecule-text modeling, which aims to facilitate molecule-relevant tasks with a textual interface and textual knowledge, is an emerging research direction. Beyond single molecules, studying reaction-text modeling holds promise for helping the synthesis of new materials and drugs. However, previous works mostly neglect reaction-text modeling: they primarily focus on modeling individual molecule-tex… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: ACL 2024 Findings, 9 pages

  31. arXiv:2405.14125  [pdf, other

    cs.AI cs.CL

    ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation

    Authors: Jingnan Zheng, Han Wang, An Zhang, Tai D. Nguyen, Jun Sun, Tat-Seng Chua

    Abstract: Large Language Models (LLMs) can elicit unintended and even harmful content when misaligned with human values, posing severe risks to users and society. To mitigate these risks, current evaluation benchmarks predominantly employ expert-designed contextual scenarios to assess how well LLMs align with human values. However, the labor-intensive nature of these benchmarks limits their test scope, hind… ▽ More

    Submitted 24 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  32. arXiv:2405.13820  [pdf, other

    cs.CL

    Towards Comprehensive and Efficient Post Safety Alignment of Large Language Models via Safety Patching

    Authors: Weixiang Zhao, Yulin Hu, Zhuojun Li, Yang Deng, Yanyan Zhao, Bing Qin, Tat-Seng Chua

    Abstract: Safety alignment of large language models (LLMs) has been gaining increasing attention. However, current safety-aligned LLMs suffer from the fragile and imbalanced safety mechanisms, which can still be induced to generate unsafe responses, exhibit over-safety by rejecting safe user inputs, and fail to preserve general utility after safety alignment. To this end, we propose a novel post safety alig… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 24 pages, 8 figures and 12 tables

  33. arXiv:2405.12564  [pdf, other

    q-bio.QM cs.CL cs.MM

    ProtT3: Protein-to-Text Generation for Text-based Protein Understanding

    Authors: Zhiyuan Liu, An Zhang, Hao Fei, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua

    Abstract: Language Models (LMs) excel in understanding textual descriptions of proteins, as evident in biomedical question-answering tasks. However, their capability falters with raw protein data, such as amino acid sequences, due to a deficit in pretraining on such data. Conversely, Protein Language Models (PLMs) can understand and convert protein data into high-quality representations, but struggle to pro… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: ACL 2024, 9 pages

  34. arXiv:2405.12063  [pdf, other

    cs.CL

    CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models

    Authors: Tong Zhang, Peixin Qin, Yang Deng, Chen Huang, Wenqiang Lei, Junhong Liu, Dingnan Jin, Hongru Liang, Tat-Seng Chua

    Abstract: Large language models (LLMs) are increasingly used to meet user information needs, but their effectiveness in dealing with user queries that contain various types of ambiguity remains unknown, ultimately risking user trust and satisfaction. To this end, we introduce CLAMBER, a benchmark for evaluating LLMs using a well-organized taxonomy. Building upon the taxonomy, we construct ~12K high-quality… ▽ More

    Submitted 1 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: Accepted to ACL 2024. Camera Ready. Our dataset is available at https://github.com/zt991211/CLAMBER

  35. arXiv:2405.12059  [pdf, other

    cs.CL

    STYLE: Improving Domain Transferability of Asking Clarification Questions in Large Language Model Powered Conversational Agents

    Authors: Yue Chen, Chen Huang, Yang Deng, Wenqiang Lei, Dingnan Jin, Jia Liu, Tat-Seng Chua

    Abstract: Equipping a conversational search engine with strategies regarding when to ask clarification questions is becoming increasingly important across various domains. Attributing to the context understanding capability of LLMs and their access to domain-specific sources of knowledge, LLM-based clarification strategies feature rapid transfer to various domains in a post-hoc manner. However, they still s… ▽ More

    Submitted 1 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: Accepted to Findings of ACL 2024. Camera Ready

  36. arXiv:2405.10248  [pdf, other

    cs.HC cs.IR

    Co-Matching: Towards Human-Machine Collaborative Legal Case Matching

    Authors: Chen Huang, Xinwei Yang, Yang Deng, Wenqiang Lei, JianCheng Lv, Tat-Seng Chua

    Abstract: Recent efforts have aimed to improve AI machines in legal case matching by integrating legal domain knowledge. However, successful legal case matching requires the tacit knowledge of legal practitioners, which is difficult to verbalize and encode into machines. This emphasizes the crucial role of involving legal practitioners in high-stakes legal case matching. To address this, we propose a collab… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Draft V1: 23 pages, 7 figures

  37. arXiv:2405.07314  [pdf, other

    cs.IR

    Learnable Item Tokenization for Generative Recommendation

    Authors: Wenjie Wang, Honghui Bao, Xinyu Lin, Jizhi Zhang, Yongqi Li, Fuli Feng, See-Kiong Ng, Tat-Seng Chua

    Abstract: Utilizing powerful Large Language Models (LLMs) for generative recommendation has attracted much attention. Nevertheless, a crucial challenge is transforming recommendation data into the language space of LLMs through effective item tokenization. Current approaches, such as ID, textual, and codebook-based identifiers, exhibit shortcomings in encoding semantic information, incorporating collaborati… ▽ More

    Submitted 18 August, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

    Comments: Accepted by CIKM 2024

  38. arXiv:2405.06211  [pdf, other

    cs.CL cs.AI cs.IR

    A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models

    Authors: Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, Qing Li

    Abstract: As one of the most advanced techniques in AI, Retrieval-Augmented Generation (RAG) can offer reliable and up-to-date external knowledge, providing huge convenience for numerous tasks. Particularly in the era of AI-Generated Content (AIGC), the powerful capacity of retrieval in providing additional knowledge enables RAG to assist existing generative AI in producing high-quality outputs. Recently, L… ▽ More

    Submitted 17 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: This is the long version of the corresponding survey paper accepted by KDD2024

  39. arXiv:2405.01926  [pdf, other

    cs.CV

    Auto-Encoding Morph-Tokens for Multimodal LLM

    Authors: Kaihang Pan, Siliang Tang, Juncheng Li, Zhaoyu Fan, Wei Chow, Shuicheng Yan, Tat-Seng Chua, Yueting Zhuang, Hanwang Zhang

    Abstract: For multimodal LLMs, the synergy of visual comprehension (textual output) and generation (visual output) presents an ongoing challenge. This is due to a conflicting objective: for comprehension, an MLLM needs to abstract the visuals; for generation, it needs to preserve the visuals as much as possible. Thus, the objective is a dilemma for visual-tokens. To resolve the conflict, we propose encoding… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  40. arXiv:2404.17826  [pdf, other

    cs.IR

    A Taxation Perspective for Fair Re-ranking

    Authors: Chen Xu, Xiaopeng Ye, Wenjie Wang, Liang Pang, Jun Xu, Tat-Seng Chua

    Abstract: Fair re-ranking aims to redistribute ranking slots among items more equitably to ensure responsibility and ethics. The exploration of redistribution problems has a long history in economics, offering valuable insights for conceptualizing fair re-ranking as a taxation process. Such a formulation provides us with a fresh perspective to re-examine fair re-ranking and inspire the development of new me… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Accepted in SIGIR 2024

  41. arXiv:2404.16924  [pdf, other

    cs.IR cs.CL

    A Survey of Generative Search and Recommendation in the Era of Large Language Models

    Authors: Yongqi Li, Xinyu Lin, Wenjie Wang, Fuli Feng, Liang Pang, Wenjie Li, Liqiang Nie, Xiangnan He, Tat-Seng Chua

    Abstract: With the information explosion on the Web, search and recommendation are foundational infrastructures to satisfying users' information needs. As the two sides of the same coin, both revolve around the same core research problem, matching queries with documents or users with items. In the recent few decades, search and recommendation have experienced synchronous technological paradigm shifts, inclu… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  42. Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval

    Authors: Haokun Wen, Xuemeng Song, Xiaolin Chen, Yinwei Wei, Liqiang Nie, Tat-Seng Chua

    Abstract: Composed image retrieval (CIR) aims to retrieve the target image based on a multimodal query, i.e., a reference image paired with corresponding modification text. Recent CIR studies leverage vision-language pre-trained (VLP) methods as the feature extraction backbone, and perform nonlinear feature-level multimodal query fusion to retrieve the target image. Despite the promising performance, we arg… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: ACM SIGIR 2024

  43. arXiv:2404.12670  [pdf, other

    cs.IR cs.CL cs.HC

    Towards Human-centered Proactive Conversational Agents

    Authors: Yang Deng, Lizi Liao, Zhonghua Zheng, Grace Hui Yang, Tat-Seng Chua

    Abstract: Recent research on proactive conversational agents (PCAs) mainly focuses on improving the system's capabilities in anticipating and planning action sequences to accomplish tasks and achieve goals before users articulate their requests. This perspectives paper highlights the importance of moving towards building human-centered PCAs that emphasize human needs and expectations, and that considers eth… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGIR 2024 (Perspectives Track)

  44. arXiv:2404.11129  [pdf, other

    cs.CV

    Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales

    Authors: Minghe Gao, Shuang Chen, Liang Pang, Yuan Yao, Jisheng Dang, Wenqiao Zhang, Juncheng Li, Siliang Tang, Yueting Zhuang, Tat-Seng Chua

    Abstract: The remarkable performance of Multimodal Large Language Models (MLLMs) has unequivocally demonstrated their proficient understanding capabilities in handling a wide array of visual tasks. Nevertheless, the opaque nature of their black-box reasoning processes persists as an enigma, rendering them uninterpretable and struggling with hallucination. Their ability to execute intricate compositional rea… ▽ More

    Submitted 5 August, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  45. arXiv:2404.03304  [pdf, other

    cs.CL cs.AI

    Concept -- An Evaluation Protocol on Conversational Recommender Systems with System-centric and User-centric Factors

    Authors: Chen Huang, Peixin Qin, Yang Deng, Wenqiang Lei, Jiancheng Lv, Tat-Seng Chua

    Abstract: The conversational recommendation system (CRS) has been criticized regarding its user experience in real-world scenarios, despite recent significant progress achieved in academia. Existing evaluation protocols for CRS may prioritize system-centric factors such as effectiveness and fluency in conversation while neglecting user-centric aspects. Thus, we propose a new and inclusive evaluation protoco… ▽ More

    Submitted 6 May, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: 33 pages, 18 tables, and 10 figures. Our code is available at https://github.com/huangzichun/Concept4CRS

  46. arXiv:2404.01735  [pdf, other

    cs.IR cs.MM

    CIRP: Cross-Item Relational Pre-training for Multimodal Product Bundling

    Authors: Yunshan Ma, Yingzhi He, Wenjun Zhong, Xiang Wang, Roger Zimmermann, Tat-Seng Chua

    Abstract: Product bundling has been a prevailing marketing strategy that is beneficial in the online shopping scenario. Effective product bundling methods depend on high-quality item representations, which need to capture both the individual items' semantics and cross-item relations. However, previous item representation learning methods, either feature fusion or graph learning, suffer from inadequate cross… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: arXiv preprint, 10 pages, 4 figures, 6 tables

    ACM Class: H.3.0

  47. arXiv:2403.14972  [pdf, other

    cs.AI cs.CL cs.MA cs.MM

    A Picture Is Worth a Graph: A Blueprint Debate Paradigm for Multimodal Reasoning

    Authors: Changmeng Zheng, Dayong Liang, Wengyu Zhang, Xiao-Yong Wei, Tat-Seng Chua, Qing Li

    Abstract: This paper presents a pilot study aimed at introducing multi-agent debate into multimodal reasoning. The study addresses two key challenges: the trivialization of opinions resulting from excessive summarization and the diversion of focus caused by distractor concepts introduced from images. These challenges stem from the inductive (bottom-up) nature of existing debating schemes. To address the iss… ▽ More

    Submitted 6 August, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

    Comments: Accepted by ACM Multimedia 2024

  48. arXiv:2403.11703  [pdf, other

    cs.CV cs.AI

    LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

    Authors: Ruyi Xu, Yuan Yao, Zonghao Guo, Junbo Cui, Zanlin Ni, Chunjiang Ge, Tat-Seng Chua, Zhiyuan Liu, Maosong Sun, Gao Huang

    Abstract: Visual encoding constitutes the basis of large multimodal models (LMMs) in understanding the visual world. Conventional LMMs process images in fixed sizes and limited resolutions, while recent explorations in this direction are limited in adaptivity, efficiency, and even correctness. In this work, we first take GPT-4V and LLaVA-1.5 as representative examples and expose systematic flaws rooted in t… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Preprint

  49. arXiv:2403.09972  [pdf, other

    cs.CL

    Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer Reflection

    Authors: Moxin Li, Wenjie Wang, Fuli Feng, Fengbin Zhu, Qifan Wang, Tat-Seng Chua

    Abstract: Self-detection for Large Language Models (LLMs) seeks to evaluate the trustworthiness of the LLM's output by leveraging its own capabilities, thereby alleviating the issue of output hallucination. However, existing self-detection approaches only retrospectively evaluate answers generated by LLM, typically leading to the over-trust in incorrectly generated answers. To tackle this limitation, we pro… ▽ More

    Submitted 27 September, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: EMNLP findings 2024

  50. arXiv:2403.06769  [pdf, other

    cs.CL

    Strength Lies in Differences! Improving Strategy Planning for Non-collaborative Dialogues via Diversified User Simulation

    Authors: Tong Zhang, Chen Huang, Yang Deng, Hongru Liang, Jia Liu, Zujie Wen, Wenqiang Lei, Tat-Seng Chua

    Abstract: We investigate non-collaborative dialogue agents, which are expected to engage in strategic conversations with diverse users, for securing a mutual agreement that leans favorably towards the system's objectives. This poses two main challenges for existing dialogue agents: 1) The inability to integrate user-specific characteristics into the strategic planning, and 2) The difficulty of training stra… ▽ More

    Submitted 22 September, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted by EMNLP 2024 (Main)