Skip to main content

Showing 1–50 of 64 results for author: Hui, B

  1. arXiv:2410.12621  [pdf, other

    cs.CL cs.LG

    Weak-to-Strong Generalization beyond Accuracy: a Pilot Study in Safety, Toxicity, and Legal Reasoning

    Authors: Ruimeng Ye, Yang Xiao, Bo Hui

    Abstract: As large language models (LLMs) continue to advance, ensuring their alignment with human values becomes increasingly critical. Traditional alignment methods heavily rely on human feedback to fine-tune models. With the emergence of superhuman models whose outputs may surpass human understanding, evaluating and aligning these models using human judgments poses significant challenges. To address the… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  2. arXiv:2409.18980  [pdf, other

    cs.CL cs.AI cs.CV

    IW-Bench: Evaluating Large Multimodal Models for Converting Image-to-Web

    Authors: Hongcheng Guo, Wei Zhang, Junhao Chen, Yaonan Gu, Jian Yang, Junjia Du, Binyuan Hui, Tianyu Liu, Jianxin Ma, Chang Zhou, Zhoujun Li

    Abstract: Recently advancements in large multimodal models have led to significant strides in image comprehension capabilities. Despite these advancements, there is a lack of the robust benchmark specifically for assessing the Image-to-Web conversion proficiency of these large models. Primarily, it is essential to ensure the integrity of the web elements generated. These elements comprise visible and invisi… ▽ More

    Submitted 14 September, 2024; originally announced September 2024.

  3. arXiv:2409.12186  [pdf, other

    cs.CL

    Qwen2.5-Coder Technical Report

    Authors: Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Kai Dang, An Yang, Rui Men, Fei Huang, Xingzhang Ren, Xuancheng Ren, Jingren Zhou, Junyang Lin

    Abstract: In this report, we introduce the Qwen2.5-Coder series, a significant upgrade from its predecessor, CodeQwen1.5. This series includes two models: Qwen2.5-Coder-1.5B and Qwen2.5-Coder-7B. As a code-specific model, Qwen2.5-Coder is built upon the Qwen2.5 architecture and continues pretrained on a vast corpus of over 5.5 trillion tokens. Through meticulous data cleaning, scalable synthetic data genera… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

  4. arXiv:2409.12122  [pdf, other

    cs.CL cs.AI cs.LG

    Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement

    Authors: An Yang, Beichen Zhang, Binyuan Hui, Bofei Gao, Bowen Yu, Chengpeng Li, Dayiheng Liu, Jianhong Tu, Jingren Zhou, Junyang Lin, Keming Lu, Mingfeng Xue, Runji Lin, Tianyu Liu, Xingzhang Ren, Zhenru Zhang

    Abstract: In this report, we present a series of math-specific large language models: Qwen2.5-Math and Qwen2.5-Math-Instruct-1.5B/7B/72B. The core innovation of the Qwen2.5 series lies in integrating the philosophy of self-improvement throughout the entire pipeline, from pre-training and post-training to inference: (1) During the pre-training phase, Qwen2-Math-Instruct is utilized to generate large-scale, h… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

  5. arXiv:2409.02060  [pdf, other

    cs.CL cs.AI cs.LG

    OLMoE: Open Mixture-of-Experts Language Models

    Authors: Niklas Muennighoff, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Jacob Morrison, Sewon Min, Weijia Shi, Pete Walsh, Oyvind Tafjord, Nathan Lambert, Yuling Gu, Shane Arora, Akshita Bhagia, Dustin Schwenk, David Wadden, Alexander Wettig, Binyuan Hui, Tim Dettmers, Douwe Kiela, Ali Farhadi, Noah A. Smith, Pang Wei Koh, Amanpreet Singh, Hannaneh Hajishirzi

    Abstract: We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses only 1B per input token. We pretrain it on 5 trillion tokens and further adapt it to create OLMoE-1B-7B-Instruct. Our models outperform all available models with similar active parameters, even surpassing larger ones like Llama2-13B-Chat an… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 61 pages (24 main), 36 figures, 14 tables

  6. arXiv:2408.03256  [pdf, other

    cs.CL

    Synthesizing Text-to-SQL Data from Weak and Strong LLMs

    Authors: Jiaxi Yang, Binyuan Hui, Min Yang, Jian Yang, Junyang Lin, Chang Zhou

    Abstract: The capability gap between open-source and closed-source large language models (LLMs) remains a challenge in text-to-SQL tasks. In this paper, we introduce a synthetic data approach that combines data produced by larger, more powerful models (strong models) with error information data generated by smaller, not well-aligned models (weak models). The method not only enhances the domain generalizatio… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: 12 pages, 7 figures, ACL 2024

  7. arXiv:2407.16741  [pdf, other

    cs.SE cs.AI cs.CL

    OpenHands: An Open Platform for AI Software Developers as Generalist Agents

    Authors: Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, Graham Neubig

    Abstract: Software is one of the most powerful tools that we humans have at our disposal; it allows a skilled programmer to interact with the world in complex and profound ways. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. In this paper, we introduce OpenH… ▽ More

    Submitted 4 October, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: Code: https://github.com/All-Hands-AI/OpenHands

  8. arXiv:2407.10671  [pdf, other

    cs.CL cs.AI

    Qwen2 Technical Report

    Authors: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin , et al. (37 additional authors not shown)

    Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a… ▽ More

    Submitted 10 September, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 26 pages, 1 figure

  9. arXiv:2406.15877  [pdf, other

    cs.SE cs.AI cs.CL

    BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

    Authors: Terry Yue Zhuo, Minh Chien Vu, Jenny Chim, Han Hu, Wenhao Yu, Ratnadira Widyasari, Imam Nur Bani Yusuf, Haolan Zhan, Junda He, Indraneil Paul, Simon Brunner, Chen Gong, Thong Hoang, Armel Randy Zebaze, Xiaoheng Hong, Wen-Ding Li, Jean Kaddour, Ming Xu, Zhihan Zhang, Prateek Yadav, Naman Jain, Alex Gu, Zhoujun Cheng, Jiawei Liu, Qian Liu , et al. (8 additional authors not shown)

    Abstract: Task automation has been greatly empowered by the recent advances in Large Language Models (LLMs) via Python code, where the tasks ranging from software engineering development to general-purpose reasoning. While current benchmarks have shown that LLMs can solve tasks using programs like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks o… ▽ More

    Submitted 7 October, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

    Comments: 44 pages, 14 figures, 7 tables, built with love by the BigCode community :)

  10. arXiv:2405.15232  [pdf, other

    cs.CV cs.CL

    DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception

    Authors: Run Luo, Yunshui Li, Longze Chen, Wanwei He, Ting-En Lin, Ziqiang Liu, Lei Zhang, Zikai Song, Xiaobo Xia, Tongliang Liu, Min Yang, Binyuan Hui

    Abstract: The development of large language models (LLMs) has significantly advanced the emergence of large multimodal models (LMMs). While LMMs have achieved tremendous success by promoting the synergy between multimodal comprehension and creation, they often face challenges when confronted with out-of-distribution data, such as which can hardly distinguish orientation, quantity, color, structure, etc. Thi… ▽ More

    Submitted 29 September, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 25 pages. arXiv admin note: text overlap with arXiv:2401.10208 by other authors

  11. arXiv:2405.10621  [pdf, other

    cs.LG cs.AI

    Historically Relevant Event Structuring for Temporal Knowledge Graph Reasoning

    Authors: Jinchuan Zhang, Bei Hui, Chong Mu, Ming Sun, Ling Tian

    Abstract: Temporal Knowledge Graph (TKG) reasoning focuses on predicting events through historical information within snapshots distributed on a timeline. Existing studies mainly concentrate on two perspectives of leveraging the history of TKGs, including capturing evolution of each recent snapshot or correlations among global historical facts. Despite the achieved significant accomplishments, these models… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  12. arXiv:2405.06823  [pdf, other

    cs.CR cs.AI cs.LG

    PLeak: Prompt Leaking Attacks against Large Language Model Applications

    Authors: Bo Hui, Haolin Yuan, Neil Gong, Philippe Burlina, Yinzhi Cao

    Abstract: Large Language Models (LLMs) enable a new ecosystem with many downstream applications, called LLM applications, with different natural language processing tasks. The functionality and performance of an LLM application highly depend on its system prompt, which instructs the backend LLM on what task to perform. Therefore, an LLM application developer often keeps a system prompt confidential to prote… ▽ More

    Submitted 14 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

    Comments: To appear in the Proceedings of The ACM Conference on Computer and Communications Security (CCS), 2024

  13. arXiv:2403.08604  [pdf, other

    cs.CL cs.SE

    DevBench: A Comprehensive Benchmark for Software Development

    Authors: Bowen Li, Wenhan Wu, Ziwei Tang, Lin Shi, John Yang, Jinyang Li, Shunyu Yao, Chen Qian, Binyuan Hui, Qicheng Zhang, Zhiyin Yu, He Du, Ping Yang, Dahua Lin, Chao Peng, Kai Chen

    Abstract: Recent advancements in large language models (LLMs) have significantly enhanced their coding capabilities. However, existing benchmarks predominantly focused on simplified or isolated aspects of programming, such as single-file code generation or repository issue debugging, falling short of measuring the full spectrum of challenges raised by real-world programming activities. To this end, we propo… ▽ More

    Submitted 15 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: Our data and code are available at https://github.com/open-compass/DevBench

  14. arXiv:2403.04861  [pdf, other

    cs.LG cs.NE

    A Survey of Lottery Ticket Hypothesis

    Authors: Bohan Liu, Zijie Zhang, Peixiong He, Zhensen Wang, Yang Xiao, Ruimeng Ye, Yang Zhou, Wei-Shinn Ku, Bo Hui

    Abstract: The Lottery Ticket Hypothesis (LTH) states that a dense neural network model contains a highly sparse subnetwork (i.e., winning tickets) that can achieve even better performance than the original model when trained in isolation. While LTH has been proved both empirically and theoretically in many works, there still are some open issues, such as efficiency and scalability, to be addressed. Also, th… ▽ More

    Submitted 12 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  15. arXiv:2403.01886  [pdf, other

    cs.CL cs.AI

    FCDS: Fusing Constituency and Dependency Syntax into Document-Level Relation Extraction

    Authors: Xudong Zhu, Zhao Kang, Bei Hui

    Abstract: Document-level Relation Extraction (DocRE) aims to identify relation labels between entities within a single document. It requires handling several sentences and reasoning over them. State-of-the-art DocRE methods use a graph structure to connect entities across the document to capture dependency syntax information. However, this is insufficient to fully exploit the rich syntax information in the… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Appear in COLING 2024

  16. arXiv:2402.19173  [pdf, other

    cs.SE cs.AI

    StarCoder 2 and The Stack v2: The Next Generation

    Authors: Anton Lozhkov, Raymond Li, Loubna Ben Allal, Federico Cassano, Joel Lamy-Poirier, Nouamane Tazi, Ao Tang, Dmytro Pykhtar, Jiawei Liu, Yuxiang Wei, Tianyang Liu, Max Tian, Denis Kocetkov, Arthur Zucker, Younes Belkada, Zijian Wang, Qian Liu, Dmitry Abulkhanov, Indraneil Paul, Zhuang Li, Wen-Ding Li, Megan Risdal, Jia Li, Jian Zhu, Terry Yue Zhuo , et al. (41 additional authors not shown)

    Abstract: The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digital commons of their source code archive. Alongside the SWH repositories spanning 619 programming languages, we carefully select other high-quality data… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  17. arXiv:2401.01076  [pdf, other

    cs.CL

    DialCLIP: Empowering CLIP as Multi-Modal Dialog Retriever

    Authors: Zhichao Yin, Binyuan Hui, Min Yang, Fei Huang, Yongbin Li

    Abstract: Recently, substantial advancements in pre-trained vision-language models have greatly enhanced the capabilities of multi-modal dialog systems. These models have demonstrated significant improvements by fine-tuning on downstream tasks. However, the existing pre-trained models primarily focus on effectively capturing the alignment between vision and language modalities, often ignoring the intricate… ▽ More

    Submitted 2 January, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

    Comments: ICASSP 2024

  18. arXiv:2312.10302  [pdf, other

    cs.CL cs.AI

    One-Shot Learning as Instruction Data Prospector for Large Language Models

    Authors: Yunshui Li, Binyuan Hui, Xiaobo Xia, Jiaxi Yang, Min Yang, Lei Zhang, Shuzheng Si, Ling-Hao Chen, Junhao Liu, Tongliang Liu, Fei Huang, Yongbin Li

    Abstract: Contemporary practices in instruction tuning often hinge on enlarging data scaling without a clear strategy for ensuring data quality, inadvertently introducing noise that may compromise model performance. To address this challenge, we introduce \textsc{Nuggets}, a novel and efficient methodology that leverages one-shot learning to discern and select high-quality instruction data from extensive da… ▽ More

    Submitted 3 June, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: ACL 2024

  19. Learning Multi-graph Structure for Temporal Knowledge Graph Reasoning

    Authors: Jinchuan Zhang, Bei Hui, Chong Mu, Ling Tian

    Abstract: Temporal Knowledge Graph (TKG) reasoning that forecasts future events based on historical snapshots distributed over timestamps is denoted as extrapolation and has gained significant attention. Owing to its extreme versatility and variation in spatial and temporal correlations, TKG reasoning presents a challenging task, demanding efficient capture of concurrent structures and evolutional interacti… ▽ More

    Submitted 26 February, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

  20. arXiv:2310.18823  [pdf, other

    cs.LG

    Successfully Applying Lottery Ticket Hypothesis to Diffusion Model

    Authors: Chao Jiang, Bo Hui, Bohan Liu, Da Yan

    Abstract: Despite the success of diffusion models, the training and inference of diffusion models are notoriously expensive due to the long chain of the reverse process. In parallel, the Lottery Ticket Hypothesis (LTH) claims that there exists winning tickets (i.e., aproperly pruned sub-network together with original weight initialization) that can achieve performance competitive to the original dense neura… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

  21. arXiv:2310.07728  [pdf

    cs.HC cs.AI

    AI Algorithm for the Generation of Three-Dimensional Accessibility Ramps in Grasshopper / Rhinoceros 7

    Authors: Antonio Li, Leila Yi, Brandon Yeo Pei Hui

    Abstract: Often overlooked as a component of urban development, accessibility infrastructure is undeniably crucial in daily life. Accessibility ramps are one of the most common types of accessibility infrastructure, and serve to benefit not only people with mobile impairments but also able-bodied third parties. While the necessity of accessibility ramps is acknowledged, actual implementation fails in light… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

    Comments: 9 pages, 7 figures

  22. arXiv:2310.06830  [pdf, other

    cs.CL

    Lemur: Harmonizing Natural Language and Code for Language Agents

    Authors: Yiheng Xu, Hongjin Su, Chen Xing, Boyu Mi, Qian Liu, Weijia Shi, Binyuan Hui, Fan Zhou, Yitao Liu, Tianbao Xie, Zhoujun Cheng, Siheng Zhao, Lingpeng Kong, Bailin Wang, Caiming Xiong, Tao Yu

    Abstract: We introduce Lemur and Lemur-Chat, openly accessible language models optimized for both natural language and coding capabilities to serve as the backbone of versatile language agents. The evolution from language chat models to functional language agents demands that models not only master human interaction, reasoning, and planning but also ensure grounding in the relevant environments. This calls… ▽ More

    Submitted 24 August, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: ICLR 2024 Spotlight; https://github.com/OpenLemur/Lemur

  23. arXiv:2310.05163  [pdf, other

    cs.CL

    An Investigation of LLMs' Inefficacy in Understanding Converse Relations

    Authors: Chengwen Qi, Bowen Li, Binyuan Hui, Bailin Wang, Jinyang Li, Jinwang Wu, Yuanjun Laili

    Abstract: Large Language Models (LLMs) have achieved remarkable success in many formal language oriented tasks, such as structural data-to-text and semantic parsing. However current benchmarks mostly follow the data distribution of the pre-training data of LLMs. Therefore, a natural question rises that do LLMs really understand the structured semantics of formal languages. In this paper, we investigate this… ▽ More

    Submitted 13 November, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

    Comments: Accepted by EMNLP 2023

  24. arXiv:2309.16609  [pdf, other

    cs.CL

    Qwen Technical Report

    Authors: Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan , et al. (23 additional authors not shown)

    Abstract: Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first installment of our large language model series. Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. It includes Q… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: 59 pages, 5 figures

  25. arXiv:2309.07387  [pdf, other

    cs.CL cs.CV

    VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue

    Authors: Yunshui Li, Binyuan Hui, Zhaochao Yin, Wanwei He, Run Luo, Yuxing Long, Min Yang, Fei Huang, Yongbin Li

    Abstract: Visually-grounded dialog systems, which integrate multiple modes of communication such as text and visual inputs, have become an increasingly popular area of investigation. However, the absence of a standardized evaluation framework poses a challenge in assessing the development of this field. To this end, we propose \textbf{VDialogUE}, a \textbf{V}isually-grounded \textbf{Dialog}ue benchmark for… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  26. arXiv:2309.00013  [pdf, other

    cs.CV

    Model Inversion Attack via Dynamic Memory Learning

    Authors: Gege Qi, YueFeng Chen, Xiaofeng Mao, Binyuan Hui, Xiaodan Li, Rong Zhang, Hui Xue

    Abstract: Model Inversion (MI) attacks aim to recover the private training data from the target model, which has raised security concerns about the deployment of DNNs in practice. Recent advances in generative adversarial models have rendered them particularly effective in MI attacks, primarily due to their ability to generate high-fidelity and perceptually realistic images that closely resemble the target… ▽ More

    Submitted 23 August, 2023; originally announced September 2023.

  27. arXiv:2308.07124  [pdf, other

    cs.CL cs.AI

    OctoPack: Instruction Tuning Code Large Language Models

    Authors: Niklas Muennighoff, Qian Liu, Armel Zebaze, Qinkai Zheng, Binyuan Hui, Terry Yue Zhuo, Swayam Singh, Xiangru Tang, Leandro von Werra, Shayne Longpre

    Abstract: Finetuning large language models (LLMs) on instructions leads to vast performance improvements on natural language tasks. We apply instruction tuning using code, leveraging the natural structure of Git commits, which pair code changes with human instructions. We compile CommitPack: 4 terabytes of Git commits across 350 programming languages. We benchmark CommitPack against other natural and synthe… ▽ More

    Submitted 18 February, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: 60 pages (9 main), 40 figures, 19 tables

  28. arXiv:2308.05696  [pdf, other

    cs.CL

    A Preliminary Study of the Intrinsic Relationship between Complexity and Alignment

    Authors: Yingxiu Zhao, Bowen Yu, Binyuan Hui, Haiyang Yu, Fei Huang, Yongbin Li, Nevin L. Zhang

    Abstract: Training large language models (LLMs) with open-domain instruction data has yielded remarkable success in aligning to end tasks and human preferences. Extensive research has highlighted the importance of the quality and diversity of instruction data. However, the impact of data complexity, as a crucial metric, remains relatively unexplored from three aspects: (1)where the sustainability of perform… ▽ More

    Submitted 28 February, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

    Comments: LREC-Coling 2024

  29. arXiv:2307.06018  [pdf, other

    cs.CL

    PolyLM: An Open Source Polyglot Large Language Model

    Authors: Xiangpeng Wei, Haoran Wei, Huan Lin, Tianhao Li, Pei Zhang, Xingzhang Ren, Mei Li, Yu Wan, Zhiwei Cao, Binbin Xie, Tianxiang Hu, Shangjie Li, Binyuan Hui, Bowen Yu, Dayiheng Liu, Baosong Yang, Fei Huang, Jun Xie

    Abstract: Large language models (LLMs) demonstrate remarkable ability to comprehend, reason, and generate following nature language instructions. However, the development of LLMs has been primarily focused on high-resource languages, such as English, thereby limiting their applicability and research in other languages. Consequently, we present PolyLM, a multilingual LLM trained on 640 billion (B) tokens, av… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

  30. arXiv:2306.06872  [pdf, other

    cs.CL

    History Semantic Graph Enhanced Conversational KBQA with Temporal Information Modeling

    Authors: Hao Sun, Yang Li, Liwei Deng, Bowen Li, Binyuan Hui, Binhua Li, Yunshi Lan, Yan Zhang, Yongbin Li

    Abstract: Context information modeling is an important task in conversational KBQA. However, existing methods usually assume the independence of utterances and model them in isolation. In this paper, we propose a History Semantic Graph Enhanced KBQA model (HSGE) that is able to effectively model long-range semantic dependencies in conversation history while maintaining low computational cost. The framework… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: Accepted to ACL 2023 Main Conference

  31. arXiv:2305.18212  [pdf, other

    cs.IR cs.AI cs.CL cs.CV cs.LG cs.MM

    Multimodal Recommendation Dialog with Subjective Preference: A New Challenge and Benchmark

    Authors: Yuxing Long, Binyuan Hui, Caixia Yuan1, Fei Huang, Yongbin Li, Xiaojie Wang

    Abstract: Existing multimodal task-oriented dialog data fails to demonstrate the diverse expressions of user subjective preferences and recommendation acts in the real-life shopping scenario. This paper introduces a new dataset SURE (Multimodal Recommendation Dialog with SUbjective PREference), which contains 12K shopping dialogs in complex store scenes. The data is built in two phases with human annotation… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  32. arXiv:2305.14839  [pdf, other

    cs.CL cs.CV

    PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts

    Authors: Yunshui Li, Binyuan Hui, ZhiChao Yin, Min Yang, Fei Huang, Yongbin Li

    Abstract: Perceiving multi-modal information and fulfilling dialogues with humans is a long-term goal of artificial intelligence. Pre-training is commonly regarded as an effective approach for multi-modal dialogue. However, due to the limited availability of multi-modal dialogue data, there is still scarce research on multi-modal dialogue pre-training. Yet another intriguing challenge emerges from the encom… ▽ More

    Submitted 13 June, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  33. arXiv:2305.13016  [pdf, other

    cs.CL

    Iterative Forward Tuning Boosts In-Context Learning in Language Models

    Authors: Jiaxi Yang, Binyuan Hui, Min Yang, Bailin Wang, Bowen Li, Binhua Li, Fei Huang, Yongbin Li

    Abstract: Despite the advancements in in-context learning (ICL) for large language models (LLMs), current research centers on specific prompt engineering, such as demonstration selection, with the expectation that a single iteration of demonstrations processing can generalize effectively to a given test sample. However, this perspective overlooks the potential benefits derived from multiple iterations invol… ▽ More

    Submitted 4 June, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: 14 pages, 6 figures, ACL 2024

  34. arXiv:2305.12082  [pdf, other

    cs.LG

    SneakyPrompt: Jailbreaking Text-to-image Generative Models

    Authors: Yuchen Yang, Bo Hui, Haolin Yuan, Neil Gong, Yinzhi Cao

    Abstract: Text-to-image generative models such as Stable Diffusion and DALL$\cdot$E raise many ethical concerns due to the generation of harmful images such as Not-Safe-for-Work (NSFW) ones. To address these ethical concerns, safety filters are often adopted to prevent the generation of NSFW images. In this work, we propose SneakyPrompt, the first automated attack framework, to jailbreak text-to-image gener… ▽ More

    Submitted 10 November, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: To appear in the Proceedings of the IEEE Symposium on Security and Privacy (Oakland), 2024

  35. arXiv:2305.03237  [pdf, other

    cs.CL cs.AI cs.LG

    Out-of-Domain Intent Detection Considering Multi-Turn Dialogue Contexts

    Authors: Hao Lang, Yinhe Zheng, Binyuan Hui, Fei Huang, Yongbin Li

    Abstract: Out-of-Domain (OOD) intent detection is vital for practical dialogue systems, and it usually requires considering multi-turn dialogue contexts. However, most previous OOD intent detection approaches are limited to single dialogue turns. In this paper, we introduce a context-aware OOD intent detection (Caro) framework to model multi-turn contexts in OOD intent detection tasks. Specifically, we foll… ▽ More

    Submitted 23 February, 2024; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: COLING2024 Long Paper

  36. arXiv:2305.03111  [pdf, other

    cs.CL

    Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs

    Authors: Jinyang Li, Binyuan Hui, Ge Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Rongyu Cao, Ruiying Geng, Nan Huo, Xuanhe Zhou, Chenhao Ma, Guoliang Li, Kevin C. C. Chang, Fei Huang, Reynold Cheng, Yongbin Li

    Abstract: Text-to-SQL parsing, which aims at converting natural language instructions into executable SQLs, has gained increasing attention in recent years. In particular, Codex and ChatGPT have shown impressive results in this task. However, most of the prevalent benchmarks, i.e., Spider, and WikiSQL, focus on database schema with few rows of database contents leaving the gap between academic study and rea… ▽ More

    Submitted 14 November, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  37. arXiv:2305.02190  [pdf, other

    cs.LG cs.AI

    Rethinking Graph Lottery Tickets: Graph Sparsity Matters

    Authors: Bo Hui, Da Yan, Xiaolong Ma, Wei-Shinn Ku

    Abstract: Lottery Ticket Hypothesis (LTH) claims the existence of a winning ticket (i.e., a properly pruned sub-network together with original weight initialization) that can achieve competitive performance to the original dense network. A recent work, called UGS, extended LTH to prune graph neural networks (GNNs) for effectively accelerating GNN inference. UGS simultaneously prunes the graph adjacency matr… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

    Comments: ICLR 2023

  38. arXiv:2303.15940  [pdf, other

    cs.SD eess.AS

    TransAudio: Towards the Transferable Adversarial Audio Attack via Learning Contextualized Perturbations

    Authors: Qi Gege, Yuefeng Chen, Xiaofeng Mao, Yao Zhu, Binyuan Hui, Xiaodan Li, Rong Zhang, Hui Xue

    Abstract: In a transfer-based attack against Automatic Speech Recognition (ASR) systems, attacks are unable to access the architecture and parameters of the target model. Existing attack methods are mostly investigated in voice assistant scenarios with restricted voice commands, prohibiting their applicability to more general ASR related applications. To tackle this challenge, we propose a novel contextuali… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

  39. arXiv:2301.13808  [pdf, other

    cs.CL

    Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning

    Authors: Yunhu Ye, Binyuan Hui, Min Yang, Binhua Li, Fei Huang, Yongbin Li

    Abstract: Table-based reasoning has shown remarkable progress in combining deep models with discrete reasoning, which requires reasoning over both free-form natural language (NL) questions and structured tabular data. However, previous table-based reasoning solutions usually suffer from significant performance degradation on huge evidence (tables). In addition, most existing methods struggle to reason over… ▽ More

    Submitted 27 April, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

    Comments: SIGIR 2023

  40. arXiv:2301.07507  [pdf, other

    cs.CL cs.DB

    Graphix-T5: Mixing Pre-Trained Transformers with Graph-Aware Layers for Text-to-SQL Parsing

    Authors: Jinyang Li, Binyuan Hui, Reynold Cheng, Bowen Qin, Chenhao Ma, Nan Huo, Fei Huang, Wenyu Du, Luo Si, Yongbin Li

    Abstract: The task of text-to-SQL parsing, which aims at converting natural language questions into executable SQL queries, has garnered increasing attention in recent years, as it can assist end users in efficiently extracting vital information from databases without the need for technical background. One of the major challenges in text-to-SQL parsing is domain generalization, i.e., how to generalize well… ▽ More

    Submitted 18 January, 2023; originally announced January 2023.

    Comments: Accepted to AAAI 2023 main conference (oral)

  41. arXiv:2301.01949  [pdf, other

    cs.CL cs.AI cs.CV cs.LG cs.MM

    SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph

    Authors: Yuxing Long, Binyuan Hui, Fulong Ye, Yanyang Li, Zhuoxin Han, Caixia Yuan, Yongbin Li, Xiaojie Wang

    Abstract: Existing multimodal conversation agents have shown impressive abilities to locate absolute positions or retrieve attributes in simple scenarios, but they fail to perform well when complex relative positions and information alignments are involved, which poses a bottleneck in response quality. In this paper, we propose a Situated Conversation Agent Petrained with Multimodal Questions from INcrement… ▽ More

    Submitted 5 January, 2023; originally announced January 2023.

    Comments: AAAI 2023

  42. arXiv:2210.15025  [pdf, other

    cs.CV cs.LG

    Addressing Heterogeneity in Federated Learning via Distributional Transformation

    Authors: Haolin Yuan, Bo Hui, Yuchen Yang, Philippe Burlina, Neil Zhenqiang Gong, Yinzhi Cao

    Abstract: Federated learning (FL) allows multiple clients to collaboratively train a deep learning model. One major challenge of FL is when data distribution is heterogeneous, i.e., differs from one client to another. Existing personalized FL algorithms are only applicable to narrow cases, e.g., one or two data classes per client, and therefore they do not satisfactorily address FL under varying levels of d… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: In the Proceedings of European Conference on Computer Vision (ECCV), 2022

  43. arXiv:2210.11888  [pdf, other

    cs.CL

    STAR: SQL Guided Pre-Training for Context-dependent Text-to-SQL Parsing

    Authors: Zefeng Cai, Xiangyu Li, Binyuan Hui, Min Yang, Bowen Li, Binhua Li, Zheng Cao, Weijie Li, Fei Huang, Luo Si, Yongbin Li

    Abstract: In this paper, we propose a novel SQL guided pre-training framework STAR for context-dependent text-to-SQL parsing, which leverages contextual information to enrich natural language (NL) utterance and table schema representations for text-to-SQL conversations. Concretely, we propose two novel pre-training objectives which respectively explore the context-dependent interactions of NL utterances and… ▽ More

    Submitted 27 October, 2022; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022

  44. arXiv:2209.06638  [pdf, other

    cs.CL

    SPACE-2: Tree-Structured Semi-Supervised Contrastive Pre-training for Task-Oriented Dialog Understanding

    Authors: Wanwei He, Yinpei Dai, Binyuan Hui, Min Yang, Zheng Cao, Jianbo Dong, Fei Huang, Luo Si, Yongbin Li

    Abstract: Pre-training methods with contrastive learning objectives have shown remarkable success in dialog understanding tasks. However, current contrastive learning solely considers the self-augmented dialog samples as positive samples and treats all other dialog samples as negative ones, which enforces dissimilar representations even for dialogs that are semantically related. In this paper, we propose SP… ▽ More

    Submitted 14 September, 2022; originally announced September 2022.

    Comments: 17 pages, 6 figures. Accepted by COLING 2022

  45. arXiv:2209.06442  [pdf, other

    cs.CL

    SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers

    Authors: Bowen Qin, Lihan Wang, Binyuan Hui, Bowen Li, Xiangpeng Wei, Binhua Li, Fei Huang, Luo Si, Min Yang, Yongbin Li

    Abstract: This paper aims to improve the performance of text-to-SQL parsing by exploring the intrinsic uncertainties in the neural network based approaches (called SUN). From the data uncertainty perspective, it is indisputable that a single SQL can be learned from multiple semantically-equivalent questions.Different from previous methods that are limited to one-to-one mapping, we propose a data uncertainty… ▽ More

    Submitted 28 October, 2022; v1 submitted 14 September, 2022; originally announced September 2022.

    Comments: Accepted at COLING 2022

  46. Latent Heterogeneous Graph Network for Incomplete Multi-View Learning

    Authors: Pengfei Zhu, Xinjie Yao, Yu Wang, Meng Cao, Binyuan Hui, Shuai Zhao, Qinghua Hu

    Abstract: Multi-view learning has progressed rapidly in recent years. Although many previous studies assume that each instance appears in all views, it is common in real-world applications for instances to be missing from some views, resulting in incomplete multi-view data. To tackle this problem, we propose a novel Latent Heterogeneous Graph Network (LHGN) for incomplete multi-view learning, which aims to… ▽ More

    Submitted 29 August, 2022; originally announced August 2022.

    Comments: 13 pages, 9 figures, IEEE Transactions on Multimedia

    Journal ref: IEEE Transactions on Multimedia, early access, February 2022

  47. arXiv:2208.13629  [pdf, other

    cs.CL

    A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions

    Authors: Bowen Qin, Binyuan Hui, Lihan Wang, Min Yang, Jinyang Li, Binhua Li, Ruiying Geng, Rongyu Cao, Jian Sun, Luo Si, Fei Huang, Yongbin Li

    Abstract: Text-to-SQL parsing is an essential and challenging task. The goal of text-to-SQL parsing is to convert a natural language (NL) question to its corresponding structured query language (SQL) based on the evidences provided by relational databases. Early text-to-SQL parsing systems from the database community achieved a noticeable progress with the cost of heavy human engineering and user interactio… ▽ More

    Submitted 29 August, 2022; originally announced August 2022.

  48. arXiv:2206.14017  [pdf, other

    cs.CL

    Proton: Probing Schema Linking Information from Pre-trained Language Models for Text-to-SQL Parsing

    Authors: Lihan Wang, Bowen Qin, Binyuan Hui, Bowen Li, Min Yang, Bailin Wang, Binhua Li, Fei Huang, Luo Si, Yongbin Li

    Abstract: The importance of building text-to-SQL parsers which can be applied to new databases has long been acknowledged, and a critical step to achieve this goal is schema linking, i.e., properly recognizing mentions of unseen columns or tables when generating SQLs. In this work, we propose a novel framework to elicit relational structures from large-scale pre-trained language models (PLMs) via a probing… ▽ More

    Submitted 6 August, 2022; v1 submitted 28 June, 2022; originally announced June 2022.

    Comments: Accepted at KDD 2022

  49. arXiv:2204.00747  [pdf, other

    cs.AI

    RFID-Based Indoor Spatial Query Evaluation with Bayesian Filtering Techniques

    Authors: Bo Hui, Wenlu Wang, Jiao Yu, Zhitao Gong, Wei-Shinn Ku, Min-Te Sun, Hua Lu

    Abstract: People spend a significant amount of time in indoor spaces (e.g., office buildings, subway systems, etc.) in their daily lives. Therefore, it is important to develop efficient indoor spatial query algorithms for supporting various location-based applications. However, indoor spaces differ from outdoor spaces because users have to follow the indoor floor plan for their movements. In addition, posit… ▽ More

    Submitted 25 May, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

  50. arXiv:2203.06958  [pdf, other

    cs.CL

    S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers

    Authors: Binyuan Hui, Ruiying Geng, Lihan Wang, Bowen Qin, Bowen Li, Jian Sun, Yongbin Li

    Abstract: The task of converting a natural language question into an executable SQL query, known as text-to-SQL, is an important branch of semantic parsing. The state-of-the-art graph-based encoder has been successfully used in this task but does not model the question syntax well. In this paper, we propose S$^2$SQL, injecting Syntax to question-Schema graph encoder for Text-to-SQL parsers, which effectivel… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: Accepted at ACL 2022 Findings