Skip to main content

Showing 1–50 of 246 results for author: Song, K

  1. arXiv:2410.14961  [pdf, other

    cs.LG cs.AI cs.SI

    LangGFM: A Large Language Model Alone Can be a Powerful Graph Foundation Model

    Authors: Tianqianjin Lin, Pengwei Yan, Kaisong Song, Zhuoren Jiang, Yangyang Kang, Jun Lin, Weikang Yuan, Junjie Cao, Changlong Sun, Xiaozhong Liu

    Abstract: Graph foundation models (GFMs) have recently gained significant attention. However, the unique data processing and evaluation setups employed by different studies hinder a deeper understanding of their progress. Additionally, current research tends to focus on specific subsets of graph learning tasks, such as structural tasks, node-level tasks, or classification tasks. As a result, they often inco… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

    Comments: under review

  2. arXiv:2410.11841  [pdf, other

    cs.IR cs.AI

    GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation

    Authors: Fei Tang, Yongliang Shen, Hang Zhang, Zeqi Tan, Wenqi Zhang, Guiyang Hou, Kaitao Song, Weiming Lu, Yueting Zhuang

    Abstract: Large language model-based explainable recommendation (LLM-based ER) systems show promise in generating human-like explanations for recommendations. However, they face challenges in modeling user-item collaborative preferences, personalizing explanations, and handling sparse user-item interactions. To address these issues, we propose GaVaMoE, a novel Gaussian-Variational Gated Mixture of Experts f… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  3. arXiv:2410.09556  [pdf, other

    cs.CL

    A Speaker Turn-Aware Multi-Task Adversarial Network for Joint User Satisfaction Estimation and Sentiment Analysis

    Authors: Kaisong Song, Yangyang Kang, Jiawei Liu, Xurui Li, Changlong Sun, Xiaozhong Liu

    Abstract: User Satisfaction Estimation is an important task and increasingly being applied in goal-oriented dialogue systems to estimate whether the user is satisfied with the service. It is observed that whether the user's needs are met often triggers various sentiments, which can be pertinent to the successful estimation of user satisfaction, and vice versa. Thus, User Satisfaction Estimation (USE) and Se… ▽ More

    Submitted 12 October, 2024; originally announced October 2024.

  4. arXiv:2410.09102  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy

    Authors: Tong Wu, Shujian Zhang, Kaiqiang Song, Silei Xu, Sanqiang Zhao, Ravi Agrawal, Sathish Reddy Indurthi, Chong Xiang, Prateek Mittal, Wenxuan Zhou

    Abstract: Large Language Models (LLMs) are susceptible to security and safety threats, such as prompt injection, prompt extraction, and harmful requests. One major cause of these vulnerabilities is the lack of an instruction hierarchy. Modern LLM architectures treat all inputs equally, failing to distinguish between and prioritize various types of instructions, such as system messages, user prompts, and dat… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

    Comments: Preprint

  5. arXiv:2410.08394  [pdf, other

    cs.LG q-fin.GN

    Identifying Money Laundering Subgraphs on the Blockchain

    Authors: Kiwhan Song, Mohamed Ali Dhraief, Muhua Xu, Locke Cai, Xuhao Chen, Arvind, Jie Chen

    Abstract: Anti-Money Laundering (AML) involves the identification of money laundering crimes in financial activities, such as cryptocurrency transactions. Recent studies advanced AML through the lens of graph-based machine learning, modeling the web of financial transactions as a graph and developing graph methods to identify suspicious activities. For instance, a recent effort on opensourcing datasets and… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

    Comments: ICAIF 2024. Code is available at https://github.com/MITIBMxGraph/RevTrack

  6. arXiv:2410.04402  [pdf, other

    cs.CV cs.GR

    Deformable NeRF using Recursively Subdivided Tetrahedra

    Authors: Zherui Qiu, Chenqu Ren, Kaiwen Song, Xiaoyi Zeng, Leyuan Yang, Juyong Zhang

    Abstract: While neural radiance fields (NeRF) have shown promise in novel view synthesis, their implicit representation limits explicit control over object manipulation. Existing research has proposed the integration of explicit geometric proxies to enable deformation. However, these methods face two primary challenges: firstly, the time-consuming and computationally demanding tetrahedralization process; an… ▽ More

    Submitted 6 October, 2024; originally announced October 2024.

    Comments: Accepted by ACM Multimedia 2024. Project Page: https://ustc3dv.github.io/DeformRF/

  7. arXiv:2410.03782  [pdf, other

    cs.LG cs.CV

    DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation

    Authors: Changdae Oh, Yixuan Li, Kyungwoo Song, Sangdoo Yun, Dongyoon Han

    Abstract: Adapting a pre-trained foundation model on downstream tasks should ensure robustness against distribution shifts without the need to retrain the whole model. Although existing weight interpolation methods are simple yet effective, we argue their static nature limits downstream performance while achieving efficiency. In this work, we propose DaWin, a training-free dynamic weight interpolation metho… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  8. arXiv:2410.02507  [pdf, other

    cs.AI cs.CL

    Can Large Language Models Grasp Legal Theories? Enhance Legal Reasoning with Insights from Multi-Agent Collaboration

    Authors: Weikang Yuan, Junjie Cao, Zhuoren Jiang, Yangyang Kang, Jun Lin, Kaisong Song, tianqianjin lin, Pengwei Yan, Changlong Sun, Xiaozhong Liu

    Abstract: Large Language Models (LLMs) could struggle to fully understand legal theories and perform complex legal reasoning tasks. In this study, we introduce a challenging task (confusing charge prediction) to better evaluate LLMs' understanding of legal theories and reasoning capabilities. We also propose a novel framework: Multi-Agent framework for improving complex Legal Reasoning capability (MALR). MA… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

    ACM Class: I.2.7

  9. arXiv:2409.16767  [pdf

    cs.LG

    Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training

    Authors: Kun Song, Zhiquan Tan, Bochao Zou, Jiansheng Chen, Huimin Ma, Weiran Huang

    Abstract: In this paper, we utilize information-theoretic metrics like matrix entropy and mutual information to analyze supervised learning. We explore the information content of data representations and classification head weights and their information interplay during supervised training. Experiments show that matrix entropy cannot solely describe the interaction of the information content of data represe… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2406.03999

  10. arXiv:2409.11143  [pdf, other

    cs.CL

    Semformer: Transformer Language Models with Semantic Planning

    Authors: Yongjing Yin, Junran Ding, Kai Song, Yue Zhang

    Abstract: Next-token prediction serves as the dominant component in current neural language models. During the training phase, the model employs teacher forcing, which predicts tokens based on all preceding ground truth tokens. However, this approach has been found to create shortcuts, utilizing the revealed prefix to spuriously fit future tokens, potentially compromising the accuracy of the next-token pred… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

  11. arXiv:2409.10878  [pdf, other

    cs.RO

    P2 Explore: Efficient Exploration in Unknown Clustered Environment with Floor Plan Prediction

    Authors: Kun Song, Gaoming Chen, Masayoshi Tomizuka, Wei Zhan, Zhenhua Xiong, Mingyu Ding

    Abstract: Robot exploration aims at constructing unknown environments and it is important to achieve it with shorter paths. Traditional methods focus on optimizing the visiting order based on current observations, which may lead to local-minimal results. Recently, by predicting the structure of the unseen environment, the exploration efficiency can be further improved. However, in a cluttered environment, d… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

    Comments: 7 pages, submitted to ICRA 2025

  12. arXiv:2408.13492  [pdf, other

    cs.CV

    Online Continuous Generalized Category Discovery

    Authors: Keon-Hee Park, Hakyung Lee, Kyungwoo Song, Gyeong-Moon Park

    Abstract: With the advancement of deep neural networks in computer vision, artificial intelligence (AI) is widely employed in real-world applications. However, AI still faces limitations in mimicking high-level human capabilities, such as novel category discovery, for practical use. While some methods utilizing offline continual learning have been proposed for novel category discovery, they neglect the cont… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  13. arXiv:2408.10923  [pdf, other

    cs.CL cs.AI

    LBC: Language-Based-Classifier for Out-Of-Variable Generalization

    Authors: Kangjun Noh, Baekryun Seong, Hoyoon Byun, Youngjun Choi, Sungjin Song, Kyungwoo Song

    Abstract: Large Language Models (LLMs) have great success in natural language processing tasks such as response generation. However, their use in tabular data has been limited due to their inferior performance compared to traditional machine learning models (TMLs) such as XGBoost. We find that the pre-trained knowledge of LLMs enables them to interpret new variables that appear in a test without additional… ▽ More

    Submitted 23 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

    Comments: 16 pages, 7 figures, 4 tables

  14. arXiv:2408.10107  [pdf, other

    cs.LG cs.AI stat.ML

    Perturb-and-Compare Approach for Detecting Out-of-Distribution Samples in Constrained Access Environments

    Authors: Heeyoung Lee, Hoyoon Byun, Changdae Oh, JinYeong Bak, Kyungwoo Song

    Abstract: Accessing machine learning models through remote APIs has been gaining prevalence following the recent trend of scaling up model parameters for increased performance. Even though these models exhibit remarkable ability, detecting out-of-distribution (OOD) samples remains a crucial safety concern for end users as these samples may induce unreliable outputs from the model. In this work, we propose a… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Accepted to European Conference on Artificial Intelligence (ECAI) 2024

  15. arXiv:2408.05917  [pdf

    cs.CE cs.AI cs.LG

    Inverse design of Non-parameterized Ventilated Acoustic Resonator via Variational Autoencoder with Acoustic Response-encoded Latent Space

    Authors: Min Woo Cho, Seok Hyeon Hwang, Jun-Young Jang, Jin Yeong Song, Sun-kwang Hwang, Kyoung Je Cha, Dong Yong Park, Kyungjun Song, Sang Min Park

    Abstract: Ventilated acoustic resonator(VAR), a type of acoustic metamaterial, emerge as an alternative for sound attenuation in environments that require ventilation, owing to its excellent low-frequency attenuation performance and flexible shape adaptability. However, due to the non-linear acoustic responses of VARs, the VAR designs are generally obtained within a limited parametrized design space, and th… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  16. arXiv:2408.04638  [pdf, other

    cs.CL cs.CY

    Affective Computing in the Era of Large Language Models: A Survey from the NLP Perspective

    Authors: Yiqun Zhang, Xiaocui Yang, Xingle Xu, Zeran Gao, Yijie Huang, Shiyi Mu, Shi Feng, Daling Wang, Yifei Zhang, Kaisong Song, Ge Yu

    Abstract: Affective Computing (AC), integrating computer science, psychology, and cognitive science knowledge, aims to enable machines to recognize, interpret, and simulate human emotions.To create more value, AC can be applied to diverse scenarios, including social media, finance, healthcare, education, etc. Affective Computing (AC) includes two mainstream tasks, i.e., Affective Understanding (AU) and Affe… ▽ More

    Submitted 30 July, 2024; originally announced August 2024.

  17. arXiv:2408.04472  [pdf, other

    cs.CL

    Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate

    Authors: Yiqun Zhang, Xiaocui Yang, Shi Feng, Daling Wang, Yifei Zhang, Kaisong Song

    Abstract: Competitive debate is a complex task of computational argumentation. Large Language Models (LLMs) suffer from hallucinations and lack competitiveness in this field. To address these challenges, we introduce Agent for Debate (Agent4Debate), a dynamic multi-agent framework based on LLMs designed to enhance their capabilities in competitive debate. Drawing inspiration from human behavior in debate pr… ▽ More

    Submitted 20 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: 12 pages (including appendix), 7 figures

  18. arXiv:2407.20502  [pdf, other

    cs.CV

    Restoring Real-World Degraded Events Improves Deblurring Quality

    Authors: Yeqing Shen, Shang Li, Kun Song

    Abstract: Due to its high speed and low latency, DVS is frequently employed in motion deblurring. Ideally, high-quality events would adeptly capture intricate motion information. However, real-world events are generally degraded, thereby introducing significant artifacts into the deblurred results. In response to this challenge, we model the degradation of events and propose RDNet to improve the quality of… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  19. arXiv:2407.17491  [pdf, other

    cs.CV cs.LG

    Robust Adaptation of Foundation Models with Black-Box Visual Prompting

    Authors: Changdae Oh, Gyeongdeok Seo, Geunyoung Jung, Zhi-Qi Cheng, Hosik Choi, Jiyoung Jung, Kyungwoo Song

    Abstract: With the surge of large-scale pre-trained models (PTMs), adapting these models to numerous downstream tasks becomes a crucial problem. Consequently, parameter-efficient transfer learning (PETL) of large models has grasped huge attention. While PETL methods show impressive performance, they commonly rely on two optimistic assumptions: 1) the entire parameters of a PTM are available, and 2) a suffic… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Extended work from the CVPR'23 paper: arxiv:2303.14773; This paper has been submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) for possible publication

  20. arXiv:2407.02867  [pdf, other

    cs.MM cs.CL

    Contrast then Memorize: Semantic Neighbor Retrieval-Enhanced Inductive Multimodal Knowledge Graph Completion

    Authors: Yu Zhao, Ying Zhang, Baohang Zhou, Xinying Qian, Kehui Song, Xiangrui Cai

    Abstract: A large number of studies have emerged for Multimodal Knowledge Graph Completion (MKGC) to predict the missing links in MKGs. However, fewer studies have been proposed to study the inductive MKGC (IMKGC) involving emerging entities unseen during training. Existing inductive approaches focus on learning textual entity representations, which neglect rich semantic information in visual modality. More… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted by SIGIR 2024

  21. arXiv:2407.01853  [pdf, other

    cs.CL cs.AI cs.LG

    Improving Multilingual Instruction Finetuning via Linguistically Natural and Diverse Datasets

    Authors: Sathish Reddy Indurthi, Wenxuan Zhou, Shamil Chollampatt, Ravi Agrawal, Kaiqiang Song, Lingxiao Zhao, Chenguang Zhu

    Abstract: Advancements in Large Language Models (LLMs) have significantly enhanced instruction-following capabilities. However, most Instruction Fine-Tuning (IFT) datasets are predominantly in English, limiting model performance in other languages. Traditional methods for creating multilingual IFT datasets such as translating existing English IFT datasets or converting existing NLP datasets into IFT dataset… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  22. arXiv:2406.17862  [pdf, other

    cs.LO

    ESBMC v7.6: Enhanced Model Checking of C++ Programs with Clang AST

    Authors: Xianzhiyu Li, Kunjian Song, Mikhail R. Gadelha, Franz Brauße, Rafael S. Menezes, Konstantin Korovin, Lucas C. Cordeiro

    Abstract: This paper presents Efficient SMT-Based Context-Bounded Model Checker (ESBMC) v7.6, an extended version based on previous work on ESBMC v7.3 by K. Song et al. The v7.3 introduced a new Clang-based C++ front-end to address the challenges posed by modern C++ programs. Although the new front-end has demonstrated significant potential in previous studies, it remains in the developmental stage and lack… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 27 pages, 2 figures. arXiv admin note: substantial text overlap with arXiv:2308.05649

  23. arXiv:2406.15664  [pdf, other

    stat.ML cs.LG

    Flat Posterior Does Matter For Bayesian Model Averaging

    Authors: Sungjun Lim, Jeyoon Yeom, Sooyon Kim, Hoyoon Byun, Jinho Kang, Yohan Jung, Jiyoung Jung, Kyungwoo Song

    Abstract: Bayesian neural network (BNN) approximates the posterior distribution of model parameters and utilizes the posterior for prediction via Bayesian Model Averaging (BMA). The quality of the posterior approximation is critical for achieving accurate and robust predictions. It is known that flatness in the loss landscape is strongly associated with generalization performance, and it necessitates consid… ▽ More

    Submitted 21 October, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  24. arXiv:2406.14228  [pdf, other

    cs.AI

    EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms

    Authors: Siyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan, Dongsheng Li, Deqing Yang

    Abstract: The rise of powerful large language models (LLMs) has spurred a new trend in building LLM-based autonomous agents for solving complex tasks, especially multi-agent systems. Despite the remarkable progress, we notice that existing works are heavily dependent on human-designed frameworks, which greatly limits the functional scope and scalability of agent systems. How to automatically extend the spec… ▽ More

    Submitted 11 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: Work in process

  25. arXiv:2406.12084  [pdf, other

    cs.CL cs.AI

    When Reasoning Meets Information Aggregation: A Case Study with Sports Narratives

    Authors: Yebowen Hu, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Wenlin Yao, Hassan Foroosh, Dong Yu, Fei Liu

    Abstract: Reasoning is most powerful when an LLM accurately aggregates relevant information. We examine the critical role of information aggregation in reasoning by requiring the LLM to analyze sports narratives. To succeed at this task, an LLM must infer points from actions, identify related entities, attribute points accurately to players and teams, and compile key statistics to draw conclusions. We condu… ▽ More

    Submitted 4 October, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted to Main conference of EMNLP 2024

  26. arXiv:2406.11827  [pdf, other

    cs.CL cs.AI cs.LG

    WPO: Enhancing RLHF with Weighted Preference Optimization

    Authors: Wenxuan Zhou, Ravi Agrawal, Shujian Zhang, Sathish Reddy Indurthi, Sanqiang Zhao, Kaiqiang Song, Silei Xu, Chenguang Zhu

    Abstract: Reinforcement learning from human feedback (RLHF) is a promising solution to align large language models (LLMs) more closely with human values. Off-policy preference optimization, where the preference data is obtained from other models, is widely adopted due to its cost efficiency and scalability. However, off-policy preference optimization often suffers from a distributional gap between the polic… ▽ More

    Submitted 3 October, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: EMNLP 2024

  27. arXiv:2406.09716  [pdf, ps, other

    cs.CR cs.AI cs.DC cs.LG

    Speed-up of Data Analysis with Kernel Trick in Encrypted Domain

    Authors: Joon Soo Yoo, Baek Kyung Song, Tae Min Ahn, Ji Won Heo, Ji Won Yoon

    Abstract: Homomorphic encryption (HE) is pivotal for secure computation on encrypted data, crucial in privacy-preserving data analysis. However, efficiently processing high-dimensional data in HE, especially for machine learning and statistical (ML/STAT) algorithms, poses a challenge. In this paper, we present an effective acceleration method using the kernel method for HE schemes, enhancing time performanc… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Submitted as a preprint

  28. arXiv:2406.07572  [pdf, ps, other

    cs.AI cs.CE cs.LG

    Domain-specific ReAct for physics-integrated iterative modeling: A case study of LLM agents for gas path analysis of gas turbines

    Authors: Tao Song, Yuwei Fan, Chenlong Feng, Keyu Song, Chao Liu, Dongxiang Jiang

    Abstract: This study explores the application of large language models (LLMs) with callable tools in energy and power engineering domain, focusing on gas path analysis of gas turbines. We developed a dual-agent tool-calling process to integrate expert knowledge, predefined tools, and LLM reasoning. We evaluated various LLMs, including LLama3, Qwen1.5 and GPT. Smaller models struggled with tool usage and par… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  29. arXiv:2406.07471  [pdf, other

    cs.CV

    OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding

    Authors: Ming Hu, Peng Xia, Lin Wang, Siyuan Yan, Feilong Tang, Zhongxing Xu, Yimin Luo, Kaimin Song, Jurgen Leitner, Xuelian Cheng, Jun Cheng, Chi Liu, Kaijing Zhou, Zongyuan Ge

    Abstract: Surgical scene perception via videos is critical for advancing robotic surgery, telesurgery, and AI-assisted surgery, particularly in ophthalmology. However, the scarcity of diverse and richly annotated video datasets has hindered the development of intelligent systems for surgical workflow analysis. Existing datasets face challenges such as small scale, lack of diversity in surgery and phase cate… ▽ More

    Submitted 19 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by ECCV 2024

  30. arXiv:2406.05613  [pdf, other

    cs.RO

    Distributed Motion Control of Multiple Mobile Manipulator System with Disturbance and Communication Delay

    Authors: Wenhang Liu, Meng Ren, Kun Song, Michael Yu Wang, Zhenhua Xiong

    Abstract: In real-world object manipulation scenarios, multiple mobile manipulator systems may suffer from disturbances and asynchrony, leading to excessive interaction forces and causing object damage or emergency stops. This paper presents a novel distributed motion control approach aimed at reducing these unnecessary interaction forces. The control strategy only utilizes force information without the nee… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  31. arXiv:2406.05352  [pdf, other

    cs.CV

    1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation

    Authors: Qingfeng Liu, Mostafa El-Khamy, Kee-Bong Song

    Abstract: The third Pixel-level Video Understanding in the Wild (PVUW CVPR 2024) challenge aims to advance the state of art in video understanding through benchmarking Video Panoptic Segmentation (VPS) and Video Semantic Segmentation (VSS) on challenging videos and scenes introduced in the large-scale Video Panoptic Segmentation in the Wild (VIPSeg) test set and the large-scale Video Scene Parsing in the Wi… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  32. arXiv:2406.04941  [pdf, ps, other

    cs.CL

    TCMD: A Traditional Chinese Medicine QA Dataset for Evaluating Large Language Models

    Authors: Ping Yu, Kaitao Song, Fengchen He, Ming Chen, Jianfeng Lu

    Abstract: The recently unprecedented advancements in Large Language Models (LLMs) have propelled the medical community by establishing advanced medical-domain models. However, due to the limited collection of medical datasets, there are only a few comprehensive benchmarks available to gauge progress in this area. In this paper, we introduce a new medical question-answering (QA) dataset that contains massive… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  33. arXiv:2406.03999  [pdf, other

    cs.LG cs.CV

    Unveiling the Dynamics of Information Interplay in Supervised Learning

    Authors: Kun Song, Zhiquan Tan, Bochao Zou, Huimin Ma, Weiran Huang

    Abstract: In this paper, we use matrix information theory as an analytical tool to analyze the dynamics of the information interplay between data representations and classification head vectors in the supervised learning process. Specifically, inspired by the theory of Neural Collapse, we introduce matrix mutual information ratio (MIR) and matrix entropy difference ratio (HDR) to assess the interactions of… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by ICML 2024

  34. arXiv:2405.19119  [pdf, other

    cs.LG

    Can Graph Learning Improve Planning in LLM-based Agents?

    Authors: Xixi Wu, Yifei Shen, Caihua Shan, Kaitao Song, Siwei Wang, Bohang Zhang, Jiarui Feng, Hong Cheng, Wei Chen, Yun Xiong, Dongsheng Li

    Abstract: Task planning in language agents is emerging as an important research topic alongside the development of large language models (LLMs). It aims to break down complex user requests in natural language into solvable sub-tasks, thereby fulfilling the original requests. In this context, the sub-tasks can be naturally viewed as a graph, where the nodes represent the sub-tasks, and the edges denote the d… ▽ More

    Submitted 11 October, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by NeurIPS 2024

  35. arXiv:2405.11726  [pdf, other

    cs.RO

    RHAML: Rendezvous-based Hierarchical Architecture for Mutual Localization

    Authors: Gaoming Chen, Kun Song, Xiang Xu, Wenhang Liu, Zhenhua Xiong

    Abstract: Mutual localization serves as the foundation for collaborative perception and task assignment in multi-robot systems. Effectively utilizing limited onboard sensors for mutual localization between marker-less robots is a worthwhile goal. However, due to inadequate consideration of large scale variations of the observed robot and localization refinement, previous work has shown limited accuracy when… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 8 pages, 8 figures, submitted to RA-L

  36. arXiv:2405.08345  [pdf, other

    cs.RO

    Multi-Robot Rendezvous in Unknown Environment with Limited Communication

    Authors: Kun Song, Gaoming Chen, Wenhang Liu, Zhenhua Xiong

    Abstract: Rendezvous aims at gathering all robots at a specific location, which is an important collaborative behavior for multirobot systems. However, in an unknown environment, it is challenging to achieve rendezvous. Previous researches mainly focus on special scenarios where communication is not allowed and each robot executes a random searching strategy, which is highly time-consuming, especially in la… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Submit to RAL. 8 pages, 6 figures

  37. arXiv:2404.19205  [pdf, other

    cs.CV cs.AI

    TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains

    Authors: Yoonsik Kim, Moonbin Yim, Ka Yeon Song

    Abstract: In this paper, we establish a benchmark for table visual question answering, referred to as the TableVQA-Bench, derived from pre-existing table question-answering (QA) and table structure recognition datasets. It is important to note that existing datasets have not incorporated images or QA pairs, which are two crucial components of TableVQA. As such, the primary objective of this paper is to obta… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Technical Report

  38. arXiv:2404.18252  [pdf, other

    cs.CV

    Fisher Information Improved Training-Free Conditional Diffusion Model

    Authors: Kaiyu Song, Hanjiang Lai

    Abstract: Recently, the diffusion model with the training-free methods has succeeded in conditional image generation tasks. However, there is an efficiency problem because it requires calculating the gradient with high computational cost, and previous methods make strong assumptions to solve it, sacrificing generalization. In this work, we propose the Fisher information guided diffusion model (FIGD). Concre… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  39. arXiv:2404.09531  [pdf, other

    cs.CV cs.GR

    Oblique-MERF: Revisiting and Improving MERF for Oblique Photography

    Authors: Xiaoyi Zeng, Kaiwen Song, Leyuan Yang, Bailin Deng, Juyong Zhang

    Abstract: Neural implicit fields have established a new paradigm for scene representation, with subsequent work achieving high-quality real-time rendering. However, reconstructing 3D scenes from oblique aerial photography presents unique challenges, such as varying spatial scale distributions and a constrained range of tilt angles, often resulting in high memory consumption and reduced rendering quality at… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  40. arXiv:2404.05674  [pdf, other

    cs.CV

    MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation

    Authors: Kunpeng Song, Yizhe Zhu, Bingchen Liu, Qing Yan, Ahmed Elgammal, Xiao Yang

    Abstract: In this paper, we present MoMA: an open-vocabulary, training-free personalized image model that boasts flexible zero-shot capabilities. As foundational text-to-image models rapidly evolve, the demand for robust image-to-image translation grows. Addressing this need, MoMA specializes in subject-driven personalized image generation. Utilizing an open-source, Multimodal Large Language Model (MLLM), w… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  41. arXiv:2404.02117  [pdf, other

    cs.CV

    Pre-trained Vision and Language Transformers Are Few-Shot Incremental Learners

    Authors: Keon-Hee Park, Kyungwoo Song, Gyeong-Moon Park

    Abstract: Few-Shot Class Incremental Learning (FSCIL) is a task that requires a model to learn new classes incrementally without forgetting when only a few samples for each class are given. FSCIL encounters two significant challenges: catastrophic forgetting and overfitting, and these challenges have driven prior studies to primarily rely on shallow models, such as ResNet-18. Even though their limited capac… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  42. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  43. arXiv:2404.01706  [pdf, other

    cs.CL

    Polarity Calibration for Opinion Summarization

    Authors: Yuanyuan Lei, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Ruihong Huang, Dong Yu

    Abstract: Opinion summarization is automatically generating summaries from a variety of subjective information, such as product reviews or political opinions. The challenge of opinions summarization lies in presenting divergent or even conflicting opinions. We conduct an analysis of previous summarization models, which reveals their inclination to amplify the polarity bias, emphasizing the majority opinions… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted to NAACL 2024

  44. arXiv:2403.19833  [pdf, other

    cs.NI cs.AI

    ChatTracer: Large Language Model Powered Real-time Bluetooth Device Tracking System

    Authors: Qijun Wang, Shichen Zhang, Kunzhe Song, Huacheng Zeng

    Abstract: Large language models (LLMs) have transformed the way we interact with cyber technologies. In this paper, we study the possibility of connecting LLM with wireless sensor networks (WSN). A successful design will not only extend LLM's knowledge landscape to the physical world but also revolutionize human interaction with WSN. To the end, we present ChatTracer, an LLM-powered real-time Bluetooth devi… ▽ More

    Submitted 9 July, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  45. arXiv:2403.10558  [pdf, other

    cs.CV cs.CR cs.LG

    Adaptive Hybrid Masking Strategy for Privacy-Preserving Face Recognition Against Model Inversion Attack

    Authors: Yinggui Wang, Yuanqing Huang, Jianshu Li, Le Yang, Kai Song, Lei Wang

    Abstract: The utilization of personal sensitive data in training face recognition (FR) models poses significant privacy concerns, as adversaries can employ model inversion attacks (MIA) to infer the original training data. Existing defense methods, such as data augmentation and differential privacy, have been employed to mitigate this issue. However, these methods often fail to strike an optimal balance bet… ▽ More

    Submitted 23 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  46. arXiv:2403.09073  [pdf, other

    cs.CL

    Revealing the Parallel Multilingual Learning within Large Language Models

    Authors: Yongyu Mu, Peinan Feng, Zhiquan Cao, Yuzhang Wu, Bei Li, Chenglong Wang, Tong Xiao, Kai Song, Tongran Liu, Chunliang Zhang, Jingbo Zhu

    Abstract: In this study, we reveal an in-context learning (ICL) capability of multilingual large language models (LLMs): by translating the input to several languages, we provide Parallel Input in Multiple Languages (PiM) to LLMs, which significantly enhances their comprehension abilities. To test this capability, we design extensive experiments encompassing 8 typical datasets, 7 languages and 8 state-of-th… ▽ More

    Submitted 8 October, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted to EMNLP 2024

  47. arXiv:2403.04031  [pdf, other

    cs.CL cs.AI

    Can Large Language Models do Analytical Reasoning?

    Authors: Yebowen Hu, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Hassan Foroosh, Dong Yu, Fei Liu

    Abstract: This paper explores the cutting-edge Large Language Model with analytical reasoning on sports. Our analytical reasoning embodies the tasks of letting large language models count how many points each team scores in a quarter in the NBA and NFL games. Our major discoveries are in two folds. Firstly, we find among all the models we employed, GPT-4 stands out in effectiveness, followed by Claude-2.1,… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  48. arXiv:2403.03100  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

    Authors: Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao

    Abstract: While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre, and acoustic details) that pose significant challenges for generation, a natural idea is to factorize speech into individual subspaces representing di… ▽ More

    Submitted 23 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: Achieving human-level quality and naturalness on multi-speaker datasets (e.g., LibriSpeech) in a zero-shot way

  49. arXiv:2402.14279  [pdf, other

    cs.CL cs.AI

    Mitigating the Linguistic Gap with Phonemic Representations for Robust Cross-lingual Transfer

    Authors: Haeji Jung, Changdae Oh, Jooeon Kang, Jimin Sohn, Kyungwoo Song, Jinkyu Kim, David R. Mortensen

    Abstract: Approaches to improving multilingual language understanding often struggle with significant performance gaps between high-resource and low-resource languages. While there are efforts to align the languages in a single latent space to mitigate such gaps, how different input-level representations influence such gaps has not been investigated, particularly with phonemic inputs. We hypothesize that th… ▽ More

    Submitted 11 October, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: Accepted to the 4th Multilingual Representation Learning (MRL) Workshop (co-located with EMNLP 2024)

  50. arXiv:2402.10979  [pdf, other

    cs.CL cs.AI

    SportsMetrics: Blending Text and Numerical Data to Understand Information Fusion in LLMs

    Authors: Yebowen Hu, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang, Hassan Foroosh, Dong Yu, Fei Liu

    Abstract: Large language models hold significant potential for integrating various data types, such as text documents and database records, for advanced analytics. However, blending text and numerical data presents substantial challenges. LLMs need to process and cross-reference entities and numbers, handle data inconsistencies and redundancies, and develop planning capabilities such as building a working m… ▽ More

    Submitted 16 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: ACL 2024 Long Paper