Skip to main content

Showing 1–50 of 264 results for author: Yun, S

  1. arXiv:2410.15876  [pdf, other

    cs.LG cs.AI cs.MA

    FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL

    Authors: Woosung Koh, Wonbeen Oh, Siyeol Kim, Suhin Shin, Hyeongjin Kim, Jaein Jang, Junghyun Lee, Se-Young Yun

    Abstract: Multi-agent reinforcement learning has demonstrated significant potential in addressing complex cooperative tasks across various real-world applications. However, existing MARL approaches often rely on the restrictive assumption that the number of entities (e.g., agents, obstacles) remains constant between training and inference. This overlooks scenarios where entities are dynamically removed or a… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: NeurIPS '24 Open-World Agents Workshop

  2. arXiv:2410.13621  [pdf, other

    cs.CV

    Enhanced Prompt-leveraged Weakly Supervised Cancer Segmentation based on Segment Anything

    Authors: Joonhyeon Song, Seohwan Yun, Seongho Yoon, Joohyeok Kim, Sangmin Lee

    Abstract: This work proposes a novel approach beyond supervised learning for effective pathological image analysis, addressing the challenge of limited robust labeled data. Pathological diagnosis of diseases like cancer has conventionally relied on the evaluation of morphological features by physicians and pathologists. However, recent advancements in compute-aided diagnosis (CAD) systems are gaining signif… ▽ More

    Submitted 21 October, 2024; v1 submitted 17 October, 2024; originally announced October 2024.

    Comments: 10 pages, 7 figures

  3. arXiv:2410.10870  [pdf, other

    cs.CL cs.AI cs.LG

    PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches

    Authors: Rana Muhammad Shahroz Khan, Pingzhi Li, Sukwon Yun, Zhenyu Wang, Shahriar Nirjon, Chau-Wai Wong, Tianlong Chen

    Abstract: As large language models (LLMs) increasingly shape the AI landscape, fine-tuning pretrained models has become more popular than in the pre-LLM era for achieving optimal performance in domain-specific tasks. However, pretrained LLMs such as ChatGPT are periodically evolved, i.e., model parameters are frequently updated), making it challenging for downstream users with limited resources to keep up w… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  4. arXiv:2410.10166  [pdf, other

    cs.LG cs.AI

    Automated Filtering of Human Feedback Data for Aligning Text-to-Image Diffusion Models

    Authors: Yongjin Yang, Sihyeon Kim, Hojung Jung, Sangmin Bae, SangMook Kim, Se-Young Yun, Kimin Lee

    Abstract: Fine-tuning text-to-image diffusion models with human feedback is an effective method for aligning model behavior with human intentions. However, this alignment process often suffers from slow convergence due to the large size and noise present in human feedback datasets. In this work, we propose FiFA, a novel automated data filtering algorithm designed to enhance the fine-tuning of diffusion mode… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  5. arXiv:2410.08245  [pdf, other

    cs.LG cs.AI

    Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts

    Authors: Sukwon Yun, Inyoung Choi, Jie Peng, Yangfan Wu, Jingxuan Bao, Qiyiwen Zhang, Jiayi Xin, Qi Long, Tianlong Chen

    Abstract: Multimodal learning has gained increasing importance across various fields, offering the ability to integrate data from diverse sources such as images, text, and personalized records, which are frequently observed in medical domains. However, in scenarios where some modalities are missing, many existing frameworks struggle to accommodate arbitrary modality combinations, often relying heavily on a… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

    Comments: NeurIPS 2024 Spotlight

  6. arXiv:2410.05628  [pdf, other

    cs.AI

    Versatile Motion Langauge Models for Multi-Turn Interactive Agents

    Authors: Jeongeun Park, Sungjoon Choi, Sangdoo Yun

    Abstract: Recent advancements in large language models (LLMs) have greatly enhanced their ability to generate natural and contextually relevant text, making AI interactions more human-like. However, generating and understanding interactive human-like motion, where two individuals engage in coordinated movements, remains a challenge due to the complexity of modeling these coordinated interactions. Furthermor… ▽ More

    Submitted 14 October, 2024; v1 submitted 7 October, 2024; originally announced October 2024.

    Comments: https://vim-motion-language.github.io/

  7. arXiv:2410.03782  [pdf, other

    cs.LG cs.CV

    DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation

    Authors: Changdae Oh, Yixuan Li, Kyungwoo Song, Sangdoo Yun, Dongyoon Han

    Abstract: Adapting a pre-trained foundation model on downstream tasks should ensure robustness against distribution shifts without the need to retrain the whole model. Although existing weight interpolation methods are simple yet effective, we argue their static nature limits downstream performance while achieving efficiency. In this work, we propose DaWin, a training-free dynamic weight interpolation metho… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  8. arXiv:2410.02506  [pdf, other

    cs.MA cs.LG

    Cut the Crap: An Economical Communication Pipeline for LLM-based Multi-Agent Systems

    Authors: Guibin Zhang, Yanwei Yue, Zhixun Li, Sukwon Yun, Guancheng Wan, Kun Wang, Dawei Cheng, Jeffrey Xu Yu, Tianlong Chen

    Abstract: Recent advancements in large language model (LLM)-powered agents have shown that collective intelligence can significantly outperform individual capabilities, largely attributed to the meticulously designed inter-agent communication topologies. Though impressive in performance, existing multi-agent pipelines inherently introduce substantial token overhead, as well as increased economic costs, whic… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  9. arXiv:2409.15889  [pdf, other

    cs.CV

    CAD: Memory Efficient Convolutional Adapter for Segment Anything

    Authors: Joohyeok Kim, Joonhyeon Song, Seohwan Yun, Seongho Yoon, Sangmin Lee

    Abstract: The Foundation model for image segmentation, Segment Anything (SAM), has been actively researched in various fields since its proposal. Various researches have been proposed to adapt SAM to specific domains, with one notable approach involving the addition and training of lightweight adapter modules. While adapter-based fine-tuning approaches have reported parameter efficiency and significant perf… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

    Comments: 14 pages

  10. arXiv:2409.09882  [pdf, other

    eess.SY cs.RO

    Safe Control of Quadruped in Varying Dynamics via Safety Index Adaptation

    Authors: Kai S. Yun, Rui Chen, Chase Dunaway, John M. Dolan, Changliu Liu

    Abstract: Varying dynamics pose a fundamental difficulty when deploying safe control laws in the real world. Safety Index Synthesis (SIS) deeply relies on the system dynamics and once the dynamics change, the previously synthesized safety index becomes invalid. In this work, we show the real-time efficacy of Safety Index Adaptation (SIA) in varying dynamics. SIA enables real-time adaptation to the changing… ▽ More

    Submitted 15 September, 2024; originally announced September 2024.

  11. arXiv:2409.07808  [pdf, other

    cs.LG

    FedHide: Federated Learning by Hiding in the Neighbors

    Authors: Hyunsin Park, Sungrack Yun

    Abstract: We propose a prototype-based federated learning method designed for embedding networks in classification or verification tasks. Our focus is on scenarios where each client has data from a single class. The main challenge is to develop an embedding network that can distinguish between different classes while adhering to privacy constraints. Sharing true class prototypes with the server or other cli… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

    Comments: ECCV 2024

  12. arXiv:2409.07787  [pdf, other

    cs.CL

    Stable Language Model Pre-training by Reducing Embedding Variability

    Authors: Woojin Chung, Jiwoo Hong, Na Min An, James Thorne, Se-Young Yun

    Abstract: Stable pre-training is essential for achieving better-performing language models. However, tracking pre-training stability by calculating gradient variance at every step is impractical due to the significant computational costs. We explore Token Embedding Variability (TEV) as a simple and efficient proxy for assessing pre-training stability in language models with pre-layer normalization, given th… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  13. arXiv:2409.01141  [pdf, other

    cs.AR cs.LG

    Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching

    Authors: Sungmin Yun, Kwanhee Kyung, Juhwan Cho, Jaewan Choi, Jongmin Kim, Byeongho Kim, Sukhan Lee, Kyomin Sohn, Jung Ho Ahn

    Abstract: Large language models (LLMs) have emerged due to their capability to generate high-quality content across diverse contexts. To reduce their explosively increasing demands for computing resources, a mixture of experts (MoE) has emerged. The MoE layer enables exploiting a huge number of parameters with less computation. Applying state-of-the-art continuous batching increases throughput; however, it… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 15 pages, 16 figures, accepted at MICRO 2024

  14. arXiv:2408.16218  [pdf, other

    cs.LG stat.ML

    Targeted Cause Discovery with Data-Driven Learning

    Authors: Jang-Hyun Kim, Claudia Skok Gibbs, Sangdoo Yun, Hyun Oh Song, Kyunghyun Cho

    Abstract: We propose a novel machine learning approach for inferring causal variables of a target variable from observations. Our goal is to identify both direct and indirect causes within a system, thereby efficiently regulating the target variable when the difficulty and cost of intervening on each causal variable vary. Our method employs a neural network trained to identify causality through supervised l… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: preprint

  15. arXiv:2408.13092  [pdf, other

    cs.LG

    Diffusion-based Episodes Augmentation for Offline Multi-Agent Reinforcement Learning

    Authors: Jihwan Oh, Sungnyun Kim, Gahee Kim, Sunghwan Kim, Se-Young Yun

    Abstract: Offline multi-agent reinforcement learning (MARL) is increasingly recognized as crucial for effectively deploying RL algorithms in environments where real-time interaction is impractical, risky, or costly. In the offline setting, learning from a static dataset of past interactions allows for the development of robust and safe policies without the need for live data collection, which can be fraught… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted by SPIGM Workshop at ICML 2024 (Structured Probabilistic Inference & Generative Modeling)

  16. arXiv:2408.11063  [pdf, other

    cs.CL cs.AI cs.LG

    Tabular Transfer Learning via Prompting LLMs

    Authors: Jaehyun Nam, Woomin Song, Seong Hyeon Park, Jihoon Tack, Sukmin Yun, Jaehyung Kim, Kyu Hwan Oh, Jinwoo Shin

    Abstract: Learning with a limited number of labeled data is a central problem in real-world applications of machine learning, as it is often expensive to obtain annotations. To deal with the scarcity of labeled data, transfer learning is a conventional approach; it suggests to learn a transferable knowledge by training a neural network from multiple other sources. In this paper, we investigate transfer lear… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: COLM 2024

  17. arXiv:2408.09674  [pdf, other

    cs.CV

    Implicit Grid Convolution for Multi-Scale Image Super-Resolution

    Authors: Dongheon Lee, Seokju Yun, Youngmin Ro

    Abstract: Recently, Super-Resolution (SR) achieved significant performance improvement by employing neural networks. Most SR methods conventionally train a single model for each targeted scale, which increases redundancy in training and deployment in proportion to the number of scales targeted. This paper challenges this conventional fixed-scale approach. Our preliminary analysis reveals that, surprisingly,… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  18. arXiv:2408.07327  [pdf, other

    cs.LG cs.AI

    An Offline Meta Black-box Optimization Framework for Adaptive Design of Urban Traffic Light Management Systems

    Authors: Taeyoung Yun, Kanghoon Lee, Sujin Yun, Ilmyung Kim, Won-Woo Jung, Min-Cheol Kwon, Kyujin Choi, Yoohyeon Lee, Jinkyoo Park

    Abstract: Complex urban road networks with high vehicle occupancy frequently face severe traffic congestion. Designing an effective strategy for managing multiple traffic lights plays a crucial role in managing congestion. However, most current traffic light management systems rely on human-crafted decisions, which may not adapt well to diverse traffic patterns. In this paper, we delve into two pivotal desi… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 12 pages, 7 figures, 10 tables

  19. arXiv:2408.05337  [pdf, other

    cs.CV cs.AI

    VACoDe: Visual Augmented Contrastive Decoding

    Authors: Sihyeon Kim, Boryeong Cho, Sangmin Bae, Sumyeong Ahn, Se-Young Yun

    Abstract: Despite the astonishing performance of recent Large Vision-Language Models (LVLMs), these models often generate inaccurate responses. To address this issue, previous studies have focused on mitigating hallucinations by employing contrastive decoding (CD) with augmented images, which amplifies the contrast with the original image. However, these methods have limitations, including reliance on a sin… ▽ More

    Submitted 26 July, 2024; originally announced August 2024.

    Comments: 10 pages, 7 figures

    MSC Class: 68T01 ACM Class: I.2.0

  20. arXiv:2407.21035  [pdf, other

    cs.CV

    Direct Unlearning Optimization for Robust and Safe Text-to-Image Models

    Authors: Yong-Hyun Park, Sangdoo Yun, Jin-Hwa Kim, Junho Kim, Geonhui Jang, Yonghyun Jeong, Junghyo Jo, Gayoung Lee

    Abstract: Recent advancements in text-to-image (T2I) models have greatly benefited from large-scale datasets, but they also pose significant risks due to the potential generation of unsafe content. To mitigate this issue, researchers have developed unlearning techniques to remove the model's ability to generate potentially harmful content. However, these methods are easily bypassed by adversarial attacks, m… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Extended abstract accepted in GenLaw 2024 workshop @ ICML2024

  21. arXiv:2407.17857  [pdf, other

    cs.CV cs.AI

    Mew: Multiplexed Immunofluorescence Image Analysis through an Efficient Multiplex Network

    Authors: Sukwon Yun, Jie Peng, Alexandro E. Trevino, Chanyoung Park, Tianlong Chen

    Abstract: Recent advancements in graph-based approaches for multiplexed immunofluorescence (mIF) images have significantly propelled the field forward, offering deeper insights into patient-level phenotyping. However, current graph-based methodologies encounter two primary challenges: (1) Cellular Heterogeneity, where existing approaches fail to adequately address the inductive biases inherent in graphs, pa… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  22. arXiv:2407.13977  [pdf, other

    stat.ML cs.LG

    A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits

    Authors: Junghyun Lee, Se-Young Yun, Kwang-Sung Jun

    Abstract: We present a unified likelihood ratio-based confidence sequence (CS) for any (self-concordant) generalized linear models (GLMs) that is guaranteed to be convex and numerically tight. We show that this is on par or improves upon known CSs for various GLMs, including Gaussian, Bernoulli, and Poisson. In particular, for the first time, our CS for Bernoulli has a poly(S)-free radius where S is the nor… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 31 pages, 1 figure, 2 tables

  23. arXiv:2407.13078  [pdf, other

    cs.CV cs.AI

    Enhancing Temporal Action Localization: Advanced S6 Modeling with Recurrent Mechanism

    Authors: Sangyoun Lee, Juho Jung, Changdae Oh, Sunghee Yun

    Abstract: Temporal Action Localization (TAL) is a critical task in video analysis, identifying precise start and end times of actions. Existing methods like CNNs, RNNs, GCNs, and Transformers have limitations in capturing long-range dependencies and temporal causality. To address these challenges, we propose a novel TAL architecture leveraging the Selective State Space Model (S6). Our approach integrates th… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 8 pages, 3 figures, Preprint

  24. arXiv:2407.08245  [pdf, other

    cs.LG cs.CV

    Feature Diversification and Adaptation for Federated Domain Generalization

    Authors: Seunghan Yang, Seokeon Choi, Hyunsin Park, Sungha Choi, Simyung Chang, Sungrack Yun

    Abstract: Federated learning, a distributed learning paradigm, utilizes multiple clients to build a robust global model. In real-world applications, local clients often operate within their limited domains, leading to a `domain shift' across clients. Privacy concerns limit each client's learning to its own domain data, which increase the risk of overfitting. Moreover, the process of aggregating models train… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  25. arXiv:2407.06123  [pdf, other

    cs.HC

    Investigating User Perceptions of Collaborative Agenda Setting in Virtual Health Counseling Session

    Authors: Mina Fallah, Farnaz Nouraei, Hye Sun Yun, Timothy Bickmore

    Abstract: Virtual health counselors offer the potential to provide users with information and counseling in complex areas such as disease management and health education. However, ensuring user engagement is challenging, particularly when the volume of information and length of counseling sessions increase. Agenda setting a clinical counseling technique where a patient and clinician collaboratively decide o… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  26. arXiv:2407.03563  [pdf, other

    eess.AS cs.CL cs.LG eess.IV

    Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition

    Authors: Sungnyun Kim, Kangwook Jang, Sangmin Bae, Hoirin Kim, Se-Young Yun

    Abstract: Audio-visual speech recognition (AVSR) aims to transcribe human speech using both audio and video modalities. In practical environments with noise-corrupted audio, the role of video information becomes crucial. However, prior works have primarily focused on enhancing audio features in AVSR, overlooking the importance of video features. In this study, we strengthen the video features by learning th… ▽ More

    Submitted 14 October, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted at SLT 2024 Main Conference; Code is available at https://github.com/sungnyun/avsr-temporal-dynamics

  27. arXiv:2407.01639  [pdf, other

    cs.LG cs.SE

    ModelVerification.jl: a Comprehensive Toolbox for Formally Verifying Deep Neural Networks

    Authors: Tianhao Wei, Luca Marzari, Kai S. Yun, Hanjiang Hu, Peizhi Niu, Xusheng Luo, Changliu Liu

    Abstract: Deep Neural Networks (DNN) are crucial in approximating nonlinear functions across diverse applications, ranging from image classification to control. Verifying specific input-output properties can be a highly challenging task due to the lack of a single, self-contained framework that allows a complete range of verification types. To this end, we present \texttt{ModelVerification.jl (MV)}, the fir… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  28. arXiv:2407.01624  [pdf, other

    cs.LG cs.AI

    Guided Trajectory Generation with Diffusion Models for Offline Model-based Optimization

    Authors: Taeyoung Yun, Sujin Yun, Jaewoo Lee, Jinkyoo Park

    Abstract: Optimizing complex and high-dimensional black-box functions is ubiquitous in science and engineering fields. Unfortunately, the online evaluation of these functions is restricted due to time and safety constraints in most cases. In offline model-based optimization (MBO), we aim to find a design that maximizes the target function using only a pre-existing offline dataset. While prior methods consid… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 29 pages, 11 figures, 17 tables

  29. arXiv:2407.00693  [pdf, other

    cs.AI cs.CL cs.LG

    BAPO: Base-Anchored Preference Optimization for Overcoming Forgetting in Large Language Models Personalization

    Authors: Gihun Lee, Minchan Jeong, Yujin Kim, Hojung Jung, Jaehoon Oh, Sangmook Kim, Se-Young Yun

    Abstract: While learning to align Large Language Models (LLMs) with human preferences has shown remarkable success, aligning these models to meet the diverse user preferences presents further challenges in preserving previous knowledge. This paper examines the impact of personalized preference optimization on LLMs, revealing that the extent of knowledge loss varies significantly with preference heterogeneit… ▽ More

    Submitted 29 September, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

    Comments: The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024)

  30. arXiv:2406.20098  [pdf, other

    cs.CV cs.AI cs.CL

    Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

    Authors: Sukmin Yun, Haokun Lin, Rusiru Thushara, Mohammad Qazim Bhat, Yongxin Wang, Zutao Jiang, Mingkai Deng, Jinhong Wang, Tianhua Tao, Junbo Li, Haonan Li, Preslav Nakov, Timothy Baldwin, Zhengzhong Liu, Eric P. Xing, Xiaodan Liang, Zhiqiang Shen

    Abstract: Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio in a variety of understanding and generation tasks. However, current MLLMs are surprisingly poor at understanding webpage screenshots and generating their corresponding HTML code. To address this problem, we propose Web2Code, a benchmark consisting of a new large-scale webpage-t… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Website at https://mbzuai-llm.github.io/webpage2code/

  31. arXiv:2406.18815  [pdf, other

    cs.LG

    MissionGNN: Hierarchical Multimodal GNN-based Weakly Supervised Video Anomaly Recognition with Mission-Specific Knowledge Graph Generation

    Authors: Sanggeon Yun, Ryozo Masukawa, Minhyoung Na, Mohsen Imani

    Abstract: In the context of escalating safety concerns across various domains, the tasks of Video Anomaly Detection (VAD) and Video Anomaly Recognition (VAR) have emerged as critically important for applications in intelligent surveillance, evidence investigation, violence alerting, etc. These tasks, aimed at identifying and classifying deviations from normal behavior in video data, face significant challen… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  32. arXiv:2406.16758  [pdf, other

    cs.CL

    Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters

    Authors: Euiin Yi, Taehyeon Kim, Hongseok Jeung, Du-Seong Chang, Se-Young Yun

    Abstract: Large language models (LLMs) have revolutionized natural language processing and broadened their applicability across diverse commercial applications. However, the deployment of these models is constrained by high inference time in multilingual settings. To mitigate this challenge, this paper explores a training recipe of an assistant model in speculative decoding, which are leveraged to draft and… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  33. arXiv:2406.02657  [pdf, other

    cs.CL cs.AI cs.LG

    Block Transformer: Global-to-Local Language Modeling for Fast Inference

    Authors: Namgyu Ho, Sangmin Bae, Taehyeon Kim, Hyunjik Jo, Yireun Kim, Tal Schuster, Adam Fisch, James Thorne, Se-Young Yun

    Abstract: This paper presents the Block Transformer architecture which adopts hierarchical global-to-local modeling to autoregressive transformers to mitigate the inference bottlenecks of self-attention. To apply self-attention, the key-value (KV) cache of all previous sequences must be retrieved from memory at every decoding step. Thereby, this KV cache IO becomes a significant bottleneck in batch inferenc… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 30 pages, 21 figures, 5 tables

  34. arXiv:2406.02355  [pdf, other

    cs.CV cs.AI cs.DC cs.LG

    FedDr+: Stabilizing Dot-regression with Global Feature Distillation for Federated Learning

    Authors: Seongyoon Kim, Minchan Jeong, Sungnyun Kim, Sungwoo Cho, Sumyeong Ahn, Se-Young Yun

    Abstract: Federated Learning (FL) has emerged as a pivotal framework for the development of effective global models (global FL) or personalized models (personalized FL) across clients with heterogeneous, non-iid data distribution. A key challenge in FL is client drift, where data heterogeneity impedes the aggregation of scattered knowledge. Recent studies have tackled the client drift issue by identifying s… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  35. arXiv:2406.02021  [pdf, other

    cs.CV cs.AI cs.LG

    MetaMixer Is All You Need

    Authors: Seokju Yun, Dongheon Lee, Youngmin Ro

    Abstract: Transformer, composed of self-attention and Feed-Forward Network, has revolutionized the landscape of network design across various vision tasks. FFN is a versatile operator seamlessly integrated into nearly all AI models to effectively harness rich representations. Recent works also show that FFN functions like key-value memories. Thus, akin to the query-key-value mechanism within self-attention,… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Code: https://github.com/ysj9909/FFNet

  36. arXiv:2405.19806  [pdf, other

    cs.LG

    Preference Alignment with Flow Matching

    Authors: Minu Kim, Yongsik Lee, Sehyeok Kang, Jihwan Oh, Song Chong, Seyoung Yun

    Abstract: We present Preference Flow Matching (PFM), a new framework for preference-based reinforcement learning (PbRL) that streamlines the integration of preferences into an arbitrary class of pre-trained models. Existing PbRL methods require fine-tuning pre-trained models, which presents challenges such as scalability, inefficiency, and the need for model modifications, especially with black-box APIs lik… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  37. arXiv:2405.18027  [pdf, other

    cs.CL

    TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models

    Authors: Jaewoo Ahn, Taehyun Lee, Junyoung Lim, Jin-Hwa Kim, Sangdoo Yun, Hwaran Lee, Gunhee Kim

    Abstract: While Large Language Models (LLMs) can serve as agents to simulate human behaviors (i.e., role-playing agents), we emphasize the importance of point-in-time role-playing. This situates characters at specific moments in the narrative progression for three main reasons: (i) enhancing users' narrative immersion, (ii) avoiding spoilers, and (iii) fostering engagement in fandom role-playing. To accurat… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: ACL 2024 Findings. Code and dataset are released at https://ahnjaewoo.github.io/timechara

  38. arXiv:2405.17995  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture

    Authors: Shentong Mo, Sukmin Yun

    Abstract: The joint-embedding predictive architecture (JEPA) recently has shown impressive results in extracting visual representations from unlabeled imagery under a masking strategy. However, we reveal its disadvantages, notably its insufficient understanding of local semantics. This deficiency originates from masked modeling in the embedding space, resulting in a reduction of discriminative power and can… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  39. arXiv:2405.16907  [pdf, other

    cs.AI cs.LG

    GTA: Generative Trajectory Augmentation with Guidance for Offline Reinforcement Learning

    Authors: Jaewoo Lee, Sujin Yun, Taeyoung Yun, Jinkyoo Park

    Abstract: Offline Reinforcement Learning (Offline RL) presents challenges of learning effective decision-making policies from static datasets without any online interactions. Data augmentation techniques, such as noise injection and data synthesizing, aim to improve Q-function approximation by smoothing the learned state-action region. However, these methods often fall short of directly improving the qualit… ▽ More

    Submitted 12 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted (Spotlight) to ICLR 2024 Workshop on Generative Models for Decision Making. Jaewoo Lee and Sujin Yun are equal contribution authors

  40. arXiv:2405.13396  [pdf, other

    cs.LG stat.ML

    Why In-Context Learning Transformers are Tabular Data Classifiers

    Authors: Felix den Breejen, Sangmin Bae, Stephen Cha, Se-Young Yun

    Abstract: The recently introduced TabPFN pretrains an In-Context Learning (ICL) transformer on synthetic data to perform tabular data classification. As synthetic data does not share features or labels with real-world data, the underlying mechanism that contributes to the success of this method remains unclear. This study provides an explanation by demonstrating that ICL-transformers acquire the ability to… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 9 pages main body, 22 pages total. Preprint under review

  41. arXiv:2405.07857  [pdf, other

    cs.CV cs.AI

    Synergistic Integration of Coordinate Network and Tensorial Feature for Improving Neural Radiance Fields from Sparse Inputs

    Authors: Mingyu Kim, Jun-Seong Kim, Se-Young Yun, Jin-Hwa Kim

    Abstract: The multi-plane representation has been highlighted for its fast training and inference across static and dynamic neural radiance fields. This approach constructs relevant features via projection onto learnable grids and interpolating adjacent vertices. However, it has limitations in capturing low-frequency details and tends to overuse parameters for low-frequency features due to its bias toward f… ▽ More

    Submitted 5 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: ICML2024 ; Project page is accessible at https://mingyukim87.github.io/SynergyNeRF ; Code is available at https://github.com/MingyuKim87/SynergyNeRF

  42. arXiv:2405.04819  [pdf, other

    cs.CL cs.AI

    DALK: Dynamic Co-Augmentation of LLMs and KG to answer Alzheimer's Disease Questions with Scientific Literature

    Authors: Dawei Li, Shu Yang, Zhen Tan, Jae Young Baik, Sukwon Yun, Joseph Lee, Aaron Chacko, Bojian Hou, Duy Duong-Tran, Ying Ding, Huan Liu, Li Shen, Tianlong Chen

    Abstract: Recent advancements in large language models (LLMs) have achieved promising performances across various applications. Nonetheless, the ongoing challenge of integrating long-tail knowledge continues to impede the seamless adoption of LLMs in specialized domains. In this work, we introduce DALK, a.k.a. Dynamic Co-Augmentation of LLMs and KG, to address this limitation and demonstrate its ability on… ▽ More

    Submitted 17 October, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted by EMNLP 2024 Findings; revise format problem

  43. arXiv:2405.04497  [pdf, other

    cs.HC

    Unveiling Disparities in Web Task Handling Between Human and Web Agent

    Authors: Kihoon Son, Jinhyeon Kwon, DaEun Choi, Tae Soo Kim, Young-Ho Kim, Sangdoo Yun, Juho Kim

    Abstract: With the advancement of Large-Language Models (LLMs) and Large Vision-Language Models (LVMs), agents have shown significant capabilities in various tasks, such as data analysis, gaming, or code generation. Recently, there has been a surge in research on web agents, capable of performing tasks within the web environment. However, the web poses unforeseeable scenarios, challenging the generalizabili… ▽ More

    Submitted 8 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  44. arXiv:2405.01686  [pdf, other

    cs.CL cs.AI

    Automatically Extracting Numerical Results from Randomized Controlled Trials with Large Language Models

    Authors: Hye Sun Yun, David Pogrebitskiy, Iain J. Marshall, Byron C. Wallace

    Abstract: Meta-analyses statistically aggregate the findings of different randomized controlled trials (RCTs) to assess treatment effectiveness. Because this yields robust estimates of treatment effectiveness, results from meta-analyses are considered the strongest form of evidence. However, rigorous evidence syntheses are time-consuming and labor-intensive, requiring manual extraction of data from individu… ▽ More

    Submitted 24 July, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: 25 pages, 7 figures, 6 tables, MLHC 2024

  45. arXiv:2405.01588  [pdf, other

    cs.CL cs.AI

    Towards Unbiased Evaluation of Detecting Unanswerable Questions in EHRSQL

    Authors: Yongjin Yang, Sihyeon Kim, SangMook Kim, Gyubok Lee, Se-Young Yun, Edward Choi

    Abstract: Incorporating unanswerable questions into EHR QA systems is crucial for testing the trustworthiness of a system, as providing non-existent responses can mislead doctors in their diagnoses. The EHRSQL dataset stands out as a promising benchmark because it is the only dataset that incorporates unanswerable questions in the EHR QA system alongside practical questions. However, in this work, we identi… ▽ More

    Submitted 28 April, 2024; originally announced May 2024.

    Comments: DPFM Workshop, ICLR 2024

  46. arXiv:2404.17507  [pdf, other

    cs.CV

    HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts

    Authors: Wonjae Kim, Sanghyuk Chun, Taekyung Kim, Dongyoon Han, Sangdoo Yun

    Abstract: In an era where the volume of data drives the effectiveness of self-supervised learning, the specificity and clarity of data semantics play a crucial role in model training. Addressing this, we introduce HYPerbolic Entailment filtering (HYPE), a novel methodology designed to meticulously extract modality-wise meaningful and well-aligned data from extensive, noisy image-text pair datasets. Our appr… ▽ More

    Submitted 16 July, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: ECCV 2024; 33pages, 4.5MB

  47. arXiv:2404.14202  [pdf, other

    cs.LG stat.ML

    An Adaptive Approach for Infinitely Many-armed Bandits under Generalized Rotting Constraints

    Authors: Jung-hun Kim, Milan Vojnovic, Se-Young Yun

    Abstract: In this study, we consider the infinitely many-armed bandit problems in a rested rotting setting, where the mean reward of an arm may decrease with each pull, while otherwise, it remains unchanged. We explore two scenarios regarding the rotting of rewards: one in which the cumulative amount of rotting is bounded by $V_T$, referred to as the slow-rotting case, and the other in which the cumulative… ▽ More

    Submitted 13 October, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: NeurIPS 2024

  48. arXiv:2404.13949  [pdf, other

    cs.CV cs.RO

    PeLiCal: Targetless Extrinsic Calibration via Penetrating Lines for RGB-D Cameras with Limited Co-visibility

    Authors: Jaeho Shin, Seungsang Yun, Ayoung Kim

    Abstract: RGB-D cameras are crucial in robotic perception, given their ability to produce images augmented with depth data. However, their limited FOV often requires multiple cameras to cover a broader area. In multi-camera RGB-D setups, the goal is typically to reduce camera overlap, optimizing spatial coverage with as few cameras as possible. The extrinsic calibration of these systems introduces additiona… ▽ More

    Submitted 23 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  49. arXiv:2404.11848  [pdf, other

    cs.CV

    Partial Large Kernel CNNs for Efficient Super-Resolution

    Authors: Dongheon Lee, Seokju Yun, Youngmin Ro

    Abstract: Recently, in the super-resolution (SR) domain, transformers have outperformed CNNs with fewer FLOPs and fewer parameters since they can deal with long-range dependency and adaptively adjust weights based on instance. In this paper, we demonstrate that CNNs, although less focused on in the current SR domain, surpass Transformers in direct efficiency measures. By incorporating the advantages of Tran… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  50. arXiv:2404.11025  [pdf, other

    cs.CV

    NeuroHash: A Hyperdimensional Neuro-Symbolic Framework for Spatially-Aware Image Hashing and Retrieval

    Authors: Sanggeon Yun, Ryozo Masukawa, SungHeon Jeong, Mohsen Imani

    Abstract: Customizable image retrieval from large datasets remains a critical challenge, particularly when preserving spatial relationships within images. Traditional hashing methods, primarily based on deep learning, often fail to capture spatial information adequately and lack transparency. In this paper, we introduce NeuroHash, a novel neuro-symbolic framework leveraging Hyperdimensional Computing (HDC)… ▽ More

    Submitted 22 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.