Skip to main content

Showing 1–50 of 356 results for author: Zou, J

  1. arXiv:2410.15778  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Reducing Hallucinations in Vision-Language Models via Latent Space Steering

    Authors: Sheng Liu, Haotian Ye, James Zou

    Abstract: Hallucination poses a challenge to the deployment of large vision-language models (LVLMs) in applications. Unlike in large language models (LLMs), hallucination in LVLMs often arises from misalignments between visual inputs and textual outputs. This paper investigates the underlying mechanisms of hallucination, focusing on the unique structure of LVLMs that distinguishes them from large language m… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Comments: 21 pages

  2. arXiv:2410.13085  [pdf, other

    cs.LG cs.CL cs.CV

    MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models

    Authors: Peng Xia, Kangyu Zhu, Haoran Li, Tianze Wang, Weijia Shi, Sheng Wang, Linjun Zhang, James Zou, Huaxiu Yao

    Abstract: Artificial Intelligence (AI) has demonstrated significant potential in healthcare, particularly in disease diagnosis and treatment planning. Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools. However, these models often suffer from factual hallucination, which can lead to incorrect diagnoses. Fine-tuning and retriev… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  3. arXiv:2410.11087  [pdf, other

    cs.CV

    Locality Alignment Improves Vision-Language Models

    Authors: Ian Covert, Tony Sun, James Zou, Tatsunori Hashimoto

    Abstract: Vision language models (VLMs) have seen growing adoption in recent years, but many still struggle with basic spatial reasoning errors. We hypothesize that this is due to VLMs adopting pre-trained vision backbones, specifically vision transformers (ViTs) trained with image-level supervision and minimal inductive biases. Such models may fail to encode the class contents at each position in the image… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  4. arXiv:2410.08474  [pdf, other

    cs.CV cs.CL

    SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models

    Authors: Haotian Xia, Zhengbang Yang, Junbo Zou, Rhys Tracy, Yuqing Wang, Chi Lu, Christopher Lai, Yanjun He, Xun Shao, Zhuoqing Xie, Yuan-fang Wang, Weining Shen, Hanjie Chen

    Abstract: Multimodal Large Language Models (MLLMs) are advancing the ability to reason about complex sports scenarios by integrating textual and visual information. To comprehensively evaluate their capabilities, we introduce SPORTU, a benchmark designed to assess MLLMs across multi-level sports reasoning tasks. SPORTU comprises two key components: SPORTU-text, featuring 900 multiple-choice questions with h… ▽ More

    Submitted 19 October, 2024; v1 submitted 10 October, 2024; originally announced October 2024.

  5. arXiv:2410.05495  [pdf, other

    cs.CL

    Self-rationalization improves LLM as a fine-grained judge

    Authors: Prapti Trivedi, Aditya Gulati, Oliver Molenschot, Meghana Arakkal Rajeev, Rajkumar Ramamurthy, Keith Stevens, Tanveesh Singh Chaudhery, Jahnavi Jambholkar, James Zou, Nazneen Rajani

    Abstract: LLM-as-a-judge models have been used for evaluating both human and AI generated content, specifically by providing scores and rationales. Rationales, in addition to increasing transparency, help models learn to calibrate its judgments. Enhancing a model's rationale can therefore improve its calibration abilities and ultimately the ability to score content. We introduce Self-Rationalization, an ite… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

  6. arXiv:2409.18968  [pdf, other

    cs.CY cs.AI cs.LG

    Safety challenges of AI in medicine

    Authors: Xiaoye Wang, Nicole Xi Zhang, Hongyu He, Trang Nguyen, Kun-Hsing Yu, Hao Deng, Cynthia Brandt, Danielle S. Bitterman, Ling Pan, Ching-Yu Cheng, James Zou, Dianbo Liu

    Abstract: Recent advancements in artificial intelligence (AI), particularly in deep learning and large language models (LLMs), have accelerated their integration into medicine. However, these developments have also raised public concerns about the safe application of AI. In healthcare, these concerns are especially pertinent, as the ethical and secure deployment of AI is crucial for protecting patient healt… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

  7. Privacy-Preserving Redaction of Diagnosis Data through Source Code Analysis

    Authors: Lixi Zhou, Lei Yu, Jia Zou, Hong Min

    Abstract: Protecting sensitive information in diagnostic data such as logs, is a critical concern in the industrial software diagnosis and debugging process. While there are many tools developed to automatically redact the logs for identifying and removing sensitive information, they have severe limitations which can cause either over redaction and loss of critical diagnostic information (false positives),… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

    Journal ref: Proceedings of the 35th International Conference on Scientific and Statistical Database Management (SSDBM 2023)

  8. arXiv:2409.15761  [pdf, other

    cs.LG cs.AI

    TFG: Unified Training-Free Guidance for Diffusion Models

    Authors: Haotian Ye, Haowei Lin, Jiaqi Han, Minkai Xu, Sheng Liu, Yitao Liang, Jianzhu Ma, James Zou, Stefano Ermon

    Abstract: Given an unconditional diffusion model and a predictor for a target property of interest (e.g., a classifier), the goal of training-free guidance is to generate samples with desirable target properties without additional training. Existing methods, though effective in various individual applications, often lack theoretical grounding and rigorous testing on extensive benchmarks. As a result, they c… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

  9. arXiv:2409.00671  [pdf, other

    cs.CE

    InvariantStock: Learning Invariant Features for Mastering the Shifting Market

    Authors: Haiyao Cao, Jinan Zou, Yuhang Liu, Zhen Zhang, Ehsan Abbasnejad, Anton van den Hengel, Javen Qinfeng Shi

    Abstract: Accurately predicting stock returns is crucial for effective portfolio management. However, existing methods often overlook a fundamental issue in the market, namely, distribution shifts, making them less practical for predicting future markets or newly listed stocks. This study introduces a novel approach to address this challenge by focusing on the acquisition of invariant features across variou… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  10. arXiv:2408.17421  [pdf, other

    eess.IV cs.CV

    Generative AI Enables Medical Image Segmentation in Ultra Low-Data Regimes

    Authors: Li Zhang, Basu Jindal, Ahmed Alaa, Robert Weinreb, David Wilson, Eran Segal, James Zou, Pengtao Xie

    Abstract: Semantic segmentation of medical images is pivotal in applications like disease diagnosis and treatment planning. While deep learning has excelled in automating this task, a major hurdle is the need for numerous annotated segmentation masks, which are resource-intensive to produce due to the required expertise and time. This scenario often leads to ultra low-data regimes, where annotated images ar… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  11. arXiv:2408.13498  [pdf, other

    cs.LG

    Rethinking State Disentanglement in Causal Reinforcement Learning

    Authors: Haiyao Cao, Zhen Zhang, Panpan Cai, Yuhang Liu, Jinan Zou, Ehsan Abbasnejad, Biwei Huang, Mingming Gong, Anton van den Hengel, Javen Qinfeng Shi

    Abstract: One of the significant challenges in reinforcement learning (RL) when dealing with noise is estimating latent states from observations. Causality provides rigorous theoretical support for ensuring that the underlying states can be uniquely recovered through identifiability. Consequently, some existing work focuses on establishing identifiability from a causal perspective to aid in the design of al… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  12. arXiv:2408.10280  [pdf, other

    cs.LG

    NoRA: Nested Low-Rank Adaptation for Efficient Fine-Tuning Large Models

    Authors: Cheng Lin, Lujun Li, Dezhi Li, Jie Zou, Wei Xue, Yike Guo

    Abstract: In this paper, we introduce Nested Low-Rank Adaptation (NoRA), a novel approach to parameter-efficient fine-tuning that extends the capabilities of Low-Rank Adaptation (LoRA) techniques. Vanilla LoRA overlooks pre-trained weight inheritance and still requires fine-tuning numerous parameters. To addresses these issues, our NoRA adopts a dual-layer nested structure with Singular Value Decomposition… ▽ More

    Submitted 27 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: Work in progress, revisions ongoing

  13. arXiv:2408.10236  [pdf, other

    eess.IV cs.CV

    AID-DTI: Accelerating High-fidelity Diffusion Tensor Imaging with Detail-preserving Model-based Deep Learning

    Authors: Wenxin Fan, Jian Cheng, Cheng Li, Jing Yang, Ruoyou Wu, Juan Zou, Shanshan Wang

    Abstract: Deep learning has shown great potential in accelerating diffusion tensor imaging (DTI). Nevertheless, existing methods tend to suffer from Rician noise and eddy current, leading to detail loss in reconstructing the DTI-derived parametric maps especially when sparsely sampled q-space data are used. To address this, this paper proposes a novel method, AID-DTI (\textbf{A}ccelerating h\textbf{I}gh fi\… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: 12 pages, 3 figures, MICCAI 2024 Workshop on Computational Diffusion MRI. arXiv admin note: text overlap with arXiv:2401.01693, arXiv:2405.03159

  14. arXiv:2408.06150  [pdf, other

    cs.CL physics.chem-ph q-bio.BM

    LipidBERT: A Lipid Language Model Pre-trained on METiS de novo Lipid Library

    Authors: Tianhao Yu, Cai Yao, Zhuorui Sun, Feng Shi, Lin Zhang, Kangjie Lyu, Xuan Bai, Andong Liu, Xicheng Zhang, Jiali Zou, Wenshou Wang, Chris Lai, Kai Wang

    Abstract: In this study, we generate and maintain a database of 10 million virtual lipids through METiS's in-house de novo lipid generation algorithms and lipid virtual screening techniques. These virtual lipids serve as a corpus for pre-training, lipid representation learning, and downstream task knowledge transfer, culminating in state-of-the-art LNP property prediction performance. We propose LipidBERT,… ▽ More

    Submitted 19 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

  15. arXiv:2408.04865  [pdf, other

    cs.SD cs.MM eess.AS

    TEAdapter: Supply abundant guidance for controllable text-to-music generation

    Authors: Jialing Zou, Jiahao Mei, Xudong Nan, Jinghua Li, Daoguo Dong, Liang He

    Abstract: Although current text-guided music generation technology can cope with simple creative scenarios, achieving fine-grained control over individual text-modality conditions remains challenging as user demands become more intricate. Accordingly, we introduce the TEAcher Adapter (TEAdapter), a compact plugin designed to guide the generation process with diverse control information provided by users. In… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: Accepted by ICME'24: IEEE International Conference on Multimedia and Expo

    Journal ref: 2024 IEEE International Conference on Multimedia and Expo (ICME 2024)

  16. arXiv:2408.02900  [pdf, other

    cs.CV

    MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine

    Authors: Yunfei Xie, Ce Zhou, Lang Gao, Juncheng Wu, Xianhang Li, Hong-Yu Zhou, Sheng Liu, Lei Xing, James Zou, Cihang Xie, Yuyin Zhou

    Abstract: This paper introduces MedTrinity-25M, a comprehensive, large-scale multimodal dataset for medicine, covering over 25 million images across 10 modalities, with multigranular annotations for more than 65 diseases. These enriched annotations encompass both global textual information, such as disease/lesion type, modality, region-specific descriptions, and inter-regional relationships, as well as deta… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: The project page is at https://yunfeixie233.github.io/MedTrinity-25M

  17. arXiv:2408.01690  [pdf, other

    cs.CV cs.AI cs.MM

    IDNet: A Novel Dataset for Identity Document Analysis and Fraud Detection

    Authors: Hong Guan, Yancheng Wang, Lulu Xie, Soham Nag, Rajeev Goel, Niranjan Erappa Narayana Swamy, Yingzhen Yang, Chaowei Xiao, Jonathan Prisby, Ross Maciejewski, Jia Zou

    Abstract: Effective fraud detection and analysis of government-issued identity documents, such as passports, driver's licenses, and identity cards, are essential in thwarting identity theft and bolstering security on online platforms. The training of accurate fraud detection and analysis tools depends on the availability of extensive identity document datasets. However, current publicly available benchmark… ▽ More

    Submitted 3 September, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

    Comments: 40 pages

  18. arXiv:2408.01112  [pdf, other

    cs.MA

    Agentic LLM Workflows for Generating Patient-Friendly Medical Reports

    Authors: Malavikha Sudarshan, Sophie Shih, Estella Yee, Alina Yang, John Zou, Cathy Chen, Quan Zhou, Leon Chen, Chinmay Singhal, George Shih

    Abstract: The application of Large Language Models (LLMs) in healthcare is expanding rapidly, with one potential use case being the translation of formal medical reports into patient-legible equivalents. Currently, LLM outputs often need to be edited and evaluated by a human to ensure both factual accuracy and comprehensibility, and this is true for the above use case. We aim to minimize this step by propos… ▽ More

    Submitted 5 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

    Comments: 12 pages, 7 figures

  19. arXiv:2407.16907  [pdf

    cs.CY cs.LG

    Research on Education Big Data for Students Academic Performance Analysis based on Machine Learning

    Authors: Chun Wang, Jiexiao Chen, Ziyang Xie, Jianke Zou

    Abstract: The application of the Internet in the field of education is becoming more and more popular, and a large amount of educational data is generated in the process. How to effectively use these data has always been a key issue in the field of educational data mining. In this work, a machine learning model based on Long Short-Term Memory Network (LSTM) was used to conduct an in-depth analysis of educat… ▽ More

    Submitted 24 June, 2024; originally announced July 2024.

    Comments: Education Big Data, Performance Analysis, Machine Learning, Long Short-Term Memory Network

  20. arXiv:2407.16900  [pdf, other

    cs.LG cs.AI cs.CY

    Regulating AI Adaptation: An Analysis of AI Medical Device Updates

    Authors: Kevin Wu, Eric Wu, Kit Rodolfa, Daniel E. Ho, James Zou

    Abstract: While the pace of development of AI has rapidly progressed in recent years, the implementation of safe and effective regulatory frameworks has lagged behind. In particular, the adaptive nature of AI models presents unique challenges to regulators as updating a model can improve its performance but also introduce safety risks. In the US, the Food and Drug Administration (FDA) has been a forerunner… ▽ More

    Submitted 22 June, 2024; originally announced July 2024.

    Journal ref: CHIL 2024

  21. arXiv:2407.14949  [pdf, other

    q-bio.NC cs.CV cs.HC

    CoCoG-2: Controllable generation of visual stimuli for understanding human concept representation

    Authors: Chen Wei, Jiachen Zou, Dietmar Heinke, Quanying Liu

    Abstract: Humans interpret complex visual stimuli using abstract concepts that facilitate decision-making tasks such as food selection and risk avoidance. Similarity judgment tasks are effective for exploring these concepts. However, methods for controllable image generation in concept space are underdeveloped. In this study, we present a novel framework called CoCoG-2, which integrates generated visual sti… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  22. arXiv:2407.12781  [pdf, other

    cs.CV

    VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control

    Authors: Sherwin Bahmani, Ivan Skorokhodov, Aliaksandr Siarohin, Willi Menapace, Guocheng Qian, Michael Vasilkovsky, Hsin-Ying Lee, Chaoyang Wang, Jiaxu Zou, Andrea Tagliasacchi, David B. Lindell, Sergey Tulyakov

    Abstract: Modern text-to-video synthesis models demonstrate coherent, photorealistic generation of complex videos from a text description. However, most existing models lack fine-grained control over camera movement, which is critical for downstream applications related to content creation, visual effects, and 3D vision. Recently, new methods demonstrate the ability to generate videos with controllable came… ▽ More

    Submitted 20 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: Project Page: https://snap-research.github.io/vd3d/

  23. arXiv:2407.09853  [pdf, other

    cs.CV

    Image Compression for Machine and Human Vision with Spatial-Frequency Adaptation

    Authors: Han Li, Shaohui Li, Shuangrui Ding, Wenrui Dai, Maida Cao, Chenglin Li, Junni Zou, Hongkai Xiong

    Abstract: Image compression for machine and human vision (ICMH) has gained increasing attention in recent years. Existing ICMH methods are limited by high training and storage overheads due to heavy design of task-specific networks. To address this issue, in this paper, we develop a novel lightweight adapter-based tuning framework for ICMH, named Adapt-ICMH, that better balances task performance and bitrate… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024, project: https://github.com/qingshi9974/ECCV2024-AdpatICMH

  24. arXiv:2407.08726  [pdf, other

    cs.CV

    Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data

    Authors: Cherie Ho, Jiaye Zou, Omar Alama, Sai Mitheran Jagadesh Kumar, Benjamin Chiang, Taneesh Gupta, Chen Wang, Nikhil Keetha, Katia Sycara, Sebastian Scherer

    Abstract: Top-down Bird's Eye View (BEV) maps are a popular representation for ground robot navigation due to their richness and flexibility for downstream tasks. While recent methods have shown promise for predicting BEV maps from First-Person View (FPV) images, their generalizability is limited to small regions captured by current autonomous vehicle-based datasets. In this context, we show that a more sca… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  25. arXiv:2407.02211  [pdf, other

    cs.CL cs.AI cs.LG

    PromptIntern: Saving Inference Costs by Internalizing Recurrent Prompt during Large Language Model Fine-tuning

    Authors: Jiaru Zou, Mengyu Zhou, Tao Li, Shi Han, Dongmei Zhang

    Abstract: Recent advances in fine-tuning large language models (LLMs) have greatly enhanced their usage in domain-specific tasks. Despite the success, fine-tuning continues to rely on repeated and lengthy prompts, which escalate computational expenses, require more resources, and lead to slower inference. In this paper, we present a novel approach, PromptIntern, which internalizes prompt knowledge during mo… ▽ More

    Submitted 15 October, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by EMNLP 2024

  26. arXiv:2406.18950  [pdf, other

    eess.IV cs.CV

    MMR-Mamba: Multi-Modal MRI Reconstruction with Mamba and Spatial-Frequency Information Fusion

    Authors: Jing Zou, Lanqing Liu, Qi Chen, Shujun Wang, Zhanli Hu, Xiaohan Xing, Jing Qin

    Abstract: Multi-modal MRI offers valuable complementary information for diagnosis and treatment; however, its utility is limited by prolonged scanning times. To accelerate the acquisition process, a practical approach is to reconstruct images of the target modality, which requires longer scanning times, from under-sampled k-space data using the fully-sampled reference modality with shorter scanning times as… ▽ More

    Submitted 7 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: 10 pages, 5 figure

  27. arXiv:2406.17675  [pdf, other

    cs.CL

    Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models

    Authors: Yuan Li, Yue Huang, Hongyi Wang, Xiangliang Zhang, James Zou, Lichao Sun

    Abstract: Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants. The broader integration of LLMs into society has sparked interest in whether they manifest psychological attributes, and whether these attributes are stable-inquiries that could deepen the understanding of their behaviors. Inspired by psychometrics, this… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  28. arXiv:2406.15609  [pdf, other

    physics.med-ph cs.AI

    Automated radiotherapy treatment planning guided by GPT-4Vision

    Authors: Sheng Liu, Oscar Pastor-Serrano, Yizheng Chen, Matthew Gopaulchan, Weixing Liang, Mark Buyyounouski, Erqi Pollom, Quynh-Thu Le, Michael Gensheimer, Peng Dong, Yong Yang, James Zou, Lei Xing

    Abstract: Radiotherapy treatment planning is a time-consuming and potentially subjective process that requires the iterative adjustment of model parameters to balance multiple conflicting objectives. Recent advancements in large foundation models offer promising avenues for addressing the challenges in planning and clinical decision-making. This study introduces GPT-RadPlan, a fully automated treatment plan… ▽ More

    Submitted 1 July, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: 12 pages, 4 figures

  29. arXiv:2406.11200  [pdf, other

    cs.LG cs.CL

    AvaTaR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval

    Authors: Shirley Wu, Shiyu Zhao, Qian Huang, Kexin Huang, Michihiro Yasunaga, Kaidi Cao, Vassilis N. Ioannidis, Karthik Subbian, Jure Leskovec, James Zou

    Abstract: Large language model (LLM) agents have demonstrated impressive capability in utilizing external tools and knowledge to boost accuracy and reduce hallucinations. However, developing the prompting techniques that make LLM agents able to effectively use external tools and knowledge is a heuristic and laborious task. Here, we introduce AvaTaR, a novel and automatic framework that optimizes an LLM agen… ▽ More

    Submitted 17 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 19 pages, 8 figures, 6 tables

  30. arXiv:2406.07496  [pdf, other

    cs.CL cs.AI cs.LG

    TextGrad: Automatic "Differentiation" via Text

    Authors: Mert Yuksekgonul, Federico Bianchi, Joseph Boen, Sheng Liu, Zhi Huang, Carlos Guestrin, James Zou

    Abstract: AI is undergoing a paradigm shift, with breakthroughs achieved by systems orchestrating multiple large language models (LLMs) and other complex components. As a result, developing principled and automated optimization methods for compound AI systems is one of the most important new challenges. Neural networks faced a similar challenge in its early days until backpropagation and automatic different… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 41 pages, 6 figures

  31. arXiv:2406.06007  [pdf, other

    cs.LG cs.CL cs.CV cs.CY

    CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

    Authors: Peng Xia, Ze Chen, Juanxi Tian, Yangrui Gong, Ruibo Hou, Yue Xu, Zhenbang Wu, Zhiyuan Fan, Yiyang Zhou, Kangyu Zhu, Wenhao Zheng, Zhaoyang Wang, Xiao Wang, Xuchao Zhang, Chetan Bansal, Marc Niethammer, Junzhou Huang, Hongtu Zhu, Yun Li, Jimeng Sun, Zongyuan Ge, Gang Li, James Zou, Huaxiu Yao

    Abstract: Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehen… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  32. arXiv:2406.05649  [pdf, other

    cs.CV cs.AI

    GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement

    Authors: Peiye Zhuang, Songfang Han, Chaoyang Wang, Aliaksandr Siarohin, Jiaxu Zou, Michael Vasilkovsky, Vladislav Shakhrai, Sergey Korolev, Sergey Tulyakov, Hsin-Ying Lee

    Abstract: We propose a novel approach for 3D mesh reconstruction from multi-view images. Our method takes inspiration from large reconstruction models like LRM that use a transformer-based triplane generator and a Neural Radiance Field (NeRF) model trained on multi-view images. However, in our method, we introduce several important modifications that allow us to significantly enhance 3D reconstruction quali… ▽ More

    Submitted 13 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: 19 pages, 17 figures. Project page: https://snap-research.github.io/GTR/

  33. arXiv:2406.04692  [pdf, other

    cs.CL

    Mixture-of-Agents Enhances Large Language Model Capabilities

    Authors: Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, James Zou

    Abstract: Recent advances in large language models (LLMs) demonstrate substantial capabilities in natural language understanding and generation tasks. With the growing number of LLMs, how to harness the collective expertise of multiple LLMs is an exciting open direction. Toward this goal, we propose a new approach that leverages the collective strengths of multiple LLMs through a Mixture-of-Agents (MoA) met… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  34. arXiv:2406.00977  [pdf, other

    cs.CV cs.AI

    Dragonfly: Multi-Resolution Zoom-In Encoding Enhances Vision-Language Models

    Authors: Rahul Thapa, Kezhen Chen, Ian Covert, Rahul Chalamala, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou

    Abstract: Recent advances in vision-language models (VLMs) have demonstrated the advantages of processing images at higher resolutions and utilizing multi-crop features to preserve native resolution details. However, despite these improvements, existing vision transformers (ViTs) still struggle to capture fine-grained details from less prominent objects, charts, and embedded text, limiting their effectivene… ▽ More

    Submitted 14 October, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  35. arXiv:2405.20456  [pdf, other

    cs.LG

    Scaling Laws for the Value of Individual Data Points in Machine Learning

    Authors: Ian Covert, Wenlong Ji, Tatsunori Hashimoto, James Zou

    Abstract: Recent works have shown that machine learning models improve at a predictable rate with the total amount of training data, leading to scaling laws that describe the relationship between error and dataset size. These scaling laws can help design a model's training dataset, but they typically take an aggregate view of the data by only considering the dataset's size. We introduce a new perspective by… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: ICML 2024 camera-ready

  36. arXiv:2405.19716  [pdf, other

    cs.CV cs.CL

    Enhancing Large Vision Language Models with Self-Training on Image Comprehension

    Authors: Yihe Deng, Pan Lu, Fan Yin, Ziniu Hu, Sheng Shen, James Zou, Kai-Wei Chang, Wei Wang

    Abstract: Large vision language models (LVLMs) integrate large language models (LLMs) with pre-trained vision encoders, thereby activating the perception capability of the model to understand image inputs for different queries and conduct subsequent reasoning. Improving this capability requires high-quality vision-language data, which is costly and labor-intensive to acquire. Self-training approaches have b… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 19 pages, 14 figures, 6 tables

  37. arXiv:2405.19665  [pdf

    eess.SY cs.AI cs.LG

    A novel fault localization with data refinement for hydroelectric units

    Authors: Jialong Huang, Junlin Song, Penglong Lian, Mengjie Gan, Zhiheng Su, Benhao Wang, Wenji Zhu, Xiaomin Pu, Jianxiao Zou, Shicai Fan

    Abstract: Due to the scarcity of fault samples and the complexity of non-linear and non-smooth characteristics data in hydroelectric units, most of the traditional hydroelectric unit fault localization methods are difficult to carry out accurate localization. To address these problems, a sparse autoencoder (SAE)-generative adversarial network (GAN)-wavelet noise reduction (WNR)- manifold-boosted deep learni… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 6pages,4 figures,Conference on Decision and Control(CDC) conference

  38. arXiv:2405.19642  [pdf

    cs.AI

    Few-shot fault diagnosis based on multi-scale graph convolution filtering for industry

    Authors: Mengjie Gan, Penglong Lian, Zhiheng Su, Jiyang Zhang, Jialong Huang, Benhao Wang, Jianxiao Zou, Shicai Fan

    Abstract: Industrial equipment fault diagnosis often encounter challenges such as the scarcity of fault data, complex operating conditions, and varied types of failures. Signal analysis, data statistical learning, and conventional deep learning techniques face constraints under these conditions due to their substantial data requirements and the necessity for transfer learning to accommodate new failure mode… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 6 pages, 2 figures, 2 tables, 63rd IEEE Conference on Decision and Control

  39. arXiv:2405.18253  [pdf, other

    cs.LG cs.GT

    Truthful Dataset Valuation by Pointwise Mutual Information

    Authors: Shuran Zheng, Yongchan Kwon, Xuan Qi, James Zou

    Abstract: A common way to evaluate a dataset in ML involves training a model on this dataset and assessing the model's performance on a test set. However, this approach has two issues: (1) it may incentivize undesirable data manipulation in data marketplaces, as the self-interested data providers seek to modify the dataset to maximize their evaluation scores; (2) it may select datasets that overfit to poten… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  40. arXiv:2405.17766  [pdf, other

    cs.LG cs.AI eess.SP

    SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals

    Authors: Rahul Thapa, Bryan He, Magnus Ruud Kjaer, Hyatt Moore, Gauri Ganjoo, Emmanuel Mignot, James Zou

    Abstract: Sleep is a complex physiological process evaluated through various modalities recording electrical brain, cardiac, and respiratory activities. We curate a large polysomnography dataset from over 14,000 participants comprising over 100,000 hours of multi-modal sleep recordings. Leveraging this extensive dataset, we developed SleepFM, the first multi-modal foundation model for sleep analysis. We sho… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  41. arXiv:2405.16148  [pdf, other

    cs.LG

    Accelerating Transformers with Spectrum-Preserving Token Merging

    Authors: Hoai-Chau Tran, Duy M. H. Nguyen, Duy M. Nguyen, Trung-Tin Nguyen, Ngan Le, Pengtao Xie, Daniel Sonntag, James Y. Zou, Binh T. Nguyen, Mathias Niepert

    Abstract: Increasing the throughput of the Transformer architecture, a foundational component used in numerous state-of-the-art models for vision and language tasks (e.g., GPT, LLaVa), is an important problem in machine learning. One recent and effective strategy is to merge token representations within Transformer models, aiming to reduce computational and memory requirements while maintaining accuracy. Pr… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: Version 1

  42. arXiv:2405.10598  [pdf, other

    cs.CV

    Learning Object-Centric Representation via Reverse Hierarchy Guidance

    Authors: Junhong Zou, Xiangyu Zhu, Zhaoxiang Zhang, Zhen Lei

    Abstract: Object-Centric Learning (OCL) seeks to enable Neural Networks to identify individual objects in visual scenes, which is crucial for interpretable visual comprehension and reasoning. Most existing OCL models adopt auto-encoding structures and learn to decompose visual scenes through specially designed inductive bias, which causes the model to miss small objects during reconstruction. Reverse hierar… ▽ More

    Submitted 7 October, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

  43. arXiv:2405.05540  [pdf

    cs.NI cs.AR

    Shape-Optimized Electrooptic Beam Scanners: Experiment

    Authors: Jennifer C. Fang, M. J. Kawas, J. Zou, V. Gopalan, T. E. Schlesinger, Daniel D. Stancil

    Abstract: A new horn-shaped electrooptic scanner is described with significantly improved scanning sensitivity over rectangular-shaped devices. In the new device, the shape of the scanner is chosen to follow the trajectory of the beam. An example design is described that exhibits a factor of two larger scanning sensitivity than a rectangular device with comparable maximum scanning angle. Beam propagation si… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 3 pages, 3 figures. IEEE Photonics Technology Letters. Author Jennifer C. Fang is currently known as Jennifer Andreoli-Fang

    Journal ref: IEEE Photonics Technology Letters ( Volume: 11, Issue: 1, January 1999)

  44. arXiv:2405.03875  [pdf, other

    cs.LG stat.ML

    Rethinking Data Shapley for Data Selection Tasks: Misleads and Merits

    Authors: Jiachen T. Wang, Tianji Yang, James Zou, Yongchan Kwon, Ruoxi Jia

    Abstract: Data Shapley provides a principled approach to data valuation and plays a crucial role in data-centric machine learning (ML) research. Data selection is considered a standard application of Data Shapley. However, its data selection performance has shown to be inconsistent across settings in the literature. This study aims to deepen our understanding of this phenomenon. We introduce a hypothesis te… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  45. arXiv:2405.03159  [pdf, other

    cs.CV

    DeepMpMRI: Tensor-decomposition Regularized Learning for Fast and High-Fidelity Multi-Parametric Microstructural MR Imaging

    Authors: Wenxin Fan, Jian Cheng, Cheng Li, Xinrui Ma, Jing Yang, Juan Zou, Ruoyou Wu, Zan Chen, Yuanjing Feng, Hairong Zheng, Shanshan Wang

    Abstract: Deep learning has emerged as a promising approach for learning the nonlinear mapping between diffusion-weighted MR images and tissue parameters, which enables automatic and deep understanding of the brain microstructures. However, the efficiency and accuracy in the multi-parametric estimations are still limited since previous studies tend to estimate multi-parametric maps with dense sampling and i… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  46. arXiv:2404.17900  [pdf, other

    cs.CV

    Unsupervised Anomaly Detection via Masked Diffusion Posterior Sampling

    Authors: Di Wu, Shicai Fan, Xue Zhou, Li Yu, Yuzhong Deng, Jianxiao Zou, Baihong Lin

    Abstract: Reconstruction-based methods have been commonly used for unsupervised anomaly detection, in which a normal image is reconstructed and compared with the given test image to detect and locate anomalies. Recently, diffusion models have shown promising applications for anomaly detection due to their powerful generative ability. However, these models lack strict mathematical support for normal image re… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Journal ref: International Joint Conference on Artificial Intelligence 2024

  47. arXiv:2404.17120  [pdf, other

    cs.CL cs.AI cs.LG

    Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs

    Authors: Valeriia Cherepanova, James Zou

    Abstract: Large language models (LLMs) exhibit excellent ability to understand human languages, but do they also understand their own language that appears gibberish to us? In this work we delve into this question, aiming to uncover the mechanisms underlying such behavior in LLMs. We employ the Greedy Coordinate Gradient optimizer to craft prompts that compel LLMs to generate coherent responses from seeming… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  48. arXiv:2404.16482  [pdf, other

    q-bio.NC cs.CV cs.HC

    CoCoG: Controllable Visual Stimuli Generation based on Human Concept Representations

    Authors: Chen Wei, Jiachen Zou, Dietmar Heinke, Quanying Liu

    Abstract: A central question for cognitive science is to understand how humans process visual objects, i.e, to uncover human low-dimensional concept representation space from high-dimensional visual stimuli. Generating visual stimuli with controlling concepts is the key. However, there are currently no generative models in AI to solve this problem. Here, we present the Concept based Controllable Generation… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  49. arXiv:2404.13207  [pdf, other

    cs.IR cs.LG

    STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases

    Authors: Shirley Wu, Shiyu Zhao, Michihiro Yasunaga, Kexin Huang, Kaidi Cao, Qian Huang, Vassilis N. Ioannidis, Karthik Subbian, James Zou, Jure Leskovec

    Abstract: Answering real-world complex queries, such as complex product search, often requires accurate retrieval from semi-structured knowledge bases that involve blend of unstructured (e.g., textual descriptions of products) and structured (e.g., entity relations of products) information. However, many previous works studied textual and relational retrieval tasks as separate topics. To address the gap, we… ▽ More

    Submitted 20 October, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

    Comments: NeurIPS 2024 Track on Datasets and Benchmarks. 26 Pages, 6 Figures. Website: https://stark.stanford.edu/

  50. arXiv:2404.13016  [pdf, other

    cs.CV cs.LG stat.ML

    Optimizing Calibration by Gaining Aware of Prediction Correctness

    Authors: Yuchi Liu, Lei Wang, Yuli Zou, James Zou, Liang Zheng

    Abstract: Model calibration aims to align confidence with prediction correctness. The Cross-Entropy (CE) loss is widely used for calibrator training, which enforces the model to increase confidence on the ground truth class. However, we find the CE loss has intrinsic limitations. For example, for a narrow misclassification, a calibrator trained by the CE loss often produces high confidence on the wrongly pr… ▽ More

    Submitted 24 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.