Skip to main content

Showing 1–50 of 269 results for author: Han, L

  1. arXiv:2410.11623  [pdf, other

    cs.CV cs.AI cs.CL

    VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI

    Authors: Sijie Cheng, Kechen Fang, Yangyang Yu, Sicheng Zhou, Bohao Li, Ye Tian, Tingguang Li, Lei Han, Yang Liu

    Abstract: Recent advancements in Multi-modal Large Language Models (MLLMs) have opened new avenues for applications in Embodied AI. Building on previous work, EgoThink, we introduce VidEgoThink, a comprehensive benchmark for evaluating egocentric video understanding capabilities. To bridge the gap between MLLMs and low-level control in Embodied AI, we design four key interrelated tasks: video question-answe… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  2. arXiv:2410.08207  [pdf, other

    cs.CV cs.LG

    DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models

    Authors: Xiaoxiao He, Ligong Han, Quan Dao, Song Wen, Minhao Bai, Di Liu, Han Zhang, Martin Renqiang Min, Felix Juefei-Xu, Chaowei Tan, Bo Liu, Kang Li, Hongdong Li, Junzhou Huang, Faez Ahmed, Akash Srivastava, Dimitris Metaxas

    Abstract: Discrete diffusion models have achieved success in tasks like image generation and masked language modeling but face limitations in controlled content editing. We introduce DICE (Discrete Inversion for Controllable Editing), the first approach to enable precise inversion for discrete diffusion models, including multinomial diffusion and masked generative models. By recording noise sequences and ma… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

  3. arXiv:2410.04581  [pdf, other

    cs.PL

    Efficient Linearizability Monitoring for Sets, Stacks, Queues and Priority Queues

    Authors: Lee Zheng Han, Umang Mathur

    Abstract: In this paper, we consider the problem of automatically monitoring linearizability. Here, one observes an execution of a concurrent program that interacts with a concurrent object and determines if the execution witnesses the violation of linearizability with respect to the sequential specification of the underlying data structure of the concurrent object. This problem has been extensively studied… ▽ More

    Submitted 6 October, 2024; originally announced October 2024.

  4. arXiv:2410.03533  [pdf, other

    cs.NE cs.AI q-bio.NC

    Multiscale fusion enhanced spiking neural network for invasive BCI neural signal decoding

    Authors: Yu Song, Liyuan Han, Bo Xu, Tielin Zhang

    Abstract: Brain-computer interfaces (BCIs) are an advanced fusion of neuroscience and artificial intelligence, requiring stable and long-term decoding of neural signals. Spiking Neural Networks (SNNs), with their neuronal dynamics and spike-based signal processing, are inherently well-suited for this task. This paper presents a novel approach utilizing a Multiscale Fusion enhanced Spiking Neural Network (MF… ▽ More

    Submitted 14 September, 2024; originally announced October 2024.

  5. arXiv:2410.01648  [pdf, other

    cs.CL

    DeIDClinic: A Multi-Layered Framework for De-identification of Clinical Free-text Data

    Authors: Angel Paul, Dhivin Shaji, Lifeng Han, Warren Del-Pinto, Goran Nenadic

    Abstract: De-identification is important in protecting patients' privacy for healthcare text analytics. The MASK framework is one of the best on the de-identification shared task organised by n2c2/i2b2 challenges. This work enhances the MASK framework by integrating ClinicalBERT, a deep learning model specifically fine-tuned on clinical texts, alongside traditional de-identification methods like dictionary… ▽ More

    Submitted 2 October, 2024; originally announced October 2024.

    Comments: ongoing work

  6. arXiv:2409.19467  [pdf, other

    cs.CL cs.AI

    INSIGHTBUDDY-AI: Medication Extraction and Entity Linking using Large Language Models and Ensemble Learning

    Authors: Pablo Romero, Lifeng Han, Goran Nenadic

    Abstract: Medication Extraction and Mining play an important role in healthcare NLP research due to its practical applications in hospital settings, such as their mapping into standard clinical knowledge bases (SNOMED-CT, BNF, etc.). In this work, we investigate state-of-the-art LLMs in text mining tasks on medications and their related attributes such as dosage, route, strength, and adverse effects. In add… ▽ More

    Submitted 28 September, 2024; originally announced September 2024.

    Comments: ongoing work, 24 pages

  7. arXiv:2409.18470  [pdf, other

    cs.LG

    Fairness without Sensitive Attributes via Knowledge Sharing

    Authors: Hongliang Ni, Lei Han, Tong Chen, Shazia Sadiq, Gianluca Demartini

    Abstract: While model fairness improvement has been explored previously, existing methods invariably rely on adjusting explicit sensitive attribute values in order to improve model fairness in downstream tasks. However, we observe a trend in which sensitive demographic information becomes inaccessible as public concerns around data privacy grow. In this paper, we propose a confidence-based hierarchical clas… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

  8. arXiv:2409.17011  [pdf, other

    cs.CL cs.DL

    LLM-CARD: Towards a Description and Landscape of Large Language Models

    Authors: Shengwei Tian, Lifeng Han, Erick Mendez Guzman, Goran Nenadic

    Abstract: With the rapid growth of the Natural Language Processing (NLP) field, a vast variety of Large Language Models (LLMs) continue to emerge for diverse NLP tasks. As an increasing number of papers are presented, researchers and developers face the challenge of information overload. Thus, it is particularly important to develop a system that can automatically extract and organise key information about… ▽ More

    Submitted 28 September, 2024; v1 submitted 25 September, 2024; originally announced September 2024.

    Comments: ongoing work, 16 pages

  9. arXiv:2409.14775  [pdf, other

    cs.RO

    Like a Martial Arts Dodge: Safe Expeditious Whole-Body Control of Mobile Manipulators for Collision Avoidance

    Authors: Bingjie Chen, Houde Liu, Chongkun Xia, Liang Han, Xueqian Wang, Bin Liang

    Abstract: In the control task of mobile manipulators(MM), achieving efficient and agile obstacle avoidance in dynamic environments is challenging. In this letter, we present a safe expeditious whole-body(SEWB) control for MMs that ensures both external and internal collision-free. SEWB is constructed by a two-layer optimization structure. Firstly, control barrier functions(CBFs) are employed for a MM to est… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

  10. arXiv:2409.14754  [pdf, other

    cs.RO

    CushionCatch: Compliant Catching Mechanism for Mobile Manipulators via Combined Optimization and Learning

    Authors: Bingjie Chen, Keyu Fan, Houde Liu, Chongkun Xia, Liang Han, Bin Liang

    Abstract: This paper presents a framework to achieve compliant catching with cushioning mechanism(CCCM) for mobile manipulators. First, we introduce a two-level motion optimization scheme, comprising a high-level capture planner and a low-level joint planner. The low-level joint planner consists of two distinct components: Pre-Catching (PRC) planner and Post-Catching (POC) planner. Next, we propose a networ… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

  11. arXiv:2409.09831  [pdf, other

    cs.CL cs.LG

    Generating Synthetic Free-text Medical Records with Low Re-identification Risk using Masked Language Modeling

    Authors: Samuel Belkadi, Libo Ren, Nicolo Micheletti, Lifeng Han, Goran Nenadic

    Abstract: In this paper, we present a system that generates synthetic free-text medical records, such as discharge summaries, admission notes and doctor correspondences, using Masked Language Modeling (MLM). Our system is designed to preserve the critical information of the records while introducing significant diversity and minimizing re-identification risk. The system incorporates a de-identification comp… ▽ More

    Submitted 17 September, 2024; v1 submitted 15 September, 2024; originally announced September 2024.

    Comments: Added references and rephrased some sentences

  12. arXiv:2409.09501  [pdf, other

    cs.CL cs.AI

    Synthetic4Health: Generating Annotated Synthetic Clinical Letters

    Authors: Libo Ren, Samuel Belkadi, Lifeng Han, Warren Del-Pinto, Goran Nenadic

    Abstract: Since clinical letters contain sensitive information, clinical-related datasets can not be widely applied in model training, medical research, and teaching. This work aims to generate reliable, various, and de-identified synthetic clinical letters. To achieve this goal, we explored different pre-trained language models (PLMs) for masking and generating text. After that, we worked on Bio\_ClinicalB… ▽ More

    Submitted 14 September, 2024; originally announced September 2024.

    Comments: ongoing work, 48 pages

  13. arXiv:2409.07407  [pdf, other

    cs.CR cs.AI

    CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification

    Authors: Zeqing Qin, Yiwei Wu, Lansheng Han

    Abstract: Large Language Models (LLMs) have shown great promise in vulnerability identification. As C/C++ comprises half of the Open-Source Software (OSS) vulnerabilities over the past decade and updates in OSS mainly occur through commits, enhancing LLMs' ability to identify C/C++ Vulnerability-Contributing Commits (VCCs) is essential. However, current studies primarily focus on further pre-training LLMs o… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 8 pages, 2 figures, conference

    MSC Class: 68M25

  14. arXiv:2409.07040  [pdf, other

    cs.CV eess.IV

    Retinex-RAWMamba: Bridging Demosaicing and Denoising for Low-Light RAW Image Enhancement

    Authors: Xianmin Chen, Peiliang Huang, Xiaoxu Feng, Dingwen Zhang, Longfei Han, Junwei Han

    Abstract: Low-light image enhancement, particularly in cross-domain tasks such as mapping from the raw domain to the sRGB domain, remains a significant challenge. Many deep learning-based methods have been developed to address this issue and have shown promising results in recent years. However, single-stage methods, which attempt to unify the complex mapping across both domains, leading to limited denoisin… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

  15. arXiv:2409.06887  [pdf, other

    eess.IV cs.CV

    Ordinal Learning: Longitudinal Attention Alignment Model for Predicting Time to Future Breast Cancer Events from Mammograms

    Authors: Xin Wang, Tao Tan, Yuan Gao, Eric Marcus, Luyi Han, Antonio Portaluri, Tianyu Zhang, Chunyao Lu, Xinglong Liang, Regina Beets-Tan, Jonas Teuwen, Ritse Mann

    Abstract: Precision breast cancer (BC) risk assessment is crucial for developing individualized screening and prevention. Despite the promising potential of recent mammogram (MG) based deep learning models in predicting BC risk, they mostly overlook the 'time-to-future-event' ordering among patients and exhibit limited explorations into how they track history changes in breast tissue, thereby limiting their… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  16. arXiv:2409.01667  [pdf, other

    cs.CV

    VProChart: Answering Chart Question through Visual Perception Alignment Agent and Programmatic Solution Reasoning

    Authors: Muye Huang, Lingling Zhang, Lai Han, Wenjun Wu, Xinyu Zhang, Jun Liu

    Abstract: Charts are widely used for data visualization across various fields, including education, research, and business. Chart Question Answering (CQA) is an emerging task focused on the automatic interpretation and reasoning of data presented in charts. However, chart images are inherently difficult to interpret, and chart-related questions often involve complex logical and numerical reasoning, which hi… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  17. arXiv:2409.01577  [pdf, other

    cs.CV

    EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding

    Authors: Muye Huang, Lai Han, Xinyu Zhang, Wenjun Wu, Jie Ma, Lingling Zhang, Jun Liu

    Abstract: Chart understanding enables automated data analysis for humans, which requires models to achieve highly accurate visual comprehension. While existing Visual Language Models (VLMs) have shown progress in chart understanding, the lack of high-quality training data and comprehensive evaluation benchmarks hinders VLM chart comprehension. In this paper, we introduce EvoChart, a novel self-training meth… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  18. arXiv:2408.13996  [pdf

    cs.NE

    Research Advances and New Paradigms for Biology-inspired Spiking Neural Networks

    Authors: Tianyu Zheng, Liyuan Han, Tielin Zhang

    Abstract: Spiking neural networks (SNNs) are gaining popularity in the computational simulation and artificial intelligence fields owing to their biological plausibility and computational efficiency. This paper explores the historical development of SNN and concludes that these two fields are intersecting and merging rapidly. Following the successful application of Dynamic Vision Sensors (DVS) and Dynamic A… ▽ More

    Submitted 28 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

  19. arXiv:2408.12413  [pdf, other

    q-bio.BM cs.AI

    Dynamic PDB: A New Dataset and a SE(3) Model Extension by Integrating Dynamic Behaviors and Physical Properties in Protein Structures

    Authors: Ce Liu, Jun Wang, Zhiqiang Cai, Yingxu Wang, Huizhen Kuang, Kaihui Cheng, Liwei Zhang, Qingkun Su, Yining Tang, Fenglei Cao, Limei Han, Siyu Zhu, Yuan Qi

    Abstract: Despite significant progress in static protein structure collection and prediction, the dynamic behavior of proteins, one of their most vital characteristics, has been largely overlooked in prior research. This oversight can be attributed to the limited availability, diversity, and heterogeneity of dynamic protein datasets. To address this gap, we propose to enhance existing prestigious static 3D… ▽ More

    Submitted 18 September, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

  20. arXiv:2408.09715  [pdf, other

    cs.AI cs.CV cs.LG eess.IV

    HYDEN: Hyperbolic Density Representations for Medical Images and Reports

    Authors: Zhi Qiao, Linbin Han, Xiantong Zhen, Jia-Hong Gao, Zhen Qian

    Abstract: In light of the inherent entailment relations between images and text, hyperbolic point vector embeddings, leveraging the hierarchical modeling advantages of hyperbolic space, have been utilized for visual semantic representation learning. However, point vector embedding approaches fail to address the issue of semantic uncertainty, where an image may have multiple interpretations, and text may ref… ▽ More

    Submitted 19 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  21. arXiv:2408.04449  [pdf, other

    cs.AI

    EAIRiskBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents

    Authors: Zihao Zhu, Bingzhe Wu, Zhengyou Zhang, Lei Han, Baoyuan Wu

    Abstract: Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction. The emergence of foundation models as the "brain" of EAI agents for high-level task planning has shown promising results. However, the deployment of these agents in physical environments presents significant safety challenges. For instance, a housekeeping robot lacking sufficient… ▽ More

    Submitted 15 October, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

  22. arXiv:2408.03871  [pdf, other

    cs.CL cs.AI

    BeeManc at the PLABA Track of TAC-2023: Investigating LLMs and Controllable Attributes for Improving Biomedical Text Readability

    Authors: Zihao Li, Samuel Belkadi, Nicolo Micheletti, Lifeng Han, Matthew Shardlow, Goran Nenadic

    Abstract: In this system report, we describe the models and methods we used for our participation in the PLABA2023 task on biomedical abstract simplification, part of the TAC 2023 tracks. The system outputs we submitted come from the following three categories: 1) domain fine-tuned T5-like models including Biomedical-T5 and Lay-SciFive; 2) fine-tuned BARTLarge model with controllable attributes (via tokens)… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: system report for PLABA-2023. arXiv admin note: substantial text overlap with arXiv:2309.13202

  23. arXiv:2407.13638  [pdf, other

    cs.CL cs.AI

    A Comparative Study on Automatic Coding of Medical Letters with Explainability

    Authors: Jamie Glen, Lifeng Han, Paul Rayson, Goran Nenadic

    Abstract: This study aims to explore the implementation of Natural Language Processing (NLP) and machine learning (ML) techniques to automate the coding of medical letters with visualised explainability and light-weighted local computer settings. Currently in clinical settings, coding is a manual process that involves assigning codes to each condition, procedure, and medication in a patient's paperwork (e.g… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: working paper

  24. arXiv:2407.10377  [pdf

    eess.IV cs.AI cs.CV

    Enhanced Self-supervised Learning for Multi-modality MRI Segmentation and Classification: A Novel Approach Avoiding Model Collapse

    Authors: Linxuan Han, Sa Xiao, Zimeng Li, Haidong Li, Xiuchao Zhao, Fumin Guo, Yeqing Han, Xin Zhou

    Abstract: Multi-modality magnetic resonance imaging (MRI) can provide complementary information for computer-aided diagnosis. Traditional deep learning algorithms are suitable for identifying specific anatomical structures segmenting lesions and classifying diseases with magnetic resonance images. However, manual labels are limited due to high expense, which hinders further improvement of model accuracy. Se… ▽ More

    Submitted 17 July, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

  25. arXiv:2407.09333  [pdf, other

    cs.CR

    A Method for Efficient Heterogeneous Parallel Compilation: A Cryptography Case Study

    Authors: Zhiyuan Tan, Liutong Han, Mingjie Xing, Yanjun Wu

    Abstract: In the era of diminishing returns from Moores Law, heterogeneous computing systems have emerged as a vital approach to enhance computational efficiency. This paper introduces a novel MLIR-based dialect, named hyper, designed to optimize data management and parallel computation across diverse hardware architectures. The hyper dialect abstracts the complexities of heterogeneous computing by providin… ▽ More

    Submitted 17 October, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

  26. arXiv:2407.06846  [pdf, other

    cs.HC

    SilverCycling: Exploring the Impact of Bike-Based Locomotion on Spatial Orientation for Older Adults in VR

    Authors: Qiongyan Chen, Zhiqing Wu, Yucheng Liu, Lei Han, Zisu Li, Ge Lin Kan, Mingming Fan

    Abstract: Spatial orientation is essential for people to effectively navigate and interact with the environment in everyday life. With age-related cognitive decline, providing VR locomotion techniques with better spatial orientation performance for older adults becomes important. Such advancements not only make VR more accessible to older adults but also enable them to reap the potential health benefits of… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 19 pages, 6 figures

  27. arXiv:2407.04285  [pdf, other

    cs.LG cs.AI

    Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling

    Authors: Jiawei Xu, Rui Yang, Feng Luo, Meng Fang, Baoxiang Wang, Lei Han

    Abstract: Learning policies from offline datasets through offline reinforcement learning (RL) holds promise for scaling data-driven decision-making and avoiding unsafe and costly online interactions. However, real-world data collected from sensors or humans often contains noise and errors, posing a significant challenge for existing offline RL methods. Our study indicates that traditional offline RL methods… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  28. arXiv:2407.04206  [pdf, other

    math.NA cs.CE

    Computational Graph Representation of Equations System Constructors in Hierarchical Circuit Simulation

    Authors: Zichao Long, Lin Li, Lei Han, Xianglong Meng, Chongjun Ding, Ruiyan Li, Wu Jiang, Fuchen Ding, Jiaqing Yue, Zhichao Li, Yisheng Hu, Ding Li, Heng Liao

    Abstract: Equations system constructors of hierarchical circuits play a central role in device modeling, nonlinear equations solving, and circuit design automation. However, existing constructors present limitations in applications to different extents. For example, the costs of developing and reusing device models -- especially coarse-grained equivalent models of circuit modules -- remain high while parame… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  29. arXiv:2407.03084  [pdf, other

    cs.RO

    Applying Extended Object Tracking for Self-Localization of Roadside Radar Sensors

    Authors: Longfei Han, Qiuyu Xu, Klaus Kefferpütz, Gordon Elger, Jürgen Beyerer

    Abstract: Intelligent Transportation Systems (ITS) can benefit from roadside 4D mmWave radar sensors for large-scale traffic monitoring due to their weatherproof functionality, long sensing range and low manufacturing cost. However, the localization method using external measurement devices has limitations in urban environments. Furthermore, if the sensor mount exhibits changes due to environmental influenc… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  30. arXiv:2407.02911  [pdf, other

    eess.IV cs.CV

    Non-Adversarial Learning: Vector-Quantized Common Latent Space for Multi-Sequence MRI

    Authors: Luyi Han, Tao Tan, Tianyu Zhang, Xin Wang, Yuan Gao, Chunyao Lu, Xinglong Liang, Haoran Dou, Yunzhi Huang, Ritse Mann

    Abstract: Adversarial learning helps generative models translate MRI from source to target sequence when lacking paired samples. However, implementing MRI synthesis with adversarial learning in clinical settings is challenging due to training instability and mode collapse. To address this issue, we leverage intermediate sequences to estimate the common latent space among multi-sequence MRI, enabling the rec… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  31. arXiv:2407.00718  [pdf, other

    eess.IV cs.CV

    ASPS: Augmented Segment Anything Model for Polyp Segmentation

    Authors: Huiqian Li, Dingwen Zhang, Jieru Yao, Longfei Han, Zhongyu Li, Junwei Han

    Abstract: Polyp segmentation plays a pivotal role in colorectal cancer diagnosis. Recently, the emergence of the Segment Anything Model (SAM) has introduced unprecedented potential for polyp segmentation, leveraging its powerful pre-training capability on large-scale datasets. However, due to the domain gap between natural and endoscopy images, SAM encounters two limitations in achieving effective performan… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI2024

  32. arXiv:2406.19972  [pdf, other

    cs.RO

    HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid

    Authors: Xinyu Xu, Yizheng Zhang, Yong-Lu Li, Lei Han, Cewu Lu

    Abstract: Physical Human-Scene Interaction (HSI) plays a crucial role in numerous applications. However, existing HSI techniques are limited to specific object dynamics and privileged information, which prevents the development of more comprehensive applications. To address this limitation, we introduce HumanVLA for general object rearrangement directed by practical vision and language. A teacher-stud… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  33. arXiv:2406.14449  [pdf, other

    cs.AI

    APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking

    Authors: Can Jin, Hongwu Peng, Shiyu Zhao, Zhenting Wang, Wujiang Xu, Ligong Han, Jiahui Zhao, Kai Zhong, Sanguthevar Rajasekaran, Dimitris N. Metaxas

    Abstract: Large Language Models (LLMs) have significantly enhanced Information Retrieval (IR) across various modules, such as reranking. Despite impressive performance, current zero-shot relevance ranking with LLMs heavily relies on human prompt engineering. Existing automatic prompt engineering algorithms primarily focus on language modeling and classification tasks, leaving the domain of IR, particularly… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  34. arXiv:2406.11675  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models

    Authors: Yibin Wang, Haizhou Shi, Ligong Han, Dimitris Metaxas, Hao Wang

    Abstract: Large Language Models (LLMs) often suffer from overconfidence during inference, particularly when adapted to downstream domain-specific tasks with limited data. Previous work addresses this issue by employing approximate Bayesian estimation after the LLMs are trained, enabling them to quantify uncertainty. However, such post-training approaches' performance is severely limited by the parameters le… ▽ More

    Submitted 27 September, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted at NeurIPS 2024; work in progress

  35. arXiv:2405.21050  [pdf, other

    cs.CV cs.LG

    Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models

    Authors: Xinxi Zhang, Song Wen, Ligong Han, Felix Juefei-Xu, Akash Srivastava, Junzhou Huang, Hao Wang, Molei Tao, Dimitris N. Metaxas

    Abstract: Adapting large-scale pre-trained generative models in a parameter-efficient manner is gaining traction. Traditional methods like low rank adaptation achieve parameter efficiency by imposing constraints but may not be optimal for tasks requiring high representation capacity. We propose a novel spectrum-aware adaptation framework for generative models. Our method adjusts both singular values and the… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  36. arXiv:2405.18688  [pdf, other

    cs.LG cs.AI cs.CL

    Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation

    Authors: Fengshuo Bai, Rui Zhao, Hongming Zhang, Sijia Cui, Ying Wen, Yaodong Yang, Bo Xu, Lei Han

    Abstract: Preference-based reinforcement learning (PbRL) has shown impressive capabilities in training agents without reward engineering. However, a notable limitation of PbRL is its dependency on substantial human feedback. This dependency stems from the learning loop, which entails accurate reward learning compounded with value/policy learning, necessitating a considerable number of samples. To boost the… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  37. arXiv:2405.16969  [pdf, other

    cs.CL stat.AP

    The Multi-Range Theory of Translation Quality Measurement: MQM scoring models and Statistical Quality Control

    Authors: Arle Lommel, Serge Gladkoff, Alan Melby, Sue Ellen Wright, Ingemar Strandvik, Katerina Gasova, Angelika Vaasa, Andy Benzo, Romina Marazzato Sparano, Monica Foresi, Johani Innis, Lifeng Han, Goran Nenadic

    Abstract: The year 2024 marks the 10th anniversary of the Multidimensional Quality Metrics (MQM) framework for analytic translation quality evaluation. The MQM error typology has been widely used by practitioners in the translation and localization industry and has served as the basis for many derivative projects. The annual Conference on Machine Translation (WMT) shared tasks on both human and automatic tr… ▽ More

    Submitted 5 August, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by AMTA2024

  38. arXiv:2405.14660  [pdf, other

    cs.LG cs.AI cs.CL

    Implicit In-context Learning

    Authors: Zhuowei Li, Zihao Xu, Ligong Han, Yunhe Gao, Song Wen, Di Liu, Hao Wang, Dimitris N. Metaxas

    Abstract: In-context Learning (ICL) empowers large language models (LLMs) to adapt to unseen tasks during inference by prefixing a few demonstration examples prior to test queries. Despite its versatility, ICL incurs substantial computational and memory overheads compared to zero-shot learning and is susceptible to the selection and order of demonstration examples. In this work, we introduce Implicit In-con… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  39. arXiv:2405.14020  [pdf, other

    cs.LG cs.AI

    Unlearning Information Bottleneck: Machine Unlearning of Systematic Patterns and Biases

    Authors: Ling Han, Hao Huang, Dustin Scheinost, Mary-Anne Hartley, María Rodríguez Martínez

    Abstract: Effective adaptation to distribution shifts in training data is pivotal for sustaining robustness in neural networks, especially when removing specific biases or outdated information, a process known as machine unlearning. Traditional approaches typically assume that data variations are random, which makes it difficult to adjust the model parameters accurately to remove patterns and characteristic… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  40. arXiv:2405.12630  [pdf, other

    cs.CL cs.AI

    Exploration of Masked and Causal Language Modelling for Text Generation

    Authors: Nicolo Micheletti, Samuel Belkadi, Lifeng Han, Goran Nenadic

    Abstract: Large Language Models (LLMs) have revolutionised the field of Natural Language Processing (NLP) and have achieved state-of-the-art performance in practically every task in this field. However, the prevalent approach used in text generation, Causal Language Modelling (CLM), which generates text sequentially from left to right, inherently limits the freedom of the model, which does not decide when a… ▽ More

    Submitted 8 August, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: working paper - under review

  41. TP3M: Transformer-based Pseudo 3D Image Matching with Reference Image

    Authors: Liming Han, Zhaoxiang Liu, Shiguo Lian

    Abstract: Image matching is still challenging in such scenes with large viewpoints or illumination changes or with low textures. In this paper, we propose a Transformer-based pseudo 3D image matching method. It upgrades the 2D features extracted from the source image to 3D features with the help of a reference image and matches to the 2D features extracted from the destination image by the coarse-to-fine 3D… ▽ More

    Submitted 11 August, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted by ICRA 2024

    Journal ref: 2024 IEEE International Conference on Robotics and Automation (ICRA), 3962-3968

  42. arXiv:2405.08172  [pdf, other

    cs.CL cs.AI

    CANTONMT: Investigating Back-Translation and Model-Switch Mechanisms for Cantonese-English Neural Machine Translation

    Authors: Kung Yin Hong, Lifeng Han, Riza Batista-Navarro, Goran Nenadic

    Abstract: This paper investigates the development and evaluation of machine translation models from Cantonese to English, where we propose a novel approach to tackle low-resource language translations. The main objectives of the study are to develop a model that can effectively translate Cantonese to English and evaluate it against state-of-the-art commercial models. To achieve this, a new parallel corpus h… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: on-going work, 30 pages

  43. arXiv:2405.07430  [pdf, other

    cs.SE cs.CR

    Don't Chase Your Tail! Missing Key Aspects Augmentation in Textual Vulnerability Descriptions of Long-tail Software through Feature Inference

    Authors: Linyi Han, Shidong Pan, Zhenchang Xing, Jiamou Sun, Sofonias Yitagesu, Xiaowang Zhang, Zhiyong Feng

    Abstract: Augmenting missing key aspects in Textual Vulnerability Descriptions (TVDs) for software with a large user base (referred to as non-long-tail software) has greatly advanced vulnerability analysis and software security research. However, these methods often overlook software instances that have a limited user base (referred to as long-tail software) due to limited TVDs, variations in software featu… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  44. arXiv:2405.07233  [pdf, other

    cs.LG cs.AI physics.ao-ph

    OXYGENERATOR: Reconstructing Global Ocean Deoxygenation Over a Century with Deep Learning

    Authors: Bin Lu, Ze Zhao, Luyu Han, Xiaoying Gan, Yuntao Zhou, Lei Zhou, Luoyi Fu, Xinbing Wang, Chenghu Zhou, Jing Zhang

    Abstract: Accurately reconstructing the global ocean deoxygenation over a century is crucial for assessing and protecting marine ecosystem. Existing expert-dominated numerical simulations fail to catch up with the dynamic variation caused by global warming and human activities. Besides, due to the high-cost data collection, the historical observations are severely sparse, leading to big challenge for precis… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024

  45. arXiv:2405.02760  [pdf, other

    cs.CE cs.SI

    GTFS2STN: Analyzing GTFS Transit Data by Generating Spatiotemporal Transit Network

    Authors: Diyi Liu, Jing Guo, Yangsong Gu, Meredith King, Lee D. Han, Candace Brakewood

    Abstract: The General Transit Feed Specification (GTFS) is an open standard format for recording transit information, utilized by thousands of transit agencies worldwide. This study introduces GTFS2STN, a novel tool that converts static GTFS transit networks into spatiotemporal networks, connecting bus stops across space and time. This transformation enables comprehensive analysis of transit system accessib… ▽ More

    Submitted 21 August, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

    Comments: 13 pages, 12 figures

  46. arXiv:2405.00066  [pdf, other

    cs.CR cs.AI

    Research and application of artificial intelligence based webshell detection model: A literature review

    Authors: Mingrui Ma, Lansheng Han, Chunjie Zhou

    Abstract: Webshell, as the "culprit" behind numerous network attacks, is one of the research hotspots in the field of cybersecurity. However, the complexity, stealthiness, and confusing nature of webshells pose significant challenges to the corresponding detection schemes. With the rise of Artificial Intelligence (AI) technology, researchers have started to apply different intelligent algorithms and neural… ▽ More

    Submitted 28 April, 2024; originally announced May 2024.

    Comments: 21 pages, 6 figures

  47. arXiv:2404.18047  [pdf, other

    cs.RO

    LIKO: LiDAR, Inertial, and Kinematic Odometry for Bipedal Robots

    Authors: Qingrui Zhao, Mingyuan Li, Yongliang Shi, Xuechao Chen, Zhangguo Yu, Lianqiang Han, Zhenyuan Fu, Jintao Zhang, Chao Li, Yuanxi Zhang, Qiang Huang

    Abstract: High-frequency and accurate state estimation is crucial for biped robots. This paper presents a tightly-coupled LiDAR-Inertial-Kinematic Odometry (LIKO) for biped robot state estimation based on an iterated extended Kalman filter. Beyond state estimation, the foot contact position is also modeled and estimated. This allows for both position and velocity updates from kinematic measurement. Addition… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  48. arXiv:2404.17335  [pdf, other

    cs.CV cs.AI

    A Novel Spike Transformer Network for Depth Estimation from Event Cameras via Cross-modality Knowledge Distillation

    Authors: Xin Zhang, Liangxiu Han, Tam Sobeih, Lianghao Han, Darren Dancey

    Abstract: Depth estimation is crucial for interpreting complex environments, especially in areas such as autonomous vehicle navigation and robotics. Nonetheless, obtaining accurate depth readings from event camera data remains a formidable challenge. Event cameras operate differently from traditional digital cameras, continuously capturing data and generating asynchronous binary spikes that encode time, loc… ▽ More

    Submitted 1 May, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: 16 pages

  49. arXiv:2404.14197  [pdf, other

    cs.LG

    SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion

    Authors: Lu Han, Xu-Yang Chen, Han-Jia Ye, De-Chuan Zhan

    Abstract: Multivariate time series forecasting plays a crucial role in various fields such as finance, traffic management, energy, and healthcare. Recent studies have highlighted the advantages of channel independence to resist distribution drift but neglect channel correlations, limiting further enhancements. Several methods utilize mechanisms like attention or mixer to address this by capturing channel co… ▽ More

    Submitted 12 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  50. arXiv:2404.10642  [pdf, other

    cs.CL cs.LG

    Self-playing Adversarial Language Game Enhances LLM Reasoning

    Authors: Pengyu Cheng, Tianhao Hu, Han Xu, Zhisong Zhang, Yong Dai, Lei Han, Nan Du

    Abstract: We explore the self-play training procedure of large language models (LLMs) in a two-player adversarial language game called Adversarial Taboo. In this game, an attacker and a defender communicate around a target word only visible to the attacker. The attacker aims to induce the defender to speak the target word unconsciously, while the defender tries to infer the target word from the attacker's u… ▽ More

    Submitted 23 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: Preprint