Skip to main content

Showing 1–50 of 191 results for author: Li, E

  1. arXiv:2410.13187  [pdf, other

    cs.CL cs.AI cs.SE

    aiXcoder-7B: A Lightweight and Effective Large Language Model for Code Completion

    Authors: Siyuan Jiang, Jia Li, He Zong, Huanyu Liu, Hao Zhu, Shukai Hu, Erlu Li, Jiazheng Ding, Yu Han, Wei Ning, Ge Li

    Abstract: Large Language Models (LLMs) have been widely used in code completion, and researchers are focusing on scaling up LLMs to improve their accuracy. However, larger LLMs will increase the response time of code completion and decrease the developers' productivity. In this paper, we propose a lightweight and effective LLM for code completion named aiXcoder-7B. Compared to existing LLMs, aiXcoder-7B ach… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: aiXcoder-7B is available at https://github.com/aixcoder-plugin/aiXcoder-7B/tree/main

  2. arXiv:2410.11488  [pdf, other

    cs.LG cs.NE

    Advancing Training Efficiency of Deep Spiking Neural Networks through Rate-based Backpropagation

    Authors: Chengting Yu, Lei Liu, Gaoang Wang, Erping Li, Aili Wang

    Abstract: Recent insights have revealed that rate-coding is a primary form of information representation captured by surrogate-gradient-based Backpropagation Through Time (BPTT) in training deep Spiking Neural Networks (SNNs). Motivated by these findings, we propose rate-based backpropagation, a training strategy specifically designed to exploit rate-based representations to reduce the complexity of BPTT. O… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

    Comments: Accepted by NeurIPS 2024

  3. arXiv:2410.07166  [pdf, other

    cs.CL cs.AI cs.LG cs.RO

    Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making

    Authors: Manling Li, Shiyu Zhao, Qineng Wang, Kangrui Wang, Yu Zhou, Sanjana Srivastava, Cem Gokmen, Tony Lee, Li Erran Li, Ruohan Zhang, Weiyu Liu, Percy Liang, Li Fei-Fei, Jiayuan Mao, Jiajun Wu

    Abstract: We aim to evaluate Large Language Models (LLMs) for embodied decision making. While a significant body of work has been leveraging LLMs for decision making in embodied environments, we still lack a systematic understanding of their performance because they are usually applied in different domains, for different purposes, and built based on different inputs and outputs. Furthermore, existing evalua… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

    Comments: Accepted for oral presentation at NeurIPS 2024 in the Datasets and Benchmarks track

  4. arXiv:2410.06243  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Unsupervised Model Diagnosis

    Authors: Yinong Oliver Wang, Eileen Li, Jinqi Luo, Zhaoning Wang, Fernando De la Torre

    Abstract: Ensuring model explainability and robustness is essential for reliable deployment of deep vision systems. Current methods for evaluating robustness rely on collecting and annotating extensive test sets. While this is common practice, the process is labor-intensive and expensive with no guarantee of sufficient coverage across attributes of interest. Recently, model diagnosis frameworks have emerged… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: 9 pages, 9 figures, 3 tables

  5. arXiv:2410.02678  [pdf, other

    cs.CL cs.AI

    Distilling an End-to-End Voice Assistant Without Instruction Training Data

    Authors: William Held, Ella Li, Michael Ryan, Weiyan Shi, Yanzhe Zhang, Diyi Yang

    Abstract: Voice assistants, such as Siri and Google Assistant, typically model audio and text separately, resulting in lost speech information and increased complexity. Recent efforts to address this with end-to-end Speech Large Language Models (LLMs) trained with supervised finetuning (SFT) have led to models ``forgetting" capabilities from text-only LLMs. Our work proposes an alternative paradigm for tr… ▽ More

    Submitted 3 October, 2024; originally announced October 2024.

  6. arXiv:2410.01515  [pdf, other

    cs.NI eess.SP

    Task-Oriented Edge-Assisted Cooperative Data Compression, Communications and Computing for UGV-Enhanced Warehouse Logistics

    Authors: Jiaming Yang, Zhen Meng, Xiangmin Xu, Kan Chen, Emma Liying Li, Philip Guodong G. Zhao

    Abstract: This paper explores the growing need for task-oriented communications in warehouse logistics, where traditional communication Key Performance Indicators (KPIs)-such as latency, reliability, and throughput-often do not fully meet task requirements. As the complexity of data flow management in large-scale device networks increases, there is also a pressing need for innovative cross-system designs th… ▽ More

    Submitted 9 October, 2024; v1 submitted 2 October, 2024; originally announced October 2024.

  7. arXiv:2409.20188  [pdf, other

    cs.RO cs.SD eess.AS

    Active Listener: Continuous Generation of Listener's Head Motion Response in Dyadic Interactions

    Authors: Bishal Ghosh, Emma Li, Tanaya Guha

    Abstract: A key component of dyadic spoken interactions is the contextually relevant non-verbal gestures, such as head movements that reflect a listener's response to the interlocutor's speech. Although significant progress has been made in the context of generating co-speech gestures, generating listener's response has remained a challenge. We introduce the task of generating continuous head motion respons… ▽ More

    Submitted 30 September, 2024; originally announced September 2024.

    Comments: 4+1 pages, 3 figures, 2 tables

  8. arXiv:2409.17659  [pdf, other

    cs.AI

    Hierarchical End-to-End Autonomous Driving: Integrating BEV Perception with Deep Reinforcement Learning

    Authors: Siyi Lu, Lei He, Shengbo Eben Li, Yugong Luo, Jianqiang Wang, Keqiang Li

    Abstract: End-to-end autonomous driving offers a streamlined alternative to the traditional modular pipeline, integrating perception, prediction, and planning within a single framework. While Deep Reinforcement Learning (DRL) has recently gained traction in this domain, existing approaches often overlook the critical connection between feature extraction of DRL and perception. In this paper, we bridge this… ▽ More

    Submitted 26 September, 2024; originally announced September 2024.

  9. arXiv:2409.13213  [pdf, other

    cs.CR cs.LG

    MalMixer: Few-Shot Malware Classification with Retrieval-Augmented Semi-Supervised Learning

    Authors: Eric Li, Yifan Zhang, Yu Huang, Kevin Leach

    Abstract: Recent growth and proliferation of malware has tested practitioners' ability to promptly classify new samples according to malware families. In contrast to labor-intensive reverse engineering efforts, machine learning approaches have demonstrated increased speed and accuracy. However, most existing deep-learning malware family classifiers must be calibrated using a large number of samples that are… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

  10. arXiv:2409.10899  [pdf, ps, other

    cs.DM math.CO

    Conflict-free chromatic index of trees

    Authors: Shanshan Guo, Ethan Y. H. Li, Luyi Li, Ping Li

    Abstract: A graph $G$ is conflict-free $k$-edge-colorable if there exists an assignment of $k$ colors to $E(G)$ such that for every edge $e\in E(G)$, there is a color that is assigned to exactly one edge among the closed neighborhood of $e$. The smallest $k$ such that $G$ is conflict-free $k$-edge-colorable is called the conflict-free chromatic index of $G$, denoted $χ'_{CF}(G)$. Dȩbski and Przyby\a{l}o sho… ▽ More

    Submitted 24 September, 2024; v1 submitted 17 September, 2024; originally announced September 2024.

  11. arXiv:2409.05834  [pdf, other

    cs.CV

    Vision-Driven 2D Supervised Fine-Tuning Framework for Bird's Eye View Perception

    Authors: Lei He, Qiaoyi Wang, Honglin Sun, Qing Xu, Bolin Gao, Shengbo Eben Li, Jianqiang Wang, Keqiang Li

    Abstract: Visual bird's eye view (BEV) perception, due to its excellent perceptual capabilities, is progressively replacing costly LiDAR-based perception systems, especially in the realm of urban intelligent driving. However, this type of perception still relies on LiDAR data to construct ground truth databases, a process that is both cumbersome and time-consuming. Moreover, most massproduced autonomous dri… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  12. arXiv:2409.04992  [pdf, other

    cs.AR cs.CL

    InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference

    Authors: Xiurui Pan, Endian Li, Qiao Li, Shengwen Liang, Yizhou Shan, Ke Zhou, Yingwei Luo, Xiaolin Wang, Jie Zhang

    Abstract: The widespread of Large Language Models (LLMs) marks a significant milestone in generative AI. Nevertheless, the increasing context length and batch size in offline LLM inference escalate the memory requirement of the key-value (KV) cache, which imposes a huge burden on the GPU VRAM, especially for resource-constraint scenarios (e.g., edge computing and personal devices). Several cost-effective so… ▽ More

    Submitted 8 September, 2024; originally announced September 2024.

  13. arXiv:2408.14000  [pdf, other

    cs.RO

    Quantitative Representation of Scenario Difficulty for Autonomous Driving Based on Adversarial Policy Search

    Authors: Shuo Yang, Caojun Wang, Yuanjian Zhang, Yuming Yin, Yanjun Huang, Shengbo Eben Li, Hong Chen

    Abstract: Adversarial scenario generation is crucial for autonomous driving testing because it can efficiently simulate various challenge and complex traffic conditions. However, it is difficult to control current existing methods to generate desired scenarios, such as the ones with different conflict levels. Therefore, this paper proposes a data-driven quantitative method to represent scenario difficulty.… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  14. Flexible 3D Lane Detection by Hierarchical Shape MatchingFlexible 3D Lane Detection by Hierarchical Shape Matching

    Authors: Zhihao Guan, Ruixin Liu, Zejian Yuan, Ao Liu, Kun Tang, Tong Zhou, Erlong Li, Chao Zheng, Shuqi Mei

    Abstract: As one of the basic while vital technologies for HD map construction, 3D lane detection is still an open problem due to varying visual conditions, complex typologies, and strict demands for precision. In this paper, an end-to-end flexible and hierarchical lane detector is proposed to precisely predict 3D lane lines from point clouds. Specifically, we design a hierarchical network predicting flexib… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  15. arXiv:2407.18901  [pdf, other

    cs.SE cs.AI cs.CL cs.LG

    AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

    Authors: Harsh Trivedi, Tushar Khot, Mareike Hartmann, Ruskin Manku, Vinty Dong, Edward Li, Shashank Gupta, Ashish Sabharwal, Niranjan Balasubramanian

    Abstract: Autonomous agents that address day-to-day digital tasks (e.g., ordering groceries for a household), must not only operate multiple apps (e.g., notes, messaging, shopping app) via APIs, but also generate rich code with complex control flow in an iterative manner based on their interaction with the environment. However, existing benchmarks for tool use are inadequate, as they only cover tasks that r… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: ACL'24 Camera Ready

  16. arXiv:2407.15083  [pdf, other

    cs.LG

    Rocket Landing Control with Random Annealing Jump Start Reinforcement Learning

    Authors: Yuxuan Jiang, Yujie Yang, Zhiqian Lan, Guojian Zhan, Shengbo Eben Li, Qi Sun, Jian Ma, Tianwen Yu, Changwu Zhang

    Abstract: Rocket recycling is a crucial pursuit in aerospace technology, aimed at reducing costs and environmental impact in space exploration. The primary focus centers on rocket landing control, involving the guidance of a nonlinear underactuated rocket with limited fuel in real-time. This challenging task prompts the application of reinforcement learning (RL), yet goal-oriented nature of the problem pose… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: IROS 2024 Oral

  17. arXiv:2407.13480  [pdf

    cs.RO cs.AI

    Risk-Aware Vehicle Trajectory Prediction Under Safety-Critical Scenarios

    Authors: Qingfan Wang, Dongyang Xu, Gaoyuan Kuang, Chen Lv, Shengbo Eben Li, Bingbing Nie

    Abstract: Trajectory prediction is significant for intelligent vehicles to achieve high-level autonomous driving, and a lot of relevant research achievements have been made recently. Despite the rapid development, most existing studies solely focused on normal safe scenarios while largely neglecting safety-critical scenarios, particularly those involving imminent collisions. This oversight may result in aut… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  18. arXiv:2407.13137  [pdf, other

    cs.CV

    OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation

    Authors: Jian Sun, Yuqi Dai, Chi-Man Vong, Qing Xu, Shengbo Eben Li, Jianqiang Wang, Lei He, Keqiang Li

    Abstract: Bird's-eye-view (BEV) semantic segmentation is becoming crucial in autonomous driving systems. It realizes ego-vehicle surrounding environment perception by projecting 2D multi-view images into 3D world space. Recently, BEV segmentation has made notable progress, attributed to better view transformation modules, larger image encoders, or more temporal information. However, there are still two issu… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  19. arXiv:2407.12491  [pdf, other

    cs.CV

    Hierarchical and Decoupled BEV Perception Learning Framework for Autonomous Driving

    Authors: Yuqi Dai, Jian Sun, Shengbo Eben Li, Qing Xu, Jianqiang Wang, Lei He, Keqiang Li

    Abstract: Perception is essential for autonomous driving system. Recent approaches based on Bird's-eye-view (BEV) and deep learning have made significant progress. However, there exists challenging issues including lengthy development cycles, poor reusability, and complex sensor setups in perception algorithm development process. To tackle the above challenges, this paper proposes a novel hierarchical BEV p… ▽ More

    Submitted 25 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

  20. arXiv:2407.04230  [pdf, other

    cs.CV

    A Physical Model-Guided Framework for Underwater Image Enhancement and Depth Estimation

    Authors: Dazhao Du, Enhan Li, Lingyu Si, Fanjiang Xu, Jianwei Niu, Fuchun Sun

    Abstract: Due to the selective absorption and scattering of light by diverse aquatic media, underwater images usually suffer from various visual degradations. Existing underwater image enhancement (UIE) approaches that combine underwater physical imaging models with neural networks often fail to accurately estimate imaging model parameters such as depth and veiling light, resulting in poor performance in ce… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  21. arXiv:2406.01623  [pdf, other

    cs.SE cs.AI

    WebSuite: Systematically Evaluating Why Web Agents Fail

    Authors: Eric Li, Jim Waldo

    Abstract: We describe WebSuite, the first diagnostic benchmark for generalist web agents, designed to systematically evaluate why agents fail. Advances in AI have led to the rise of numerous web agents that autonomously operate a browser to complete tasks. However, most existing benchmarks focus on strictly measuring whether an agent can or cannot complete a task, without giving insight on why. In this pape… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  22. arXiv:2406.01380  [pdf, other

    cs.CV stat.AP

    Convolutional Unscented Kalman Filter for Multi-Object Tracking with Outliers

    Authors: Shiqi Liu, Wenhan Cao, Chang Liu, Tianyi Zhang, Shengbo Eben Li

    Abstract: Multi-object tracking (MOT) is an essential technique for navigation in autonomous driving. In tracking-by-detection systems, biases, false positives, and misses, which are referred to as outliers, are inevitable due to complex traffic scenarios. Recent tracking methods are based on filtering algorithms that overlook these outliers, leading to reduced tracking accuracy or even loss of the objects… ▽ More

    Submitted 15 September, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: IEEE Transactions on Intelligent Vehicles

  23. arXiv:2405.15177  [pdf, other

    cs.LG cs.AI

    Diffusion Actor-Critic with Entropy Regulator

    Authors: Yinuo Wang, Likun Wang, Yuxuan Jiang, Wenjun Zou, Tong Liu, Xujie Song, Wenxuan Wang, Liming Xiao, Jiang Wu, Jingliang Duan, Shengbo Eben Li

    Abstract: Reinforcement learning (RL) has proven highly effective in addressing complex decision-making and control tasks. However, in most traditional RL algorithms, the policy is typically parameterized as a diagonal Gaussian distribution with learned mean and variance, which constrains their capability to acquire complex policies. In response to this problem, we propose an online RL algorithm termed diff… ▽ More

    Submitted 9 October, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: NeurIPS2024 Accepted

  24. arXiv:2405.09713  [pdf, other

    cs.CV cs.AI cs.CL

    SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge

    Authors: Andong Wang, Bo Wu, Sunli Chen, Zhenfang Chen, Haotian Guan, Wei-Ning Lee, Li Erran Li, Chuang Gan

    Abstract: Learning commonsense reasoning from visual contexts and scenes in real-world is a crucial step toward advanced artificial intelligence. However, existing video reasoning benchmarks are still inadequate since they were mainly designed for factual or situated reasoning and rarely involve broader knowledge in the real world. Our work aims to delve deeper into reasoning evaluations, specifically withi… ▽ More

    Submitted 16 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: CVPR

  25. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  26. arXiv:2404.00481  [pdf, other

    stat.ML cs.LG eess.SY

    Convolutional Bayesian Filtering

    Authors: Wenhan Cao, Shiqi Liu, Chang Liu, Zeyu He, Stephen S. -T. Yau, Shengbo Eben Li

    Abstract: Bayesian filtering serves as the mainstream framework of state estimation in dynamic systems. Its standard version utilizes total probability rule and Bayes' law alternatively, where how to define and compute conditional probability is critical to state distribution inference. Previously, the conditional probability is assumed to be exactly known, which represents a measure of the occurrence proba… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  27. arXiv:2403.14447  [pdf, other

    cs.CV cs.RO

    Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset

    Authors: Andrea Avogaro, Andrea Toaiari, Federico Cunico, Xiangmin Xu, Haralambos Dafas, Alessandro Vinciarelli, Emma Li, Marco Cristani

    Abstract: We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and Spot, the quadruped robot manufactured by Boston Dynamics. The key-novelty is the focus on the robot's perspective, i.e., on the data captured by the robot's sensors. These make 3D body pose analysis challenging because being close to the ground captures humans only partially. The… ▽ More

    Submitted 23 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

  28. arXiv:2403.14443  [pdf, other

    cs.AI cs.CL cs.GT cs.LG cs.MA cs.SI

    Language Models Can Reduce Asymmetry in Information Markets

    Authors: Nasim Rahaman, Martin Weiss, Manuel Wüthrich, Yoshua Bengio, Li Erran Li, Chris Pal, Bernhard Schölkopf

    Abstract: This work addresses the buyer's inspection paradox for information markets. The paradox is that buyers need to access information to determine its value, while sellers need to limit access to prevent theft. To study this, we introduce an open-source simulated digital marketplace where intelligent agents, powered by language models, buy and sell information on behalf of external participants. The c… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  29. arXiv:2403.12848  [pdf, other

    cs.CV

    Planner3D: LLM-enhanced graph prior meets 3D indoor scene explicit regularization

    Authors: Yao Wei, Martin Renqiang Min, George Vosselman, Li Erran Li, Michael Ying Yang

    Abstract: Compositional 3D scene synthesis has diverse applications across a spectrum of industries such as robotics, films, and video games, as it closely mirrors the complexity of real-world multi-object environments. Conventional works typically employ shape retrieval based frameworks which naturally suffer from limited shape diversity. Recent progresses have been made in object shape generation with gen… ▽ More

    Submitted 26 August, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 16 pages, 10 figures

  30. arXiv:2403.12847  [pdf, other

    cs.LG

    Policy Bifurcation in Safe Reinforcement Learning

    Authors: Wenjun Zou, Yao Lyu, Jie Li, Yujie Yang, Shengbo Eben Li, Jingliang Duan, Xianyuan Zhan, Jingjing Liu, Yaqin Zhang, Keqiang Li

    Abstract: Safe reinforcement learning (RL) offers advanced solutions to constrained optimal control problems. Existing studies in safe RL implicitly assume continuity in policy functions, where policies map states to actions in a smooth, uninterrupted manner; however, our research finds that in some scenarios, the feasible policy should be discontinuous or multi-valued, interpolating between discontinuous l… ▽ More

    Submitted 28 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  31. arXiv:2403.11807  [pdf, other

    cs.AI cs.CL

    How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments

    Authors: Jen-tse Huang, Eric John Li, Man Ho Lam, Tian Liang, Wenxuan Wang, Youliang Yuan, Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Michael R. Lyu

    Abstract: Decision-making is a complex process requiring diverse abilities, making it an excellent framework for evaluating Large Language Models (LLMs). Researchers have examined LLMs' decision-making through the lens of Game Theory. However, existing evaluation mainly focus on two-player scenarios where an LLM competes against another. Additionally, previous benchmarks suffer from test set leakage due to… ▽ More

    Submitted 30 September, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: 11 pages of main text; 19 pages of appendices. Included models: GPT-3.5-{0613, 1106, 0125}, GPT-4-0125, Gemini-{1.0, 1.5)-Pro, LLaMA-3.1-{7, 70, 405}B, Mixtral-8x{7, 22}B, Qwen-2-72B

  32. arXiv:2403.11506  [pdf, other

    cs.CV cs.AI

    End-To-End Underwater Video Enhancement: Dataset and Model

    Authors: Dazhao Du, Enhan Li, Lingyu Si, Fanjiang Xu, Jianwei Niu

    Abstract: Underwater video enhancement (UVE) aims to improve the visibility and frame quality of underwater videos, which has significant implications for marine research and exploration. However, existing methods primarily focus on developing image enhancement algorithms to enhance each frame independently. There is a lack of supervised datasets and models specifically tailored for UVE tasks. To fill this… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  33. arXiv:2403.03730  [pdf, other

    cs.CV cs.AI cs.LG

    Learning 3D object-centric representation through prediction

    Authors: John Day, Tushar Arora, Jirui Liu, Li Erran Li, Ming Bo Cai

    Abstract: As part of human core knowledge, the representation of objects is the building block of mental representation that supports high-level concepts and symbolic reasoning. While humans develop the ability of perceiving objects situated in 3D environments without supervision, models that learn the same set of abilities with similar constraints faced by human infants are lacking. Towards this end, we de… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 21 pages, 11 figures. Project webpage can be found at https://jday54.github.io/opple_site/

    ACM Class: I.2.10; I.4.8; I.4.6; I.4.10; I.2.6

  34. arXiv:2403.01768  [pdf, other

    eess.SY cs.AI

    Canonical Form of Datatic Description in Control Systems

    Authors: Guojian Zhan, Ziang Zheng, Shengbo Eben Li

    Abstract: The design of feedback controllers is undergoing a paradigm shift from modelic (i.e., model-driven) control to datatic (i.e., data-driven) control. Canonical form of state space model is an important concept in modelic control systems, exemplified by Jordan form, controllable form and observable form, whose purpose is to facilitate system analysis and controller synthesis. In the realm of datatic… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  35. arXiv:2402.19251  [pdf, other

    cs.AI cs.RO

    A Cognitive-Based Trajectory Prediction Approach for Autonomous Driving

    Authors: Haicheng Liao, Yongkang Li, Zhenning Li, Chengyue Wang, Zhiyong Cui, Shengbo Eben Li, Chengzhong Xu

    Abstract: In autonomous vehicle (AV) technology, the ability to accurately predict the movements of surrounding vehicles is paramount for ensuring safety and operational efficiency. Incorporating human decision-making insights enables AVs to more effectively anticipate the potential actions of other vehicles, significantly improving prediction accuracy and responsiveness in dynamic environments. This paper… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  36. arXiv:2402.11588  [pdf, other

    cs.CV cs.AI

    SDiT: Spiking Diffusion Model with Transformer

    Authors: Shu Yang, Hanzhi Ma, Chengting Yu, Aili Wang, Er-Ping Li

    Abstract: Spiking neural networks (SNNs) have low power consumption and bio-interpretable characteristics, and are considered to have tremendous potential for energy-efficient computing. However, the exploration of SNNs on image generation tasks remains very limited, and a unified and effective structure for SNN-based generative models has yet to be proposed. In this paper, we explore a novel diffusion mode… ▽ More

    Submitted 24 February, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  37. arXiv:2402.06118  [pdf, other

    cs.CV cs.AI

    ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling

    Authors: Siming Yan, Min Bai, Weifeng Chen, Xiong Zhou, Qixing Huang, Li Erran Li

    Abstract: By combining natural language understanding, generation capabilities, and breadth of knowledge of large language models with image perception, recent large vision language models (LVLMs) have shown unprecedented visual reasoning capabilities. However, the generated text often suffers from inaccurate grounding in the visual input, resulting in errors such as hallucination of nonexistent scene eleme… ▽ More

    Submitted 13 October, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: 10 pages, 3 figures

  38. arXiv:2402.04318  [pdf, other

    cs.RO

    Human Observation-Inspired Trajectory Prediction for Autonomous Driving in Mixed-Autonomy Traffic Environments

    Authors: Haicheng Liao, Shangqian Liu, Yongkang Li, Zhenning Li, Chengyue Wang, Yunjian Li, Shengbo Eben Li, Chengzhong Xu

    Abstract: In the burgeoning field of autonomous vehicles (AVs), trajectory prediction remains a formidable challenge, especially in mixed autonomy environments. Traditional approaches often rely on computational methods such as time-series analysis. Our research diverges significantly by adopting an interdisciplinary approach that integrates principles of human cognition and observational behavior into traj… ▽ More

    Submitted 8 March, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  39. arXiv:2401.16395  [pdf, other

    cs.FL

    Deciding Subtyping for Asynchronous Multiparty Sessions

    Authors: Elaine Li, Felix Stutz, Thomas Wies

    Abstract: Multiparty session types (MSTs) are a type-based approach to verifying communication protocols, represented as global types in the framework. We present a precise subtyping relation for asynchronous MSTs with communicating state machines (CSMs) as implementation model. We address two problems: when can a local implementation safely substitute another, and when does an arbitrary CSM implement a glo… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  40. arXiv:2401.13920  [pdf, other

    cs.LG cs.AI cs.CL

    LocMoE: A Low-Overhead MoE for Large Language Model Training

    Authors: Jing Li, Zhijie Sun, Xuan He, Li Zeng, Yi Lin, Entong Li, Binfan Zheng, Rongqian Zhao, Xin Chen

    Abstract: The Mixtures-of-Experts (MoE) model is a widespread distributed and integrated learning method for large language models (LLM), which is favored due to its ability to sparsify and expand models efficiently. However, the performance of MoE is limited by load imbalance and high latency of All-to-All communication, along with relatively redundant computation owing to large expert capacity. Load imbal… ▽ More

    Submitted 23 May, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: 1. Update the font size of all figures. 2. Update the name of the proposed layer Grouped Average Pooling (GrAP). 3. Change the order of the Section Contribution Statement

  41. Frustrated Random Walks: A Fast Method to Compute Node Distances on Hypergraphs

    Authors: Enzhi Li, Scott Nickleach, Bilal Fadlallah

    Abstract: A hypergraph is a generalization of a graph that arises naturally when attribute-sharing among entities is considered. Compared to graphs, hypergraphs have the distinct advantage that they contain explicit communities and are more convenient to manipulate. An open problem in hypergraph research is how to accurately and efficiently calculate node distances on hypergraphs. Estimating node distances… ▽ More

    Submitted 27 August, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: 15 pages, 6 figures

    Journal ref: Phys. Rev. E 110 (2024), 024314

  42. arXiv:2401.10700  [pdf, other

    cs.LG cs.AI cs.RO

    Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model

    Authors: Yinan Zheng, Jianxiong Li, Dongjie Yu, Yujie Yang, Shengbo Eben Li, Xianyuan Zhan, Jingjing Liu

    Abstract: Safe offline RL is a promising way to bypass risky online interactions towards safe policy learning. Most existing methods only enforce soft constraints, i.e., constraining safety violations in expectation below thresholds predetermined. This can lead to potentially unsafe outcomes, thus unacceptable in safety-critical scenarios. An alternative is to enforce the hard constraint of zero violation.… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: ICLR 2024, 30pages, 11 figures

  43. arXiv:2401.06341  [pdf, other

    cs.CV cs.RO

    AffordanceLLM: Grounding Affordance from Vision Language Models

    Authors: Shengyi Qian, Weifeng Chen, Min Bai, Xiong Zhou, Zhuowen Tu, Li Erran Li

    Abstract: Affordance grounding refers to the task of finding the area of an object with which one can interact. It is a fundamental but challenging task, as a successful solution requires the comprehensive understanding of a scene in multiple aspects including detection, localization, and recognition of objects with their parts, of geo-spatial configuration/layout of the scene, of 3D shapes and physics, as… ▽ More

    Submitted 17 April, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

  44. arXiv:2401.02954  [pdf, other

    cs.CL cs.AI cs.LG

    DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

    Authors: DeepSeek-AI, :, Xiao Bi, Deli Chen, Guanting Chen, Shanhuang Chen, Damai Dai, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Zhe Fu, Huazuo Gao, Kaige Gao, Wenjun Gao, Ruiqi Ge, Kang Guan, Daya Guo, Jianzhong Guo, Guangbo Hao, Zhewen Hao, Ying He, Wenjie Hu, Panpan Huang, Erhang Li , et al. (63 additional authors not shown)

    Abstract: The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  45. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  46. arXiv:2312.08715  [pdf, other

    cs.RO

    Bayes3D: fast learning and inference in structured generative models of 3D objects and scenes

    Authors: Nishad Gothoskar, Matin Ghavami, Eric Li, Aidan Curtis, Michael Noseworthy, Karen Chung, Brian Patton, William T. Freeman, Joshua B. Tenenbaum, Mirko Klukas, Vikash K. Mansinghka

    Abstract: Robots cannot yet match humans' ability to rapidly learn the shapes of novel 3D objects and recognize them robustly despite clutter and occlusion. We present Bayes3D, an uncertainty-aware perception system for structured 3D scenes, that reports accurate posterior uncertainty over 3D object shape, pose, and scene composition in the presence of clutter and occlusion. Bayes3D delivers these capabilit… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  47. arXiv:2312.07636  [pdf, other

    cs.LG cs.CV stat.ML

    Go beyond End-to-End Training: Boosting Greedy Local Learning with Context Supply

    Authors: Chengting Yu, Fengzhao Zhang, Hanzhi Ma, Aili Wang, Erping Li

    Abstract: Traditional end-to-end (E2E) training of deep networks necessitates storing intermediate activations for back-propagation, resulting in a large memory footprint on GPUs and restricted model parallelization. As an alternative, greedy local learning partitions the network into gradient-isolated modules and trains supervisely based on local preliminary losses, thereby providing asynchronous and paral… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: 9 figures, 12 tables

  48. arXiv:2312.06371  [pdf, other

    cs.RO cs.AI

    BAT: Behavior-Aware Human-Like Trajectory Prediction for Autonomous Driving

    Authors: Haicheng Liao, Zhenning Li, Huanming Shen, Wenxuan Zeng, Dongping Liao, Guofa Li, Shengbo Eben Li, Chengzhong Xu

    Abstract: The ability to accurately predict the trajectory of surrounding vehicles is a critical hurdle to overcome on the journey to fully autonomous vehicles. To address this challenge, we pioneer a novel behavior-aware trajectory prediction model (BAT) that incorporates insights and findings from traffic psychology, human behavior, and decision-making. Our model consists of behavior-aware, interaction-aw… ▽ More

    Submitted 15 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

  49. arXiv:2312.06240  [pdf, other

    cs.CV

    UIEDP:Underwater Image Enhancement with Diffusion Prior

    Authors: Dazhao Du, Enhan Li, Lingyu Si, Fanjiang Xu, Jianwei Niu, Fuchun Sun

    Abstract: Underwater image enhancement (UIE) aims to generate clear images from low-quality underwater images. Due to the unavailability of clear reference images, researchers often synthesize them to construct paired datasets for training deep models. However, these synthesized images may sometimes lack quality, adversely affecting training outcomes. To address this issue, we propose UIE with Diffusion Pri… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  50. arXiv:2312.01836  [pdf, other

    cs.RO cs.AI

    Integrated Drill Boom Hole-Seeking Control via Reinforcement Learning

    Authors: Haoqi Yan, Haoyuan Xu, Hongbo Gao, Fei Ma, Shengbo Eben Li, Jingliang Duan

    Abstract: Intelligent drill boom hole-seeking is a promising technology for enhancing drilling efficiency, mitigating potential safety hazards, and relieving human operators. Most existing intelligent drill boom control methods rely on a hierarchical control framework based on inverse kinematics. However, these methods are generally time-consuming due to the computational complexity of inverse kinematics an… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.