Skip to main content

Showing 1–50 of 133 results for author: Tian, S

  1. arXiv:2410.00371  [pdf, other

    cs.RO

    AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation

    Authors: Jiafei Duan, Wilbert Pumacay, Nishanth Kumar, Yi Ru Wang, Shulin Tian, Wentao Yuan, Ranjay Krishna, Dieter Fox, Ajay Mandlekar, Yijie Guo

    Abstract: Robotic manipulation in open-world settings requires not only task execution but also the ability to detect and learn from failures. While recent advances in vision-language models (VLMs) and large language models (LLMs) have improved robots' spatial reasoning and problem-solving abilities, they still struggle with failure recognition, limiting their real-world applicability. We introduce AHA, an… ▽ More

    Submitted 30 September, 2024; originally announced October 2024.

    Comments: Appendix and details can be found in project website: https://aha-vlm.github.io/

  2. arXiv:2409.18973  [pdf, other

    eess.SP cs.AI q-bio.NC

    EEG-EMG FAConformer: Frequency Aware Conv-Transformer for the fusion of EEG and EMG

    Authors: ZhengXiao He, Minghong Cai, Letian Li, Siyuan Tian, Ren-Jie Dai

    Abstract: Motor pattern recognition paradigms are the main forms of Brain-Computer Interfaces(BCI) aimed at motor function rehabilitation and are the most easily promoted applications. In recent years, many researchers have suggested encouraging patients to perform real motor control execution simultaneously in MI-based BCI rehabilitation training systems. Electromyography (EMG) signals are the most direct… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  3. arXiv:2409.17011  [pdf, other

    cs.CL cs.DL

    LLM-CARD: Towards a Description and Landscape of Large Language Models

    Authors: Shengwei Tian, Lifeng Han, Erick Mendez Guzman, Goran Nenadic

    Abstract: With the rapid growth of the Natural Language Processing (NLP) field, a vast variety of Large Language Models (LLMs) continue to emerge for diverse NLP tasks. As an increasing number of papers are presented, researchers and developers face the challenge of information overload. Thus, it is particularly important to develop a system that can automatically extract and organise key information about… ▽ More

    Submitted 28 September, 2024; v1 submitted 25 September, 2024; originally announced September 2024.

    Comments: ongoing work, 16 pages

  4. arXiv:2409.12456  [pdf, other

    cs.CV cs.RO

    Bayesian-Optimized One-Step Diffusion Model with Knowledge Distillation for Real-Time 3D Human Motion Prediction

    Authors: Sibo Tian, Minghui Zheng, Xiao Liang

    Abstract: Human motion prediction is a cornerstone of human-robot collaboration (HRC), as robots need to infer the future movements of human workers based on past motion cues to proactively plan their motion, ensuring safety in close collaboration scenarios. The diffusion model has demonstrated remarkable performance in predicting high-quality motion samples with reasonable diversity, but suffers from a slo… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

  5. Agent Aggregator with Mask Denoise Mechanism for Histopathology Whole Slide Image Analysis

    Authors: Xitong Ling, Minxi Ouyang, Yizhi Wang, Xinrui Chen, Renao Yan, Hongbo Chu, Junru Cheng, Tian Guan, Sufang Tian, Xiaoping Liu, Yonghong He

    Abstract: Histopathology analysis is the gold standard for medical diagnosis. Accurate classification of whole slide images (WSIs) and region-of-interests (ROIs) localization can assist pathologists in diagnosis. The gigapixel resolution of WSI and the absence of fine-grained annotations make direct classification and analysis challenging. In weakly supervised learning, multiple instance learning (MIL) pres… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

  6. arXiv:2409.03685  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    View-Invariant Policy Learning via Zero-Shot Novel View Synthesis

    Authors: Stephen Tian, Blake Wulfe, Kyle Sargent, Katherine Liu, Sergey Zakharov, Vitor Guizilini, Jiajun Wu

    Abstract: Large-scale visuomotor policy learning is a promising approach toward developing generalizable manipulation systems. Yet, policies that can be deployed on diverse embodiments, environments, and observational modalities remain elusive. In this work, we investigate how knowledge from large-scale visual data of the world may be used to address one axis of variation for generalizable manipulation: obs… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted to CoRL 2024

  7. arXiv:2409.02979  [pdf, other

    cs.CV

    Vec2Face: Scaling Face Dataset Generation with Loosely Constrained Vectors

    Authors: Haiyu Wu, Jaskirat Singh, Sicong Tian, Liang Zheng, Kevin W. Bowyer

    Abstract: This paper studies how to synthesize face images of non-existent persons, to create a dataset that allows effective training of face recognition (FR) models. Two important goals are (1) the ability to generate a large number of distinct identities (inter-class separation) with (2) a wide variation in appearance of each identity (intra-class variation). However, existing works 1) are typically limi… ▽ More

    Submitted 21 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

  8. arXiv:2407.19279  [pdf, other

    cs.RO

    Grasping Force Control and Adaptation for a Cable-Driven Robotic Hand

    Authors: Eric Mountain, Ean Weise, Sibo Tian, Beiwen Li, Xiao Liang, Minghui Zheng

    Abstract: This paper introduces a unique force control and adaptation algorithm for a lightweight and low-complexity five-fingered robotic hand, namely an Integrated-Finger Robotic Hand (IFRH). The force control and adaptation algorithm is intuitive to design, easy to implement, and improves the grasping functionality through feedforward adaptation automatically. Specifically, we have extended Youla-paramet… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  9. arXiv:2407.15488  [pdf, other

    cs.CV

    DiffX: Guide Your Layout to Cross-Modal Generative Modeling

    Authors: Zeyu Wang, Jingyu Lin, Yifei Qian, Yi Huang, Shicen Tian, Bosong Chai, Juncan Deng, Qu Yang, Lan Du, Cunjian Chen, Kejie Huang

    Abstract: Diffusion models have made significant strides in language-driven and layout-driven image generation. However, most diffusion models are limited to visible RGB image generation. In fact, human perception of the world is enriched by diverse viewpoints, such as chromatic contrast, thermal illumination, and depth information. In this paper, we introduce a novel diffusion model for general layout-guid… ▽ More

    Submitted 20 October, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

  10. arXiv:2407.14676  [pdf, other

    cs.CV

    On Learning Discriminative Features from Synthesized Data for Self-Supervised Fine-Grained Visual Recognition

    Authors: Zihu Wang, Lingqiao Liu, Scott Ricardo Figueroa Weston, Samuel Tian, Peng Li

    Abstract: Self-Supervised Learning (SSL) has become a prominent approach for acquiring visual representations across various tasks, yet its application in fine-grained visual recognition (FGVR) is challenged by the intricate task of distinguishing subtle differences between categories. To overcome this, we introduce an novel strategy that boosts SSL's ability to extract critical discriminative features vita… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  11. arXiv:2407.11585  [pdf, other

    cs.CV cs.AI

    QVD: Post-training Quantization for Video Diffusion Models

    Authors: Shilong Tian, Hong Chen, Chengtao Lv, Yu Liu, Jinyang Guo, Xianglong Liu, Shengxi Li, Hao Yang, Tao Xie

    Abstract: Recently, video diffusion models (VDMs) have garnered significant attention due to their notable advancements in generating coherent and realistic video content. However, processing multiple frame features concurrently, coupled with the considerable model size, results in high latency and extensive memory consumption, hindering their broader application. Post-training quantization (PTQ) is an effe… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: accepted by ACMMM2024

  12. arXiv:2407.01418  [pdf, other

    cs.RO cs.AI cs.LG

    RoboPack: Learning Tactile-Informed Dynamics Models for Dense Packing

    Authors: Bo Ai, Stephen Tian, Haochen Shi, Yixuan Wang, Cheston Tan, Yunzhu Li, Jiajun Wu

    Abstract: Tactile feedback is critical for understanding the dynamics of both rigid and deformable objects in many manipulation tasks, such as non-prehensile manipulation and dense packing. We introduce an approach that combines visual and tactile sensing for robotic manipulation by learning a neural, tactile-informed dynamics model. Our proposed framework, RoboPack, employs a recurrent graph neural network… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Robotics: Science and Systems (RSS), 2024. Project page: https://robo-pack.github.io/

    ACM Class: I.2.9; I.2.6; I.2.10

  13. arXiv:2406.16995  [pdf, other

    q-bio.QM cs.AI

    A large language model for predicting T cell receptor-antigen binding specificity

    Authors: Xing Fang, Chenpeng Yu, Shiye Tian, Hui Liu

    Abstract: The human immune response depends on the binding of T-cell receptors (TCRs) to antigens (pTCR), which elicits the T cells to eliminate viruses, tumor cells, and other pathogens. The ability of human immunity system responding to unknown viruses and bacteria stems from the TCR diversity. However, this vast diversity poses challenges on the TCR-antigen binding prediction methods. In this study, we p… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  14. arXiv:2406.14288  [pdf, other

    cs.LG cs.AI

    Revisiting Modularity Maximization for Graph Clustering: A Contrastive Learning Perspective

    Authors: Yunfei Liu, Jintang Li, Yuehe Chen, Ruofan Wu, Ericbk Wang, Jing Zhou, Sheng Tian, Shuheng Shen, Xing Fu, Changhua Meng, Weiqiang Wang, Liang Chen

    Abstract: Graph clustering, a fundamental and challenging task in graph mining, aims to classify nodes in a graph into several disjoint clusters. In recent years, graph contrastive learning (GCL) has emerged as a dominant line of research in graph clustering and advances the new state-of-the-art. However, GCL-based methods heavily rely on graph augmentations and contrastive schemes, which may potentially in… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: KDD 2024 research track. Code available at https://github.com/EdisonLeeeee/MAGI

  15. arXiv:2406.05055  [pdf, other

    cs.AI

    Robustness Assessment of Mathematical Reasoning in the Presence of Missing and Contradictory Conditions

    Authors: Shi-Yu Tian, Zhi Zhou, Lin-Han Jia, Lan-Zhe Guo, Yu-Feng Li

    Abstract: Large language models (LLMs) have demonstrated impressive performance on reasoning tasks, which can be further improved through few-shot prompting techniques. However, the current evaluation primarily focuses on carefully constructed benchmarks and neglects the consideration of real-world reasoning problems that present missing and contradictory conditions, known as ill-defined problems. Our obser… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Preprint. arXiv admin note: text overlap with arXiv:2304.09797

  16. arXiv:2405.16205  [pdf

    cs.AI cs.CL

    GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases

    Authors: Zhizheng Wang, Qiao Jin, Chih-Hsuan Wei, Shubo Tian, Po-Ting Lai, Qingqing Zhu, Chi-Ping Day, Christina Ross, Zhiyong Lu

    Abstract: Gene set knowledge discovery is essential for advancing human functional genomics. Recent studies have shown promising performance by harnessing the power of Large Language Models (LLMs) on this task. Nonetheless, their results are subject to several limitations common in LLMs such as hallucinations. In response, we present GeneAgent, a first-of-its-kind language agent featuring self-verification… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 30 pages with 10 figures and/or tables

  17. arXiv:2405.15965  [pdf, other

    cs.CV

    What is a Goldilocks Face Verification Test Set?

    Authors: Haiyu Wu, Sicong Tian, Aman Bhatta, Jacob Gutierrez, Grace Bezold, Genesis Argueta, Karl Ricanek Jr., Michael C. King, Kevin W. Bowyer

    Abstract: Face Recognition models are commonly trained with web-scraped datasets containing millions of images and evaluated on test sets emphasizing pose, age and mixed attributes. With train and test sets both assembled from web-scraped images, it is critical to ensure disjoint sets of identities between train and test sets. However, existing train and test sets have not considered this. Moreover, as accu… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  18. arXiv:2405.13859  [pdf, other

    cs.CV

    QGait: Toward Accurate Quantization for Gait Recognition with Binarized Input

    Authors: Senmao Tian, Haoyu Gao, Gangyi Hong, Shuyun Wang, JingJie Wang, Xin Yu, Shunli Zhang

    Abstract: Existing deep learning methods have made significant progress in gait recognition. Typically, appearance-based models binarize inputs into silhouette sequences. However, mainstream quantization methods prioritize minimizing task loss over quantization error, which is detrimental to gait recognition with binarized inputs. Minor variations in silhouette sequences can be diminished in the network's i… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  19. arXiv:2405.09779  [pdf, other

    cs.RO

    Integrating Uncertainty-Aware Human Motion Prediction into Graph-Based Manipulator Motion Planning

    Authors: Wansong Liu, Kareem Eltouny, Sibo Tian, Xiao Liang, Minghui Zheng

    Abstract: There has been a growing utilization of industrial robots as complementary collaborators for human workers in re-manufacturing sites. Such a human-robot collaboration (HRC) aims to assist human workers in improving the flexibility and efficiency of labor-intensive tasks. In this paper, we propose a human-aware motion planning framework for HRC to effectively compute collision-free motions for mani… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  20. arXiv:2405.09403  [pdf, other

    cs.CV

    Identity Overlap Between Face Recognition Train/Test Data: Causing Optimistic Bias in Accuracy Measurement

    Authors: Haiyu Wu, Sicong Tian, Jacob Gutierrez, Aman Bhatta, Kağan Öztürk, Kevin W. Bowyer

    Abstract: A fundamental tenet of pattern recognition is that overlap between training and testing sets causes an optimistic accuracy estimate. Deep CNNs for face recognition are trained for N-way classification of the identities in the training set. Accuracy is commonly estimated as average 10-fold classification accuracy on image pairs from test sets such as LFW, CALFW, CPLFW, CFP-FP and AgeDB-30. Because… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  21. arXiv:2405.07962  [pdf, other

    cs.RO eess.SY

    KG-Planner: Knowledge-Informed Graph Neural Planning for Collaborative Manipulators

    Authors: Wansong Liu, Kareem Eltouny, Sibo Tian, Xiao Liang, Minghui Zheng

    Abstract: This paper presents a novel knowledge-informed graph neural planner (KG-Planner) to address the challenge of efficiently planning collision-free motions for robots in high-dimensional spaces, considering both static and dynamic environments involving humans. Unlike traditional motion planners that struggle with finding a balance between efficiency and optimality, the KG-Planner takes a different a… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  22. arXiv:2404.09992  [pdf, other

    cs.CV cs.AI cs.CL

    MMInA: Benchmarking Multihop Multimodal Internet Agents

    Authors: Ziniu Zhang, Shulin Tian, Liangyu Chen, Ziwei Liu

    Abstract: Autonomous embodied agents live on an Internet of multimedia websites. Can they hop around multimodal websites to complete complex user tasks? Existing benchmarks fail to assess them in a realistic, evolving environment for their embodiment across websites. To answer this question, we present MMInA, a multihop and multimodal benchmark to evaluate the embodied agents for compositional Internet task… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  23. arXiv:2404.04949  [pdf, other

    cs.CL cs.CE

    SilverSight: A Multi-Task Chinese Financial Large Language Model Based on Adaptive Semantic Space Learning

    Authors: Yuhang Zhou, Zeping Li, Siyu Tian, Yuchen Ni, Sen Liu, Guangnan Ye, Hongfeng Chai

    Abstract: Large language models (LLMs) are increasingly being applied across various specialized fields, leveraging their extensive knowledge to empower a multitude of scenarios within these domains. However, each field encompasses a variety of specific tasks that require learning, and the diverse, heterogeneous data across these domains can lead to conflicts during model task transfer. In response to this… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 17 pages, 17 figures

  24. arXiv:2404.04483  [pdf

    eess.IV cs.CV

    FastHDRNet: A new efficient method for SDR-to-HDR Translation

    Authors: Siyuan Tian, Hao Wang, Yiren Rong, Junhao Wang, Renjie Dai, Zhengxiao He

    Abstract: Modern displays nowadays possess the capability to render video content with a high dynamic range (HDR) and an extensive color gamut .However, the majority of available resources are still in standard dynamic range (SDR). Therefore, we need to identify an effective methodology for this objective.The existing deep neural networks (DNN) based SDR to HDR conversion methods outperforms conventional me… ▽ More

    Submitted 11 May, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

    Comments: 16 pages, 4 figures

  25. arXiv:2403.12945  [pdf, other

    cs.RO

    DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    Authors: Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park , et al. (74 additional authors not shown)

    Abstract: The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://droid-dataset.github.io/

  26. iBA: Backdoor Attack on 3D Point Cloud via Reconstructing Itself

    Authors: Yuhao Bian, Shengjing Tian, Xiuping Liu

    Abstract: The widespread deployment of Deep Neural Networks (DNNs) for 3D point cloud processing starkly contrasts with their susceptibility to security breaches, notably backdoor attacks. These attacks hijack DNNs during training, embedding triggers in the data that, once activated, cause the network to make predetermined errors while maintaining normal performance on unaltered data. This vulnerability pos… ▽ More

    Submitted 9 September, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

    Comments: 16 pages. in IEEE Transactions on Information Forensics and Security (2024)

  27. arXiv:2403.05753  [pdf, other

    eess.IV cs.CV

    UDCR: Unsupervised Aortic DSA/CTA Rigid Registration Using Deep Reinforcement Learning and Overlap Degree Calculation

    Authors: Wentao Liu, Bowen Liang, Weijin Xu, Tong Tian, Qingsheng Lu, Xipeng Pan, Haoyuan Li, Siyu Tian, Huihua Yang, Ruisheng Su

    Abstract: The rigid registration of aortic Digital Subtraction Angiography (DSA) and Computed Tomography Angiography (CTA) can provide 3D anatomical details of the vasculature for the interventional surgical treatment of conditions such as aortic dissection and aortic aneurysms, holding significant value for clinical research. However, the current methods for 2D/3D image registration are dependent on manual… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  28. arXiv:2403.04545  [pdf, other

    cs.LG math.ST

    Improve Generalization Ability of Deep Wide Residual Network with A Suitable Scaling Factor

    Authors: Songtao Tian, Zixiong Yu

    Abstract: Deep Residual Neural Networks (ResNets) have demonstrated remarkable success across a wide range of real-world applications. In this paper, we identify a suitable scaling factor (denoted by $α$) on the residual branch of deep wide ResNets to achieve good generalization ability. We show that if $α$ is a constant, the class of functions induced by Residual Neural Tangent Kernel (RNTK) is asymptotica… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  29. Adaptive quantization with mixed-precision based on low-cost proxy

    Authors: Junzhe Chen, Qiao Yang, Senmao Tian, Shunli Zhang

    Abstract: It is critical to deploy complicated neural network models on hardware with limited resources. This paper proposes a novel model quantization method, named the Low-Cost Proxy-Based Adaptive Mixed-Precision Model Quantization (LCPAQ), which contains three key modules. The hardware-aware module is designed by considering the hardware limitations, while an adaptive mixed-precision quantization module… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: accepted by icassp2024

    Journal ref: ICASSP2024

  30. arXiv:2402.13045  [pdf, other

    cs.RO

    A Recurrent Neural Network Enhanced Unscented Kalman Filter for Human Motion Prediction

    Authors: Wansong Liu, Sibo Tian, Boyi Hu, Xiao Liang, Minghui Zheng

    Abstract: This paper presents a deep learning enhanced adaptive unscented Kalman filter (UKF) for predicting human arm motion in the context of manufacturing. Unlike previous network-based methods that solely rely on captured human motion data, which is represented as bone vectors in this paper, we incorporate a human arm dynamic model into the motion prediction algorithm and use the UKF to iteratively fore… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  31. arXiv:2402.11540  [pdf, other

    cs.CV

    CPN: Complementary Proposal Network for Unconstrained Text Detection

    Authors: Longhuang Wu, Shangxuan Tian, Youxin Wang, Pengfei Xiong

    Abstract: Existing methods for scene text detection can be divided into two paradigms: segmentation-based and anchor-based. While Segmentation-based methods are well-suited for irregular shapes, they struggle with compact or overlapping layouts. Conversely, anchor-based approaches excel for complex layouts but suffer from irregular shapes. To strengthen their merits and overcome their respective demerits, w… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: Accepted to AAAI 2024

  32. arXiv:2402.01693  [pdf

    cs.CL cs.AI

    Quality of Answers of Generative Large Language Models vs Peer Patients for Interpreting Lab Test Results for Lay Patients: Evaluation Study

    Authors: Zhe He, Balu Bhasuran, Qiao Jin, Shubo Tian, Karim Hanna, Cindy Shavor, Lisbeth Garcia Arguello, Patrick Murray, Zhiyong Lu

    Abstract: Lab results are often confusing and hard to understand. Large language models (LLMs) such as ChatGPT have opened a promising avenue for patients to get their questions answered. We aim to assess the feasibility of using LLMs to generate relevant, accurate, helpful, and unharmful responses to lab test-related questions asked by patients and to identify potential issues that can be mitigated with au… ▽ More

    Submitted 23 January, 2024; originally announced February 2024.

  33. arXiv:2401.14807  [pdf, other

    cs.CV

    PL-FSCIL: Harnessing the Power of Prompts for Few-Shot Class-Incremental Learning

    Authors: Songsong Tian, Lusi Li, Weijun Li, Hang Ran, Li Li, Xin Ning

    Abstract: Few-Shot Class-Incremental Learning (FSCIL) aims to enable deep neural networks to learn new tasks incrementally from a small number of labeled samples without forgetting previously learned tasks, closely mimicking human learning patterns. In this paper, we propose a novel approach called Prompt Learning for FSCIL (PL-FSCIL), which harnesses the power of prompts in conjunction with a pre-trained V… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

  34. arXiv:2401.13285  [pdf, other

    cs.CV

    Small Object Tracking in LiDAR Point Cloud: Learning the Target-awareness Prototype and Fine-grained Search Region

    Authors: Shengjing Tian, Yinan Han, Xiuping Liu, Xiantong Zhao

    Abstract: Single Object Tracking in LiDAR point cloud is one of the most essential parts of environmental perception, in which small objects are inevitable in real-world scenarios and will bring a significant barrier to the accurate location. However, the existing methods concentrate more on exploring universal architectures for common categories and overlook the challenges that small objects have long been… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  35. arXiv:2401.11489  [pdf

    cs.CV cs.AI

    MapChange: Enhancing Semantic Change Detection with Temporal-Invariant Historical Maps Based on Deep Triplet Network

    Authors: Yinhe Liu, Sunan Shi, Zhuo Zheng, Jue Wang, Shiqi Tian, Yanfei Zhong

    Abstract: Semantic Change Detection (SCD) is recognized as both a crucial and challenging task in the field of image analysis. Traditional methods for SCD have predominantly relied on the comparison of image pairs. However, this approach is significantly hindered by substantial imaging differences, which arise due to variations in shooting times, atmospheric conditions, and angles. Such discrepancies lead t… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

  36. arXiv:2401.11048  [pdf

    cs.CL q-bio.QM

    PubTator 3.0: an AI-powered Literature Resource for Unlocking Biomedical Knowledge

    Authors: Chih-Hsuan Wei, Alexis Allot, Po-Ting Lai, Robert Leaman, Shubo Tian, Ling Luo, Qiao Jin, Zhizheng Wang, Qingyu Chen, Zhiyong Lu

    Abstract: PubTator 3.0 (https://www.ncbi.nlm.nih.gov/research/pubtator3/) is a biomedical literature resource using state-of-the-art AI techniques to offer semantic and relation searches for key concepts like proteins, genetic variants, diseases, and chemicals. It currently provides over one billion entity and relation annotations across approximately 36 million PubMed abstracts and 6 million full-text arti… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

  37. arXiv:2401.06789  [pdf

    cs.IR cs.AI cs.CL cs.LG

    Information Retrieval and Classification of Real-Time Multi-Source Hurricane Evacuation Notices

    Authors: Tingting Zhao, Shubo Tian, Jordan Daly, Melissa Geiger, Minna Jia, Jinfeng Zhang

    Abstract: For an approaching disaster, the tracking of time-sensitive critical information such as hurricane evacuation notices is challenging in the United States. These notices are issued and distributed rapidly by numerous local authorities that may spread across multiple states. They often undergo frequent updates and are distributed through diverse online portals lacking standard formats. In this study… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

  38. arXiv:2312.10892  [pdf, other

    eess.IV cs.CV q-bio.QM

    Deep Learning-based MRI Reconstruction with Artificial Fourier Transform (AFT)-Net

    Authors: Yanting Yang, Yiren Zhang, Zongyu Li, Jeffery Siyuan Tian, Matthieu Dagommer, Jia Guo

    Abstract: Deep complex-valued neural networks provide a powerful way to leverage complex number operations and representations and have succeeded in several phase-based applications. However, most previously published networks have not fully explored the impact of complex-valued networks in the frequency domain. Here, we introduce a unified complex-valued deep learning framework-Artificial Fourier Transform… ▽ More

    Submitted 18 October, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

  39. arXiv:2312.07843  [pdf, ps, other

    cs.RO

    Foundation Models in Robotics: Applications, Challenges, and the Future

    Authors: Roya Firoozi, Johnathan Tucker, Stephen Tian, Anirudha Majumdar, Jiankai Sun, Weiyu Liu, Yuke Zhu, Shuran Song, Ashish Kapoor, Karol Hausman, Brian Ichter, Danny Driess, Jiajun Wu, Cewu Lu, Mac Schwager

    Abstract: We survey applications of pretrained foundation models in robotics. Traditional deep learning models in robotics are trained on small datasets tailored for specific tasks, which limits their adaptability across diverse applications. In contrast, foundation models pretrained on internet-scale data appear to have superior generalization capabilities, and in some instances display an emergent ability… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  40. arXiv:2311.16605  [pdf, other

    cs.LG cs.AI

    LasTGL: An Industrial Framework for Large-Scale Temporal Graph Learning

    Authors: Jintang Li, Jiawang Dan, Ruofan Wu, Jing Zhou, Sheng Tian, Yunfei Liu, Baokun Wang, Changhua Meng, Weiqiang Wang, Yuchang Zhu, Liang Chen, Zibin Zheng

    Abstract: Over the past few years, graph neural networks (GNNs) have become powerful and practical tools for learning on (static) graph-structure data. However, many real-world applications, such as social networks and e-commerce, involve temporal graphs where nodes and edges are dynamically evolving. Temporal graph neural networks (TGNNs) have progressively emerged as an extension of GNNs to address time-e… ▽ More

    Submitted 30 November, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: Preprint; Work in progress

  41. arXiv:2311.14566  [pdf, other

    cs.RO

    Multi-tap Resistive Sensing and FEM Modeling enables Shape and Force Estimation in Soft Robots

    Authors: Sizhe Tian, Barnabas Gavin Cangan, Stefan Escaida Navarro, Artem Beger, Christian Duriez, Robert K. Katzschmann

    Abstract: We address the challenge of reliable and accurate proprioception in soft robots, specifically those with tight packaging constraints and relying only on internally embedded sensors. While various sensing approaches with single sensors have been tried, often with a constant curvature assumption, we look into sensing local deformations at multiple locations of the sensor. In our approach, we multi-t… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

    Comments: 8 pages, 8 figures, to be published in Robotics and Automation Letters (RA-L)

  42. arXiv:2311.11208  [pdf, other

    cs.CV

    LogicNet: A Logical Consistency Embedded Face Attribute Learning Network

    Authors: Haiyu Wu, Sicong Tian, Huayu Li, Kevin W. Bowyer

    Abstract: Ensuring logical consistency in predictions is a crucial yet overlooked aspect in multi-attribute classification. We explore the potential reasons for this oversight and introduce two pressing challenges to the field: 1) How can we ensure that a model, when trained with data checked for logical consistency, yields predictions that are logically consistent? 2) How can we achieve the same with data… ▽ More

    Submitted 21 September, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

  43. arXiv:2311.10751  [pdf, other

    cs.RO cs.AI cs.CL

    ProAgent: From Robotic Process Automation to Agentic Process Automation

    Authors: Yining Ye, Xin Cong, Shizuo Tian, Jiannan Cao, Hao Wang, Yujia Qin, Yaxi Lu, Heyang Yu, Huadong Wang, Yankai Lin, Zhiyuan Liu, Maosong Sun

    Abstract: From ancient water wheels to robotic process automation (RPA), automation technology has evolved throughout history to liberate human beings from arduous tasks. Yet, RPA struggles with tasks needing human-like intelligence, especially in elaborate design of workflow construction and dynamic decision-making in workflow execution. As Large Language Models (LLMs) have emerged human-like intelligence,… ▽ More

    Submitted 23 November, 2023; v1 submitted 2 November, 2023; originally announced November 2023.

    Comments: Work in progress

  44. arXiv:2311.08428  [pdf, other

    q-bio.QM cs.LG

    Deep Phenotyping of Non-Alcoholic Fatty Liver Disease Patients with Genetic Factors for Insights into the Complex Disease

    Authors: Tahmina Sultana Priya, Fan Leng, Anthony C. Luehrs, Eric W. Klee, Alina M. Allen, Konstantinos N. Lazaridis, Danfeng, Yao, Shulan Tian

    Abstract: Non-alcoholic fatty liver disease (NAFLD) is a prevalent chronic liver disorder characterized by the excessive accumulation of fat in the liver in individuals who do not consume significant amounts of alcohol, including risk factors like obesity, insulin resistance, type 2 diabetes, etc. We aim to identify subgroups of NAFLD patients based on demographic, clinical, and genetic characteristics for… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 11 pages

  45. Deep Learning-based 3D Point Cloud Classification: A Systematic Survey and Outlook

    Authors: Huang Zhang, Changshuo Wang, Shengwei Tian, Baoli Lu, Liping Zhang, Xin Ning, Xiao Bai

    Abstract: In recent years, point cloud representation has become one of the research hotspots in the field of computer vision, and has been widely used in many fields, such as autonomous driving, virtual reality, robotics, etc. Although deep learning techniques have achieved great success in processing regular structured 2D grid image data, there are still great challenges in processing irregular, unstructu… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

    Journal ref: Displays 102456 (2023)

  46. arXiv:2311.01862  [pdf, other

    cs.CL cs.DB

    $R^3$-NL2GQL: A Model Coordination and Knowledge Graph Alignment Approach for NL2GQL

    Authors: Yuhang Zhou, Yu He, Siyu Tian, Yuchen Ni, Zhangyue Yin, Xiang Liu, Chuanjun Ji, Sen Liu, Xipeng Qiu, Guangnan Ye, Hongfeng Chai

    Abstract: While current tasks of converting natural language to SQL (NL2SQL) using Foundation Models have shown impressive achievements, adapting these approaches for converting natural language to Graph Query Language (NL2GQL) encounters hurdles due to the distinct nature of GQL compared to SQL, alongside the diverse forms of GQL. Moving away from traditional rule-based and slot-filling methodologies, we i… ▽ More

    Submitted 1 July, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

  47. arXiv:2311.00754  [pdf, other

    cs.RO cs.AI cs.LG

    Learning to Design and Use Tools for Robotic Manipulation

    Authors: Ziang Liu, Stephen Tian, Michelle Guo, C. Karen Liu, Jiajun Wu

    Abstract: When limited by their own morphologies, humans and some species of animals have the remarkable ability to use objects from the environment toward accomplishing otherwise impossible tasks. Robots might similarly unlock a range of additional capabilities through tool use. Recent techniques for jointly optimizing morphology and control via deep learning are effective at designing locomotion agents. B… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: First two authors contributed equally. Accepted at CoRL 2023

  48. arXiv:2311.00750  [pdf, other

    cs.CV cs.AI cs.LG

    Are These the Same Apple? Comparing Images Based on Object Intrinsics

    Authors: Klemen Kotar, Stephen Tian, Hong-Xing Yu, Daniel L. K. Yamins, Jiajun Wu

    Abstract: The human visual system can effortlessly recognize an object under different extrinsic factors such as lighting, object poses, and background, yet current computer vision systems often struggle with these variations. An important step to understanding and improving artificial vision systems is to measure image similarity purely based on intrinsic object properties that define object identity. This… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: First two authors contributed equally. Accepted at NeurIPS Datasets and Benchmarks Track 2023

  49. arXiv:2310.13409  [pdf, other

    cs.CL

    Explicit Alignment and Many-to-many Entailment Based Reasoning for Conversational Machine Reading

    Authors: Yangyang Luo, Shiyu Tian, Caixia Yuan, Xiaojie Wang

    Abstract: Conversational Machine Reading (CMR) requires answering a user's initial question through multi-turn dialogue interactions based on a given document. Although there exist many effective methods, they largely neglected the alignment between the document and the user-provided information, which significantly affects the intermediate decision-making and subsequent follow-up question generation. To ad… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Journal ref: EMNLP2023 Findings

  50. arXiv:2310.08864  [pdf, other

    cs.RO

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

    Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More

    Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Project website: https://robotics-transformer-x.github.io