Skip to main content

Showing 1–50 of 74 results for author: Fang, K

  1. arXiv:2410.14940  [pdf, other

    cs.LG cs.CL

    Baichuan Alignment Technical Report

    Authors: Mingan Lin, Fan Yang, Yanjun Shen, Haoze Sun, Tianpeng Li, Tao Zhang, Chenzheng Zhu, Tao Zhang, Miao Zheng, Xu Li, Yijie Zhou, Mingyang Chen, Yanzhao Qin, Youquan Li, Hao Liang, Fei Li, Yadong Li, Mang Wang, Guosheng Dong, Kun Fang, Jianhua Xu, Bin Cui, Wentao Zhang, Zenan Zhou, Weipeng Chen

    Abstract: We introduce Baichuan Alignment, a detailed analysis of the alignment techniques employed in the Baichuan series of models. This represents the industry's first comprehensive account of alignment methodologies, offering valuable insights for advancing AI research. We investigate the critical components that enhance model performance during the alignment process, including optimization methods, dat… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

  2. arXiv:2410.14547  [pdf, other

    quant-ph cs.IT

    Surpassing the fundamental limits of distillation with catalysts

    Authors: Kun Fang, Zi-Wen Liu

    Abstract: Quantum resource distillation is a fundamental task in quantum information science. Minimizing the distillation overhead, i.e., the amount of noisy source states required to produce some desired output state within some target error, is crucial for the scalability of quantum computation and communication. Here, we show that quantum catalysts -- an additional resource that facilitates the transform… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

    Comments: 13 pages, 3 figures; comments are welcome

  3. arXiv:2410.12376  [pdf, other

    cs.AI

    ShapefileGPT: A Multi-Agent Large Language Model Framework for Automated Shapefile Processing

    Authors: Qingming Lin, Rui Hu, Huaxia Li, Sensen Wu, Yadong Li, Kai Fang, Hailin Feng, Zhenhong Du, Liuchang Xu

    Abstract: Vector data is one of the two core data structures in geographic information science (GIS), essential for accurately storing and representing geospatial information. Shapefile, the most widely used vector data format, has become the industry standard supported by all major geographic information systems. However, processing this data typically requires specialized GIS knowledge and skills, creatin… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  4. arXiv:2410.11623  [pdf, other

    cs.CV cs.AI cs.CL

    VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI

    Authors: Sijie Cheng, Kechen Fang, Yangyang Yu, Sicheng Zhou, Bohao Li, Ye Tian, Tingguang Li, Lei Han, Yang Liu

    Abstract: Recent advancements in Multi-modal Large Language Models (MLLMs) have opened new avenues for applications in Embodied AI. Building on previous work, EgoThink, we introduce VidEgoThink, a comprehensive benchmark for evaluating egocentric video understanding capabilities. To bridge the gap between MLLMs and low-level control in Embodied AI, we design four key interrelated tasks: video question-answe… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  5. arXiv:2410.05262  [pdf, other

    cs.CL

    TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles

    Authors: Qingchen Yu, Shichao Song, Ke Fang, Yunfeng Shi, Zifan Zheng, Hanyu Wang, Simin Niu, Zhiyu Li

    Abstract: As the application of Large Language Models (LLMs) expands, the demand for reliable evaluations increases. Existing LLM evaluation benchmarks primarily rely on static datasets, making it challenging to assess model performance in dynamic interactions with users. Moreover, these benchmarks often depend on specific background knowledge, complicating the measurement of a model's logical reasoning cap… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: 22 pages

  6. arXiv:2409.17126  [pdf, other

    cs.RO cs.AI cs.LG

    Blox-Net: Generative Design-for-Robot-Assembly Using VLM Supervision, Physics Simulation, and a Robot with Reset

    Authors: Andrew Goldberg, Kavish Kondap, Tianshuang Qiu, Zehan Ma, Letian Fu, Justin Kerr, Huang Huang, Kaiyuan Chen, Kuan Fang, Ken Goldberg

    Abstract: Generative AI systems have shown impressive capabilities in creating text, code, and images. Inspired by the rich history of research in industrial ''Design for Assembly'', we introduce a novel problem: Generative Design-for-Robot-Assembly (GDfRA). The task is to generate an assembly based on a natural language prompt (e.g., ''giraffe'') and an image of available physical components, such as 3D-pr… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: 8 pages, 7 Figures

  7. arXiv:2409.14066  [pdf, other

    cs.RO cs.AI cs.LG

    KALIE: Fine-Tuning Vision-Language Models for Open-World Manipulation without Robot Data

    Authors: Grace Tang, Swetha Rajkumar, Yifei Zhou, Homer Rich Walke, Sergey Levine, Kuan Fang

    Abstract: Building generalist robotic systems involves effectively endowing robots with the capabilities to handle novel objects in an open-world setting. Inspired by the advances of large pre-trained models, we propose Keypoint Affordance Learning from Imagined Environments (KALIE), which adapts pre-trained Vision Language Models (VLMs) for robotic control in a scalable manner. Instead of directly producin… ▽ More

    Submitted 21 September, 2024; originally announced September 2024.

    Comments: 8 pages, 7 figures

  8. arXiv:2409.10094  [pdf, other

    cs.CV cs.LG

    DDoS: Diffusion Distribution Similarity for Out-of-Distribution Detection

    Authors: Kun Fang, Qinghua Tao, Zuopeng Yang, Xiaolin Huang, Jie Yang

    Abstract: Out-of-Distribution (OoD) detection determines whether the given samples are from the training distribution of the classifier-under-protection, i.e., the In-Distribution (InD), or from a different OoD. Latest researches introduce diffusion models pre-trained on InD data to advocate OoD detection by transferring an OoD image into a generated one that is close to InD, so that one could capture the d… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

  9. arXiv:2408.16228  [pdf, other

    cs.RO cs.LG

    Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation

    Authors: Vivek Myers, Bill Chunyuan Zheng, Oier Mees, Sergey Levine, Kuan Fang

    Abstract: Learned language-conditioned robot policies often struggle to effectively adapt to new real-world tasks even when pre-trained across a diverse set of instructions. We propose a novel approach for few-shot adaptation to unseen tasks that exploits the semantic understanding of task decomposition provided by vision-language models (VLMs). Our method, Policy Adaptation via Language Optimization (PALO)… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 27 pages, 14 figures

  10. arXiv:2408.01258  [pdf, other

    cs.RO

    Jacta: A Versatile Planner for Learning Dexterous and Whole-body Manipulation

    Authors: Jan Brüdigam, Ali-Adeeb Abbas, Maks Sorokin, Kuan Fang, Brandon Hung, Maya Guru, Stefan Sosnowski, Jiuguang Wang, Sandra Hirche, Simon Le Cleac'h

    Abstract: Robotic manipulation is challenging due to discontinuous dynamics, as well as high-dimensional state and action spaces. Data-driven approaches that succeed in manipulation tasks require large amounts of data and expert demonstrations, typically from humans. Existing manipulation planners are restricted to specific systems and often depend on specialized algorithms for using demonstration. Therefor… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  11. arXiv:2407.10341  [pdf, other

    cs.RO cs.AI cs.LG

    Affordance-Guided Reinforcement Learning via Visual Prompting

    Authors: Olivia Y. Lee, Annie Xie, Kuan Fang, Karl Pertsch, Chelsea Finn

    Abstract: Robots equipped with reinforcement learning (RL) have the potential to learn a wide range of skills solely from a reward signal. However, obtaining a robust and dense reward signal for general manipulation tasks remains a challenge. Existing learning-based approaches require significant data, such as human demonstrations of success and failure, to learn task-specific reward functions. Recently, th… ▽ More

    Submitted 1 October, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

    Comments: 8 pages, 6 figures. Robotics: Science and Systems (RSS) 2024, Task Specification for General-Purpose Intelligent Robots & Lifelong Robot Learning Workshops

  12. arXiv:2407.06027  [pdf, other

    cs.CL

    PAS: Data-Efficient Plug-and-Play Prompt Augmentation System

    Authors: Miao Zheng, Hao Liang, Fan Yang, Haoze Sun, Tianpeng Li, Lingchu Xiong, Yan Zhang, Youzhen Wu, Kun Li, Yanjun Shen, Mingan Lin, Tao Zhang, Guosheng Dong, Yujing Qiao, Kun Fang, Weipeng Chen, Bin Cui, Wentao Zhang, Zenan Zhou

    Abstract: In recent years, the rise of Large Language Models (LLMs) has spurred a growing demand for plug-and-play AI systems. Among the various AI techniques, prompt engineering stands out as particularly significant. However, users often face challenges in writing prompts due to the steep learning curve and significant time investment, and existing automatic prompt engineering (APE) models can be difficul… ▽ More

    Submitted 7 August, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  13. arXiv:2406.05654  [pdf, other

    cs.CL cs.IR

    DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation

    Authors: Shuting Wang, Jiongnan Liu, Shiren Song, Jiehan Cheng, Yuqi Fu, Peidong Guo, Kun Fang, Yutao Zhu, Zhicheng Dou

    Abstract: Retrieval-Augmented Generation (RAG) offers a promising solution to address various limitations of Large Language Models (LLMs), such as hallucination and difficulties in keeping up with real-time updates. This approach is particularly critical in expert and domain-specific applications where LLMs struggle to cover expert knowledge. Therefore, evaluating RAG models in such scenarios is crucial, ye… ▽ More

    Submitted 16 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  14. arXiv:2404.00357  [pdf, other

    cs.LG

    Revisiting Random Weight Perturbation for Efficiently Improving Generalization

    Authors: Tao Li, Qinghua Tao, Weihao Yan, Zehao Lei, Yingwen Wu, Kun Fang, Mingzhen He, Xiaolin Huang

    Abstract: Improving the generalization ability of modern deep neural networks (DNNs) is a fundamental challenge in machine learning. Two branches of methods have been proposed to seek flat minima and improve generalization: one led by sharpness-aware minimization (SAM) minimizes the worst-case neighborhood loss through adversarial weight perturbation (AWP), and the other minimizes the expected Bayes objecti… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: Accepted to TMLR 2024

  15. arXiv:2403.03174  [pdf, other

    cs.RO cs.AI

    MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting

    Authors: Fangchen Liu, Kuan Fang, Pieter Abbeel, Sergey Levine

    Abstract: Open-world generalization requires robotic systems to have a profound understanding of the physical world and the user command to solve diverse and complex tasks. While the recent advancement in vision-language models (VLMs) has offered unprecedented opportunities to solve open-world problems, how to leverage their capabilities to control robots remains a grand challenge. In this paper, we introdu… ▽ More

    Submitted 3 September, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  16. arXiv:2402.12052  [pdf, other

    cs.CL

    Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs

    Authors: Jiejun Tan, Zhicheng Dou, Yutao Zhu, Peidong Guo, Kun Fang, Ji-Rong Wen

    Abstract: The integration of large language models (LLMs) and search engines represents a significant evolution in knowledge acquisition methodologies. However, determining the knowledge that an LLM already possesses and the knowledge that requires the help of a search engine remains an unresolved issue. Most existing methods solve this problem through the results of preliminary answers or reasoning done by… ▽ More

    Submitted 30 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024 main conference. Repo: https://github.com/plageon/SlimPLM

  17. arXiv:2402.02949  [pdf, other

    cs.LG stat.ML

    Kernel PCA for Out-of-Distribution Detection

    Authors: Kun Fang, Qinghua Tao, Kexin Lv, Mingzhen He, Xiaolin Huang, Jie Yang

    Abstract: Out-of-Distribution (OoD) detection is vital for the reliability of Deep Neural Networks (DNNs). Existing works have shown the insufficiency of Principal Component Analysis (PCA) straightforwardly applied on the features of DNNs in detecting OoD data from In-Distribution (InD) data. The failure of PCA suggests that the network features residing in OoD and InD are not well separated by simply proce… ▽ More

    Submitted 20 October, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted by NeurIPS 2024

  18. arXiv:2312.12478  [pdf, other

    cs.CV

    ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval

    Authors: Kaipeng Fang, Jingkuan Song, Lianli Gao, Pengpeng Zeng, Zhi-Qi Cheng, Xiyao Li, Heng Tao Shen

    Abstract: The goal of Universal Cross-Domain Retrieval (UCDR) is to achieve robust performance in generalized test scenarios, wherein data may belong to strictly unknown domains and categories during training. Recently, pre-trained models with prompt tuning have shown strong generalization capabilities and attained noteworthy achievements in various downstream tasks, such as few-shot learning and video-text… ▽ More

    Submitted 29 February, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

  19. arXiv:2311.15596  [pdf, other

    cs.CV cs.CL

    EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models

    Authors: Sijie Cheng, Zhicheng Guo, Jingwen Wu, Kechen Fang, Peng Li, Huaping Liu, Yang Liu

    Abstract: Vision-language models (VLMs) have recently shown promising results in traditional downstream tasks. Evaluation studies have emerged to assess their abilities, with the majority focusing on the third-person perspective, and only a few addressing specific tasks from the first-person perspective. However, the capability of VLMs to "think" from a first-person perspective, a crucial attribute for adva… ▽ More

    Submitted 28 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  20. arXiv:2310.18738  [pdf, other

    cs.CL cs.LG

    TLM: Token-Level Masking for Transformers

    Authors: Yangjun Wu, Kebin Fang, Dongxiang Zhang, Han Wang, Hao Zhang, Gang Chen

    Abstract: Structured dropout approaches, such as attention dropout and DropHead, have been investigated to regularize the multi-head attention mechanism in Transformers. In this paper, we propose a new regularization scheme based on token-level rather than structure-level to reduce overfitting. Specifically, we devise a novel Token-Level Masking (TLM) training strategy for Transformers to regularize the con… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Comments: 13 pages. Accepted by EMNLP2023 main conference

  21. arXiv:2310.18026  [pdf, other

    quant-ph cs.PL math.CO

    Symmetry-Based Quantum Circuit Mapping

    Authors: Di Yu, Kun Fang

    Abstract: Quantum circuit mapping is a crucial process in the quantum circuit compilation pipeline, facilitating the transformation of a logical quantum circuit into a list of instructions directly executable on a target quantum system. Recent research has introduced a post-compilation step known as remapping, which seeks to reconfigure the initial circuit mapping to mitigate quantum circuit errors arising… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: 10 pages, 5 figures; comments are welcome

  22. arXiv:2310.15896  [pdf, other

    cs.CL cs.HC

    BianQue: Balancing the Questioning and Suggestion Ability of Health LLMs with Multi-turn Health Conversations Polished by ChatGPT

    Authors: Yirong Chen, Zhenyu Wang, Xiaofen Xing, huimin zheng, Zhipei Xu, Kai Fang, Junhong Wang, Sihang Li, Jieling Wu, Qi Liu, Xiangmin Xu

    Abstract: Large language models (LLMs) have performed well in providing general and extensive health suggestions in single-turn conversations, exemplified by systems such as ChatGPT, ChatGLM, ChatDoctor, DoctorGLM, and etc. However, the limited information provided by users during single turn results in inadequate personalization and targeting of the generated suggestions, which requires users to independen… ▽ More

    Submitted 4 December, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

  23. Revisiting Deep Ensemble for Out-of-Distribution Detection: A Loss Landscape Perspective

    Authors: Kun Fang, Qinghua Tao, Xiaolin Huang, Jie Yang

    Abstract: Existing Out-of-Distribution (OoD) detection methods address to detect OoD samples from In-Distribution (InD) data mainly by exploring differences in features, logits and gradients in Deep Neural Networks (DNNs). We in this work propose a new perspective upon loss landscape and mode ensemble to investigate OoD detection. In the optimization of DNNs, there exist many local optima in the parameter s… ▽ More

    Submitted 15 July, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

    Comments: published in International Journal of Computer Vision

  24. arXiv:2310.11021  [pdf, other

    quant-ph cs.PL

    Dynamic quantum circuit compilation

    Authors: Kun Fang, Munan Zhang, Ruqi Shi, Yinan Li

    Abstract: Quantum computing has shown tremendous promise in addressing complex computational problems, yet its practical realization is hindered by the limited availability of qubits for computation. Recent advancements in quantum hardware have introduced mid-circuit measurements and resets, enabling the reuse of measured qubits and significantly reducing the qubit requirements for executing quantum algorit… ▽ More

    Submitted 21 November, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: 51 pages, 32 figures; comments are welcome; v2 reorganize the writing and strengthen the results

  25. arXiv:2310.08864  [pdf, other

    cs.RO

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

    Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More

    Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Project website: https://robotics-transformer-x.github.io

  26. arXiv:2309.10305  [pdf, other

    cs.CL

    Baichuan 2: Open Large-scale Language Models

    Authors: Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, Fan Yang, Fei Deng, Feng Wang, Feng Liu, Guangwei Ai, Guosheng Dong, Haizhou Zhao, Hang Xu, Haoze Sun, Hongda Zhang, Hui Liu, Jiaming Ji, Jian Xie, JunTao Dai, Kun Fang , et al. (30 additional authors not shown)

    Abstract: Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering. However, most powerful LLMs are closed-source or limited in their capability for languages other than English. In this technical report, we present Baichuan 2, a series of lar… ▽ More

    Submitted 20 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: Baichuan 2 technical report. Github: https://github.com/baichuan-inc/Baichuan2

  27. arXiv:2308.12952  [pdf, other

    cs.RO cs.LG

    BridgeData V2: A Dataset for Robot Learning at Scale

    Authors: Homer Walke, Kevin Black, Abraham Lee, Moo Jin Kim, Max Du, Chongyi Zheng, Tony Zhao, Philippe Hansen-Estruch, Quan Vuong, Andre He, Vivek Myers, Kuan Fang, Chelsea Finn, Sergey Levine

    Abstract: We introduce BridgeData V2, a large and diverse dataset of robotic manipulation behaviors designed to facilitate research on scalable robot learning. BridgeData V2 contains 60,096 trajectories collected across 24 environments on a publicly available low-cost robot. BridgeData V2 provides extensive task and environment variability, leading to skills that can generalize across environments, domains,… ▽ More

    Submitted 17 January, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

    Comments: 9 pages

  28. arXiv:2308.12915  [pdf, other

    cs.HC cs.AI

    Language as Reality: A Co-Creative Storytelling Game Experience in 1001 Nights using Generative AI

    Authors: Yuqian Sun, Zhouyi Li, Ke Fang, Chang Hee Lee, Ali Asadipour

    Abstract: In this paper, we present "1001 Nights", an AI-native game that allows players lead in-game reality through co-created storytelling with the character driven by large language model. The concept is inspired by Wittgenstein's idea of the limits of one's world being determined by the bounds of their language. Using advanced AI tools like GPT-4 and Stable Diffusion, the second iteration of the game e… ▽ More

    Submitted 18 September, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

    Comments: The paper was accepted by The 19th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE 23)

  29. arXiv:2307.08927  [pdf, other

    cs.RO cs.AI

    Multi-Stage Cable Routing through Hierarchical Imitation Learning

    Authors: Jianlan Luo, Charles Xu, Xinyang Geng, Gilbert Feng, Kuan Fang, Liam Tan, Stefan Schaal, Sergey Levine

    Abstract: We study the problem of learning to perform multi-stage robotic manipulation tasks, with applications to cable routing, where the robot must route a cable through a series of clips. This setting presents challenges representative of complex multi-stage robotic manipulation scenarios: handling deformable objects, closing the loop on visual perception, and handling extended behaviors consisting of m… ▽ More

    Submitted 13 January, 2024; v1 submitted 17 July, 2023; originally announced July 2023.

    Comments: T-RO 2024

  30. arXiv:2307.00117  [pdf, other

    cs.RO cs.LG

    Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control

    Authors: Vivek Myers, Andre He, Kuan Fang, Homer Walke, Philippe Hansen-Estruch, Ching-An Cheng, Mihai Jalobeanu, Andrey Kolobov, Anca Dragan, Sergey Levine

    Abstract: Our goal is for robots to follow natural language instructions like "put the towel next to the microwave." But getting large amounts of labeled data, i.e. data that contains demonstrations of tasks labeled with the language instruction, is prohibitive. In contrast, obtaining policies that respond to image goals is much easier, because any autonomous trial or demonstration can be labeled in hindsig… ▽ More

    Submitted 17 August, 2023; v1 submitted 30 June, 2023; originally announced July 2023.

    Comments: 15 pages, 5 figures

  31. arXiv:2306.03346  [pdf, other

    cs.LG cs.AI

    Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from Offline Data

    Authors: Chongyi Zheng, Benjamin Eysenbach, Homer Walke, Patrick Yin, Kuan Fang, Ruslan Salakhutdinov, Sergey Levine

    Abstract: Robotic systems that rely primarily on self-supervised learning have the potential to decrease the amount of human annotation and engineering effort required to learn control strategies. In the same way that prior robotic systems have leveraged self-supervised techniques from computer vision (CV) and natural language processing (NLP), our work builds on prior work showing that the reinforcement le… ▽ More

    Submitted 25 February, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: ICLR 2024 Spotlight (< 5%). Website (https://chongyi-zheng.github.io/stable_contrastive_rl) and code (https://github.com/chongyi-zheng/stable_contrastive_rl)

  32. SigRec: Automatic Recovery of Function Signatures in Smart Contracts

    Authors: Ting Chen, Zihao Li, Xiapu Luo, Xiaofeng Wang, Ting Wang, Zheyuan He, Kezhao Fang, Yufei Zhang, Hang Zhu, Hongwei Li, Yan Cheng, Xiaosong Zhang

    Abstract: Millions of smart contracts have been deployed onto Ethereum for providing various services, whose functions can be invoked. For this purpose, the caller needs to know the function signature of a callee, which includes its function id and parameter types. Such signatures are critical to many applications focusing on smart contracts, e.g., reverse engineering, fuzzing, attack detection, and profili… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

  33. arXiv:2304.05148  [pdf, other

    cs.AR cs.OS

    High-performance and Scalable Software-based NVMe Virtualization Mechanism with I/O Queues Passthrough

    Authors: Yiquan Chen, Zhen Jin, Yijing Wang, Yi Chen, Hao Yu, Jiexiong Xu, Jinlong Chen, Wenhai Lin, Kanghua Fang, Chengkun Wei, Qiang Liu, Yuan Xie, Wenzhi Chen

    Abstract: NVMe(Non-Volatile Memory Express) is an industry standard for solid-state drives (SSDs) that has been widely adopted in data centers. NVMe virtualization is crucial in cloud computing as it allows for virtualized NVMe devices to be used by virtual machines (VMs), thereby improving the utilization of storage resources. However, traditional software-based solutions have flexibility benefits but ofte… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  34. arXiv:2301.04027  [pdf

    cs.LG cs.CE physics.ao-ph physics.geo-ph

    Differentiable modeling to unify machine learning and physical models and advance Geosciences

    Authors: Chaopeng Shen, Alison P. Appling, Pierre Gentine, Toshiyuki Bandai, Hoshin Gupta, Alexandre Tartakovsky, Marco Baity-Jesi, Fabrizio Fenicia, Daniel Kifer, Li Li, Xiaofeng Liu, Wei Ren, Yi Zheng, Ciaran J. Harman, Martyn Clark, Matthew Farthing, Dapeng Feng, Praveen Kumar, Doaa Aboelyazeed, Farshid Rahmani, Hylke E. Beck, Tadd Bindas, Dipankar Dwivedi, Kuai Fang, Marvin Höge , et al. (5 additional authors not shown)

    Abstract: Process-Based Modeling (PBM) and Machine Learning (ML) are often perceived as distinct paradigms in the geosciences. Here we present differentiable geoscientific modeling as a powerful pathway toward dissolving the perceived barrier between them and ushering in a paradigm shift. For decades, PBM offered benefits in interpretability and physical consistency but struggled to efficiently leverage lar… ▽ More

    Submitted 26 December, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

    Journal ref: Nat Rev Earth Environ 4, 552-567 (2023)

  35. Quantum NETwork: from theory to practice

    Authors: Kun Fang, Jingtian Zhao, Xiufan Li, Yifei Li, Runyao Duan

    Abstract: The quantum internet is envisioned as the ultimate stage of the quantum revolution, which surpasses its classical counterpart in various aspects, such as the efficiency of data transmission, the security of network services, and the capability of information processing. Given its disruptive impact on the national security and the digital economy, a global race to build scalable quantum networks ha… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

    Comments: 36 pages, 33 figures; comments are welcome

    Journal ref: Sci China Inf Sci, 2023, 66: 180509

  36. arXiv:2211.11489  [pdf, other

    cs.CV cs.LG

    Efficient Generalization Improvement Guided by Random Weight Perturbation

    Authors: Tao Li, Weihao Yan, Zehao Lei, Yingwen Wu, Kun Fang, Ming Yang, Xiaolin Huang

    Abstract: To fully uncover the great potential of deep neural networks (DNNs), various learning algorithms have been developed to improve the model's generalization ability. Recently, sharpness-aware minimization (SAM) establishes a generic scheme for generalization improvements by minimizing the sharpness measure within a small neighborhood and achieves state-of-the-art performance. However, SAM requires t… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  37. arXiv:2211.10882  [pdf, other

    cs.LG cs.CV

    On Multi-head Ensemble of Smoothed Classifiers for Certified Robustness

    Authors: Kun Fang, Qinghua Tao, Yingwen Wu, Tao Li, Xiaolin Huang, Jie Yang

    Abstract: Randomized Smoothing (RS) is a promising technique for certified robustness, and recently in RS the ensemble of multiple deep neural networks (DNNs) has shown state-of-the-art performances. However, such an ensemble brings heavy computation burdens in both training and certification, and yet under-exploits individual DNNs and their mutual effects, as the communication between these classifiers is… ▽ More

    Submitted 20 November, 2022; originally announced November 2022.

  38. arXiv:2211.06134  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Active Task Randomization: Learning Robust Skills via Unsupervised Generation of Diverse and Feasible Tasks

    Authors: Kuan Fang, Toki Migimatsu, Ajay Mandlekar, Li Fei-Fei, Jeannette Bohg

    Abstract: Solving real-world manipulation tasks requires robots to have a repertoire of skills applicable to a wide range of circumstances. When using learning-based methods to acquire such skills, the key challenge is to obtain training data that covers diverse and feasible variations of the task, which often requires non-trivial manual labor and domain knowledge. In this work, we introduce Active Task Ran… ▽ More

    Submitted 18 April, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

    Comments: 9 pages, 5 figures

  39. arXiv:2211.04699  [pdf, other

    cs.CL

    FF2: A Feature Fusion Two-Stream Framework for Punctuation Restoration

    Authors: Yangjun Wu, Kebin Fang, Yao Zhao, Hao Zhang, Lifeng Shi, Mengqi Zhang

    Abstract: To accomplish punctuation restoration, most existing methods focus on introducing extra information (e.g., part-of-speech) or addressing the class imbalance problem. Recently, large-scale transformer-based pre-trained language models (PLMS) have been utilized widely and obtained remarkable success. However, the PLMS are trained on the large dataset with marks, which may not fit well with the small… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

    Comments: 5pages. arXiv admin note: substantial text overlap with arXiv:2203.12487

  40. arXiv:2210.10537   

    cs.CV cs.AI

    Online LiDAR-Camera Extrinsic Parameters Self-checking

    Authors: Pengjin Wei, Guohang Yan, Yikang Li, Kun Fang, Jie Yang, Wei Liu

    Abstract: With the development of neural networks and the increasing popularity of automatic driving, the calibration of the LiDAR and the camera has attracted more and more attention. This calibration task is multi-modal, where the rich color and texture information captured by the camera and the accurate three-dimensional spatial information from the LiDAR is incredibly significant for downstream tasks. C… ▽ More

    Submitted 14 January, 2024; v1 submitted 19 October, 2022; originally announced October 2022.

    Comments: There are some errors in the methodology section of the paper, which is currently being revised

  41. arXiv:2210.06601  [pdf, other

    cs.RO cs.AI cs.LG

    Generalization with Lossy Affordances: Leveraging Broad Offline Data for Learning Visuomotor Tasks

    Authors: Kuan Fang, Patrick Yin, Ashvin Nair, Homer Walke, Gengchen Yan, Sergey Levine

    Abstract: The utilization of broad datasets has proven to be crucial for generalization for a wide range of fields. However, how to effectively make use of diverse multi-task data for novel downstream tasks still remains a grand challenge in robotics. To tackle this challenge, we introduce a framework that acquires goal-conditioned policies for unseen temporally extended tasks via offline reinforcement lear… ▽ More

    Submitted 18 April, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: CoRL 2022

  42. arXiv:2208.06228  [pdf, other

    stat.ML cs.CR cs.LG

    Unifying Gradients to Improve Real-world Robustness for Deep Networks

    Authors: Yingwen Wu, Sizhe Chen, Kun Fang, Xiaolin Huang

    Abstract: The wide application of deep neural networks (DNNs) demands an increasing amount of attention to their real-world robustness, i.e., whether a DNN resists black-box adversarial attacks, among which score-based query attacks (SQAs) are most threatening since they can effectively hurt a victim network with the only access to model outputs. Defending against SQAs requires a slight but artful variation… ▽ More

    Submitted 24 August, 2023; v1 submitted 12 August, 2022; originally announced August 2022.

    Journal ref: ACM Transactions on Intelligent Systems and Technology (TIST), 2023

  43. arXiv:2205.08129  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space

    Authors: Kuan Fang, Patrick Yin, Ashvin Nair, Sergey Levine

    Abstract: General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments. To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach configurable goals for a wide range of tasks on command. However, such goal-conditioned policies are notoriously difficult and time-consuming to train from scratc… ▽ More

    Submitted 18 April, 2023; v1 submitted 17 May, 2022; originally announced May 2022.

  44. arXiv:2203.12487  [pdf, other

    cs.CL cs.SD eess.AS

    A Context-Aware Feature Fusion Framework for Punctuation Restoration

    Authors: Yangjun Wu, Kebin Fang, Yao Zhao

    Abstract: To accomplish the punctuation restoration task, most existing approaches focused on leveraging extra information (e.g., part-of-speech tags) or addressing the class imbalance problem. Recent works have widely applied the transformer-based language models and significantly improved their effectiveness. To the best of our knowledge, an inherent issue has remained neglected: the attention of individu… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.

  45. arXiv:2203.03182  [pdf, other

    cs.CV

    CROON: Automatic Multi-LiDAR Calibration and Refinement Method in Road Scene

    Authors: Pengjin Wei, Guohang Yan, Yikang Li, Kun Fang, Xinyu Cai, Jie Yang, Wei Liu

    Abstract: Sensor-based environmental perception is a crucial part of the autonomous driving system. In order to get an excellent perception of the surrounding environment, an intelligent system would configure multiple LiDARs (3D Light Detection and Ranging) to cover the distant and near space of the car. The precision of perception relies on the quality of sensor calibration. This research aims at developi… ▽ More

    Submitted 13 November, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: 7 pages, 5 figures

  46. arXiv:2112.08686  [pdf, other

    cs.NI

    Ruta: Dis-aggregated routing system over multi-cloud

    Authors: Kevin Fang

    Abstract: Over the years, the SDN evolution create multiple overlay technologies which is inefficient and hard to deploy end-to-end traffic engineering services, Ruta is designed as an unified encapsulation with Segment Routing, Crypto and NAT-Traversal capabilities over UDP. Ruta could be deployed as a cloud native SDN platform globally over multi-cloud and integrated with each applications on transport… ▽ More

    Submitted 9 January, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

  47. arXiv:2111.12229  [pdf, other

    cs.LG

    Subspace Adversarial Training

    Authors: Tao Li, Yingwen Wu, Sizhe Chen, Kun Fang, Xiaolin Huang

    Abstract: Single-step adversarial training (AT) has received wide attention as it proved to be both efficient and robust. However, a serious problem of catastrophic overfitting exists, i.e., the robust accuracy against projected gradient descent (PGD) attack suddenly drops to 0% during the training. In this paper, we approach this problem from a novel perspective of optimization and firstly reveal the close… ▽ More

    Submitted 21 March, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

    Comments: CVPR2022

  48. arXiv:2110.14902  [pdf, other

    cs.DC

    NetDAM: Network Direct Attached Memory with Programmable In-Memory Computing ISA

    Authors: Kevin Fang, David Peng

    Abstract: Data-intensive applications like distributed AI-training may require multi-terabytes memory capacity with multi-terabits bandwidth. We directly attach the memory to the ethernet controller with some programable logic to design an efficient hardware "template" for Memory pooling and in-memory / in-network computing. We built an FPGA prototype of the NetDAM, andwe demonstrate MPI-Allreduce communica… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

  49. arXiv:2110.14842  [pdf, other

    quant-ph cs.IT math-ph math.ST

    Towards the ultimate limits of quantum channel discrimination

    Authors: Kun Fang, Gilad Gour, Xin Wang

    Abstract: This note studies the difficulty of discriminating quantum channels under operational regimes. First, we make a conjecture on the exponentially strong converse of quantum channel hypothesis testing under coherent strategies, meaning that any strategy to make the Type II error decays with an exponent larger than the regularized channel relative entropy will unavoidably result in the Type I error co… ▽ More

    Submitted 1 March, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

    Comments: comments are welcome

  50. arXiv:2106.13935  [pdf, other

    cs.RO cs.AI cs.LG

    Discovering Generalizable Skills via Automated Generation of Diverse Tasks

    Authors: Kuan Fang, Yuke Zhu, Silvio Savarese, Li Fei-Fei

    Abstract: The learning efficiency and generalization ability of an intelligent agent can be greatly improved by utilizing a useful set of skills. However, the design of robot skills can often be intractable in real-world applications due to the prohibitive amount of effort and expertise that it requires. In this work, we introduce Skill Learning In Diversified Environments (SLIDE), a method to discover gene… ▽ More

    Submitted 25 June, 2021; originally announced June 2021.

    Comments: RSS 2021