subscribe to arXiv mailings

Correspondence-Guided SfM-Free 3D Gaussian Splatting for NVS

Authors: Wei Sun, Xiaosong Zhang, Fang Wan, Yanzhao Zhou, Yuan Li, Qixiang Ye, Jianbin Jiao

Abstract: Novel View Synthesis (NVS) without Structure-from-Motion (SfM) pre-processed camera poses--referred to as SfM-free methods--is crucial for promoting rapid response capabilities and enhancing robustness against variable operating conditions. Recent SfM-free methods have integrated pose optimization, designing end-to-end frameworks for joint camera pose estimation and NVS. However, most existing wor… ▽ More Novel View Synthesis (NVS) without Structure-from-Motion (SfM) pre-processed camera poses--referred to as SfM-free methods--is crucial for promoting rapid response capabilities and enhancing robustness against variable operating conditions. Recent SfM-free methods have integrated pose optimization, designing end-to-end frameworks for joint camera pose estimation and NVS. However, most existing works rely on per-pixel image loss functions, such as L2 loss. In SfM-free methods, inaccurate initial poses lead to misalignment issue, which, under the constraints of per-pixel image loss functions, results in excessive gradients, causing unstable optimization and poor convergence for NVS. In this study, we propose a correspondence-guided SfM-free 3D Gaussian splatting for NVS. We use correspondences between the target and the rendered result to achieve better pixel alignment, facilitating the optimization of relative poses between frames. We then apply the learned poses to optimize the entire scene. Each 2D screen-space pixel is associated with its corresponding 3D Gaussians through approximated surface rendering to facilitate gradient back propagation. Experimental results underline the superior performance and time efficiency of the proposed approach compared to the state-of-the-art baselines. △ Less

Submitted 16 August, 2024; originally announced August 2024.

Comments: arXiv admin note: text overlap with arXiv:2312.07504 by other authors

arXiv:2408.08563 [pdf, other]

Compact Efficient Polarizers for Relativistic Electron Beams

Authors: Kun Xue, Yue Cao, Feng Wan, Zhong-Peng Li, Qian Zhao, Si-Man Liu, Xin-Yu Liu, Li-Xiang Hu, Yong-Tao Zhao, Zhong-Feng Xu, Tong-Pu Yu, Jian-Xing Li

Abstract: Relativistic spin-polarized electron beams are important for fundamental research and the industry, but their generation currently requires conventional accelerators or ultrastrong laser facilities, limiting their accessibility and broad applications. Here, we put forward a novel method for constructing a compact efficient "polarizer" that achieves direct ultrafast conversion of relativistic dense… ▽ More Relativistic spin-polarized electron beams are important for fundamental research and the industry, but their generation currently requires conventional accelerators or ultrastrong laser facilities, limiting their accessibility and broad applications. Here, we put forward a novel method for constructing a compact efficient "polarizer" that achieves direct ultrafast conversion of relativistic dense electron beams into polarized ones, based on the beam "self-polarization" mechanism via simple beam-target interactions. In this scheme, as the electron beam grazes through the polarizer (a double-layer solid target), it ionizes the target and excites an asymmetric plasma field due to the plasma backflows. This field then reacts on the beam itself, triggering spontaneous radiative polarization and reflection of the beam, and ultimately yielding a dense polarized electron beam. Moreover, the double-layer target setup induces a plasma bubble that focuses the polarized beam and reshapes its polarization distribution. Our method is robust with respect to the beam and target parameters, and opens a new avenue for relativistic beam polarization with compact accessible devices, which would facilitate their broad applications and the development of related experiments, such as in strong-field QED studies, and polarized electron-positron and electron-ion colliders. △ Less

Submitted 18 September, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

arXiv:2408.07990 [pdf, other]

FuseChat: Knowledge Fusion of Chat Models

Authors: Fanqi Wan, Longguang Zhong, Ziyi Yang, Ruijun Chen, Xiaojun Quan

Abstract: While training large language models (LLMs) from scratch can indeed lead to models with distinct capabilities and strengths, it incurs substantial costs and may lead to redundancy in competencies. Knowledge fusion aims to integrate existing LLMs of diverse architectures and capabilities into a more potent LLM through lightweight continual training, thereby reducing the need for costly LLM developm… ▽ More While training large language models (LLMs) from scratch can indeed lead to models with distinct capabilities and strengths, it incurs substantial costs and may lead to redundancy in competencies. Knowledge fusion aims to integrate existing LLMs of diverse architectures and capabilities into a more potent LLM through lightweight continual training, thereby reducing the need for costly LLM development. In this work, we propose a new framework for the knowledge fusion of chat LLMs through two main stages, resulting in FuseChat. Firstly, we conduct pairwise knowledge fusion on source chat LLMs of varying structures and scales to create multiple target LLMs with identical structure and size via lightweight fine-tuning. During this process, a statistics-based token alignment approach is introduced as the cornerstone for fusing LLMs with different structures. Secondly, we merge these target LLMs within the parameter space, where we propose a novel method for determining the merging coefficients based on the magnitude of parameter updates before and after fine-tuning. We implement and validate FuseChat using six prominent chat LLMs with diverse architectures and scales, including OpenChat-3.5-7B, Starling-LM-7B-alpha, NH2-SOLAR-10.7B, InternLM2-Chat-20B, Mixtral-8x7B-Instruct, and Qwen-1.5-Chat-72B. Experimental results on two instruction-following benchmarks, AlpacaEval 2.0 and MT-Bench, demonstrate the superiority of FuseChat-7B over baselines of various sizes. Our model is even comparable to the larger Mixtral-8x7B-Instruct and approaches GPT-3.5-Turbo-1106 on MT-Bench. Our code, model weights, and data are public at \url{https://github.com/fanqiwan/FuseAI}. △ Less

Submitted 15 August, 2024; originally announced August 2024.

Comments: Work in progress

arXiv:2408.04998 [pdf, other]

ProFuser: Progressive Fusion of Large Language Models

Authors: Tianyuan Shi, Fanqi Wan, Canbin Huang, Xiaojun Quan, Chenliang Li, Ming Yan, Ji Zhang

Abstract: While fusing the capacities and advantages of various large language models (LLMs) offers a pathway to construct more powerful and versatile models, a fundamental challenge is to properly select advantageous model during the training. Existing fusion methods primarily focus on the training mode that uses cross entropy on ground truth in a teacher-forcing setup to measure a model's advantage, which… ▽ More While fusing the capacities and advantages of various large language models (LLMs) offers a pathway to construct more powerful and versatile models, a fundamental challenge is to properly select advantageous model during the training. Existing fusion methods primarily focus on the training mode that uses cross entropy on ground truth in a teacher-forcing setup to measure a model's advantage, which may provide limited insight towards model advantage. In this paper, we introduce a novel approach that enhances the fusion process by incorporating both the training and inference modes. Our method evaluates model advantage not only through cross entropy during training but also by considering inference outputs, providing a more comprehensive assessment. To combine the two modes effectively, we introduce ProFuser to progressively transition from inference mode to training mode. To validate ProFuser's effectiveness, we fused three models, including vicuna-7b-v1.5, Llama-2-7b-chat, and mpt-7b-8k-chat, and demonstrated the improved performance in knowledge, reasoning, and safety compared to baseline methods. △ Less

Submitted 9 August, 2024; originally announced August 2024.

arXiv:2407.16041 [pdf, other]

doi 10.1016/j.measurement.2024.115376

On Flange-based 3D Hand-Eye Calibration for Soft Robotic Tactile Welding

Authors: Xudong Han, Ning Guo, Yu Jie, He Wang, Fang Wan, Chaoyang Song

Abstract: This paper investigates the direct application of standardized designs on the robot for conducting robot hand-eye calibration by employing 3D scanners with collaborative robots. The well-established geometric features of the robot flange are exploited by directly capturing its point cloud data. In particular, an iterative method is proposed to facilitate point cloud processing toward a refined cal… ▽ More This paper investigates the direct application of standardized designs on the robot for conducting robot hand-eye calibration by employing 3D scanners with collaborative robots. The well-established geometric features of the robot flange are exploited by directly capturing its point cloud data. In particular, an iterative method is proposed to facilitate point cloud processing toward a refined calibration outcome. Several extensive experiments are conducted over a range of collaborative robots, including Universal Robots UR5 & UR10 e-series, Franka Emika, and AUBO i5 using an industrial-grade 3D scanner Photoneo Phoxi S & M and a commercial-grade 3D scanner Microsoft Azure Kinect DK. Experimental results show that translational and rotational errors converge efficiently to less than 0.28 mm and 0.25 degrees, respectively, achieving a hand-eye calibration accuracy as high as the camera's resolution, probing the hardware limit. A welding seam tracking system is presented, combining the flange-based calibration method with soft tactile sensing. The experiment results show that the system enables the robot to adjust its motion in real-time, ensuring consistent weld quality and paving the way for more efficient and adaptable manufacturing processes. △ Less

Submitted 27 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

Comments: 25 pages, 14 figures, 2 tables, Accepted by Measurement

arXiv:2407.12449 [pdf, other]

Close the Sim2real Gap via Physically-based Structured Light Synthetic Data Simulation

Authors: Kaixin Bai, Lei Zhang, Zhaopeng Chen, Fang Wan, Jianwei Zhang

Abstract: Despite the substantial progress in deep learning, its adoption in industrial robotics projects remains limited, primarily due to challenges in data acquisition and labeling. Previous sim2real approaches using domain randomization require extensive scene and model optimization. To address these issues, we introduce an innovative physically-based structured light simulation system, generating both… ▽ More Despite the substantial progress in deep learning, its adoption in industrial robotics projects remains limited, primarily due to challenges in data acquisition and labeling. Previous sim2real approaches using domain randomization require extensive scene and model optimization. To address these issues, we introduce an innovative physically-based structured light simulation system, generating both RGB and physically realistic depth images, surpassing previous dataset generation tools. We create an RGBD dataset tailored for robotic industrial grasping scenarios and evaluate it across various tasks, including object detection, instance segmentation, and embedding sim2real visual perception in industrial robotic grasping. By reducing the sim2real gap and enhancing deep learning training, we facilitate the application of deep learning models in industrial settings. Project details are available at https://baikaixinpublic.github.io/structured light 3D synthesizer/. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: 7 pages, 2024 IEEE International Conference on Robotics and Automation

arXiv:2407.01094 [pdf, other]

Evaluation of Text-to-Video Generation Models: A Dynamics Perspective

Authors: Mingxiang Liao, Hannan Lu, Xinyu Zhang, Fang Wan, Tianyu Wang, Yuzhong Zhao, Wangmeng Zuo, Qixiang Ye, Jingdong Wang

Abstract: Comprehensive and constructive evaluation protocols play an important role in the development of sophisticated text-to-video (T2V) generation models. Existing evaluation protocols primarily focus on temporal consistency and content continuity, yet largely ignore the dynamics of video content. Dynamics are an essential dimension for measuring the visual vividness and the honesty of video content to… ▽ More Comprehensive and constructive evaluation protocols play an important role in the development of sophisticated text-to-video (T2V) generation models. Existing evaluation protocols primarily focus on temporal consistency and content continuity, yet largely ignore the dynamics of video content. Dynamics are an essential dimension for measuring the visual vividness and the honesty of video content to text prompts. In this study, we propose an effective evaluation protocol, termed DEVIL, which centers on the dynamics dimension to evaluate T2V models. For this purpose, we establish a new benchmark comprising text prompts that fully reflect multiple dynamics grades, and define a set of dynamics scores corresponding to various temporal granularities to comprehensively evaluate the dynamics of each generated video. Based on the new benchmark and the dynamics scores, we assess T2V models with the design of three metrics: dynamics range, dynamics controllability, and dynamics-based quality. Experiments show that DEVIL achieves a Pearson correlation exceeding 90% with human ratings, demonstrating its potential to advance T2V generation models. Code is available at https://github.com/MingXiangL/DEVIL. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.01050 [pdf, other]

Evolutionary Morphology Towards Overconstrained Locomotion via Large-Scale, Multi-Terrain Deep Reinforcement Learning

Authors: Yenan Chen, Chuye Zhang, Pengxi Gu, Jianuo Qiu, Jiayi Yin, Nuofan Qiu, Guojing Huang, Bangchao Huang, Zishang Zhang, Hui Deng, Wei Zhang, Fang Wan, Chaoyang Song

Abstract: While the animals' Fin-to-Limb evolution has been well-researched in biology, such morphological transformation remains under-adopted in the modern design of advanced robotic limbs. This paper investigates a novel class of overconstrained locomotion from a design and learning perspective inspired by evolutionary morphology, aiming to integrate the concept of `intelligent design under constraints'… ▽ More While the animals' Fin-to-Limb evolution has been well-researched in biology, such morphological transformation remains under-adopted in the modern design of advanced robotic limbs. This paper investigates a novel class of overconstrained locomotion from a design and learning perspective inspired by evolutionary morphology, aiming to integrate the concept of `intelligent design under constraints' - hereafter referred to as constraint-driven design intelligence - in developing modern robotic limbs with superior energy efficiency. We propose a 3D-printable design of robotic limbs parametrically reconfigurable as a classical planar 4-bar linkage, an overconstrained Bennett linkage, and a spherical 4-bar linkage. These limbs adopt a co-axial actuation, identical to the modern legged robot platforms, with the added capability of upgrading into a wheel-legged system. Then, we implemented a large-scale, multi-terrain deep reinforcement learning framework to train these reconfigurable limbs for a comparative analysis of overconstrained locomotion in energy efficiency. Results show that the overconstrained limbs exhibit more efficient locomotion than planar limbs during forward and sideways walking over different terrains, including floors, slopes, and stairs, with or without random noises, by saving at least 22% mechanical energy in completing the traverse task, with the spherical limbs being the least efficient. It also achieves the highest average speed of 0.85 meters per second on flat terrain, which is 20% faster than the planar limbs. This study paves the path for an exciting direction for future research in overconstrained robotics leveraging evolutionary morphology and reconfigurable mechanism intelligence when combined with state-of-the-art methods in deep reinforcement learning. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 13 pages, 5 figures, Accepted and Presented at ReMAR2024

arXiv:2406.14136 [pdf, other]

One Fling to Goal: Environment-aware Dynamics for Goal-conditioned Fabric Flinging

Authors: Linhan Yang, Lei Yang, Haoran Sun, Zeqing Zhang, Haibin He, Fang Wan, Chaoyang Song, Jia Pan

Abstract: Fabric manipulation dynamically is commonly seen in manufacturing and domestic settings. While dynamically manipulating a fabric piece to reach a target state is highly efficient, this task presents considerable challenges due to the varying properties of different fabrics, complex dynamics when interacting with environments, and meeting required goal conditions. To address these challenges, we pr… ▽ More Fabric manipulation dynamically is commonly seen in manufacturing and domestic settings. While dynamically manipulating a fabric piece to reach a target state is highly efficient, this task presents considerable challenges due to the varying properties of different fabrics, complex dynamics when interacting with environments, and meeting required goal conditions. To address these challenges, we present \textit{One Fling to Goal}, an algorithm capable of handling fabric pieces with diverse shapes and physical properties across various scenarios. Our method learns a graph-based dynamics model equipped with environmental awareness. With this dynamics model, we devise a real-time controller to enable high-speed fabric manipulation in one attempt, requiring less than 3 seconds to finish the goal-conditioned task. We experimentally validate our method on a goal-conditioned manipulation task in five diverse scenarios. Our method significantly improves this goal-conditioned task, achieving an average error of 13.2mm in complex scenarios. Our method can be seamlessly transferred to real-world robotic systems and generalized to unseen scenarios in a zero-shot manner. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.10813 [pdf, other]

Self-Evolution Fine-Tuning for Policy Optimization

Authors: Ruijun Chen, Jiehao Liang, Shiping Gao, Fanqi Wan, Xiaojun Quan

Abstract: The alignment of large language models (LLMs) is crucial not only for unlocking their potential in specific tasks but also for ensuring that responses meet human expectations and adhere to safety and ethical principles. Current alignment methodologies face considerable challenges. For instance, supervised fine-tuning (SFT) requires extensive, high-quality annotated samples, while reinforcement lea… ▽ More The alignment of large language models (LLMs) is crucial not only for unlocking their potential in specific tasks but also for ensuring that responses meet human expectations and adhere to safety and ethical principles. Current alignment methodologies face considerable challenges. For instance, supervised fine-tuning (SFT) requires extensive, high-quality annotated samples, while reinforcement learning from human feedback (RLHF) is complex and often unstable. In this paper, we introduce self-evolution fine-tuning (SEFT) for policy optimization, with the aim of eliminating the need for annotated samples while retaining the stability and efficiency of SFT. SEFT first trains an adaptive reviser to elevate low-quality responses while maintaining high-quality ones. The reviser then gradually guides the policy's optimization by fine-tuning it with enhanced responses. One of the prominent features of this method is its ability to leverage unlimited amounts of unannotated data for policy optimization through supervised fine-tuning. Our experiments on AlpacaEval 2.0 and MT-Bench demonstrate the effectiveness of SEFT. We also provide a comprehensive analysis of its advantages over existing alignment techniques. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10744 [pdf, other]

Technique Report of CVPR 2024 PBDL Challenges

Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu , et al. (75 additional authors not shown)

Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, and medium properties from images. In recent years, deep learning has shown promising improvements for various vision tasks, and when combined with physics-based vision, these approaches can enhance the robustness and accuracy of vision systems. This technical report summarizes the outcomes of the Physics-Based Vision Meets Deep Learning (PBDL) 2024 challenge, held in CVPR 2024 workshop. The challenge consisted of eight tracks, focusing on Low-Light Enhancement and Detection as well as High Dynamic Range (HDR) Imaging. This report details the objectives, methodologies, and results of each track, highlighting the top-performing solutions and their innovative approaches. △ Less

Submitted 12 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: CVPR 2024 PBDL Challenges: https://pbdl-ws.github.io/pbdl2024/challenge/index.html

arXiv:2406.10594 [pdf, other]

BlockPruner: Fine-grained Pruning for Large Language Models

Authors: Longguang Zhong, Fanqi Wan, Ruijun Chen, Xiaojun Quan, Liangzhi Li

Abstract: With the rapid growth in the size and complexity of large language models (LLMs), the costs associated with their training and inference have escalated significantly. Research indicates that certain layers in LLMs harbor substantial redundancy, and pruning these layers has minimal impact on the overall performance. While various layer pruning methods have been developed based on this insight, they… ▽ More With the rapid growth in the size and complexity of large language models (LLMs), the costs associated with their training and inference have escalated significantly. Research indicates that certain layers in LLMs harbor substantial redundancy, and pruning these layers has minimal impact on the overall performance. While various layer pruning methods have been developed based on this insight, they generally overlook the finer-grained redundancies within the layers themselves. In this paper, we delve deeper into the architecture of LLMs and demonstrate that finer-grained pruning can be achieved by targeting redundancies in multi-head attention (MHA) and multi-layer perceptron (MLP) blocks. We propose a novel, training-free structured pruning approach called BlockPruner. Unlike existing layer pruning methods, BlockPruner segments each Transformer layer into MHA and MLP blocks. It then assesses the importance of these blocks using perplexity measures and applies a heuristic search for iterative pruning. We applied BlockPruner to LLMs of various sizes and architectures and validated its performance across a wide range of downstream tasks. Experimental results show that BlockPruner achieves more granular and effective pruning compared to state-of-the-art baselines. △ Less

Submitted 26 August, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

arXiv:2405.16071 [pdf, other]

DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution

Authors: Yuzhong Zhao, Feng Liu, Yue Liu, Mingxiang Liao, Chen Gong, Qixiang Ye, Fang Wan

Abstract: Region-level multi-modality methods can translate referred image regions to human preferred language descriptions. Unfortunately, most of existing methods using fixed visual inputs remain lacking the resolution adaptability to find out precise language descriptions. In this study, we propose a dynamic resolution approach, referred to as DynRefer, to pursue high-accuracy region-level referring thro… ▽ More Region-level multi-modality methods can translate referred image regions to human preferred language descriptions. Unfortunately, most of existing methods using fixed visual inputs remain lacking the resolution adaptability to find out precise language descriptions. In this study, we propose a dynamic resolution approach, referred to as DynRefer, to pursue high-accuracy region-level referring through mimicking the resolution adaptability of human visual cognition. DynRefer first implements stochastic vision-language alignment. It aligns desired language descriptions of multi-modality tasks with images of stochastic resolution, which are constructed by nesting a set of views around the referred region. DynRefer then implements dynamic multi-modality referring, which is realized by selecting views based on image and language priors. This allows the visual information used for referring to better match human preferences, thereby improving the representational adaptability of region-level multi-modality models. Extensive experiments show that DynRefer brings mutual improvement upon tasks including region-level captioning, open-vocabulary region recognition and attribute detection. Last but not least, DynRefer achieves new state-of-the-art on multiple region-level multi-modality tasks using a single model. Code is available at https://github.com/callsys/DynRefer. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: Code is available at https://github.com/callsys/DynRefer

arXiv:2405.11186 [pdf, other]

Compact Spin-Polarized Positron Acceleration in Multi-Layer Microhole Array Films

Authors: Zhen-Ke Dou, Chong Lv, Yousef I. Salamin, Nan Zhang, Feng Wan, Zhong-Feng Xu, Jian-Xing Li

Abstract: Compact spin-polarized positron accelerators play a major role in promoting significant positron application research, which typically require high acceleration gradients and polarization degree, both of which, however, are still great challenging. Here, we put forward a novel spin-polarized positron acceleration method which employs an ultrarelativistic high-density electron beam passing through… ▽ More Compact spin-polarized positron accelerators play a major role in promoting significant positron application research, which typically require high acceleration gradients and polarization degree, both of which, however, are still great challenging. Here, we put forward a novel spin-polarized positron acceleration method which employs an ultrarelativistic high-density electron beam passing through any hole of multi-layer microhole array films to excite strong electrostatic and transition radiation fields. Positrons in the polarized electron-positron pair plasma, filled in the front of the multi-layer films, can be captured, accelerated, and focused by the electrostatic and transition radiation fields, while maintaining high polarization of above 90% and high acceleration gradient of about TeV/m. Multi-layer design allows for capturing more positrons and achieving cascade acceleration. Our method offers a promising solution for accelerator miniaturization, positron injection, and polarization maintaining, and also can be used to accelerate other charged particles. △ Less

Submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.07709 [pdf, other]

Ultrafast Structured Spin-Manipulation of Relativistic Lepton Beams

Authors: Zhong-Peng Li, Yu Wang, Ting Sun, Feng Wan, Yousef I. Salamin, Mamutjan Ababekri, Qian Zhao, Kun Xue, Ye Tian, Wen-Qing Wei, Jian-Xing Li

Abstract: Relativistic spin-polarized (SP) lepton beams are important for investigating spin-dependent interaction processes. In particular, spatially structured spin-polarized (SSP) lepton beams may find new applications in material, atomic, nuclear, high-energy physics and new physics beyond the Standard Model. However, realizing ultrafast generation and spin-manipulation of relativistic SSP lepton beams… ▽ More Relativistic spin-polarized (SP) lepton beams are important for investigating spin-dependent interaction processes. In particular, spatially structured spin-polarized (SSP) lepton beams may find new applications in material, atomic, nuclear, high-energy physics and new physics beyond the Standard Model. However, realizing ultrafast generation and spin-manipulation of relativistic SSP lepton beams pose significant challenges. Here, we put forward a novel method of ultrafast (picosecond-timescale) generation of a relativistic SSP lepton beam via employing a moderate terahertz (THz) wave in a dielectric-lined waveguide (DWL). We first find that lepton beams with customizable spin-polarization structures can be generated by utilizing different electromagnetic modes, and optimizing the lepton velocity and THz phase velocity can improve efficiency of spin-manipulation and visibility of the SP structure. These SSP beams play a profound role in studying magnetic effects in material physics, chiral-selective chemistry, generation of structured $γ$-rays, etc., and open a new avenue for research on relativistic SP particles. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.06426 [pdf, other]

Generation of Ultra-Collimated Polarized Attosecond $γ-$Rays via Beam Instabilities

Authors: Li-Jie Cui, Ke-Jia Wei, Chong Lv, Feng Wan, Yousef I. Salamin, Lei-Feng Cao, Jian-Xing Li

Abstract: Polarized attosecond $γ-$rays may offer excitation and hyperfine tracking of reactions relevant to nuclear physics, astrophysics, high-energy physics, etc. However, unfortunately, generation of a feasible and easy-to-deploy source is still a great challenge. Here, we put forward a novel method for producing ultra-collimated high-brilliance polarized attosecond $γ-$rays via the interaction of an un… ▽ More Polarized attosecond $γ-$rays may offer excitation and hyperfine tracking of reactions relevant to nuclear physics, astrophysics, high-energy physics, etc. However, unfortunately, generation of a feasible and easy-to-deploy source is still a great challenge. Here, we put forward a novel method for producing ultra-collimated high-brilliance polarized attosecond $γ-$rays via the interaction of an unpolarized electron beam with a solid-density plasma. As a relativistic electron beam enters a solid-density plasma, it can be modulated into high-density clusters via the self-modulation instability of itself and further into attosecond slices due to its own hosing instability. This is accompanied by the generation of similar pulse-width $γ-$slices via nonlinear Compton scattering. The severe hosing instability breaks the symmetry of the excited electromagnetic fields, resulting in net linear polarization of $γ-$slices, which challenges the conventional perception that the interaction of an axially symmetric unpolarized electron beam with a uniform plasma cannot generate polarized radiation. In addition, we also obtain high-quality electron microbunches which may serve as an alternative source for prebunched free-electron lasers. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.02376 [pdf, other]

Non-invasive magnetocardiography of living rat based on diamond quantum sensor

Authors: Ziyun Yu, Yijin Xie, Guodong Jin, Yunbin Zhu, Qi Zhang, Fazhan Shi, Fang-yan Wan, Hongmei Luo, Ai-hui Tang, Xing Rong

Abstract: Magnetocardiography (MCG) has emerged as a sensitive and precise method to diagnose cardiovascular diseases, providing more diagnostic information than traditional technology. However, the sensor limitations of conventional MCG systems, such as large size and cryogenic requirement, have hindered the widespread application and in-depth understanding of this technology. In this study, we present a h… ▽ More Magnetocardiography (MCG) has emerged as a sensitive and precise method to diagnose cardiovascular diseases, providing more diagnostic information than traditional technology. However, the sensor limitations of conventional MCG systems, such as large size and cryogenic requirement, have hindered the widespread application and in-depth understanding of this technology. In this study, we present a high-sensitivity, room-temperature MCG system based on the negatively charged Nitrogen-Vacancy (NV) centers in diamond. The magnetic cardiac signal of a living rat, characterized by an approximately 20 pT amplitude in the R-wave, is successfully captured through non-invasive measurement using this innovative solid-state spin sensor. To detect these extremely weak biomagnetic signals, we utilize sensitivity-enhancing techniques such as magnetic flux concentration. These approaches have enabled us to simultaneously achieve a magnetometry sensitivity of 9 $\text{pT}\cdot \text{Hz}^{-1/2}$ and a sensor scale of 5 $\text{mm}$. By extending the sensing scale of the NV centers from cellular and molecular level to macroscopic level of living creatures, we have opened the future of solid-state quantum sensing technologies in clinical environments. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2403.09363 [pdf, other]

Sentinel-Guided Zero-Shot Learning: A Collaborative Paradigm without Real Data Exposure

Authors: Fan Wan, Xingyu Miao, Haoran Duan, Jingjing Deng, Rui Gao, Yang Long

Abstract: With increasing concerns over data privacy and model copyrights, especially in the context of collaborations between AI service providers and data owners, an innovative SG-ZSL paradigm is proposed in this work. SG-ZSL is designed to foster efficient collaboration without the need to exchange models or sensitive data. It consists of a teacher model, a student model and a generator that links both m… ▽ More With increasing concerns over data privacy and model copyrights, especially in the context of collaborations between AI service providers and data owners, an innovative SG-ZSL paradigm is proposed in this work. SG-ZSL is designed to foster efficient collaboration without the need to exchange models or sensitive data. It consists of a teacher model, a student model and a generator that links both model entities. The teacher model serves as a sentinel on behalf of the data owner, replacing real data, to guide the student model at the AI service provider's end during training. Considering the disparity of knowledge space between the teacher and student, we introduce two variants of the teacher model: the omniscient and the quasi-omniscient teachers. Under these teachers' guidance, the student model seeks to match the teacher model's performance and explores domains that the teacher has not covered. To trade off between privacy and performance, we further introduce two distinct security-level training protocols: white-box and black-box, enhancing the paradigm's adaptability. Despite the inherent challenges of real data absence in the SG-ZSL paradigm, it consistently outperforms in ZSL and GZSL tasks, notably in the white-box protocol. Our comprehensive evaluation further attests to its robustness and efficiency across various setups, including stringent black-box training protocol. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2402.16107 [pdf, other]

Knowledge Fusion of Chat LLMs: A Preliminary Technical Report

Authors: Fanqi Wan, Ziyi Yang, Longguang Zhong, Xiaojun Quan, Xinting Huang, Wei Bi

Abstract: Recently, FuseLLM introduced the concept of knowledge fusion to transfer the collective knowledge of multiple structurally varied LLMs into a target LLM through lightweight continual training. In this report, we extend the scalability and flexibility of the FuseLLM framework to realize the fusion of chat LLMs, resulting in FusionChat. FusionChat comprises two main stages. Firstly, we undertake kno… ▽ More Recently, FuseLLM introduced the concept of knowledge fusion to transfer the collective knowledge of multiple structurally varied LLMs into a target LLM through lightweight continual training. In this report, we extend the scalability and flexibility of the FuseLLM framework to realize the fusion of chat LLMs, resulting in FusionChat. FusionChat comprises two main stages. Firstly, we undertake knowledge fusion for structurally and scale-varied source LLMs to derive multiple target LLMs of identical structure and size via lightweight fine-tuning. Then, these target LLMs are merged within the parameter space, wherein we propose a novel method for determining the merging weights based on the variation ratio of parameter matrices before and after fine-tuning. We validate our approach using three prominent chat LLMs with diverse architectures and scales, namely NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B. Experimental results spanning various chat domains demonstrate the superiority of FusionChat-7B across a broad spectrum of chat LLMs at 7B and 34B scales, even surpassing GPT-3.5 (March) and approaching Mixtral-8x7B-Instruct. △ Less

Submitted 28 May, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

Comments: Technical Report, work in progress

arXiv:2402.03634 [pdf, other]

Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection

Authors: Feng Liu, Tengteng Huang, Qianjing Zhang, Haotian Yao, Chi Zhang, Fang Wan, Qixiang Ye, Yanzhao Zhou

Abstract: Multi-view 3D object detection systems often struggle with generating precise predictions due to the challenges in estimating depth from images, increasing redundant and incorrect detections. Our paper presents Ray Denoising, an innovative method that enhances detection accuracy by strategically sampling along camera rays to construct hard negative examples. These examples, visually challenging to… ▽ More Multi-view 3D object detection systems often struggle with generating precise predictions due to the challenges in estimating depth from images, increasing redundant and incorrect detections. Our paper presents Ray Denoising, an innovative method that enhances detection accuracy by strategically sampling along camera rays to construct hard negative examples. These examples, visually challenging to differentiate from true positives, compel the model to learn depth-aware features, thereby improving its capacity to distinguish between true and false positives. Ray Denoising is designed as a plug-and-play module, compatible with any DETR-style multi-view 3D detectors, and it only minimally increases training computational costs without affecting inference speed. Our comprehensive experiments, including detailed ablation studies, consistently demonstrate that Ray Denoising outperforms strong baselines across multiple datasets. It achieves a 1.9\% improvement in mean Average Precision (mAP) over the state-of-the-art StreamPETR method on the NuScenes dataset. It shows significant performance gains on the Argoverse 2 dataset, highlighting its generalization capability. The code will be available at https://github.com/LiewFeng/RayDN. △ Less

Submitted 12 March, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

arXiv:2402.03031 [pdf, other]

Superconducting Qubits Above 20 GHz Operating over 200 mK

Authors: Alexander Anferov, Shannon P. Harvey, Fanghui Wan, Jonathan Simon, David I. Schuster

Abstract: Current state-of-the-art superconducting microwave qubits are cooled to extremely low temperatures to avoid sources of decoherence. Higher qubit operating temperatures would significantly increase the cooling power available, which is desirable for scaling up the number of qubits in quantum computing architectures and integrating qubits in experiments requiring increased heat dissipation. To opera… ▽ More Current state-of-the-art superconducting microwave qubits are cooled to extremely low temperatures to avoid sources of decoherence. Higher qubit operating temperatures would significantly increase the cooling power available, which is desirable for scaling up the number of qubits in quantum computing architectures and integrating qubits in experiments requiring increased heat dissipation. To operate superconducting qubits at higher temperatures, it is necessary to address both quasiparticle decoherence (which becomes significant for aluminum junctions above 160 mK) and dephasing from thermal microwave photons (which are problematic above 50 mK). Using low-loss niobium trilayer junctions, which have reduced sensitivity to quasiparticles due to niobium's higher superconducting transition temperature, we fabricate transmons with higher frequencies than previously studied, up to 24 GHz. We measure decoherence and dephasing times of about 1 us, corresponding to average qubit quality factors of approximately $10^5$, and find that decoherence is unaffected by quasiparticles up to 1 K. Without relaxation from quasiparticles, we are able to explore dephasing from purely thermal sources, finding that our qubits can operate up to approximately 250 mK while maintaining similar performance. The thermal resilience of these qubits creates new options for scaling up quantum processors, enables hybrid quantum experiments with high heat dissipation budgets, and introduces a material platform for even higher-frequency qubits. △ Less

Submitted 16 August, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: 18 pages, 15 Figures including supplemental information. arXiv admin note: text overlap with arXiv:2402.03031

arXiv:2402.01950 [pdf, other]

ConRF: Zero-shot Stylization of 3D Scenes with Conditioned Radiation Fields

Authors: Xingyu Miao, Yang Bai, Haoran Duan, Fan Wan, Yawen Huang, Yang Long, Yefeng Zheng

Abstract: Most of the existing works on arbitrary 3D NeRF style transfer required retraining on each single style condition. This work aims to achieve zero-shot controlled stylization in 3D scenes utilizing text or visual input as conditioning factors. We introduce ConRF, a novel method of zero-shot stylization. Specifically, due to the ambiguity of CLIP features, we employ a conversion process that maps th… ▽ More Most of the existing works on arbitrary 3D NeRF style transfer required retraining on each single style condition. This work aims to achieve zero-shot controlled stylization in 3D scenes utilizing text or visual input as conditioning factors. We introduce ConRF, a novel method of zero-shot stylization. Specifically, due to the ambiguity of CLIP features, we employ a conversion process that maps the CLIP feature space to the style space of a pre-trained VGG network and then refine the CLIP multi-modal knowledge into a style transfer neural radiation field. Additionally, we use a 3D volumetric representation to perform local style transfer. By combining these operations, ConRF offers the capability to utilize either text or images as references, resulting in the generation of sequences with novel views enhanced by global or local stylization. Our experiment demonstrates that ConRF outperforms other existing methods for 3D scene and single-text stylization in terms of visual quality. △ Less

Submitted 6 March, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.17910 [pdf, other]

ControlCap: Controllable Region-level Captioning

Authors: Yuzhong Zhao, Yue Liu, Zonghao Guo, Weijia Wu, Chen Gong, Fang Wan, Qixiang Ye

Abstract: Region-level captioning is challenged by the caption degeneration issue, which refers to that pre-trained multimodal models tend to predict the most frequent captions but miss the less frequent ones. In this study, we propose a controllable region-level captioning (ControlCap) approach, which introduces control words to a multimodal model to address the caption degeneration issue. In specific, Con… ▽ More Region-level captioning is challenged by the caption degeneration issue, which refers to that pre-trained multimodal models tend to predict the most frequent captions but miss the less frequent ones. In this study, we propose a controllable region-level captioning (ControlCap) approach, which introduces control words to a multimodal model to address the caption degeneration issue. In specific, ControlCap leverages a discriminative module to generate control words within the caption space to partition it to multiple sub-spaces. The multimodal model is constrained to generate captions within a few sub-spaces containing the control words, which increases the opportunity of hitting less frequent captions, alleviating the caption degeneration issue. Furthermore, interactive control words can be given by either a human or an expert model, which enables captioning beyond the training caption space, enhancing the model's generalization ability. Extensive experiments on Visual Genome and RefCOCOg datasets show that ControlCap respectively improves the CIDEr score by 21.6 and 2.2, outperforming the state-of-the-arts by significant margins. Code is available at https://github.com/callsys/ControlCap. △ Less

Submitted 9 March, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

Comments: https://github.com/callsys/ControlCap

arXiv:2401.14075 [pdf, other]

Generation of High-Brilliance Polarized $γ$-Rays via Vacuum Dichroism-assisted Vacuum Birefringence

Authors: Chong Lv, Feng Wan, Yousef I. Salamin, Qian Zhao, Mamutjan Ababekri, Ruirui Xu, Jian-Xing Li

Abstract: We put forward a novel method to generate high-brilliance polarized $γ$-photon beams via vacuum dichroism (VD)-assisted vacuum birefringence (VB) effect. We split a linearly polarized (LP) laser pulse into two subpulses with the first one colliding with a dense unpolarized electron beam to generate LP $γ$ photons (via nonlinear Compton scattering), which then further collide with the second subpul… ▽ More We put forward a novel method to generate high-brilliance polarized $γ$-photon beams via vacuum dichroism (VD)-assisted vacuum birefringence (VB) effect. We split a linearly polarized (LP) laser pulse into two subpulses with the first one colliding with a dense unpolarized electron beam to generate LP $γ$ photons (via nonlinear Compton scattering), which then further collide with the second subpulse and are partially transformed into circularly polarized ones via the VB effect. We find that by manipulating the relative polarization of two subpulses, one can ``purify'' (i.e., enhance) the polarization of the $γ$-photon beam via the VD effect. Due to the VD assistance, the VB effect reaches optimal when the relative polarization is nearly $30^\circ$, not the widely used $45^\circ$ in the common VB detection methods. In addition, our method can be used to efficiently confirm the well-known VB effect itself, which has not been directly observed in experiments yet. △ Less

Submitted 30 April, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

arXiv:2401.10768 [pdf, other]

Knowledge Verification to Nip Hallucination in the Bud

Authors: Fanqi Wan, Xinting Huang, Leyang Cui, Xiaojun Quan, Wei Bi, Shuming Shi

Abstract: While large language models (LLMs) have demonstrated exceptional performance across various tasks following human alignment, they may still generate responses that sound plausible but contradict factual knowledge, a phenomenon known as hallucination. In this paper, we demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external knowledge p… ▽ More While large language models (LLMs) have demonstrated exceptional performance across various tasks following human alignment, they may still generate responses that sound plausible but contradict factual knowledge, a phenomenon known as hallucination. In this paper, we demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external knowledge present in the alignment data and the intrinsic knowledge embedded within foundation LLMs. Specifically, we propose a novel approach called Knowledge Consistent Alignment (KCA), which employs a well-aligned LLM to automatically formulate assessments based on external knowledge to evaluate the knowledge boundaries of foundation LLMs. To address knowledge inconsistencies in the alignment data, KCA implements several specific strategies to deal with these data instances. We demonstrate the superior efficacy of KCA in reducing hallucinations across six benchmarks, utilizing foundation LLMs of varying backbones and scales. This confirms the effectiveness of mitigating hallucinations by reducing knowledge inconsistency. Our code, model weights, and data are openly accessible at \url{https://github.com/fanqiwan/KCA}. △ Less

Submitted 21 September, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

Comments: Accepted to EMNLP 2024 (Main Conference)

arXiv:2401.10491 [pdf, other]

Knowledge Fusion of Large Language Models

Authors: Fanqi Wan, Xinting Huang, Deng Cai, Xiaojun Quan, Wei Bi, Shuming Shi

Abstract: While training large language models (LLMs) from scratch can generate models with distinct functionalities and strengths, it comes at significant costs and may result in redundant capabilities. Alternatively, a cost-effective and compelling approach is to merge existing pre-trained LLMs into a more potent model. However, due to the varying architectures of these LLMs, directly blending their weigh… ▽ More While training large language models (LLMs) from scratch can generate models with distinct functionalities and strengths, it comes at significant costs and may result in redundant capabilities. Alternatively, a cost-effective and compelling approach is to merge existing pre-trained LLMs into a more potent model. However, due to the varying architectures of these LLMs, directly blending their weights is impractical. In this paper, we introduce the notion of knowledge fusion for LLMs, aimed at combining the capabilities of existing LLMs and transferring them into a single LLM. By leveraging the generative distributions of source LLMs, we externalize their collective knowledge and unique strengths, thereby potentially elevating the capabilities of the target model beyond those of any individual source LLM. We validate our approach using three popular LLMs with different architectures--Llama-2, MPT, and OpenLLaMA--across various benchmarks and tasks. Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation. Our code, model weights, and data are public at \url{https://github.com/fanqiwan/FuseLLM}. △ Less

Submitted 22 January, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

Comments: Accepted to ICLR 2024

arXiv:2401.04861 [pdf, other]

doi 10.1016/j.patcog.2024.110729

CTNeRF: Cross-Time Transformer for Dynamic Neural Radiance Field from Monocular Video

Authors: Xingyu Miao, Yang Bai, Haoran Duan, Yawen Huang, Fan Wan, Yang Long, Yefeng Zheng

Abstract: The goal of our work is to generate high-quality novel views from monocular videos of complex and dynamic scenes. Prior methods, such as DynamicNeRF, have shown impressive performance by leveraging time-varying dynamic radiation fields. However, these methods have limitations when it comes to accurately modeling the motion of complex objects, which can lead to inaccurate and blurry renderings of d… ▽ More The goal of our work is to generate high-quality novel views from monocular videos of complex and dynamic scenes. Prior methods, such as DynamicNeRF, have shown impressive performance by leveraging time-varying dynamic radiation fields. However, these methods have limitations when it comes to accurately modeling the motion of complex objects, which can lead to inaccurate and blurry renderings of details. To address this limitation, we propose a novel approach that builds upon a recent generalization NeRF, which aggregates nearby views onto new viewpoints. However, such methods are typically only effective for static scenes. To overcome this challenge, we introduce a module that operates in both the time and frequency domains to aggregate the features of object motion. This allows us to learn the relationship between frames and generate higher-quality images. Our experiments demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets. Specifically, our approach outperforms existing methods in terms of both the accuracy and visual quality of the synthesized views. Our code is available on https://github.com/xingy038/CTNeRF. △ Less

Submitted 26 June, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

Comments: Accepted by Pattern Recognition

arXiv:2312.12295 [pdf, other]

Describing Robots from Design to Learning: Towards an Interactive Lifecycle Representation of Robots

Authors: Nuofan Qiu, Fang Wan, Chaoyang Song

Abstract: The robot development process is divided into several stages, which create barriers to the exchange of information between these different stages. We advocate for an interactive lifecycle representation, extending from robot morphology design to learning, and introduce the role of robot description formats in facilitating information transfer throughout this pipeline. We analyzed the relationship… ▽ More The robot development process is divided into several stages, which create barriers to the exchange of information between these different stages. We advocate for an interactive lifecycle representation, extending from robot morphology design to learning, and introduce the role of robot description formats in facilitating information transfer throughout this pipeline. We analyzed the relationship between design and simulation, enabling us to employ robot process automation methods for transferring information from the design phase to the learning phase in simulation. As part of this effort, we have developed an open-source plugin called ACDC4Robot for Fusion 360, which automates this process and transforms Fusion 360 into a user-friendly graphical interface for creating and editing robot description formats. Additionally, we offer an out-of-the-box robot model library to streamline and reduce repetitive tasks. All codes are hosted open-source. (\url{https://github.com/bionicdl-sustech/ACDC4Robot}) △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 11 pages, 8 figures, 2 tables, submitted to ICRA2024 for review

arXiv:2312.09863 [pdf, other]

Proprioceptive State Estimation for Amphibious Tactile Sensing

Authors: Ning Guo, Xudong Han, Shuqiao Zhong, Zhiyuan Zhou, Jian Lin, Jian S. Dai, Fang Wan, Chaoyang Song

Abstract: This paper presents a novel vision-based proprioception approach for a soft robotic finger that can estimate and reconstruct tactile interactions in both terrestrial and aquatic environments. The key to this system lies in the finger's unique metamaterial structure, which facilitates omni-directional passive adaptation during grasping, protecting delicate objects across diverse scenarios. A compac… ▽ More This paper presents a novel vision-based proprioception approach for a soft robotic finger that can estimate and reconstruct tactile interactions in both terrestrial and aquatic environments. The key to this system lies in the finger's unique metamaterial structure, which facilitates omni-directional passive adaptation during grasping, protecting delicate objects across diverse scenarios. A compact in-finger camera captures high-framerate images of the finger's deformation during contact, extracting crucial tactile data in real-time. We present a volumetric discretized model of the soft finger and use the geometry constraints captured by the camera to find the optimal estimation of the deformed shape. The approach is benchmarked using a motion capture system with sparse markers and a haptic device with dense measurements. Both results show state-of-the-art accuracies, with a median error of 1.96 mm for overall body deformation, corresponding to 2.1% of the finger's length. More importantly, the state estimation is robust in both on-land and underwater environments as we demonstrate its usage for underwater object shape sensing. This combination of passive adaptation and real-time tactile sensing paves the way for amphibious robotic grasping applications. △ Less

Submitted 21 July, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: 24 pages, 11 figures, 1 table, Conditionally Accepted for the Special Collection on Tactile Robotics in IEEE Transactions on Robotics

arXiv:2312.09822 [pdf, other]

SeeThruFinger: See and Grasp Anything with a Multi-Modal Soft Touch

Authors: Fang Wan, Zheng Wang, Wei Zhang, Chaoyang Song

Abstract: We present SeeThruFinger, a Vision-Based Tactile Sensing (VBTS) architecture using a markerless See-Thru-Network. It achieves simultaneous visual perception and tactile sensing while providing omni-directional, adaptive grasping for manipulation. Multi-modal perception of intrinsic and extrinsic interactions is critical in building intelligent robots that learn. Instead of adding various sensors f… ▽ More We present SeeThruFinger, a Vision-Based Tactile Sensing (VBTS) architecture using a markerless See-Thru-Network. It achieves simultaneous visual perception and tactile sensing while providing omni-directional, adaptive grasping for manipulation. Multi-modal perception of intrinsic and extrinsic interactions is critical in building intelligent robots that learn. Instead of adding various sensors for different modalities, a preferred solution is to integrate them into one elegant and coherent design, which is a challenging task. This study leverages the in-finger vision to inpaint occluded regions of the external environment, achieving coherent scene reconstruction for visual perception. By tracking real-time segmentation of the Soft Polyhedral Network's large-scale deformation, we achieved real-time markerless tactile sensing of 6D forces and torques. We demonstrate the capable performances of the SeeThruFinger for reactive grasping without using external cameras or dedicated force and torque sensors on the fingertips. Using the inpainted scene and the deformation mask, we further demonstrate the multi-modal performance of the SeeThruFinger architecture to simultaneously achieve various capabilities, including but not limited to scene inpainting, object detection, depth sensing, scene segmentation, masked deformation tracking, 6D force-and-torque sensing, and contact event detection, all within a single input from the in-finger vision of the See-Thru-Network in a markerless way. All codes are available at https://github.com/ancorasir/SeeThruFinger. △ Less

Submitted 20 September, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: 19 pages, 15 figures, 1 table

arXiv:2311.14974 [pdf, other]

Active Surface with Passive Omni-Directional Adaptation of Soft Polyhedral Fingers for In-Hand Manipulation

Authors: Sen Li, Fang Wan, Chaoyang Song

Abstract: Track systems effectively distribute loads, augmenting traction and maneuverability on unstable terrains, leveraging their expansive contact areas. This tracked locomotion capability also aids in hand manipulation of not only regular objects but also irregular objects. In this study, we present the design of a soft robotic finger with an active surface on an omni-adaptive network structure, which… ▽ More Track systems effectively distribute loads, augmenting traction and maneuverability on unstable terrains, leveraging their expansive contact areas. This tracked locomotion capability also aids in hand manipulation of not only regular objects but also irregular objects. In this study, we present the design of a soft robotic finger with an active surface on an omni-adaptive network structure, which can be easily installed on existing grippers and achieve stability and dexterity for in-hand manipulation. The system's active surfaces initially transfer the object from the fingertip segment with less compliance to the middle segment of the finger with superior adaptability. Despite the omni-directional deformation of the finger, in-hand manipulation can still be executed with controlled active surfaces. We characterized the soft finger's stiffness distribution and simplified models to assess the feasibility of repositioning and reorienting a grasped object. A set of experiments on in-hand manipulation was performed with the proposed fingers, demonstrating the dexterity and robustness of the strategy. △ Less

Submitted 25 November, 2023; originally announced November 2023.

Comments: 10 pages, 6 figures, 2 tables, submitted to ICRA 2024

arXiv:2311.01670 [pdf, other]

doi 10.1088/1361-6668/ad22ff

Low-loss Millimeter-wave Resonators with an Improved Coupling Structure

Authors: Alexander Anferov, Shannon P. Harvey, Fanghui Wan, Kan-Heng Lee, Jonathan Simon, David I. Schuster

Abstract: Millimeter-wave superconducting resonators are a useful tool for studying quantum device coherence in a new frequency domain. However, improving resonators is difficult without a robust and reliable method for coupling millimeter-wave signals to 2D structures. We develop and characterize a tapered transition structure coupling a rectangular waveguide to a planar slotline waveguide with better than… ▽ More Millimeter-wave superconducting resonators are a useful tool for studying quantum device coherence in a new frequency domain. However, improving resonators is difficult without a robust and reliable method for coupling millimeter-wave signals to 2D structures. We develop and characterize a tapered transition structure coupling a rectangular waveguide to a planar slotline waveguide with better than 0.5 dB efficiency over 14 GHz, and use it to measure ground-shielded resonators in the W band (75 - 110 GHz). Having decoupled the resonators from radiative losses, we consistently achieve single-photon quality factors above $10^5$, with a two-level-system loss limit above $10^6$, and verify the effectiveness of oxide removal treatments to reduce loss. These values are 4-5 times higher than those previously reported in the W band, and much closer to typical planar microwave resonators. The improved losses demonstrated by these on-chip millimeter-wave devices shed new light on quantum decoherence in a different frequency regime, offer increased selectivity for high-frequency detectors, and enables new possibilities for hybrid quantum experiments integrating millimeter-wave frequencies. △ Less

Submitted 21 February, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

Comments: 9 pages, 9 figures and appendices (3 pages, 2 figures)

arXiv:2310.20256 [pdf, other]

PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for Personality Detection

Authors: Tao Yang, Tianyuan Shi, Fanqi Wan, Xiaojun Quan, Qifan Wang, Bingzhe Wu, Jiaxiang Wu

Abstract: Recent advances in large language models (LLMs), such as ChatGPT, have showcased remarkable zero-shot performance across various NLP tasks. However, the potential of LLMs in personality detection, which involves identifying an individual's personality from their written texts, remains largely unexplored. Drawing inspiration from Psychological Questionnaires, which are carefully designed by psychol… ▽ More Recent advances in large language models (LLMs), such as ChatGPT, have showcased remarkable zero-shot performance across various NLP tasks. However, the potential of LLMs in personality detection, which involves identifying an individual's personality from their written texts, remains largely unexplored. Drawing inspiration from Psychological Questionnaires, which are carefully designed by psychologists to evaluate individual personality traits through a series of targeted items, we argue that these items can be regarded as a collection of well-structured chain-of-thought (CoT) processes. By incorporating these processes, LLMs can enhance their capabilities to make more reasonable inferences on personality from textual input. In light of this, we propose a novel personality detection method, called PsyCoT, which mimics the way individuals complete psychological questionnaires in a multi-turn dialogue manner. In particular, we employ a LLM as an AI assistant with a specialization in text analysis. We prompt the assistant to rate individual items at each turn and leverage the historical rating results to derive a conclusive personality preference. Our experiments demonstrate that PsyCoT significantly improves the performance and robustness of GPT-3.5 in personality detection, achieving an average F1 score improvement of 4.23/10.63 points on two benchmark datasets compared to the standard prompting method. Our code is available at https://github.com/TaoYang225/PsyCoT. △ Less

Submitted 4 November, 2023; v1 submitted 31 October, 2023; originally announced October 2023.

Comments: Accepted to Findings of EMNLP 2023

arXiv:2310.09824 [pdf, other]

Overconstrained Locomotion

Authors: Haoran Sun, Bangchao Huang, Zishang Zhang, Ronghan Xu, Guojing Huang, Shihao Feng, Guangyi Huang, Jiayi Yin, Nuofan Qiu, Hua Chen, Wei Zhang, Jia Pan, Fang Wan, Chaoyang Song

Abstract: This paper studies the design, control, and learning of a novel robotic limb that produces overconstrained locomotion by employing the Bennett linkage for motion generation, capable of parametric reconfiguration between a reptile- and mammal-inspired morphology within a single quadruped. In contrast to the prevailing focus on planar linkages, this research delves into adopting overconstrained link… ▽ More This paper studies the design, control, and learning of a novel robotic limb that produces overconstrained locomotion by employing the Bennett linkage for motion generation, capable of parametric reconfiguration between a reptile- and mammal-inspired morphology within a single quadruped. In contrast to the prevailing focus on planar linkages, this research delves into adopting overconstrained linkages as the limb mechanism. The overconstrained linkages have solid theoretical foundations in advanced kinematics but are under-explored in robotic applications. This study showcases the morphological superiority of Overconstrained Robotic Limbs (ORLs) that can transform into planar or spherical limbs, exemplified using the simplest case of a Bennett linkage as an ORL. We apply Model Predictive Control (MPC) to simulate a range of overconstrained locomotion tasks, revealing its superiority in energy efficiency against planar limbs when considering foothold distances and speeds. The results are further verified in overconstrained locomotion policies optimized from Reinforcement Learning (RL). From an evolutionary biology perspective, these findings highlight the mechanism distinctions in limb design between reptiles and mammals and represent the first documented instance of ORLs outperforming planar limb designs in dynamic locomotion. Future studies will focus on deploying the model-based and learning-based overconstrained locomotion skills in the robotic hardware to close the Sim2Real gap for developing evolutionary-inspired, energy-efficient control of novel robotic limbs. △ Less

Submitted 30 July, 2024; v1 submitted 15 October, 2023; originally announced October 2023.

Comments: 30 pages, 20 figures, 2 tables

arXiv:2310.09168 [pdf, other]

Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration

Authors: Fanqi Wan, Xinting Huang, Tao Yang, Xiaojun Quan, Wei Bi, Shuming Shi

Abstract: Instruction-tuning can be substantially optimized through enhanced diversity, resulting in models capable of handling a broader spectrum of tasks. However, existing data employed for such tuning often exhibit an inadequate coverage of individual domains, limiting the scope for nuanced comprehension and interactions within these areas. To address this deficiency, we propose Explore-Instruct, a nove… ▽ More Instruction-tuning can be substantially optimized through enhanced diversity, resulting in models capable of handling a broader spectrum of tasks. However, existing data employed for such tuning often exhibit an inadequate coverage of individual domains, limiting the scope for nuanced comprehension and interactions within these areas. To address this deficiency, we propose Explore-Instruct, a novel approach to enhance the data coverage to be used in domain-specific instruction-tuning through active exploration via Large Language Models (LLMs). Built upon representative domain use cases, Explore-Instruct explores a multitude of variations or possibilities by implementing a search algorithm to obtain diversified and domain-focused instruction-tuning data. Our data-centric analysis validates the effectiveness of this proposed approach in improving domain-specific instruction coverage. Moreover, our model's performance demonstrates considerable advancements over multiple baselines, including those utilizing domain-specific data enhancement. Our findings offer a promising opportunity to improve instruction coverage, especially in domain-specific contexts, thereby advancing the development of adaptable language models. Our code, model weights, and data are public at \url{https://github.com/fanqiwan/Explore-Instruct}. △ Less

Submitted 24 October, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: Accepted to EMNLP 2023 (Main Conference)

arXiv:2310.08877 [pdf, other]

Retrieval-Generation Alignment for End-to-End Task-Oriented Dialogue System

Authors: Weizhou Shen, Yingqi Gao, Canbin Huang, Fanqi Wan, Xiaojun Quan, Wei Bi

Abstract: Developing an efficient retriever to retrieve knowledge from a large-scale knowledge base (KB) is critical for task-oriented dialogue systems to effectively handle localized and specialized tasks. However, widely used generative models such as T5 and ChatGPT often struggle to differentiate subtle differences among the retrieved KB records when generating responses, resulting in suboptimal quality… ▽ More Developing an efficient retriever to retrieve knowledge from a large-scale knowledge base (KB) is critical for task-oriented dialogue systems to effectively handle localized and specialized tasks. However, widely used generative models such as T5 and ChatGPT often struggle to differentiate subtle differences among the retrieved KB records when generating responses, resulting in suboptimal quality of generated responses. In this paper, we propose the application of maximal marginal likelihood to train a perceptive retriever by utilizing signals from response generation for supervision. In addition, our approach goes beyond considering solely retrieved entities and incorporates various meta knowledge to guide the generator, thus improving the utilization of knowledge. We evaluate our approach on three task-oriented dialogue datasets using T5 and ChatGPT as the backbone models. The results demonstrate that when combined with meta knowledge, the response generator can effectively leverage high-quality knowledge records from the retriever and enhance the quality of generated responses. The codes and models of this paper are available at https://github.com/shenwzh3/MK-TOD. △ Less

Submitted 20 October, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: Accepted to EMNLP 2023 Main Conference

arXiv:2308.08538 [pdf, other]

doi 10.1177/02783649241238765

Proprioceptive Learning with Soft Polyhedral Networks

Authors: Xiaobo Liu, Xudong Han, Wei Hong, Fang Wan, Chaoyang Song

Abstract: Proprioception is the "sixth sense" that detects limb postures with motor neurons. It requires a natural integration between the musculoskeletal systems and sensory receptors, which is challenging among modern robots that aim for lightweight, adaptive, and sensitive designs at a low cost. Here, we present the Soft Polyhedral Network with an embedded vision for physical interactions, capable of ada… ▽ More Proprioception is the "sixth sense" that detects limb postures with motor neurons. It requires a natural integration between the musculoskeletal systems and sensory receptors, which is challenging among modern robots that aim for lightweight, adaptive, and sensitive designs at a low cost. Here, we present the Soft Polyhedral Network with an embedded vision for physical interactions, capable of adaptive kinesthesia and viscoelastic proprioception by learning kinetic features. This design enables passive adaptations to omni-directional interactions, visually captured by a miniature high-speed motion tracking system embedded inside for proprioceptive learning. The results show that the soft network can infer real-time 6D forces and torques with accuracies of 0.25/0.24/0.35 N and 0.025/0.034/0.006 Nm in dynamic interactions. We also incorporate viscoelasticity in proprioception during static adaptation by adding a creep and relaxation modifier to refine the predicted results. The proposed soft network combines simplicity in design, omni-adaptation, and proprioceptive sensing with high accuracy, making it a versatile solution for robotics at a low cost with more than 1 million use cycles for tasks such as sensitive and competitive grasping, and touch-based geometry reconstruction. This study offers new insights into vision-based proprioception for soft robots in adaptive grasping, soft manipulation, and human-robot interaction. △ Less

Submitted 27 July, 2024; v1 submitted 16 August, 2023; originally announced August 2023.

Comments: 20 pages, 10 figures, 2 tables, Published in the International Journal of Robotics Research

arXiv:2308.08510 [pdf, other]

Autoencoding a Soft Touch to Learn Grasping from On-land to Underwater

Authors: Ning Guo, Xudong Han, Xiaobo Liu, Shuqiao Zhong, Zhiyuan Zhou, Jian Lin, Jiansheng Dai, Fang Wan, Chaoyang Song

Abstract: Robots play a critical role as the physical agent of human operators in exploring the ocean. However, it remains challenging to grasp objects reliably while fully submerging under a highly pressurized aquatic environment with little visible light, mainly due to the fluidic interference on the tactile mechanics between the finger and object surfaces. This study investigates the transferability of g… ▽ More Robots play a critical role as the physical agent of human operators in exploring the ocean. However, it remains challenging to grasp objects reliably while fully submerging under a highly pressurized aquatic environment with little visible light, mainly due to the fluidic interference on the tactile mechanics between the finger and object surfaces. This study investigates the transferability of grasping knowledge from on-land to underwater via a vision-based soft robotic finger that learns 6D forces and torques (FT) using a Supervised Variational Autoencoder (SVAE). A high-framerate camera captures the whole-body deformations while a soft robotic finger interacts with physical objects on-land and underwater. Results show that the trained SVAE model learned a series of latent representations of the soft mechanics transferrable from land to water, presenting a superior adaptation to the changing environments against commercial FT sensors. Soft, delicate, and reactive grasping enabled by tactile intelligence enhances the gripper's underwater interaction with improved reliability and robustness at a much-reduced cost, paving the path for learning-based intelligent grasping to support fundamental scientific discoveries in environmental and ocean research. △ Less

Submitted 16 August, 2023; originally announced August 2023.

Comments: 17 pages, 5 figures, 1 table, submitted to Advanced Intelligent Systems for review

arXiv:2308.07225 [pdf, other]

doi 10.1109/TCSVT.2023.3305776

DS-Depth: Dynamic and Static Depth Estimation via a Fusion Cost Volume

Authors: Xingyu Miao, Yang Bai, Haoran Duan, Yawen Huang, Fan Wan, Xinxing Xu, Yang Long, Yefeng Zheng

Abstract: Self-supervised monocular depth estimation methods typically rely on the reprojection error to capture geometric relationships between successive frames in static environments. However, this assumption does not hold in dynamic objects in scenarios, leading to errors during the view synthesis stage, such as feature mismatch and occlusion, which can significantly reduce the accuracy of the generated… ▽ More Self-supervised monocular depth estimation methods typically rely on the reprojection error to capture geometric relationships between successive frames in static environments. However, this assumption does not hold in dynamic objects in scenarios, leading to errors during the view synthesis stage, such as feature mismatch and occlusion, which can significantly reduce the accuracy of the generated depth maps. To address this problem, we propose a novel dynamic cost volume that exploits residual optical flow to describe moving objects, improving incorrectly occluded regions in static cost volumes used in previous work. Nevertheless, the dynamic cost volume inevitably generates extra occlusions and noise, thus we alleviate this by designing a fusion module that makes static and dynamic cost volumes compensate for each other. In other words, occlusion from the static volume is refined by the dynamic volume, and incorrect information from the dynamic volume is eliminated by the static volume. Furthermore, we propose a pyramid distillation loss to reduce photometric error inaccuracy at low resolutions and an adaptive photometric error loss to alleviate the flow direction of the large gradient in the occlusion regions. We conducted extensive experiments on the KITTI and Cityscapes datasets, and the results demonstrate that our model outperforms previously published baselines for self-supervised monocular depth estimation. △ Less

Submitted 14 August, 2023; originally announced August 2023.

arXiv:2307.16401 [pdf, other]

Cascade of polarized Compton scattering and Breit-Wheeler pair production

Authors: Qian Zhao, Ting Sun, Kun Xue, Feng Wan, Jian-Xing Li

Abstract: Cascaded Compton scattering and Breit-Wheeler (BW) processes play fundamental roles in high-energy astrophysical sources and laser-driven quantum electrodynamics (QED) plasmas. A thorough comprehension of the polarization transfer in these cascaded processes is essential for elucidating the polarization mechanism of high-energy cosmic gamma rays and laser-driven QED plasmas. In this study, we empl… ▽ More Cascaded Compton scattering and Breit-Wheeler (BW) processes play fundamental roles in high-energy astrophysical sources and laser-driven quantum electrodynamics (QED) plasmas. A thorough comprehension of the polarization transfer in these cascaded processes is essential for elucidating the polarization mechanism of high-energy cosmic gamma rays and laser-driven QED plasmas. In this study, we employ analytical cross-sectional calculations and Monte Carlo (MC) numerical simulations to investigate the polarization transfer in the cascade of electron-seeded inverse Compton scattering (ICS) and BW process. Theoretical analysis indicates that the polarization of background photons can effectively transfer to final-state particles in the first-generation cascade due to helicity transfer. Through MC simulations involving polarized background photons and non-polarized seed electrons, we reveal the characteristic polarization curves as a function of particle energy produced by the cascaded processes of ICS and BW pair production. Our results demonstrate that the first-generation photons from ICS exhibit the non-decayed stair-shape polarization curves, in contrast to the linearly decayed ones of the first-generation electrons. Interestingly, this polarization curve trend can be reversed in the second-generation cascade, facilitated by the presence of polarized first-generation BW pairs with fluctuant polarization curves. The cascade culminates with the production of second-generation BW pairs, due to diminished energy of second-generation photons below the threshold of BW process. Our findings provide crucial insights into the cascaded processes of Compton scattering and BW process, significantly contributing to the understanding and further exploration of laser-driven QED plasma creation in laboratory settings and high-energy astrophysics research. △ Less

Submitted 6 November, 2023; v1 submitted 31 July, 2023; originally announced July 2023.

arXiv:2307.09756 [pdf, other]

Generative Prompt Model for Weakly Supervised Object Localization

Authors: Yuzhong Zhao, Qixiang Ye, Weijia Wu, Chunhua Shen, Fang Wan

Abstract: Weakly supervised object localization (WSOL) remains challenging when learning object localization models from image category labels. Conventional methods that discriminatively train activation models ignore representative yet less discriminative object parts. In this study, we propose a generative prompt model (GenPromp), defining the first generative pipeline to localize less discriminative obje… ▽ More Weakly supervised object localization (WSOL) remains challenging when learning object localization models from image category labels. Conventional methods that discriminatively train activation models ignore representative yet less discriminative object parts. In this study, we propose a generative prompt model (GenPromp), defining the first generative pipeline to localize less discriminative object parts by formulating WSOL as a conditional image denoising procedure. During training, GenPromp converts image category labels to learnable prompt embeddings which are fed to a generative model to conditionally recover the input image with noise and learn representative embeddings. During inference, enPromp combines the representative embeddings with discriminative embeddings (queried from an off-the-shelf vision-language model) for both representative and discriminative capacity. The combined embeddings are finally used to generate multi-scale high-quality attention maps, which facilitate localizing full object extent. Experiments on CUB-200-2011 and ILSVRC show that GenPromp respectively outperforms the best discriminative models by 5.2% and 5.6% (Top-1 Loc), setting a solid baseline for WSOL with the generative model. Code is available at https://github.com/callsys/GenPromp. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Journal ref: International Conference on Computer Vision Conference (ICCV2023)

arXiv:2306.14936 [pdf, other]

Manipulation of $γ$ ray polarization in Compton scattering

Authors: Yu Wang, Mamutjan Ababekri, Feng Wan, Jia-Xing Wen, Wen-Qing Wei, Zhong-Peng Li, Hai-Tao Kang, Bo Zhang, Yong-Tao Zhao, Wei-Min Zhou, Jian-Xing Li

Abstract: High-brilliance high-polarization $γ$ rays based on Compton scattering are of great significance in broad areas, such as nuclear, high-energy, astro-physics, etc. However, the transfer mechanism of spin angular momentum in the transition from linear, through weakly into strongly nonlinear processes is still unclear, which severely limits the simultaneous control of brilliance and polarization of h… ▽ More High-brilliance high-polarization $γ$ rays based on Compton scattering are of great significance in broad areas, such as nuclear, high-energy, astro-physics, etc. However, the transfer mechanism of spin angular momentum in the transition from linear, through weakly into strongly nonlinear processes is still unclear, which severely limits the simultaneous control of brilliance and polarization of high-energy $γ$ rays. In this work, we investigate the manipulation mechanism of high-quality polarized $γ$ rays in Compton scattering of the ultrarelativistic electron beam colliding with an intense laser pulse. We find that the contradiction lies in the simultaneous achievement of high-brilliance and high-polarization of $γ$ rays by increasing laser intensity, since the polarization is predominately contributed by the electron (laser photon) spin via multi-photon (single-photon) absorption channel. Moreover, we confirm that the signature of $γ$-ray polarization can be applied for observing the nonlinear effects (multi-photon absorption) of Compton scattering with moderate-intensity laser facilities. △ Less

Submitted 19 July, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.11288 [pdf, other]

Simulations of spin/polarization-resolved laser-plasma interactions in the nonlinear QED regime

Authors: Feng Wan, Chong Lv, Kun Xue, Zhen-Ke Dou, Qian Zhao, Mamutjan Ababekri, Wen-Qing Wei, Zhong-Peng Li, Yong-Tao Zhao, Jian-Xing Li

Abstract: Strong-field quantum electrodynamics (SF-QED) plays a crucial role in ultraintense laser matter interactions, and demands sophisticated techniques to understand the related physics with new degrees of freedom, including spin angular momentum. To investigate the impact of SF-QED processes, we have introduced spin/polarization-resolved nonlinear Compton scattering, nonlinear Breit-Wheeler and vacuum… ▽ More Strong-field quantum electrodynamics (SF-QED) plays a crucial role in ultraintense laser matter interactions, and demands sophisticated techniques to understand the related physics with new degrees of freedom, including spin angular momentum. To investigate the impact of SF-QED processes, we have introduced spin/polarization-resolved nonlinear Compton scattering, nonlinear Breit-Wheeler and vacuum birefringence processes into our particle-in-cell (PIC) code. In this article, we will provide details of the implementation of these SF-QED modules and share known results that demonstrate exact agreement with existing single particle codes. By coupling normal PIC with spin/polarization-resolved SF-QED processes, we create a new theoretical platform to study strong field physics in currently running or planned petawatt or multi-petawatt laser facilities. △ Less

Submitted 26 July, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

arXiv:2306.04932 [pdf, other]

Jigsaw-based Benchmarking for Learning Robotic Manipulation

Authors: Xiaobo Liu, Fang Wan, Sheng Ge, Haokun Wang, Haoran Sun, Chaoyang Song

Abstract: Benchmarking provides experimental evidence of the scientific baseline to enhance the progression of fundamental research, which is also applicable to robotics. In this paper, we propose a method to benchmark metrics of robotic manipulation, which addresses the spatial-temporal reasoning skills for robot learning with the jigsaw game. In particular, our approach exploits a simple set of jigsaw pie… ▽ More Benchmarking provides experimental evidence of the scientific baseline to enhance the progression of fundamental research, which is also applicable to robotics. In this paper, we propose a method to benchmark metrics of robotic manipulation, which addresses the spatial-temporal reasoning skills for robot learning with the jigsaw game. In particular, our approach exploits a simple set of jigsaw pieces by designing a structured protocol, which can be highly customizable according to a wide range of task specifications. Researchers can selectively adopt the proposed protocol to benchmark their research outputs, on a comparable scale in the functional, task, and system-level of details. The purpose is to provide a potential look-up table for learning-based robot manipulation, commonly available in other engineering disciplines, to facilitate the adoption of robotics through calculated, empirical, and systematic experimental evidence. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: 7 pages, 7 figures, accepted to 2023 IEEE International Conference on Advanced Robotics and Mechatronics (ICARM)

arXiv:2306.04928 [pdf, other]

Underwater Intention Recognition using Head Motion and Throat Vibration for Supernumerary Robotic Assistance

Authors: Yuqin Guo, Rongzheng Zhang, Wanghongjie Qiu, Harry Asada, Fang Wan, Chaoyang Song

Abstract: This study presents a multi-modal mechanism for recognizing human intentions while diving underwater, aiming to achieve natural human-robot interactions through an underwater superlimb for diving assistance. The underwater environment severely limits the divers' capabilities in intention expression, which becomes more challenging when they intend to operate tools while keeping control of body post… ▽ More This study presents a multi-modal mechanism for recognizing human intentions while diving underwater, aiming to achieve natural human-robot interactions through an underwater superlimb for diving assistance. The underwater environment severely limits the divers' capabilities in intention expression, which becomes more challenging when they intend to operate tools while keeping control of body postures in 3D with the various diving suits and gears. The current literature is limited in underwater intention recognition, impeding the development of intelligent wearable systems for human-robot interactions underwater. Here, we present a novel solution to simultaneously detect head motion and throat vibrations under the water in a compact, wearable design. Experiment results show that using machine learning algorithms, we achieved high performance in integrating these two modalities to translate human intentions to robot control commands for an underwater superlimb system. This study's results paved the way for future development in underwater intention recognition and underwater human-robot interactions with supernumerary support. △ Less

Submitted 16 August, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: 6 pages, 9 figures, 3 tables, accepted to IEEE CASE 2023

arXiv:2306.04142 [pdf, other]

doi 10.1103/PhysRevLett.131.175101

Generation of High-Density High-Polarization Positrons via Single-Shot Strong Laser-Foil Interaction

Authors: Kun Xue, Ting Sun, Ke-Jia Wei, Zhong-Peng Li, Qian Zhao, Feng Wan, Chong Lv, Yong-Tao Zhao, Zhong-Feng Xu, Jian-Xing Li

Abstract: We put forward a novel method for producing ultrarelativistic high-density high-polarization positrons through a single-shot interaction of a strong laser with a tilted solid foil. In our method, the driving laser ionizes the target, and the emitted electrons are accelerated and subsequently generate abundant $γ$ photons via the nonlinear Compton scattering, dominated by the laser. These $γ$ photo… ▽ More We put forward a novel method for producing ultrarelativistic high-density high-polarization positrons through a single-shot interaction of a strong laser with a tilted solid foil. In our method, the driving laser ionizes the target, and the emitted electrons are accelerated and subsequently generate abundant $γ$ photons via the nonlinear Compton scattering, dominated by the laser. These $γ$ photons then generate polarized positrons via the nonlinear Breit-Wheeler process, dominated by a strong self-generated quasi-static magnetic field $\mathbf{B}^{\rm S}$. We find that placing the foil at an appropriate angle can result in a directional orientation of $\mathbf{B}^{\rm S}$, thereby polarizing positrons. Manipulating the laser polarization direction can control the angle between the $γ$ photon polarization and $\mathbf{B}^{\rm S}$, significantly enhancing the positron polarization degree. Our spin-resolved quantum electrodynamics particle-in-cell simulations demonstrate that employing a laser with a peak intensity of about $10^{23}$ W/cm$^2$ can obtain dense ($\gtrsim$ 10$^{18}$ cm$^{-3}$) polarized positrons with an average polarization degree of about 70\% and a yield of above 0.1 nC per shot. Moreover, our method is feasible using currently available or upcoming laser facilities and robust with respect to the laser and target parameters. Such high-density high-polarization positrons hold great significance in laboratory astrophysics, high-energy physics and new physics beyond the Standard Model. △ Less

Submitted 26 October, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

Journal ref: Phys. Rev. Lett. 131 (2023) 175101

arXiv:2305.10174 [pdf, other]

Utilising high-dimensional data in randomised clinical trials: a review of methods and practice

Authors: Svetlana Cherlin, Theophile Bigirumurame, Michael J Grayling, Jérémie Nsengimana, Luke Ouma, Aida Santaolalla, Fang Wan, S Faye Williamson, James M S Wason

Abstract: Introduction: Even in effectively conducted randomised trials, the probability of a successful study remains relatively low. With recent advances in the next-generation sequencing technologies, there is a rapidly growing number of high-dimensional data, including genetic, molecular and phenotypic information, that have improved our understanding of driver genes, drug targets, and drug mechanisms o… ▽ More Introduction: Even in effectively conducted randomised trials, the probability of a successful study remains relatively low. With recent advances in the next-generation sequencing technologies, there is a rapidly growing number of high-dimensional data, including genetic, molecular and phenotypic information, that have improved our understanding of driver genes, drug targets, and drug mechanisms of action. The leveraging of high-dimensional data holds promise for increased success of clinical trials. Methods: We provide an overview of methods for utilising high-dimensional data in clinical trials. We also investigate the use of these methods in practice through a review of recently published randomised clinical trials that utilise high-dimensional genetic data. The review includes articles that were published between 2019 and 2021, identified through the PubMed database. Results: Out of 174 screened articles, 100 (57.5%) were randomised clinical trials that collected high-dimensional data. The most common clinical area was oncology (30%), followed by chronic diseases (28%), nutrition and ageing (18%) and cardiovascular diseases (7%). The most common types of data analysed were gene expression data (70%), followed by DNA data (21%). The most common method of analysis (36.3%) was univariable analysis. Articles that described multivariable analyses used standard statistical methods. Most of the clinical trials had two arms. Discussion: New methodological approaches are required for more efficient analysis of the increasing amount of high-dimensional data collected in randomised clinical trials. We highlight the limitations and barriers to the current use of high-dimensional data in trials, and suggest potential avenues for improvement and future work. △ Less

Submitted 5 February, 2024; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: 17 pages, 3 figures, 2 tables

arXiv:2305.10149 [pdf, other]

Multi-Grained Knowledge Retrieval for End-to-End Task-Oriented Dialog

Authors: Fanqi Wan, Weizhou Shen, Ke Yang, Xiaojun Quan, Wei Bi

Abstract: Retrieving proper domain knowledge from an external database lies at the heart of end-to-end task-oriented dialog systems to generate informative responses. Most existing systems blend knowledge retrieval with response generation and optimize them with direct supervision from reference responses, leading to suboptimal retrieval performance when the knowledge base becomes large-scale. To address th… ▽ More Retrieving proper domain knowledge from an external database lies at the heart of end-to-end task-oriented dialog systems to generate informative responses. Most existing systems blend knowledge retrieval with response generation and optimize them with direct supervision from reference responses, leading to suboptimal retrieval performance when the knowledge base becomes large-scale. To address this, we propose to decouple knowledge retrieval from response generation and introduce a multi-grained knowledge retriever (MAKER) that includes an entity selector to search for relevant entities and an attribute selector to filter out irrelevant attributes. To train the retriever, we propose a novel distillation objective that derives supervision signals from the response generator. Experiments conducted on three standard benchmarks with both small and large-scale knowledge bases demonstrate that our retriever performs knowledge retrieval more effectively than existing methods. Our code has been made publicly available.\footnote{https://github.com/18907305772/MAKER} △ Less

Submitted 17 May, 2023; originally announced May 2023.

Comments: Accepted to ACL 2023 (Main Conference)

arXiv:2305.09892 [pdf, other]

Clustering-Aware Negative Sampling for Unsupervised Sentence Representation

Authors: Jinghao Deng, Fanqi Wan, Tao Yang, Xiaojun Quan, Rui Wang

Abstract: Contrastive learning has been widely studied in sentence representation learning. However, earlier works mainly focus on the construction of positive examples, while in-batch samples are often simply treated as negative examples. This approach overlooks the importance of selecting appropriate negative examples, potentially leading to a scarcity of hard negatives and the inclusion of false negative… ▽ More Contrastive learning has been widely studied in sentence representation learning. However, earlier works mainly focus on the construction of positive examples, while in-batch samples are often simply treated as negative examples. This approach overlooks the importance of selecting appropriate negative examples, potentially leading to a scarcity of hard negatives and the inclusion of false negatives. To address these issues, we propose ClusterNS (Clustering-aware Negative Sampling), a novel method that incorporates cluster information into contrastive learning for unsupervised sentence representation learning. We apply a modified K-means clustering algorithm to supply hard negatives and recognize in-batch false negatives during training, aiming to solve the two issues in one unified framework. Experiments on semantic textual similarity (STS) tasks demonstrate that our proposed ClusterNS compares favorably with baselines in unsupervised sentence representation learning. Our code has been made publicly available. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: accepted to Finding of ACL2023, 16 pages

arXiv:2301.02571 [pdf]

doi 10.1088/1361-6668/acb17a

APC Nb$_3$Sn superconductors based on internal oxidation of Nb-Ta-Hf alloys

Authors: X Xu, X Peng, F Wan, J Rochester, G Bradford, J Jaroszynski, M Sumption

Abstract: In the last few years, a new type of Nb$_3$Sn superconducting composite, containing a high density of artificial pinning centers (APC) generated via an internal oxidation approach, has demonstrated a significantly superior performance relative to present, state-of-the-art commercial Nb$_3$Sn conductors. This was achieved via the internal oxidation of Nb-4at.%Ta-1at.%Zr alloy. On the other hand, ou… ▽ More In the last few years, a new type of Nb$_3$Sn superconducting composite, containing a high density of artificial pinning centers (APC) generated via an internal oxidation approach, has demonstrated a significantly superior performance relative to present, state-of-the-art commercial Nb$_3$Sn conductors. This was achieved via the internal oxidation of Nb-4at.%Ta-1at.%Zr alloy. On the other hand, our recent studies have shown that internal oxidation of Nb-Ta-Hf alloys can also lead to dramatic improvements in Nb$_3$Sn performance. In this work we follow up this latter approach, fabricating a 61-stack APC wire based on the internal oxidation of Nb-4at.%Ta-1at.%Hf alloy, and compare its critical current density (Jc) and irreversibility field (Birr) with APC wires made using Nb-4at.%Ta-1at.%Zr. A second goal of this work was to improve the filamentary design of APC wires in order to improve their wire quality and electromagnetic stability. Our new modifications have led to significantly improved RRR and stability in the conductors, while still keeping non-Cu Jc at or above the FCC Jc specification. Further improvement via optimization of the wire recipe and design is ongoing. Finally, additional work needed to make APC conductors ready for applications in magnets is discussed. △ Less

Submitted 27 November, 2023; v1 submitted 6 January, 2023; originally announced January 2023.

Comments: Matches published version

Report number: FERMILAB-PUB-22-576-TD

Showing 1–50 of 122 results for author: Wan, F