subscribe to arXiv mailings

Construction contract risk identification based on knowledge-augmented language model

Authors: Saika Wong, Chunmo Zheng, Xing Su, Yinqiu Tang

Abstract: Contract review is an essential step in construction projects to prevent potential losses. However, the current methods for reviewing construction contracts lack effectiveness and reliability, leading to time-consuming and error-prone processes. While large language models (LLMs) have shown promise in revolutionizing natural language processing (NLP) tasks, they struggle with domain-specific knowl… ▽ More Contract review is an essential step in construction projects to prevent potential losses. However, the current methods for reviewing construction contracts lack effectiveness and reliability, leading to time-consuming and error-prone processes. While large language models (LLMs) have shown promise in revolutionizing natural language processing (NLP) tasks, they struggle with domain-specific knowledge and addressing specialized issues. This paper presents a novel approach that leverages LLMs with construction contract knowledge to emulate the process of contract review by human experts. Our tuning-free approach incorporates construction contract domain knowledge to enhance language models for identifying construction contract risks. The use of a natural language when building the domain knowledge base facilitates practical implementation. We evaluated our method on real construction contracts and achieved solid performance. Additionally, we investigated how large language models employ logical thinking during the task and provide insights and recommendations for future research. △ Less

Submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.12132 [pdf]

A knowledge representation approach for construction contract knowledge modeling

Authors: Chunmo Zheng, Saika Wong, Xing Su, Yinqiu Tang

Abstract: The emergence of large language models (LLMs) presents an unprecedented opportunity to automate construction contract management, reducing human errors and saving significant time and costs. However, LLMs may produce convincing yet inaccurate and misleading content due to a lack of domain expertise. To address this issue, expert-driven contract knowledge can be represented in a structured manner t… ▽ More The emergence of large language models (LLMs) presents an unprecedented opportunity to automate construction contract management, reducing human errors and saving significant time and costs. However, LLMs may produce convincing yet inaccurate and misleading content due to a lack of domain expertise. To address this issue, expert-driven contract knowledge can be represented in a structured manner to constrain the automatic contract management process. This paper introduces the Nested Contract Knowledge Graph (NCKG), a knowledge representation approach that captures the complexity of contract knowledge using a nested structure. It includes a nested knowledge representation framework, a NCKG ontology built on the framework, and an implementation method. Furthermore, we present the LLM-assisted contract review pipeline enhanced with external knowledge in NCKG. Our pipeline achieves a promising performance in contract risk reviewing, shedding light on the combination of LLM and KG towards more reliable and interpretable contract management. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.08980 [pdf, other]

Differential Modulation for Short Packet Transmission in URLLC

Authors: Canjian Zheng, Fu-Chun Zheng, Jingjing Luo, Pengcheng Zhu, Xiaohu You, Daquan Feng

Abstract: One key feature of ultra-reliable low-latency communications (URLLC) in 5G is to support short packet transmission (SPT). However, the pilot overhead in SPT for channel estimation is relatively high, especially in high Doppler environments. In this paper, we advocate the adoption of differential modulation to support ultra-low latency services, which can ease the channel estimation burden and redu… ▽ More One key feature of ultra-reliable low-latency communications (URLLC) in 5G is to support short packet transmission (SPT). However, the pilot overhead in SPT for channel estimation is relatively high, especially in high Doppler environments. In this paper, we advocate the adoption of differential modulation to support ultra-low latency services, which can ease the channel estimation burden and reduce the power and bandwidth overhead incurred in traditional coherent modulation schemes. Specifically, we consider a multi-connectivity (MC) scheme employing differential modulation to enable URLLC services. The popular selection combining and maximal ratio combining schemes are respectively applied to explore the diversity gain in the MC scheme. A first-order autoregressive model is further utilized to characterize the time-varying nature of the channel. Theoretically, the maximum achievable rate and minimum achievable block error rate under ergodic fading channels with PSK inputs and perfect CSI are first derived by using the non-asymptotic information-theoretic bounds. The performance of SPT with differential modulation and MC schemes is then analysed by characterizing the effect of differential modulation and time-varying channels as a reduction in the effective SNR. Simulation results show that differential modulation does offer a significant advantage over the pilot-assisted coherent scheme for SPT, especially in high Doppler environments. △ Less

Submitted 16 September, 2023; originally announced September 2023.

Comments: 15 pages, 9 figures

arXiv:2309.08901 [pdf, other]

Combinatorial curvature flows for generalized hyperbolic circle packings

Authors: Te Ba, Chao Zheng

Abstract: Generalized circle packings were introduced in \cite{Ba-Hu-Sun} as a generalization of tangential circle packings in hyperbolic background geometry. In this paper, we introduce the combinatorial Calabi flow, fractional combinatorial Calabi flow and combinatorial $p$-th Calabi flow for generalized hyperbolic circle packings. We establish several equivalent conditions regarding the longtime behavior… ▽ More Generalized circle packings were introduced in \cite{Ba-Hu-Sun} as a generalization of tangential circle packings in hyperbolic background geometry. In this paper, we introduce the combinatorial Calabi flow, fractional combinatorial Calabi flow and combinatorial $p$-th Calabi flow for generalized hyperbolic circle packings. We establish several equivalent conditions regarding the longtime behaviors of these flows. This provides effective algorithms for finding the generalized circle packings with prescribed total geodesic curvatures. △ Less

Submitted 16 September, 2023; originally announced September 2023.

MSC Class: 52C26; 53E99; 57Q15

arXiv:2309.06685 [pdf, ps, other]

A discrete uniformization theorem for decorated piecewise Euclidean metrics on surfaces, II

Authors: Xu Xu, Chao Zheng

Abstract: In this paper, we study a natural discretization of the smooth Gaussian curvature on surfaces, which is defined as the quotient of the angle defect and the area of a geodesic disk at a vertex of a polyhedral surface. It is proved that each decorated piecewise Euclidean metric on surfaces with nonpositive Euler number is discrete conformal to a decorated piecewise Euclidean metric with this discret… ▽ More In this paper, we study a natural discretization of the smooth Gaussian curvature on surfaces, which is defined as the quotient of the angle defect and the area of a geodesic disk at a vertex of a polyhedral surface. It is proved that each decorated piecewise Euclidean metric on surfaces with nonpositive Euler number is discrete conformal to a decorated piecewise Euclidean metric with this discrete curvature constant. We further investigate the prescribing combinatorial curvature problem for a parametrization of this discrete curvature and prove some Kazdan-Warner type results. The main tools are Bobenko-Lutz's discrete conformal theory for decorated piecewise Euclidean metrics on surfaces and variational principles with constraints. △ Less

Submitted 12 September, 2023; originally announced September 2023.

MSC Class: 52C26

arXiv:2309.05674 [pdf, other]

ConvFormer: Plug-and-Play CNN-Style Transformers for Improving Medical Image Segmentation

Authors: Xian Lin, Zengqiang Yan, Xianbo Deng, Chuansheng Zheng, Li Yu

Abstract: Transformers have been extensively studied in medical image segmentation to build pairwise long-range dependence. Yet, relatively limited well-annotated medical image data makes transformers struggle to extract diverse global features, resulting in attention collapse where attention maps become similar or even identical. Comparatively, convolutional neural networks (CNNs) have better convergence p… ▽ More Transformers have been extensively studied in medical image segmentation to build pairwise long-range dependence. Yet, relatively limited well-annotated medical image data makes transformers struggle to extract diverse global features, resulting in attention collapse where attention maps become similar or even identical. Comparatively, convolutional neural networks (CNNs) have better convergence properties on small-scale training data but suffer from limited receptive fields. Existing works are dedicated to exploring the combinations of CNN and transformers while ignoring attention collapse, leaving the potential of transformers under-explored. In this paper, we propose to build CNN-style Transformers (ConvFormer) to promote better attention convergence and thus better segmentation performance. Specifically, ConvFormer consists of pooling, CNN-style self-attention (CSA), and convolutional feed-forward network (CFFN) corresponding to tokenization, self-attention, and feed-forward network in vanilla vision transformers. In contrast to positional embedding and tokenization, ConvFormer adopts 2D convolution and max-pooling for both position information preservation and feature size reduction. In this way, CSA takes 2D feature maps as inputs and establishes long-range dependency by constructing self-attention matrices as convolution kernels with adaptive sizes. Following CSA, 2D convolution is utilized for feature refinement through CFFN. Experimental results on multiple datasets demonstrate the effectiveness of ConvFormer working as a plug-and-play module for consistent performance improvement of transformer-based frameworks. Code is available at https://github.com/xianlin7/ConvFormer. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: Accepted by MICCAI 2023

arXiv:2309.05215 [pdf, other]

A discrete uniformization theorem for decorated piecewise Euclidean metrics on surfaces

Authors: Xu Xu, Chao Zheng

Abstract: In this paper, we introduce a new discretization of the Gaussian curvature on surfaces, which is defined as the quotient of the angle defect and the area of some dual cell of a weighted triangulation at the conic singularity. A discrete uniformization theorem for this discrete Gaussian curvature is established on surfaces with non-positive Euler number. The main tools are Bobenko-Lutz's discrete c… ▽ More In this paper, we introduce a new discretization of the Gaussian curvature on surfaces, which is defined as the quotient of the angle defect and the area of some dual cell of a weighted triangulation at the conic singularity. A discrete uniformization theorem for this discrete Gaussian curvature is established on surfaces with non-positive Euler number. The main tools are Bobenko-Lutz's discrete conformal theory for decorated piecewise Euclidean metrics on surfaces and variational principles with constraints. △ Less

Submitted 10 September, 2023; originally announced September 2023.

MSC Class: 52C26

arXiv:2309.05209 [pdf, other]

Phase-Specific Augmented Reality Guidance for Microscopic Cataract Surgery Using Long-Short Spatiotemporal Aggregation Transformer

Authors: Puxun Tu, Hongfei Ye, Haochen Shi, Jeff Young, Meng Xie, Peiquan Zhao, Ce Zheng, Xiaoyi Jiang, Xiaojun Chen

Abstract: Phacoemulsification cataract surgery (PCS) is a routine procedure conducted using a surgical microscope, heavily reliant on the skill of the ophthalmologist. While existing PCS guidance systems extract valuable information from surgical microscopic videos to enhance intraoperative proficiency, they suffer from non-phasespecific guidance, leading to redundant visual information. In this study, our… ▽ More Phacoemulsification cataract surgery (PCS) is a routine procedure conducted using a surgical microscope, heavily reliant on the skill of the ophthalmologist. While existing PCS guidance systems extract valuable information from surgical microscopic videos to enhance intraoperative proficiency, they suffer from non-phasespecific guidance, leading to redundant visual information. In this study, our major contribution is the development of a novel phase-specific augmented reality (AR) guidance system, which offers tailored AR information corresponding to the recognized surgical phase. Leveraging the inherent quasi-standardized nature of PCS procedures, we propose a two-stage surgical microscopic video recognition network. In the first stage, we implement a multi-task learning structure to segment the surgical limbus region and extract limbus region-focused spatial feature for each frame. In the second stage, we propose the long-short spatiotemporal aggregation transformer (LS-SAT) network to model local fine-grained and global temporal relationships, and combine the extracted spatial features to recognize the current surgical phase. Additionally, we collaborate closely with ophthalmologists to design AR visual cues by utilizing techniques such as limbus ellipse fitting and regional restricted normal cross-correlation rotation computation. We evaluated the network on publicly available and in-house datasets, with comparison results demonstrating its superior performance compared to related works. Ablation results further validated the effectiveness of the limbus region-focused spatial feature extractor and the combination of temporal features. Furthermore, the developed system was evaluated in a clinical setup, with results indicating remarkable accuracy and real-time performance. underscoring its potential for clinical applications. △ Less

Submitted 31 October, 2023; v1 submitted 10 September, 2023; originally announced September 2023.

arXiv:2309.04295 [pdf, other]

FIMO: A Challenge Formal Dataset for Automated Theorem Proving

Authors: Chengwu Liu, Jianhao Shen, Huajian Xin, Zhengying Liu, Ye Yuan, Haiming Wang, Wei Ju, Chuanyang Zheng, Yichun Yin, Lin Li, Ming Zhang, Qun Liu

Abstract: We present FIMO, an innovative dataset comprising formal mathematical problem statements sourced from the International Mathematical Olympiad (IMO) Shortlisted Problems. Designed to facilitate advanced automated theorem proving at the IMO level, FIMO is currently tailored for the Lean formal language. It comprises 149 formal problem statements, accompanied by both informal problem descriptions and… ▽ More We present FIMO, an innovative dataset comprising formal mathematical problem statements sourced from the International Mathematical Olympiad (IMO) Shortlisted Problems. Designed to facilitate advanced automated theorem proving at the IMO level, FIMO is currently tailored for the Lean formal language. It comprises 149 formal problem statements, accompanied by both informal problem descriptions and their corresponding LaTeX-based informal proofs. Through initial experiments involving GPT-4, our findings underscore the existing limitations in current methodologies, indicating a substantial journey ahead before achieving satisfactory IMO-level automated theorem proving outcomes. △ Less

Submitted 5 December, 2023; v1 submitted 8 September, 2023; originally announced September 2023.

Comments: Added a hyperlink to the dataset made accessible on GitHub

arXiv:2309.03882 [pdf, other]

Large Language Models Are Not Robust Multiple Choice Selectors

Authors: Chujie Zheng, Hao Zhou, Fandong Meng, Jie Zhou, Minlie Huang

Abstract: Multiple choice questions (MCQs) serve as a common yet important task format in the evaluation of large language models (LLMs). This work shows that modern LLMs are vulnerable to option position changes in MCQs due to their inherent "selection bias", namely, they prefer to select specific option IDs as answers (like "Option A"). Through extensive empirical analyses with 20 LLMs on three benchmarks… ▽ More Multiple choice questions (MCQs) serve as a common yet important task format in the evaluation of large language models (LLMs). This work shows that modern LLMs are vulnerable to option position changes in MCQs due to their inherent "selection bias", namely, they prefer to select specific option IDs as answers (like "Option A"). Through extensive empirical analyses with 20 LLMs on three benchmarks, we pinpoint that this behavioral bias primarily stems from LLMs' token bias, where the model a priori assigns more probabilistic mass to specific option ID tokens (e.g., A/B/C/D) when predicting answers from the option IDs. To mitigate selection bias, we propose a label-free, inference-time debiasing method, called PriDe, which separates the model's prior bias for option IDs from the overall prediction distribution. PriDe first estimates the prior by permutating option contents on a small number of test samples, and then applies the estimated prior to debias the remaining samples. We demonstrate that it achieves interpretable and transferable debiasing with high computational efficiency. We hope this work can draw broader research attention to the bias and robustness of modern LLMs. △ Less

Submitted 21 February, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

Comments: ICLR 2024 Spotlight

arXiv:2309.02457 [pdf, other]

Rigidity of generalized Thurston's sphere packings on 3-dimensional manifolds with boundary

Authors: Xu Xu, Chao Zheng

Abstract: Motivated by Guo-Luo's generalized circle packings on surfaces with boundary \cite{GL2}, we introduce the generalized Thurston's sphere packings on 3-dimensional manifolds with boundary. Then we investigate the rigidity of the generalized Thurston's sphere packings. We prove that the generalized Thurston's sphere packings are locally determined by the combinatorial scalar curvatures. We further pr… ▽ More Motivated by Guo-Luo's generalized circle packings on surfaces with boundary \cite{GL2}, we introduce the generalized Thurston's sphere packings on 3-dimensional manifolds with boundary. Then we investigate the rigidity of the generalized Thurston's sphere packings. We prove that the generalized Thurston's sphere packings are locally determined by the combinatorial scalar curvatures. We further prove the infinitesimal rigidity that the generalized Thurston's sphere packings can not be deformed while keeping the combinatorial Ricci curvatures fixed. △ Less

Submitted 3 September, 2023; originally announced September 2023.

Comments: arXiv admin note: text overlap with arXiv:2309.01205

MSC Class: 52C26

arXiv:2309.02318 [pdf, other]

TiAVox: Time-aware Attenuation Voxels for Sparse-view 4D DSA Reconstruction

Authors: Zhenghong Zhou, Huangxuan Zhao, Jiemin Fang, Dongqiao Xiang, Lei Chen, Lingxia Wu, Feihong Wu, Wenyu Liu, Chuansheng Zheng, Xinggang Wang

Abstract: Four-dimensional Digital Subtraction Angiography (4D DSA) plays a critical role in the diagnosis of many medical diseases, such as Arteriovenous Malformations (AVM) and Arteriovenous Fistulas (AVF). Despite its significant application value, the reconstruction of 4D DSA demands numerous views to effectively model the intricate vessels and radiocontrast flow, thereby implying a significant radiatio… ▽ More Four-dimensional Digital Subtraction Angiography (4D DSA) plays a critical role in the diagnosis of many medical diseases, such as Arteriovenous Malformations (AVM) and Arteriovenous Fistulas (AVF). Despite its significant application value, the reconstruction of 4D DSA demands numerous views to effectively model the intricate vessels and radiocontrast flow, thereby implying a significant radiation dose. To address this high radiation issue, we propose a Time-aware Attenuation Voxel (TiAVox) approach for sparse-view 4D DSA reconstruction, which paves the way for high-quality 4D imaging. Additionally, 2D and 3D DSA imaging results can be generated from the reconstructed 4D DSA images. TiAVox introduces 4D attenuation voxel grids, which reflect attenuation properties from both spatial and temporal dimensions. It is optimized by minimizing discrepancies between the rendered images and sparse 2D DSA images. Without any neural network involved, TiAVox enjoys specific physical interpretability. The parameters of each learnable voxel represent the attenuation coefficients. We validated the TiAVox approach on both clinical and simulated datasets, achieving a 31.23 Peak Signal-to-Noise Ratio (PSNR) for novel view synthesis using only 30 views on the clinically sourced dataset, whereas traditional Feldkamp-Davis-Kress methods required 133 views. Similarly, with merely 10 views from the synthetic dataset, TiAVox yielded a PSNR of 34.32 for novel view synthesis and 41.40 for 3D reconstruction. We also executed ablation studies to corroborate the essential components of TiAVox. The code will be publically available. △ Less

Submitted 19 December, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: 10 pages, 8 figures

arXiv:2309.01205 [pdf, other]

Rigidity and deformation of generalized sphere packings on 3-dimensional manifolds with boundary

Authors: Xu Xu, Chao Zheng

Abstract: Motivated by Guo-Luo's generalized circle packings on surfaces with boundary \cite{GL2}, we introduce the generalized sphere packings on 3-dimensional manifolds with boundary. Then we investigate the rigidity of the generalized sphere packing metrics. We prove that the generalized sphere packing metric is determined by the combinatorial scalar curvature. To find the hyper-ideal polyhedral metrics… ▽ More Motivated by Guo-Luo's generalized circle packings on surfaces with boundary \cite{GL2}, we introduce the generalized sphere packings on 3-dimensional manifolds with boundary. Then we investigate the rigidity of the generalized sphere packing metrics. We prove that the generalized sphere packing metric is determined by the combinatorial scalar curvature. To find the hyper-ideal polyhedral metrics on 3-dimensional manifolds with prescribed combinatorial scalar curvature, we introduce the combinatorial Ricci flow and combinatorial Calabi flow for the generalized sphere packings on 3-dimensional manifolds with boundary. Then we study the longtime existence and convergence for the solutions of these combinatorial curvature flows. △ Less

Submitted 3 September, 2023; originally announced September 2023.

Comments: To appear in CVPDE

arXiv:2308.16838 [pdf, ps, other]

On cohomological characterizations of endotrivial modules

Authors: Fei Xu, Chenyou Zheng

Abstract: Given a general finite group $G$, there are various finite categories whose cohomology theories are of great interests. Recently Balmer and Grodal gave some new characterizations of the groups of endotrivial modules, via Čech cohomology and category cohomology, respectively, defined on certain orbit categories. These two seemingly different approaches share a common root in topos theory. We shall… ▽ More Given a general finite group $G$, there are various finite categories whose cohomology theories are of great interests. Recently Balmer and Grodal gave some new characterizations of the groups of endotrivial modules, via Čech cohomology and category cohomology, respectively, defined on certain orbit categories. These two seemingly different approaches share a common root in topos theory. We shall demonstrate the connection, which leads to a better understanding as well as new characterizations of the group of endotrivial modules. △ Less

Submitted 31 August, 2023; originally announced August 2023.

MSC Class: 20C20; 18F10; 18F20; 18A25

arXiv:2308.14583 [pdf, other]

Learning to Read Analog Gauges from Synthetic Data

Authors: Juan Leon-Alcazar, Yazeed Alnumay, Cheng Zheng, Hassane Trigui, Sahejad Patel, Bernard Ghanem

Abstract: Manually reading and logging gauge data is time inefficient, and the effort increases according to the number of gauges available. We present a computer vision pipeline that automates the reading of analog gauges. We propose a two-stage CNN pipeline that identifies the key structural components of an analog gauge and outputs an angular reading. To facilitate the training of our approach, a synthet… ▽ More Manually reading and logging gauge data is time inefficient, and the effort increases according to the number of gauges available. We present a computer vision pipeline that automates the reading of analog gauges. We propose a two-stage CNN pipeline that identifies the key structural components of an analog gauge and outputs an angular reading. To facilitate the training of our approach, a synthetic dataset is generated thus obtaining a set of realistic analog gauges with their corresponding annotation. To validate our proposal, an additional real-world dataset was collected with 4.813 manually curated images. When compared against state-of-the-art methodologies, our method shows a significant improvement of 4.55 in the average error, which is a 52% relative improvement. The resources for this project will be made available at: https://github.com/fuankarion/automatic-gauge-reading. △ Less

Submitted 28 August, 2023; originally announced August 2023.

Journal ref: Winter Conference on Applications of Computer Vision 2024

arXiv:2308.14267 [pdf, other]

Unleash Model Potential: Bootstrapped Meta Self-supervised Learning

Authors: Jingyao Wang, Zeen Song, Wenwen Qiang, Changwen Zheng

Abstract: The long-term goal of machine learning is to learn general visual representations from a small amount of data without supervision, mimicking three advantages of human cognition: i) no need for labels, ii) robustness to data scarcity, and iii) learning from experience. Self-supervised learning and meta-learning are two promising techniques to achieve this goal, but they both only partially capture… ▽ More The long-term goal of machine learning is to learn general visual representations from a small amount of data without supervision, mimicking three advantages of human cognition: i) no need for labels, ii) robustness to data scarcity, and iii) learning from experience. Self-supervised learning and meta-learning are two promising techniques to achieve this goal, but they both only partially capture the advantages and fail to address all the problems. Self-supervised learning struggles to overcome the drawbacks of data scarcity, while ignoring prior knowledge that can facilitate learning and generalization. Meta-learning relies on supervised information and suffers from a bottleneck of insufficient learning. To address these issues, we propose a novel Bootstrapped Meta Self-Supervised Learning (BMSSL) framework that aims to simulate the human learning process. We first analyze the close relationship between meta-learning and self-supervised learning. Based on this insight, we reconstruct tasks to leverage the strengths of both paradigms, achieving advantages i and ii. Moreover, we employ a bi-level optimization framework that alternates between solving specific tasks with a learned ability (first level) and improving this ability (second level), attaining advantage iii. To fully harness its power, we introduce a bootstrapped target based on meta-gradient to make the model its own teacher. We validate the effectiveness of our approach with comprehensive theoretical and empirical study. △ Less

Submitted 27 August, 2023; originally announced August 2023.

Comments: submitted to NIPS

arXiv:2308.13678 [pdf, other]

Textureless Deformable Surface Reconstruction with Invisible Markers

Authors: Xinyuan Li, Yu Ji, Yanchen Liu, Xiaochen Hu, Jinwei Ye, Changxi Zheng

Abstract: Reconstructing and tracking deformable surface with little or no texture has posed long-standing challenges. Fundamentally, the challenges stem from textureless surfaces lacking features for establishing cross-image correspondences. In this work, we present a novel type of markers to proactively enrich the object's surface features, and thereby ease the 3D surface reconstruction and correspondence… ▽ More Reconstructing and tracking deformable surface with little or no texture has posed long-standing challenges. Fundamentally, the challenges stem from textureless surfaces lacking features for establishing cross-image correspondences. In this work, we present a novel type of markers to proactively enrich the object's surface features, and thereby ease the 3D surface reconstruction and correspondence tracking. Our markers are made of fluorescent dyes, visible only under the ultraviolet (UV) light and invisible under regular lighting condition. Leveraging the markers, we design a multi-camera system that captures surface deformation under the UV light and the visible light in a time multiplexing fashion. Under the UV light, markers on the object emerge to enrich its surface texture, allowing high-quality 3D shape reconstruction and tracking. Under the visible light, markers become invisible, allowing us to capture the object's original untouched appearance. We perform experiments on various challenging scenes, including hand gestures, facial expressions, waving cloth, and hand-object interaction. In all these cases, we demonstrate that our system is able to produce robust, high-quality 3D reconstruction and tracking. △ Less

Submitted 25 August, 2023; originally announced August 2023.

arXiv:2308.12952 [pdf, other]

BridgeData V2: A Dataset for Robot Learning at Scale

Authors: Homer Walke, Kevin Black, Abraham Lee, Moo Jin Kim, Max Du, Chongyi Zheng, Tony Zhao, Philippe Hansen-Estruch, Quan Vuong, Andre He, Vivek Myers, Kuan Fang, Chelsea Finn, Sergey Levine

Abstract: We introduce BridgeData V2, a large and diverse dataset of robotic manipulation behaviors designed to facilitate research on scalable robot learning. BridgeData V2 contains 60,096 trajectories collected across 24 environments on a publicly available low-cost robot. BridgeData V2 provides extensive task and environment variability, leading to skills that can generalize across environments, domains,… ▽ More We introduce BridgeData V2, a large and diverse dataset of robotic manipulation behaviors designed to facilitate research on scalable robot learning. BridgeData V2 contains 60,096 trajectories collected across 24 environments on a publicly available low-cost robot. BridgeData V2 provides extensive task and environment variability, leading to skills that can generalize across environments, domains, and institutions, making the dataset a useful resource for a broad range of researchers. Additionally, the dataset is compatible with a wide variety of open-vocabulary, multi-task learning methods conditioned on goal images or natural language instructions. In our experiments, we train 6 state-of-the-art imitation learning and offline reinforcement learning methods on our dataset, and find that they succeed on a suite of tasks requiring varying amounts of generalization. We also demonstrate that the performance of these methods improves with more data and higher capacity models, and that training on a greater variety of skills leads to improved generalization. By publicly sharing BridgeData V2 and our pre-trained models, we aim to accelerate research in scalable robot learning methods. Project page at https://rail-berkeley.github.io/bridgedata △ Less

Submitted 17 January, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: 9 pages

arXiv:2308.11362 [pdf, other]

Calibration of the Timing Performance of GECAM-C

Authors: Shuo Xiao, Ya-Qing Liu, Ke Gong, Zheng-Hua An, Shao-Lin Xiong, Xin-Qiao Li, Xiang-Yang Wen, Wen-Xi Peng, Da-Li Zhang, You-Li Tuo, Shi-Jie Zheng, Li-Ming Song, Ping Wang, Xiao-Yun Zhao, Yue Huang, Xiang Ma, Xiao-Jing Liu, Rui Qiao, Yan-Bing Xu, Sheng Yang, Fan Zhang, Yue Wang, Yan-Qiu Zhang, Wang-Chen Xue, Jia-Cong Liu , et al. (13 additional authors not shown)

Abstract: As a new member of the Gravitational wave high-energy Electromagnetic Counterpart All-sky Monitor (GECAM) after GECAM-A and GECAM-B, GECAM-C (originally called HEBS), which was launched on board the SATech-01 satellite on July 27, 2022, aims to monitor and localize X-ray and gamma-ray transients from $\sim$ 6 keV to 6 MeV. GECAM-C utilizes a similar design to GECAM but operates in a more complex o… ▽ More As a new member of the Gravitational wave high-energy Electromagnetic Counterpart All-sky Monitor (GECAM) after GECAM-A and GECAM-B, GECAM-C (originally called HEBS), which was launched on board the SATech-01 satellite on July 27, 2022, aims to monitor and localize X-ray and gamma-ray transients from $\sim$ 6 keV to 6 MeV. GECAM-C utilizes a similar design to GECAM but operates in a more complex orbital environment. In this work, we utilize the secondary particles simultaneously produced by the cosmic-ray events on orbit and recorded by multiple detectors, to calibrate the relative timing accuracy between all detectors of GECAM-C. We find the result is 0.1 $��\rm s$, which is the highest time resolution among all GRB detectors ever flown and very helpful in timing analyses such as minimum variable timescale and spectral lags, as well as in time delay localization. Besides, we calibrate the absolute time accuracy using the one-year Crab pulsar data observed by GECAM-C and Fermi/GBM, as well as GECAM-C and GECAM-B. The results are $2.02\pm 2.26\ μ\rm s$ and $5.82\pm 3.59\ μ\rm s$, respectively. Finally, we investigate the spectral lag between the different energy bands of Crab pulsar observed by GECAM and GBM, which is $\sim -0.2\ {\rm μs\ keV^{-1}}$. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: submitted

arXiv:2308.10522 [pdf, other]

Information Theory-Guided Heuristic Progressive Multi-View Coding

Authors: Jiangmeng Li, Hang Gao, Wenwen Qiang, Changwen Zheng

Abstract: Multi-view representation learning aims to capture comprehensive information from multiple views of a shared context. Recent works intuitively apply contrastive learning to different views in a pairwise manner, which is still scalable: view-specific noise is not filtered in learning view-shared representations; the fake negative pairs, where the negative terms are actually within the same class as… ▽ More Multi-view representation learning aims to capture comprehensive information from multiple views of a shared context. Recent works intuitively apply contrastive learning to different views in a pairwise manner, which is still scalable: view-specific noise is not filtered in learning view-shared representations; the fake negative pairs, where the negative terms are actually within the same class as the positive, and the real negative pairs are coequally treated; evenly measuring the similarities between terms might interfere with optimization. Importantly, few works study the theoretical framework of generalized self-supervised multi-view learning, especially for more than two views. To this end, we rethink the existing multi-view learning paradigm from the perspective of information theory and then propose a novel information theoretical framework for generalized multi-view learning. Guided by it, we build a multi-view coding method with a three-tier progressive architecture, namely Information theory-guided hierarchical Progressive Multi-view Coding (IPMC). In the distribution-tier, IPMC aligns the distribution between views to reduce view-specific noise. In the set-tier, IPMC constructs self-adjusted contrasting pools, which are adaptively modified by a view filter. Lastly, in the instance-tier, we adopt a designed unified loss to learn representations and reduce the gradient interference. Theoretically and empirically, we demonstrate the superiority of IPMC over state-of-the-art methods. △ Less

Submitted 23 August, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

Comments: This paper is accepted by the jourcal of Neural Networks (Elsevier) by 2023. arXiv admin note: substantial text overlap with arXiv:2109.02344

arXiv:2308.09944 [pdf, other]

doi 10.1016/j.neunet.2024.106320

Spatial Reconstructed Local Attention Res2Net with F0 Subband for Fake Speech Detection

Authors: Cunhang Fan, Jun Xue, Jianhua Tao, Jiangyan Yi, Chenglong Wang, Chengshi Zheng, Zhao Lv

Abstract: The rhythm of bonafide speech is often difficult to replicate, which causes that the fundamental frequency (F0) of synthetic speech is significantly different from that of real speech. It is expected that the F0 feature contains the discriminative information for the fake speech detection (FSD) task. In this paper, we propose a novel F0 subband for FSD. In addition, to effectively model the F0 sub… ▽ More The rhythm of bonafide speech is often difficult to replicate, which causes that the fundamental frequency (F0) of synthetic speech is significantly different from that of real speech. It is expected that the F0 feature contains the discriminative information for the fake speech detection (FSD) task. In this paper, we propose a novel F0 subband for FSD. In addition, to effectively model the F0 subband so as to improve the performance of FSD, the spatial reconstructed local attention Res2Net (SR-LA Res2Net) is proposed. Specifically, Res2Net is used as a backbone network to obtain multiscale information, and enhanced with a spatial reconstruction mechanism to avoid losing important information when the channel group is constantly superimposed. In addition, local attention is designed to make the model focus on the local information of the F0 subband. Experimental results on the ASVspoof 2019 LA dataset show that our proposed method obtains an equal error rate (EER) of 0.47% and a minimum tandem detection cost function (min t-DCF) of 0.0159, achieving the state-of-the-art performance among all of the single systems. △ Less

Submitted 8 July, 2024; v1 submitted 19 August, 2023; originally announced August 2023.

Comments: Accept by Neural Networks

arXiv:2308.04706 [pdf, other]

doi 10.1145/3581783.3612591

Pareto Invariant Representation Learning for Multimedia Recommendation

Authors: Shanshan Huang, Haoxuan Li, Qingsong Li, Chunyuan Zheng, Li Liu

Abstract: Multimedia recommendation involves personalized ranking tasks, where multimedia content is usually represented using a generic encoder. However, these generic representations introduce spurious correlations that fail to reveal users' true preferences. Existing works attempt to alleviate this problem by learning invariant representations, but overlook the balance between independent and identically… ▽ More Multimedia recommendation involves personalized ranking tasks, where multimedia content is usually represented using a generic encoder. However, these generic representations introduce spurious correlations that fail to reveal users' true preferences. Existing works attempt to alleviate this problem by learning invariant representations, but overlook the balance between independent and identically distributed (IID) and out-of-distribution (OOD) generalization. In this paper, we propose a framework called Pareto Invariant Representation Learning (PaInvRL) to mitigate the impact of spurious correlations from an IID-OOD multi-objective optimization perspective, by learning invariant representations (intrinsic factors that attract user attention) and variant representations (other factors) simultaneously. Specifically, PaInvRL includes three iteratively executed modules: (i) heterogeneous identification module, which identifies the heterogeneous environments to reflect distributional shifts for user-item interactions; (ii) invariant mask generation module, which learns invariant masks based on the Pareto-optimal solutions that minimize the adaptive weighted Invariant Risk Minimization (IRM) and Empirical Risk (ERM) losses; (iii) convert module, which generates both variant representations and item-invariant representations for training a multi-modal recommendation model that mitigates spurious correlations and balances the generalization performance within and cross the environmental distributions. We compare the proposed PaInvRL with state-of-the-art recommendation models on three public multimedia recommendation datasets (Movielens, Tiktok, and Kwai), and the experimental results validate the effectiveness of PaInvRL for both within- and cross-environmental learning. △ Less

Submitted 23 August, 2023; v1 submitted 9 August, 2023; originally announced August 2023.

Comments: ACM MM 2023 full paper

arXiv:2308.04583 [pdf, other]

LATR: 3D Lane Detection from Monocular Images with Transformer

Authors: Yueru Luo, Chaoda Zheng, Xu Yan, Tang Kun, Chao Zheng, Shuguang Cui, Zhen Li

Abstract: 3D lane detection from monocular images is a fundamental yet challenging task in autonomous driving. Recent advances primarily rely on structural 3D surrogates (e.g., bird's eye view) built from front-view image features and camera parameters. However, the depth ambiguity in monocular images inevitably causes misalignment between the constructed surrogate feature map and the original image, posing… ▽ More 3D lane detection from monocular images is a fundamental yet challenging task in autonomous driving. Recent advances primarily rely on structural 3D surrogates (e.g., bird's eye view) built from front-view image features and camera parameters. However, the depth ambiguity in monocular images inevitably causes misalignment between the constructed surrogate feature map and the original image, posing a great challenge for accurate lane detection. To address the above issue, we present a novel LATR model, an end-to-end 3D lane detector that uses 3D-aware front-view features without transformed view representation. Specifically, LATR detects 3D lanes via cross-attention based on query and key-value pairs, constructed using our lane-aware query generator and dynamic 3D ground positional embedding. On the one hand, each query is generated based on 2D lane-aware features and adopts a hybrid embedding to enhance lane information. On the other hand, 3D space information is injected as positional embedding from an iteratively-updated 3D ground plane. LATR outperforms previous state-of-the-art methods on both synthetic Apollo, realistic OpenLane and ONCE-3DLanes by large margins (e.g., 11.4 gain in terms of F1 score on OpenLane). Code will be released at https://github.com/JMoonr/LATR . △ Less

Submitted 20 August, 2023; v1 submitted 8 August, 2023; originally announced August 2023.

Comments: Accepted by ICCV2023 (Oral)

arXiv:2308.03313 [pdf]

Quantifying the Impact of Large Language Models on Collective Opinion Dynamics

Authors: Chao Li, Xing Su, Haoying Han, Cong Xue, Chunmo Zheng, Chao Fan

Abstract: The process of opinion expression and exchange is a critical component of democratic societies. As people interact with large language models (LLMs) in the opinion shaping process different from traditional media, the impacts of LLMs are increasingly recognized and being concerned. However, the knowledge about how LLMs affect the process of opinion expression and exchange of social opinion network… ▽ More The process of opinion expression and exchange is a critical component of democratic societies. As people interact with large language models (LLMs) in the opinion shaping process different from traditional media, the impacts of LLMs are increasingly recognized and being concerned. However, the knowledge about how LLMs affect the process of opinion expression and exchange of social opinion networks is very limited. Here, we create an opinion network dynamics model to encode the opinions of LLMs, cognitive acceptability and usage strategies of individuals, and simulate the impact of LLMs on opinion dynamics in a variety of scenarios. The outcomes of the simulations inform about effective demand-oriented opinion network interventions. The results from this study suggested that the output opinion of LLMs has a unique and positive effect on the collective opinion difference. The marginal effect of cognitive acceptability on collective opinion formation is nonlinear and shows a decreasing trend. When people partially rely on LLMs, the exchange process of opinion becomes more intense and the diversity of opinion becomes more favorable. In fact, there is 38.6% more opinion diversity when people all partially rely on LLMs, compared to prohibiting the use of LLMs entirely. The optimal diversity of opinion was found when the fractions of people who do not use, partially rely on, and fully rely on LLMs reached roughly 4:12:1. Our experiments also find that introducing extra agents with opposite/neutral/random opinions, we can effectively mitigate the impact of biased/toxic output from LLMs. Our findings provide valuable insights into opinion dynamics in the age of LLMs, highlighting the need for customized interventions tailored to specific scenarios to address the drawbacks of improper output and use of LLMs. △ Less

Submitted 25 August, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

Comments: 21 pages, 4figures,2tables

arXiv:2308.03202 [pdf, other]

Source-free Domain Adaptive Human Pose Estimation

Authors: Qucheng Peng, Ce Zheng, Chen Chen

Abstract: Human Pose Estimation (HPE) is widely used in various fields, including motion analysis, healthcare, and virtual reality. However, the great expenses of labeled real-world datasets present a significant challenge for HPE. To overcome this, one approach is to train HPE models on synthetic datasets and then perform domain adaptation (DA) on real-world data. Unfortunately, existing DA methods for HPE… ▽ More Human Pose Estimation (HPE) is widely used in various fields, including motion analysis, healthcare, and virtual reality. However, the great expenses of labeled real-world datasets present a significant challenge for HPE. To overcome this, one approach is to train HPE models on synthetic datasets and then perform domain adaptation (DA) on real-world data. Unfortunately, existing DA methods for HPE neglect data privacy and security by using both source and target data in the adaptation process. To this end, we propose a new task, named source-free domain adaptive HPE, which aims to address the challenges of cross-domain learning of HPE without access to source data during the adaptation process. We further propose a novel framework that consists of three models: source model, intermediate model, and target model, which explores the task from both source-protect and target-relevant perspectives. The source-protect module preserves source information more effectively while resisting noise, and the target-relevant module reduces the sparsity of spatial representations by building a novel spatial probability space, and pose-specific contrastive learning and information maximization are proposed on the basis of this space. Comprehensive experiments on several domain adaptive HPE benchmarks show that the proposed method outperforms existing approaches by a considerable margin. The codes are available at https://github.com/davidpengucf/SFDAHPE. △ Less

Submitted 18 August, 2023; v1 submitted 6 August, 2023; originally announced August 2023.

Comments: Accepted by ICCV 2023

Journal ref: https://openaccess.thecvf.com/content/ICCV2023/papers/Peng_Source-free_Domain_Adaptive_Human_Pose_Estimation_ICCV_2023_paper.pdf

arXiv:2308.02571 [pdf, other]

ADRNet: A Generalized Collaborative Filtering Framework Combining Clinical and Non-Clinical Data for Adverse Drug Reaction Prediction

Authors: Haoxuan Li, Taojun Hu, Zetong Xiong, Chunyuan Zheng, Fuli Feng, Xiangnan He, Xiao-Hua Zhou

Abstract: Adverse drug reaction (ADR) prediction plays a crucial role in both health care and drug discovery for reducing patient mortality and enhancing drug safety. Recently, many studies have been devoted to effectively predict the drug-ADRs incidence rates. However, these methods either did not effectively utilize non-clinical data, i.e., physical, chemical, and biological information about the drug, or… ▽ More Adverse drug reaction (ADR) prediction plays a crucial role in both health care and drug discovery for reducing patient mortality and enhancing drug safety. Recently, many studies have been devoted to effectively predict the drug-ADRs incidence rates. However, these methods either did not effectively utilize non-clinical data, i.e., physical, chemical, and biological information about the drug, or did little to establish a link between content-based and pure collaborative filtering during the training phase. In this paper, we first formulate the prediction of multi-label ADRs as a drug-ADR collaborative filtering problem, and to the best of our knowledge, this is the first work to provide extensive benchmark results of previous collaborative filtering methods on two large publicly available clinical datasets. Then, by exploiting the easy accessible drug characteristics from non-clinical data, we propose ADRNet, a generalized collaborative filtering framework combining clinical and non-clinical data for drug-ADR prediction. Specifically, ADRNet has a shallow collaborative filtering module and a deep drug representation module, which can exploit the high-dimensional drug descriptors to further guide the learning of low-dimensional ADR latent embeddings, which incorporates both the benefits of collaborative filtering and representation learning. Extensive experiments are conducted on two publicly available real-world drug-ADR clinical datasets and two non-clinical datasets to demonstrate the accuracy and efficiency of the proposed ADRNet. The code is available at https://github.com/haoxuanli-pku/ADRnet. △ Less

Submitted 3 August, 2023; originally announced August 2023.

Comments: RecSys '23

arXiv:2308.02506 [pdf, other]

Improving the Generalization Ability in Essay Coherence Evaluation through Monotonic Constraints

Authors: Chen Zheng, Huan Zhang, Yan Zhao, Yuxuan Lai

Abstract: Coherence is a crucial aspect of evaluating text readability and can be assessed through two primary factors when evaluating an essay in a scoring scenario. The first factor is logical coherence, characterized by the appropriate use of discourse connectives and the establishment of logical relationships between sentences. The second factor is the appropriateness of punctuation, as inappropriate pu… ▽ More Coherence is a crucial aspect of evaluating text readability and can be assessed through two primary factors when evaluating an essay in a scoring scenario. The first factor is logical coherence, characterized by the appropriate use of discourse connectives and the establishment of logical relationships between sentences. The second factor is the appropriateness of punctuation, as inappropriate punctuation can lead to confused sentence structure. To address these concerns, we propose a coherence scoring model consisting of a regression model with two feature extractors: a local coherence discriminative model and a punctuation correction model. We employ gradient-boosting regression trees as the regression model and impose monotonicity constraints on the input features. The results show that our proposed model better generalizes unseen data. The model achieved third place in track 1 of NLPCC 2023 shared task 7. Additionally, we briefly introduce our solution for the remaining tracks, which achieves second place for track 2 and first place for both track 3 and track 4. △ Less

Submitted 25 July, 2023; originally announced August 2023.

Comments: 12 pages, 1 figure, accepted to NLPCC 2023

arXiv:2308.02271 [pdf, other]

Combinatorial curvature flows with surgery for inversive distance circle packings on surfaces

Authors: Xu Xu, Chao Zheng

Abstract: Inversive distance circle packings introduced by Bowers-Stephenson are natural generalizations of Thurston's circle packings on surfaces. To find piecewise Euclidean metrics on surfaces with prescribed combinatorial curvatures, we introduce the combinatorial Calabi flow, the fractional combinatorial Calabi flow and the combinatorial $p$-th Calabi flow for the Euclidean inversive distance circle pa… ▽ More Inversive distance circle packings introduced by Bowers-Stephenson are natural generalizations of Thurston's circle packings on surfaces. To find piecewise Euclidean metrics on surfaces with prescribed combinatorial curvatures, we introduce the combinatorial Calabi flow, the fractional combinatorial Calabi flow and the combinatorial $p$-th Calabi flow for the Euclidean inversive distance circle packings. Due to the singularities possibly developed by these combinatorial curvature flows, the longtime existence and convergence of these combinatorial curvature flows have been a difficult problem for a long time. To handle the potential singularities along these combinatorial curvature flows, we do surgery along these flows by edge flipping under the weighted Delaunay condition. Using the discrete conformal theory recently established by Bobenko-Lutz for decorated piecewise Euclidean metrics on surfaces, we prove the longtime existence and global convergence for the solutions of these combinatorial curvature flows with surgery. This provides effective algorithms for finding piecewise Euclidean metrics on surfaces with prescribed combinatorial curvatures. △ Less

Submitted 4 August, 2023; originally announced August 2023.

arXiv:2308.00729 [pdf, other]

Ada-DQA: Adaptive Diverse Quality-aware Feature Acquisition for Video Quality Assessment

Authors: Hongbo Liu, Mingda Wu, Kun Yuan, Ming Sun, Yansong Tang, Chuanchuan Zheng, Xing Wen, Xiu Li

Abstract: Video quality assessment (VQA) has attracted growing attention in recent years. While the great expense of annotating large-scale VQA datasets has become the main obstacle for current deep-learning methods. To surmount the constraint of insufficient training data, in this paper, we first consider the complete range of video distribution diversity (\ie content, distortion, motion) and employ divers… ▽ More Video quality assessment (VQA) has attracted growing attention in recent years. While the great expense of annotating large-scale VQA datasets has become the main obstacle for current deep-learning methods. To surmount the constraint of insufficient training data, in this paper, we first consider the complete range of video distribution diversity (\ie content, distortion, motion) and employ diverse pretrained models (\eg architecture, pretext task, pre-training dataset) to benefit quality representation. An Adaptive Diverse Quality-aware feature Acquisition (Ada-DQA) framework is proposed to capture desired quality-related features generated by these frozen pretrained models. By leveraging the Quality-aware Acquisition Module (QAM), the framework is able to extract more essential and relevant features to represent quality. Finally, the learned quality representation is utilized as supplementary supervisory information, along with the supervision of the labeled quality score, to guide the training of a relatively lightweight VQA model in a knowledge distillation manner, which largely reduces the computational cost during inference. Experimental results on three mainstream no-reference VQA benchmarks clearly show the superior performance of Ada-DQA in comparison with current state-of-the-art approaches without using extra training data of VQA. △ Less

Submitted 1 August, 2023; originally announced August 2023.

Comments: 10 pages, 5 figures, to appear in ACM MM 2023

arXiv:2307.16813 [pdf, other]

Capturing Co-existing Distortions in User-Generated Content for No-reference Video Quality Assessment

Authors: Kun Yuan, Zishang Kong, Chuanchuan Zheng, Ming Sun, Xing Wen

Abstract: Video Quality Assessment (VQA), which aims to predict the perceptual quality of a video, has attracted raising attention with the rapid development of streaming media technology, such as Facebook, TikTok, Kwai, and so on. Compared with other sequence-based visual tasks (\textit{e.g.,} action recognition), VQA faces two under-estimated challenges unresolved in User Generated Content (UGC) videos. \… ▽ More Video Quality Assessment (VQA), which aims to predict the perceptual quality of a video, has attracted raising attention with the rapid development of streaming media technology, such as Facebook, TikTok, Kwai, and so on. Compared with other sequence-based visual tasks (\textit{e.g.,} action recognition), VQA faces two under-estimated challenges unresolved in User Generated Content (UGC) videos. \textit{First}, it is not rare that several frames containing serious distortions (\textit{e.g.,}blocking, blurriness), can determine the perceptual quality of the whole video, while other sequence-based tasks require more frames of equal importance for representations. \textit{Second}, the perceptual quality of a video exhibits a multi-distortion distribution, due to the differences in the duration and probability of occurrence for various distortions. In order to solve the above challenges, we propose \textit{Visual Quality Transformer (VQT)} to extract quality-related sparse features more efficiently. Methodologically, a Sparse Temporal Attention (STA) is proposed to sample keyframes by analyzing the temporal correlation between frames, which reduces the computational complexity from $O(T^2)$ to $O(T \log T)$. Structurally, a Multi-Pathway Temporal Network (MPTN) utilizes multiple STA modules with different degrees of sparsity in parallel, capturing co-existing distortions in a video. Experimentally, VQT demonstrates superior performance than many \textit{state-of-the-art} methods in three public no-reference VQA datasets. Furthermore, VQT shows better performance in four full-reference VQA datasets against widely-adopted industrial algorithms (\textit{i.e.,} VMAF and AVQT). △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: 10 pages, 7 figures, to appear in ACM MM 2023

arXiv:2307.16629 [pdf]

Reliable Synthesis of Large-Area Monolayer WS2 Single Crystals, Films, and Heterostructures with Extraordinary Photoluminescence Induced by Water Intercalation

Authors: Qianhui Zhang, Jianfeng Lu, Ziyu Wang, Zhigao Dai, Yupeng Zhang, Fuzhi Huang, Qiaoliang Bao, Wenhui Duan, Michael S. Fuhrer, Changxi Zheng

Abstract: Two-dimensional (2D) transition metal dichalcogenides (TMDs) hold great potential for future low-energy optoelectronics owing to their unique electronic, optical, and mechanical properties. Chemical vapor deposition (CVD) is the technique widely used for the synthesis of large-area TMDs. However, due to high sensitivity to the growth environment, reliable synthesis of monolayer TMDs via CVD remain… ▽ More Two-dimensional (2D) transition metal dichalcogenides (TMDs) hold great potential for future low-energy optoelectronics owing to their unique electronic, optical, and mechanical properties. Chemical vapor deposition (CVD) is the technique widely used for the synthesis of large-area TMDs. However, due to high sensitivity to the growth environment, reliable synthesis of monolayer TMDs via CVD remains challenging. Here we develop a controllable CVD process for large-area synthesis of monolayer WS2 crystals, films, and in-plane graphene-WS2 heterostructures by cleaning the reaction tube with hydrochloric acid, sulfuric acid and aqua regia. The concise cleaning process can remove the residual contaminates attached to the CVD reaction tube and crucibles, reducing the nucleation density but enhancing the diffusion length of WS2 species. The photoluminescence (PL) mappings of a WS2 single crystal and film reveal that the extraordinary PL around the edges of a triangular single crystal is induced by ambient water intercalation at the WS2-sapphire interface. The extraordinary PL can be controlled by the choice of substrates with different wettabilities. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Journal ref: Advanced Optical Materials, 6(12), p.1701347 (2018)

arXiv:2307.16618 [pdf]

A volatile polymer stamp for large-scale, etching-free, and ultraclean transfer and assembly of two-dimensional materials and its heterostructures

Authors: Zhigao Dai, Yupeng Wang, Lu Liu, Junkai Deng, Wen-Xin Tang, Qingdong Ou, Ziyu Wang, Md Hemayet Uddin, Guangyuan Si, Qianhui Zhang, Wenhui Duan, Michael S. Fuhrer, Changxi Zheng

Abstract: The intact transfer and assembly of two-dimensional (2D) materials and their heterostructures are critical for their integration into advanced electronic and optical devices. Herein, we report a facile technique called volatile polymer stamping (VPS) to achieve efficient transfer of 2D materials and assembly of large-scale heterojunctions with clean interfaces. The central feature of the VPS techn… ▽ More The intact transfer and assembly of two-dimensional (2D) materials and their heterostructures are critical for their integration into advanced electronic and optical devices. Herein, we report a facile technique called volatile polymer stamping (VPS) to achieve efficient transfer of 2D materials and assembly of large-scale heterojunctions with clean interfaces. The central feature of the VPS technique is the use of volatile polyphthalaldehyde (PPA) together with hydrophobic polystyrene (PS). While PS enables the direct delamination of 2D materials from hydrophilic substrates owing to water intercalation, PPA can protect 2D materials from solution attack and maintain their integrity during PS removal. Thereafter, PPA can be completely removed by thermal annealing at 180 °C. The proposed VPS technique overcomes the limitations of currently used transfer techniques, such as chemical etching during the delamination stage, solution tearing during cleaning, and contamination from polymer residues. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Journal ref: Materials Today Physics, 27, p.100834 (2022)

arXiv:2307.15139 [pdf, other]

Online Clustered Codebook

Authors: Chuanxia Zheng, Andrea Vedaldi

Abstract: Vector Quantisation (VQ) is experiencing a comeback in machine learning, where it is increasingly used in representation learning. However, optimizing the codevectors in existing VQ-VAE is not entirely trivial. A problem is codebook collapse, where only a small subset of codevectors receive gradients useful for their optimisation, whereas a majority of them simply ``dies off'' and is never updated… ▽ More Vector Quantisation (VQ) is experiencing a comeback in machine learning, where it is increasingly used in representation learning. However, optimizing the codevectors in existing VQ-VAE is not entirely trivial. A problem is codebook collapse, where only a small subset of codevectors receive gradients useful for their optimisation, whereas a majority of them simply ``dies off'' and is never updated or used. This limits the effectiveness of VQ for learning larger codebooks in complex computer vision tasks that require high-capacity representations. In this paper, we present a simple alternative method for online codebook learning, Clustering VQ-VAE (CVQ-VAE). Our approach selects encoded features as anchors to update the ``dead'' codevectors, while optimising the codebooks which are alive via the original loss. This strategy brings unused codevectors closer in distribution to the encoded features, increasing the likelihood of being chosen and optimized. We extensively validate the generalization capability of our quantiser on various datasets, tasks (e.g. reconstruction and generation), and architectures (e.g. VQ-VAE, VQGAN, LDM). Our CVQ-VAE can be easily integrated into the existing models with just a few lines of code. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: The project page: https://chuanxiaz.com/cvq/

arXiv:2307.14884 [pdf, other]

Individual and Averaged Power Density Spectra of X-ray bursts from SGR J1935+2154: Quasiperiodic Oscillation Search and Slopes

Authors: Shuo Xiao, Xiao-Bo Li, Wang-Chen Xue, Shao-Lin Xiong, Shuang-Nan Zhang, Wen-Xi Peng, Ai-Jun Dong, You-Li Tuo, Ce Cai, Xi-Hong Luo, Jiao-Jiao Yang, Yue Wang, Chao Zheng, Yan-Qiu Zhang, Jia-Cong Liu, Wen-Jun Tan, Chen-Wei Wang, Ping Wang, Cheng-Kui Li, Shu-Xu Yi, Shi-Jun Dang, Lun-Hua Shang, Ru-Shuang Zhao, Qing-Bo Ma, Wei Xie , et al. (7 additional authors not shown)

Abstract: The study of quasi-periodic oscillations (QPOs) and power density spectra (PDS) continuum properties can help shed light on the still illusive emission physics of magnetars and as a window into the interiors of neutron stars using asteroseismology. In this work, we employ a Bayesian method to search for the QPOs in the hundreds of X-ray bursts from SGR J1935+2154 observed by {\it Insight}-HXMT, GE… ▽ More The study of quasi-periodic oscillations (QPOs) and power density spectra (PDS) continuum properties can help shed light on the still illusive emission physics of magnetars and as a window into the interiors of neutron stars using asteroseismology. In this work, we employ a Bayesian method to search for the QPOs in the hundreds of X-ray bursts from SGR J1935+2154 observed by {\it Insight}-HXMT, GECAM and Fermi/GBM from July 2014 to January 2022. Although no definitive QPO signal (significance $>3σ$) is detected in individual bursts or the averaged periodogram of the bursts grouped by duration, we identify several bursts exhibiting possible QPO at $\sim$ 40 Hz, which is consistent with that reported in the X-ray burst associated with FRB 200428. We investigate the PDS continuum properties and find that the distribution of the PDS slope in the simple power-law model peaks $\sim$ 2.5, which is consistent with other magnetars but higher than 5/3 commonly seen in gamma-ray bursts. Besides, the distribution of the break frequency in the broken power-law model peaks at $\sim$ 60 Hz. Finally, we report that the power-law index of PDS has an anti-correlation and power-law dependence on the burst duration as well as the minimum variation timescale. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: comments welcome

arXiv:2307.14696 [pdf, other]

doi 10.1088/2058-9565/ad3d80

Non-Markovian Quantum Gate Set Tomography

Authors: Ze-Tong Li, Cong-Cong Zheng, Fan-Xu Meng, Han Zeng, Tian Luan, Zai-Chen Zhang, Xu-Tao Yu

Abstract: Engineering quantum devices requires reliable characterization of the quantum system, including qubits, quantum operations (also known as instruments) and the quantum noise. Recently, quantum gate set tomography (GST) has emerged as a powerful technique for self-consistently describing quantum states, gates, and measurements. However, non-Markovian correlations between the quantum system and envir… ▽ More Engineering quantum devices requires reliable characterization of the quantum system, including qubits, quantum operations (also known as instruments) and the quantum noise. Recently, quantum gate set tomography (GST) has emerged as a powerful technique for self-consistently describing quantum states, gates, and measurements. However, non-Markovian correlations between the quantum system and environment impact the reliability of GST. To address this, we propose a self-consistent operational framework called instrument set tomography (IST) for non-Markovian GST. Based on the stochastic quantum process, the instrument set describes instruments and system-environment (SE) correlations. We introduce a linear inversion IST (LIST) to describe instruments and SE correlations without physical constraints. The disharmony of linear relationships between instruments is detected. Furthermore, we propose a physically constrained statistical method based on the maximum likelihood estimation for IST (MLE-IST) with adjustable dimensions. MLE-IST shows significant flexibility in adapting to different types of devices, such as noisy intermediate-scale quantum (NISQ) devices, by adjusting the model and constraints. Experimental results demonstrate the effectiveness and necessity of simultaneously describing instruments and SE correlations. Remarkably, real-chip experiments indicate that a polynomial number of parameters with respect to the Markovian order are sufficient to characterize non-Markovian quantum noise in current NISQ devices. Consequently, IST provides an essential and self-consistent framework for characterizing, benchmarking, and developing quantum devices in terms of the instrument set. △ Less

Submitted 14 April, 2024; v1 submitted 27 July, 2023; originally announced July 2023.

Comments: Quantum Science and Technology 2024

arXiv:2307.13223 [pdf, ps, other]

On the classification of discrete conformal structures on surfaces

Authors: Xu Xu, Chao Zheng

Abstract: Glickenstein \cite{Glickenstein} and Glickenstein-Thomas \cite{GT} introduced the discrete conformal structures on surfaces in an axiomatic approach and studied its classification. In this paper, we give a full classification of the discrete conformal structures on surfaces, which completes Glickenstein-Thomas' classification. As a result, we find some new classes of discrete conformal structures… ▽ More Glickenstein \cite{Glickenstein} and Glickenstein-Thomas \cite{GT} introduced the discrete conformal structures on surfaces in an axiomatic approach and studied its classification. In this paper, we give a full classification of the discrete conformal structures on surfaces, which completes Glickenstein-Thomas' classification. As a result, we find some new classes of discrete conformal structures on surfaces, including some of the generalized circle packing metrics introduced by Guo-Luo \cite{GL2}. The relationships between the discrete conformal structures on surfaces and the 3-dimensional hyperbolic geometry are also discussed. △ Less

Submitted 19 August, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

arXiv:2307.13066 [pdf, other]

Ultra-Wideband Technology: Characteristics, Applications and Challenges

Authors: Chutao Zheng, Yuchu Ge, Anfu Guo

Abstract: Ultra-wideband (UWB) technology is a wireless communication technology designed for short-range applications. It is characterized by its ability to generate and transmit radio-frequency energy over an extensive frequency range. This paper provides an overview of UWB technology including its definition, two representative schemes and some key characteristics distinguished from other types of commun… ▽ More Ultra-wideband (UWB) technology is a wireless communication technology designed for short-range applications. It is characterized by its ability to generate and transmit radio-frequency energy over an extensive frequency range. This paper provides an overview of UWB technology including its definition, two representative schemes and some key characteristics distinguished from other types of communication. Besides, this paper also analyses some widely used applications of UWB technology and highlights some of the challenges associated with implementing UWB in real-world scenarios. Furthermore, this paper expands upon UWB technology to encompass terahertz technology, providing an overview of the current status of terahertz communication, and conducting an analysis of the advantages, challenges, and certain corresponding solutions pertaining to ultra-wideband THz communication. △ Less

Submitted 13 August, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

arXiv:2307.11934 [pdf, other]

LAMP: Leveraging Language Prompts for Multi-person Pose Estimation

Authors: Shengnan Hu, Ce Zheng, Zixiang Zhou, Chen Chen, Gita Sukthankar

Abstract: Human-centric visual understanding is an important desideratum for effective human-robot interaction. In order to navigate crowded public places, social robots must be able to interpret the activity of the surrounding humans. This paper addresses one key aspect of human-centric visual understanding, multi-person pose estimation. Achieving good performance on multi-person pose estimation in crowded… ▽ More Human-centric visual understanding is an important desideratum for effective human-robot interaction. In order to navigate crowded public places, social robots must be able to interpret the activity of the surrounding humans. This paper addresses one key aspect of human-centric visual understanding, multi-person pose estimation. Achieving good performance on multi-person pose estimation in crowded scenes is difficult due to the challenges of occluded joints and instance separation. In order to tackle these challenges and overcome the limitations of image features in representing invisible body parts, we propose a novel prompt-based pose inference strategy called LAMP (Language Assisted Multi-person Pose estimation). By utilizing the text representations generated by a well-trained language model (CLIP), LAMP can facilitate the understanding of poses on the instance and joint levels, and learn more robust visual representations that are less susceptible to occlusion. This paper demonstrates that language-supervised training boosts the performance of single-stage multi-person pose estimation, and both instance-level and joint-level prompts are valuable for training. The code is available at https://github.com/shengnanh20/LAMP. △ Less

Submitted 26 July, 2023; v1 submitted 21 July, 2023; originally announced July 2023.

arXiv:2307.11100 [pdf, other]

CSSL-RHA: Contrastive Self-Supervised Learning for Robust Handwriting Authentication

Authors: Jingyao Wang, Luntian Mou, Changwen Zheng, Wen Gao

Abstract: Handwriting authentication is a valuable tool used in various fields, such as fraud prevention and cultural heritage protection. However, it remains a challenging task due to the complex features, severe damage, and lack of supervision. In this paper, we propose a novel Contrastive Self-Supervised Learning framework for Robust Handwriting Authentication (CSSL-RHA) to address these issues. It can d… ▽ More Handwriting authentication is a valuable tool used in various fields, such as fraud prevention and cultural heritage protection. However, it remains a challenging task due to the complex features, severe damage, and lack of supervision. In this paper, we propose a novel Contrastive Self-Supervised Learning framework for Robust Handwriting Authentication (CSSL-RHA) to address these issues. It can dynamically learn complex yet important features and accurately predict writer identities. Specifically, to remove the negative effects of imperfections and redundancy, we design an information-theoretic filter for pre-processing and propose a novel adaptive matching scheme to represent images as patches of local regions dominated by more important features. Through online optimization at inference time, the most informative patch embeddings are identified as the "most important" elements. Furthermore, we employ contrastive self-supervised training with a momentum-based paradigm to learn more general statistical structures of handwritten data without supervision. We conduct extensive experiments on five benchmark datasets and our manually annotated dataset EN-HA, which demonstrate the superiority of our CSSL-RHA compared to baselines. Additionally, we show that our proposed model can still effectively achieve authentication even under abnormal circumstances, such as data falsification and corruption. △ Less

Submitted 17 July, 2023; originally announced July 2023.

Comments: 10 pages, 4 figures, 3 tables, submitted to ACM MM 2023

arXiv:2307.08924 [pdf, other]

Towards Task Sampler Learning for Meta-Learning

Authors: Jingyao Wang, Wenwen Qiang, Xingzhe Su, Changwen Zheng, Fuchun Sun, Hui Xiong

Abstract: Meta-learning aims to learn general knowledge with diverse training tasks conducted from limited data, and then transfer it to new tasks. It is commonly believed that increasing task diversity will enhance the generalization ability of meta-learning models. However, this paper challenges this view through empirical and theoretical analysis. We obtain three conclusions: (i) there is no universal ta… ▽ More Meta-learning aims to learn general knowledge with diverse training tasks conducted from limited data, and then transfer it to new tasks. It is commonly believed that increasing task diversity will enhance the generalization ability of meta-learning models. However, this paper challenges this view through empirical and theoretical analysis. We obtain three conclusions: (i) there is no universal task sampling strategy that can guarantee the optimal performance of meta-learning models; (ii) over-constraining task diversity may incur the risk of under-fitting or over-fitting during training; and (iii) the generalization performance of meta-learning models are affected by task diversity, task entropy, and task difficulty. Based on this insight, we design a novel task sampler, called Adaptive Sampler (ASr). ASr is a plug-and-play module that can be integrated into any meta-learning framework. It dynamically adjusts task weights according to task diversity, task entropy, and task difficulty, thereby obtaining the optimal probability distribution for meta-training tasks. Finally, we conduct experiments on a series of benchmark datasets across various scenarios, and the results demonstrate that ASr has clear advantages. △ Less

Submitted 2 June, 2024; v1 submitted 17 July, 2023; originally announced July 2023.

Comments: accepted by IJCV

arXiv:2307.08913 [pdf, ps, other]

Towards the Sparseness of Projection Head in Self-Supervised Learning

Authors: Zeen Song, Xingzhe Su, Jingyao Wang, Wenwen Qiang, Changwen Zheng, Fuchun Sun

Abstract: In recent years, self-supervised learning (SSL) has emerged as a promising approach for extracting valuable representations from unlabeled data. One successful SSL method is contrastive learning, which aims to bring positive examples closer while pushing negative examples apart. Many current contrastive learning approaches utilize a parameterized projection head. Through a combination of empirical… ▽ More In recent years, self-supervised learning (SSL) has emerged as a promising approach for extracting valuable representations from unlabeled data. One successful SSL method is contrastive learning, which aims to bring positive examples closer while pushing negative examples apart. Many current contrastive learning approaches utilize a parameterized projection head. Through a combination of empirical analysis and theoretical investigation, we provide insights into the internal mechanisms of the projection head and its relationship with the phenomenon of dimensional collapse. Our findings demonstrate that the projection head enhances the quality of representations by performing contrastive loss in a projected subspace. Therefore, we propose an assumption that only a subset of features is necessary when minimizing the contrastive loss of a mini-batch of data. Theoretical analysis further suggests that a sparse projection head can enhance generalization, leading us to introduce SparseHead - a regularization term that effectively constrains the sparsity of the projection head, and can be seamlessly integrated with any self-supervised learning (SSL) approaches. Our experimental results validate the effectiveness of SparseHead, demonstrating its ability to improve the performance of existing contrastive methods. △ Less

Submitted 19 July, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

Comments: 9 pages,3 figures

arXiv:2307.08199 [pdf, other]

Unbiased Image Synthesis via Manifold Guidance in Diffusion Models

Authors: Xingzhe Su, Daixi Jia, Fengge Wu, Junsuo Zhao, Changwen Zheng, Wenwen Qiang

Abstract: Diffusion Models are a potent class of generative models capable of producing high-quality images. However, they often inadvertently favor certain data attributes, undermining the diversity of generated images. This issue is starkly apparent in skewed datasets like CelebA, where the initial dataset disproportionately favors females over males by 57.9%, this bias amplified in generated data where f… ▽ More Diffusion Models are a potent class of generative models capable of producing high-quality images. However, they often inadvertently favor certain data attributes, undermining the diversity of generated images. This issue is starkly apparent in skewed datasets like CelebA, where the initial dataset disproportionately favors females over males by 57.9%, this bias amplified in generated data where female representation outstrips males by 148%. In response, we propose a plug-and-play method named Manifold Guidance Sampling, which is also the first unsupervised method to mitigate bias issue in DDPMs. Leveraging the inherent structure of the data manifold, this method steers the sampling process towards a more uniform distribution, effectively dispersing the clustering of biased data. Without the need for modifying the existing model or additional training, it significantly mitigates data bias and enhances the quality and unbiasedness of the generated images. △ Less

Submitted 15 April, 2024; v1 submitted 16 July, 2023; originally announced July 2023.

arXiv:2307.07079 [pdf, other]

The Minimum Variation Timescales of X-ray bursts from SGR J1935+2154

Authors: Shuo Xiao, Jiao-Jiao Yang, Xi-Hong Luo, Shao-Lin Xiong, Yuan-Hong Qu, Shuang-Nan Zhang, Wang-Chen Xue, Xiao-Bo Li, You-Li Tuo, Ai-Jun Dong, Ru-Shuang Zhao, Shi-Jun Dang, Lun-Hua Shang, Qing-Bo Ma, Ce Cai, Jin Wang, Ping Wang, Cheng-Kui Li, Shu-Xu Yi, Zhen Zhang, Ming-Yu Ge, Shi-Jie Zheng, Li-Ming Song, Wen-Xi Peng, Xiang-Yang Wen , et al. (12 additional authors not shown)

Abstract: The minimum variation timescale (MVT) of soft gamma-ray repeaters can be an important probe to estimate the emission region in pulsar-like models, as well as the Lorentz factor and radius of the possible relativistic jet in gamma-ray burst (GRB)-like models, thus revealing their progenitors and physical mechanisms. In this work, we systematically study the MVTs of hundreds of X-ray bursts (XRBs) f… ▽ More The minimum variation timescale (MVT) of soft gamma-ray repeaters can be an important probe to estimate the emission region in pulsar-like models, as well as the Lorentz factor and radius of the possible relativistic jet in gamma-ray burst (GRB)-like models, thus revealing their progenitors and physical mechanisms. In this work, we systematically study the MVTs of hundreds of X-ray bursts (XRBs) from SGR J1935+2154 observed by {\it Insight}-HXMT, GECAM and Fermi/GBM from July 2014 to Jan 2022 through the Bayesian Block algorithm. We find that the MVTs peak at $\sim$ 2 ms, corresponding to a light travel time size of about 600 km, which supports the magnetospheric origin in pulsar-like models. The shock radius and the Lorentz factor of the jet are also constrained in GRB-like models. Interestingly, the MVT of the XRB associated with FRB 200428 is $\sim$ 70 ms, which is longer than that of most bursts and implies its special radiation mechanism. Besides, the median of MVTs is 7 ms, shorter than the median MVTs of 40 ms and 480 ms for short GRBs or long GRBs, respectively. However, the MVT is independent of duration, similar to GRBs. Finally, we investigate the energy dependence of MVT and suggest that there is a marginal evidence for a power-law relationship like GRBs but the rate of variation is at least about an order of magnitude smaller. These features may provide an approach to identify bursts with a magnetar origin. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: accepted for publication in ApJS

arXiv:2307.05689 [pdf, other]

Magnetar emergence in a peculiar gamma-ray burst from a compact star merger

Authors: H. Sun, C. -W. Wang, J. Yang, B. -B. Zhang, S. -L. Xiong, Y. -H. I. Yin, Y. Liu, Y. Li, W. -C. Xue, Z. Yan, C. Zhang, W. -J. Tan, H. -W. Pan, J. -C. Liu, H. -Q. Cheng, Y. -Q. Zhang, J. -W. Hu, C. Zheng, Z. -H. An, C. Cai, L. Hu, C. Jin, D. -Y. Li, X. -Q. Li, H. -Y. Liu , et al. (19 additional authors not shown)

Abstract: The central engine that powers gamma-ray bursts (GRBs), the most powerful explosions in the universe, is still not identified. Besides hyper-accreting black holes, rapidly spinning and highly magnetized neutron stars, known as millisecond magnetars, have been suggested to power both long and short GRBs. The presence of a magnetar engine following compact star mergers is of particular interest as i… ▽ More The central engine that powers gamma-ray bursts (GRBs), the most powerful explosions in the universe, is still not identified. Besides hyper-accreting black holes, rapidly spinning and highly magnetized neutron stars, known as millisecond magnetars, have been suggested to power both long and short GRBs. The presence of a magnetar engine following compact star mergers is of particular interest as it would provide essential constraints on the poorly understood equation of state for neutron stars. Indirect indications of a magnetar engine in these merger sources have been observed in the form of plateau features present in the X-ray afterglow light curves of some short GRBs. Additionally, some X-ray transients lacking gamma-ray bursts (GRB-less) have been identified as potential magnetar candidates originating from compact star mergers. Nevertheless, smoking gun evidence is still lacking for a magnetar engine in short GRBs, and the associated theoretical challenges have been addressed. Here we present a comprehensive analysis of the broad-band prompt emission data of a peculiar, very bright GRB 230307A. Despite its apparently long duration, the prompt emission and host galaxy properties point toward a compact star merger origin, being consistent with its association with a kilonova. More intriguingly, an extended X-ray emission component emerges as the $γ$-ray emission dies out, signifying the emergence of a magnetar central engine. We also identify an achromatic temporal break in the high-energy band during the prompt emission phase, which was never observed in previous bursts and reveals a narrow jet with half opening angle of approximately $3.4^\circ$. △ Less

Submitted 11 July, 2023; originally announced July 2023.

Comments: 44 pages, 10 figures, 5 tables

arXiv:2307.04506 [pdf, ps, other]

Distributed Decisions on Optimal Load Balancing in Loss Networks

Authors: Qiong Liu, Chehao Wang, Ce Zheng

Abstract: When multiple users share a common link in direct transmission, packet loss and network collision may occur due to the simultaneous arrival of traffics at the source node. To tackle this problem, users may resort to an indirect path: the packet flows are first relayed through a sidelink to another source node, then transmitted to the destination. This behavior brings the problems of packet routing… ▽ More When multiple users share a common link in direct transmission, packet loss and network collision may occur due to the simultaneous arrival of traffics at the source node. To tackle this problem, users may resort to an indirect path: the packet flows are first relayed through a sidelink to another source node, then transmitted to the destination. This behavior brings the problems of packet routing or load balancing: (1) how to maximize the total traffic in a collaborative way; (2) how self-interested users choose routing strategies to minimize their individual packet loss independently. In this work, we propose a generalized mathematical framework to tackle the packet and load balancing issue in loss networks. In centralized scenarios with a planner, we provide a polynomial-time algorithm to compute the system optimum point where the total traffic rate is maximized. Conversely, in decentralized settings with autonomous users making distributed decisions, the system converges to an equilibrium where no user can reduce their loss probability through unilateral deviation. We thereby provide a full characterization of Nash equilibrium and examine the efficiency loss stemming from selfish behaviors, both theoretically and empirically. In general, the performance degradation caused by selfish behaviors is not catastrophic; however, this gap is not monotonic and can have extreme values in certain specific scenarios. △ Less

Submitted 17 July, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

Comments: 8 pages, WiOPT workshop RAWNET

arXiv:2307.03758 [pdf, other]

doi 10.1109/MVT.2022.3153274

Federated Learning over a Wireless Network: Distributed User Selection through Random Access

Authors: Chen Sun, Shiyao Ma, Ce Zheng, Songtao Wu, Tao Cui, Lingjuan Lyu

Abstract: User selection has become crucial for decreasing the communication costs of federated learning (FL) over wireless networks. However, centralized user selection causes additional system complexity. This study proposes a network intrinsic approach of distributed user selection that leverages the radio resource competition mechanism in random access. Taking the carrier sensing multiple access (CSMA)… ▽ More User selection has become crucial for decreasing the communication costs of federated learning (FL) over wireless networks. However, centralized user selection causes additional system complexity. This study proposes a network intrinsic approach of distributed user selection that leverages the radio resource competition mechanism in random access. Taking the carrier sensing multiple access (CSMA) mechanism as an example of random access, we manipulate the contention window (CW) size to prioritize certain users for obtaining radio resources in each round of training. Training data bias is used as a target scenario for FL with user selection. Prioritization is based on the distance between the newly trained local model and the global model of the previous round. To avoid excessive contribution by certain users, a counting mechanism is used to ensure fairness. Simulations with various datasets demonstrate that this method can rapidly achieve convergence similar to that of the centralized user selection approach. △ Less

Submitted 6 July, 2023; originally announced July 2023.

arXiv:2307.03177 [pdf, other]

PanoDiffusion: 360-degree Panorama Outpainting via Diffusion

Authors: Tianhao Wu, Chuanxia Zheng, Tat-Jen Cham

Abstract: Generating complete 360-degree panoramas from narrow field of view images is ongoing research as omnidirectional RGB data is not readily available. Existing GAN-based approaches face some barriers to achieving higher quality output, and have poor generalization performance over different mask types. In this paper, we present our 360-degree indoor RGB-D panorama outpainting model using latent diffu… ▽ More Generating complete 360-degree panoramas from narrow field of view images is ongoing research as omnidirectional RGB data is not readily available. Existing GAN-based approaches face some barriers to achieving higher quality output, and have poor generalization performance over different mask types. In this paper, we present our 360-degree indoor RGB-D panorama outpainting model using latent diffusion models (LDM), called PanoDiffusion. We introduce a new bi-modal latent diffusion structure that utilizes both RGB and depth panoramic data during training, which works surprisingly well to outpaint depth-free RGB images during inference. We further propose a novel technique of introducing progressive camera rotations during each diffusion denoising step, which leads to substantial improvement in achieving panorama wraparound consistency. Results show that our PanoDiffusion not only significantly outperforms state-of-the-art methods on RGB-D panorama outpainting by producing diverse well-structured results for different types of masks, but can also synthesize high-quality depth panoramas to provide realistic 3D indoor models. △ Less

Submitted 20 March, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: Project Page: https://sm0kywu.github.io/panodiffusion/

arXiv:2307.01010 [pdf, other]

GECAM Observations of the Galactic Magnetar SGR J1935+2154 during the 2021 and 2022 Burst Active Episodes. I. Burst Catalog

Authors: Sheng-Lun Xie, Ce Cai, Yun-Wei Yu, Shao-Lin Xiong, Lin Lin, Yi Zhao, Shuang-Nan Zhang, Li-Ming Song, Ping Wang, Xiao-Bo Li, Wang-Chen Xue, Peng Zhang, Chao Zheng, Yan-Qiu Zhang, Jia-Cong Liu, Chen-Wei Wang, Wen-Jun Tan, Yue Wang, Zheng-Hang Yu, Pei-Yi Feng, Jin-Peng Zhang, Shuo Xiao, Hai-Sheng Zhao, Wen-Long Zhang, Yan-Ting Zhang , et al. (12 additional authors not shown)

Abstract: Magnetar is a neutron star with an ultrahigh magnetic field ($\sim 10^{14}-10^{15}$ G) which usually manifests as soft gamma-ray repeater (SGR) or anomalous X-ray pulsar (AXP). SGR J1935+2154 is not only one of the most active magnetar detected so far, but also the unique confirmed source of fast radio burst (FRB). Gravitational wave high-energy Electromagnetic Counterpart All-sky Monitor (GECAM)… ▽ More Magnetar is a neutron star with an ultrahigh magnetic field ($\sim 10^{14}-10^{15}$ G) which usually manifests as soft gamma-ray repeater (SGR) or anomalous X-ray pulsar (AXP). SGR J1935+2154 is not only one of the most active magnetar detected so far, but also the unique confirmed source of fast radio burst (FRB). Gravitational wave high-energy Electromagnetic Counterpart All-sky Monitor (GECAM) are dedicated to monitor gamma-ray transients all over the sky, including SGR bursts. Here we report the GECAM observation of the burst activity of SGR J1935+2154 from January 2021 to December 2022, which results in a unique and valuable data set for this important magnetar. With a targeted search of GECAM data, 164 bursts from SGR J1935+2154 are detected by GECAM-B while 97 bursts by GECAM-C, including the X-ray burst associated with a fast radio burst (FRB 20221014). We find that both the burst duration and the waiting time between two successive bursts follow lognormal distributions. The period of burst activity is $134\pm20$ days, thus the burst activity could be generally divided into 4 active episodes over these two years. Interestingly, the hardness ratio of X-ray bursts tends to be softer and more concentrated over these two years, especially during the active episode with FRBs detected. △ Less

Submitted 16 September, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

arXiv:2306.17179 [pdf, other]

Integrating Tick-level Data and Periodical Signal for High-frequency Market Making

Authors: Jiafa He, Cong Zheng, Can Yang

Abstract: We focus on the problem of market making in high-frequency trading. Market making is a critical function in financial markets that involves providing liquidity by buying and selling assets. However, the increasing complexity of financial markets and the high volume of data generated by tick-level trading makes it challenging to develop effective market making strategies. To address this challenge,… ▽ More We focus on the problem of market making in high-frequency trading. Market making is a critical function in financial markets that involves providing liquidity by buying and selling assets. However, the increasing complexity of financial markets and the high volume of data generated by tick-level trading makes it challenging to develop effective market making strategies. To address this challenge, we propose a deep reinforcement learning approach that fuses tick-level data with periodic prediction signals to develop a more accurate and robust market making strategy. Our results of market making strategies based on different deep reinforcement learning algorithms under the simulation scenarios and real data experiments in the cryptocurrency markets show that the proposed framework outperforms existing methods in terms of profitability and risk management. △ Less

Submitted 19 June, 2023; originally announced June 2023.

arXiv:2306.17178 [pdf, other]

Optimal Execution Using Reinforcement Learning

Authors: Cong Zheng, Jiafa He, Can Yang

Abstract: This work is about optimal order execution, where a large order is split into several small orders to maximize the implementation shortfall. Based on the diversity of cryptocurrency exchanges, we attempt to extract cross-exchange signals by aligning data from multiple exchanges for the first time. Unlike most previous studies that focused on using single-exchange information, we discuss the impact… ▽ More This work is about optimal order execution, where a large order is split into several small orders to maximize the implementation shortfall. Based on the diversity of cryptocurrency exchanges, we attempt to extract cross-exchange signals by aligning data from multiple exchanges for the first time. Unlike most previous studies that focused on using single-exchange information, we discuss the impact of cross-exchange signals on the agent's decision-making in the optimal execution problem. Experimental results show that cross-exchange signals can provide additional information for the optimal execution of cryptocurrency to facilitate the optimal execution process. △ Less

Submitted 19 June, 2023; originally announced June 2023.

Showing 201–250 of 803 results for author: Zheng, C