-
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
Authors:
Gongfan Fang,
Hongxu Yin,
Saurav Muralidharan,
Greg Heinrich,
Jeff Pool,
Jan Kautz,
Pavlo Molchanov,
Xinchao Wang
Abstract:
Large Language Models (LLMs) are distinguished by their massive parameter counts, which typically result in significant redundancy. This work introduces MaskLLM, a learnable pruning method that establishes Semi-structured (or ``N:M'') Sparsity in LLMs, aimed at reducing computational overhead during inference. Instead of developing a new importance criterion, MaskLLM explicitly models N:M patterns…
▽ More
Large Language Models (LLMs) are distinguished by their massive parameter counts, which typically result in significant redundancy. This work introduces MaskLLM, a learnable pruning method that establishes Semi-structured (or ``N:M'') Sparsity in LLMs, aimed at reducing computational overhead during inference. Instead of developing a new importance criterion, MaskLLM explicitly models N:M patterns as a learnable distribution through Gumbel Softmax sampling. This approach facilitates end-to-end training on large-scale datasets and offers two notable advantages: 1) High-quality Masks - our method effectively scales to large datasets and learns accurate masks; 2) Transferability - the probabilistic modeling of mask distribution enables the transfer learning of sparsity across domains or tasks. We assessed MaskLLM using 2:4 sparsity on various LLMs, including LLaMA-2, Nemotron-4, and GPT-3, with sizes ranging from 843M to 15B parameters, and our empirical results show substantial improvements over state-of-the-art methods. For instance, leading approaches achieve a perplexity (PPL) of 10 or greater on Wikitext compared to the dense model's 5.12 PPL, but MaskLLM achieves a significantly lower 6.72 PPL solely by learning the masks with frozen weights. Furthermore, MaskLLM's learnable nature allows customized masks for lossless application of 2:4 sparsity to downstream tasks or domains. Code is available at \url{https://github.com/NVlabs/MaskLLM}.
△ Less
Submitted 25 September, 2024;
originally announced September 2024.
-
Preparation for CSST: Star-galaxy Classification using a Rotationally Invariant Supervised Machine Learning Method
Authors:
Shiliang Zhang,
Guanwen Fang,
Jie Song,
Ran Li,
Yizhou Gu,
Zesen Lin,
Chichun Zhou,
Yao Dai,
Xu Kong
Abstract:
Most existing star-galaxy classifiers depend on the reduced information from catalogs, necessitating careful data processing and feature extraction. In this study, we employ a supervised machine learning method (GoogLeNet) to automatically classify stars and galaxies in the COSMOS field. Unlike traditional machine learning methods, we introduce several preprocessing techniques, including noise red…
▽ More
Most existing star-galaxy classifiers depend on the reduced information from catalogs, necessitating careful data processing and feature extraction. In this study, we employ a supervised machine learning method (GoogLeNet) to automatically classify stars and galaxies in the COSMOS field. Unlike traditional machine learning methods, we introduce several preprocessing techniques, including noise reduction and the unwrapping of denoised images in polar coordinates, applied to our carefully selected samples of stars and galaxies. By dividing the selected samples into training and validation sets in an 8:2 ratio, we evaluate the performance of the GoogLeNet model in distinguishing between stars and galaxies. The results indicate that the GoogLeNet model is highly effective, achieving accuracies of 99.6% and 99.9% for stars and galaxies, respectively. Furthermore, by comparing the results with and without preprocessing, we find that preprocessing can significantly improve classification accuracy (by approximately 2.0% to 6.0%) when the images are rotated. In preparation for the future launch of the China Space Station Telescope (CSST), we also evaluate the performance of the GoogLeNet model on the CSST simulation data. These results demonstrate a high level of accuracy (approximately 99.8%), indicating that this model can be effectively utilized for future observations with the CSST.
△ Less
Submitted 20 September, 2024;
originally announced September 2024.
-
Accelerate Neural Subspace-Based Reduced-Order Solver of Deformable Simulation by Lipschitz Optimization
Authors:
Aoran Lyu,
Shixian Zhao,
Chuhua Xian,
Zhihao Cen,
Hongmin Cai,
Guoxin Fang
Abstract:
Reduced-order simulation is an emerging method for accelerating physical simulations with high DOFs, and recently developed neural-network-based methods with nonlinear subspaces have been proven effective in diverse applications as more concise subspaces can be detected. However, the complexity and landscape of simulation objectives within the subspace have not been optimized, which leaves room fo…
▽ More
Reduced-order simulation is an emerging method for accelerating physical simulations with high DOFs, and recently developed neural-network-based methods with nonlinear subspaces have been proven effective in diverse applications as more concise subspaces can be detected. However, the complexity and landscape of simulation objectives within the subspace have not been optimized, which leaves room for enhancement of the convergence speed. This work focuses on this point by proposing a general method for finding optimized subspace mappings, enabling further acceleration of neural reduced-order simulations while capturing comprehensive representations of the configuration manifolds. We achieve this by optimizing the Lipschitz energy of the elasticity term in the simulation objective, and incorporating the cubature approximation into the training process to manage the high memory and time demands associated with optimizing the newly introduced energy. Our method is versatile and applicable to both supervised and unsupervised settings for optimizing the parameterizations of the configuration manifolds. We demonstrate the effectiveness of our approach through general cases in both quasi-static and dynamics simulations. Our method achieves acceleration factors of up to 6.83 while consistently preserving comparable simulation accuracy in various cases, including large twisting, bending, and rotational deformations with collision handling. This novel approach offers significant potential for accelerating physical simulations, and can be a good add-on to existing neural-network-based solutions in modeling complex deformable objects.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
Motion-Driven Neural Optimizer for Prophylactic Braces Made by Distributed Microstructures
Authors:
Xingjian Han,
Yu Jiang,
Weiming Wang,
Guoxin Fang,
Simeon Gill,
Zhiqiang Zhang,
Shengfa Wang,
Jun Saito,
Deepak Kumar,
Zhongxuan Luo,
Emily Whiting,
Charlie C. L. Wang
Abstract:
Joint injuries, and their long-term consequences, present a substantial global health burden. Wearable prophylactic braces are an attractive potential solution to reduce the incidence of joint injuries by limiting joint movements that are related to injury risk. Given human motion and ground reaction forces, we present a computational framework that enables the design of personalized braces by opt…
▽ More
Joint injuries, and their long-term consequences, present a substantial global health burden. Wearable prophylactic braces are an attractive potential solution to reduce the incidence of joint injuries by limiting joint movements that are related to injury risk. Given human motion and ground reaction forces, we present a computational framework that enables the design of personalized braces by optimizing the distribution of microstructures and elasticity. As varied brace designs yield different reaction forces that influence kinematics and kinetics analysis outcomes, the optimization process is formulated as a differentiable end-to-end pipeline in which the design domain of microstructure distribution is parameterized onto a neural network. The optimized distribution of microstructures is obtained via a self-learning process to determine the network coefficients according to a carefully designed set of losses and the integrated biomechanical and physical analyses. Since knees and ankles are the most commonly injured joints, we demonstrate the effectiveness of our pipeline by designing, fabricating, and testing prophylactic braces for the knee and ankle to prevent potentially harmful joint movements.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
Learning Based Toolpath Planner on Diverse Graphs for 3D Printing
Authors:
Yuming Huang,
Yuhu Guo,
Renbo Su,
Xingjian Han,
Junhao Ding,
Tianyu Zhang,
Tao Liu,
Weiming Wang,
Guoxin Fang,
Xu Song,
Emily Whiting,
Charlie C. L. Wang
Abstract:
This paper presents a learning based planner for computing optimized 3D printing toolpaths on prescribed graphs, the challenges of which include the varying graph structures on different models and the large scale of nodes & edges on a graph. We adopt an on-the-fly strategy to tackle these challenges, formulating the planner as a Deep Q-Network (DQN) based optimizer to decide the next `best' node…
▽ More
This paper presents a learning based planner for computing optimized 3D printing toolpaths on prescribed graphs, the challenges of which include the varying graph structures on different models and the large scale of nodes & edges on a graph. We adopt an on-the-fly strategy to tackle these challenges, formulating the planner as a Deep Q-Network (DQN) based optimizer to decide the next `best' node to visit. We construct the state spaces by the Local Search Graph (LSG) centered at different nodes on a graph, which is encoded by a carefully designed algorithm so that LSGs in similar configurations can be identified to re-use the earlier learned DQN priors for accelerating the computation of toolpath planning. Our method can cover different 3D printing applications by defining their corresponding reward functions. Toolpath planning problems in wire-frame printing, continuous fiber printing, and metallic printing are selected to demonstrate its generality. The performance of our planner has been verified by testing the resultant toolpaths in physical experiments. By using our planner, wire-frame models with up to 4.2k struts can be successfully printed, up to 93.3% of sharp turns on continuous fiber toolpaths can be avoided, and the thermal distortion in metallic printing can be reduced by 24.9%.
△ Less
Submitted 17 August, 2024;
originally announced August 2024.
-
LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis
Authors:
Zhenxiong Tan,
Xinyin Ma,
Gongfan Fang,
Xinchao Wang
Abstract:
Latent diffusion models have shown promising results in audio generation, making notable advancements over traditional methods. However, their performance, while impressive with short audio clips, faces challenges when extended to longer audio sequences. These challenges are due to model's self-attention mechanism and training predominantly on 10-second clips, which complicates the extension to lo…
▽ More
Latent diffusion models have shown promising results in audio generation, making notable advancements over traditional methods. However, their performance, while impressive with short audio clips, faces challenges when extended to longer audio sequences. These challenges are due to model's self-attention mechanism and training predominantly on 10-second clips, which complicates the extension to longer audio without adaptation. In response to these issues, we introduce a novel approach, LiteFocus that enhances the inference of existing audio latent diffusion models in long audio synthesis. Observed the attention pattern in self-attention, we employ a dual sparse form for attention calculation, designated as same-frequency focus and cross-frequency compensation, which curtails the attention computation under same-frequency constraints, while enhancing audio quality through cross-frequency refillment. LiteFocus demonstrates substantial reduction on inference time with diffusion-based TTA model by 1.99x in synthesizing 80-second audio clips while also obtaining improved audio quality.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance
Authors:
Guian Fang,
Wenbiao Yan,
Yuanfan Guo,
Jianhua Han,
Zutao Jiang,
Hang Xu,
Shengcai Liao,
Xiaodan Liang
Abstract:
Text-to-image diffusion models have significantly advanced in conditional image generation. However, these models usually struggle with accurately rendering images featuring humans, resulting in distorted limbs and other anomalies. This issue primarily stems from the insufficient recognition and evaluation of limb qualities in diffusion models. To address this issue, we introduce AbHuman, the firs…
▽ More
Text-to-image diffusion models have significantly advanced in conditional image generation. However, these models usually struggle with accurately rendering images featuring humans, resulting in distorted limbs and other anomalies. This issue primarily stems from the insufficient recognition and evaluation of limb qualities in diffusion models. To address this issue, we introduce AbHuman, the first large-scale synthesized human benchmark focusing on anatomical anomalies. This benchmark consists of 56K synthesized human images, each annotated with detailed, bounding-box level labels identifying 147K human anomalies in 18 different categories. Based on this, the recognition of human anomalies can be established, which in turn enhances image generation through traditional techniques such as negative prompting and guidance. To further boost the improvement, we propose HumanRefiner, a novel plug-and-play approach for the coarse-to-fine refinement of human anomalies in text-to-image generation. Specifically, HumanRefiner utilizes a self-diagnostic procedure to detect and correct issues related to both coarse-grained abnormal human poses and fine-grained anomaly levels, facilitating pose-reversible diffusion generation. Experimental results on the AbHuman benchmark demonstrate that HumanRefiner significantly reduces generative discrepancies, achieving a 2.9x improvement in limb quality compared to the state-of-the-art open-source generator SDXL and a 1.4x improvement over DALL-E 3 in human evaluations. Our data and code are available at https://github.com/Enderfga/HumanRefiner.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Isomorphic Pruning for Vision Models
Authors:
Gongfan Fang,
Xinyin Ma,
Michael Bi Mi,
Xinchao Wang
Abstract:
Structured pruning reduces the computational overhead of deep neural networks by removing redundant sub-structures. However, assessing the relative importance of different sub-structures remains a significant challenge, particularly in advanced vision models featuring novel mechanisms and architectures like self-attention, depth-wise convolutions, or residual connections. These heterogeneous subst…
▽ More
Structured pruning reduces the computational overhead of deep neural networks by removing redundant sub-structures. However, assessing the relative importance of different sub-structures remains a significant challenge, particularly in advanced vision models featuring novel mechanisms and architectures like self-attention, depth-wise convolutions, or residual connections. These heterogeneous substructures usually exhibit diverged parameter scales, weight distributions, and computational topology, introducing considerable difficulty to importance comparison. To overcome this, we present Isomorphic Pruning, a simple approach that demonstrates effectiveness across a range of network architectures such as Vision Transformers and CNNs, and delivers competitive performance across different model sizes. Isomorphic Pruning originates from an observation that, when evaluated under a pre-defined importance criterion, heterogeneous sub-structures demonstrate significant divergence in their importance distribution, as opposed to isomorphic structures that present similar importance patterns. This inspires us to perform isolated ranking and comparison on different types of sub-structures for more reliable pruning. Our empirical results on ImageNet-1K demonstrate that Isomorphic Pruning surpasses several pruning baselines dedicatedly designed for Transformers or CNNs. For instance, we improve the accuracy of DeiT-Tiny from 74.52% to 77.50% by pruning an off-the-shelf DeiT-Base model. And for ConvNext-Tiny, we enhanced performance from 82.06% to 82.18%, while reducing the number of parameters and memory usage. Code is available at \url{https://github.com/VainF/Isomorphic-Pruning}.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
PruningBench: A Comprehensive Benchmark of Structural Pruning
Authors:
Haoling Li,
Changhao Li,
Mengqi Xue,
Gongfan Fang,
Sheng Zhou,
Zunlei Feng,
Huiqiong Wang,
Yong Wang,
Lechao Cheng,
Mingli Song,
Jie Song
Abstract:
Structural pruning has emerged as a promising approach for producing more efficient models. Nevertheless, the community suffers from a lack of standardized benchmarks and metrics, leaving the progress in this area not fully comprehended. To fill this gap, we present the first comprehensive benchmark, termed \textit{PruningBench}, for structural pruning. PruningBench showcases the following three c…
▽ More
Structural pruning has emerged as a promising approach for producing more efficient models. Nevertheless, the community suffers from a lack of standardized benchmarks and metrics, leaving the progress in this area not fully comprehended. To fill this gap, we present the first comprehensive benchmark, termed \textit{PruningBench}, for structural pruning. PruningBench showcases the following three characteristics: 1) PruningBench employs a unified and consistent framework for evaluating the effectiveness of diverse structural pruning techniques; 2) PruningBench systematically evaluates 16 existing pruning methods, encompassing a wide array of models (e.g., CNNs and ViTs) and tasks (e.g., classification and detection); 3) PruningBench provides easily implementable interfaces to facilitate the implementation of future pruning methods, and enables the subsequent researchers to incorporate their work into our leaderboards. We provide an online pruning platform http://pruning.vipazoo.cn for customizing pruning tasks and reproducing all results in this paper. Codes will be made publicly on https://github.com/HollyLee2000/PruningBench.
△ Less
Submitted 20 July, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
Authors:
Zigeng Chen,
Xinyin Ma,
Gongfan Fang,
Zhenxiong Tan,
Xinchao Wang
Abstract:
Diffusion models have garnered significant interest from the community for their great generative ability across various applications. However, their typical multi-step sequential-denoising nature gives rise to high cumulative latency, thereby precluding the possibilities of parallel computation. To address this, we introduce AsyncDiff, a universal and plug-and-play acceleration scheme that enable…
▽ More
Diffusion models have garnered significant interest from the community for their great generative ability across various applications. However, their typical multi-step sequential-denoising nature gives rise to high cumulative latency, thereby precluding the possibilities of parallel computation. To address this, we introduce AsyncDiff, a universal and plug-and-play acceleration scheme that enables model parallelism across multiple devices. Our approach divides the cumbersome noise prediction model into multiple components, assigning each to a different device. To break the dependency chain between these components, it transforms the conventional sequential denoising into an asynchronous process by exploiting the high similarity between hidden states in consecutive diffusion steps. Consequently, each component is facilitated to compute in parallel on separate devices. The proposed strategy significantly reduces inference latency while minimally impacting the generative quality. Specifically, for the Stable Diffusion v2.1, AsyncDiff achieves a 2.7x speedup with negligible degradation and a 4.0x speedup with only a slight reduction of 0.38 in CLIP Score, on four NVIDIA A5000 GPUs. Our experiments also demonstrate that AsyncDiff can be readily applied to video diffusion models with encouraging performances. The code is available at https://github.com/czg1225/AsyncDiff.
△ Less
Submitted 26 September, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
Exploring flavour space of an economical SU(5) GUT in future proton decay measurements
Authors:
Gao-Xiang Fang,
Ye-Ling Zhou
Abstract:
We discuss the potential of future proton decay experiments on the exploration of the flavour space of grand unification. We focus on an economical SU(5) grand unified model (GUT) with the fermion sector extended by including only one copy of 24-plet. Neutrino masses are generated via type-(I+III) seesaw mechanism with the lightest neutrino massless. Gauge unification requires masses of fermions i…
▽ More
We discuss the potential of future proton decay experiments on the exploration of the flavour space of grand unification. We focus on an economical SU(5) grand unified model (GUT) with the fermion sector extended by including only one copy of 24-plet. Neutrino masses are generated via type-(I+III) seesaw mechanism with the lightest neutrino massless. Gauge unification requires masses of fermions in the 24-plet to be hierarchical, in particular, the electroweak singlet and triplet heavy leptons to be around the canonical seesaw scale and TeV scale, respectively. We address how extra parameters in the flavour space which cannot be touched in flavour measurements can be tested by a multi-channel analysis in future proton decay measurements.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
Authors:
Xinyin Ma,
Gongfan Fang,
Michael Bi Mi,
Xinchao Wang
Abstract:
Diffusion Transformers have recently demonstrated unprecedented generative capabilities for various tasks. The encouraging results, however, come with the cost of slow inference, since each denoising step requires inference on a transformer model with a large scale of parameters. In this study, we make an interesting and somehow surprising observation: the computation of a large proportion of laye…
▽ More
Diffusion Transformers have recently demonstrated unprecedented generative capabilities for various tasks. The encouraging results, however, come with the cost of slow inference, since each denoising step requires inference on a transformer model with a large scale of parameters. In this study, we make an interesting and somehow surprising observation: the computation of a large proportion of layers in the diffusion transformer, through introducing a caching mechanism, can be readily removed even without updating the model parameters. In the case of U-ViT-H/2, for example, we may remove up to 93.68% of the computation in the cache steps (46.84% for all steps), with less than 0.01 drop in FID. To achieve this, we introduce a novel scheme, named Learning-to-Cache (L2C), that learns to conduct caching in a dynamic manner for diffusion transformers. Specifically, by leveraging the identical structure of layers in transformers and the sequential nature of diffusion, we explore redundant computations between timesteps by treating each layer as the fundamental unit for caching. To address the challenge of the exponential search space in deep models for identifying layers to cache and remove, we propose a novel differentiable optimization objective. An input-invariant yet timestep-variant router is then optimized, which can finally produce a static computation graph. Experimental results show that L2C largely outperforms samplers such as DDIM and DPM-Solver, alongside prior cache-based methods at the same inference speed.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
On Non-asymptotic Theory of Recurrent Neural Networks in Temporal Point Processes
Authors:
Zhiheng Chen,
Guanhua Fang,
Wen Yu
Abstract:
Temporal point process (TPP) is an important tool for modeling and predicting irregularly timed events across various domains. Recently, the recurrent neural network (RNN)-based TPPs have shown practical advantages over traditional parametric TPP models. However, in the current literature, it remains nascent in understanding neural TPPs from theoretical viewpoints. In this paper, we establish the…
▽ More
Temporal point process (TPP) is an important tool for modeling and predicting irregularly timed events across various domains. Recently, the recurrent neural network (RNN)-based TPPs have shown practical advantages over traditional parametric TPP models. However, in the current literature, it remains nascent in understanding neural TPPs from theoretical viewpoints. In this paper, we establish the excess risk bounds of RNN-TPPs under many well-known TPP settings. We especially show that an RNN-TPP with no more than four layers can achieve vanishing generalization errors. Our technical contributions include the characterization of the complexity of the multi-layer RNN class, the construction of $\tanh$ neural networks for approximating dynamic event intensity functions, and the truncation technique for alleviating the issue of unbounded event sequences. Our results bridge the gap between TPP's application and neural network theory.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
On Robust Clustering of Temporal Point Process
Authors:
Yuecheng Zhang,
Guanhua Fang,
Wen Yu
Abstract:
Clustering of event stream data is of great importance in many application scenarios, including but not limited to, e-commerce, electronic health, online testing, mobile music service, etc. Existing clustering algorithms fail to take outlier data into consideration and are implemented without theoretical guarantees. In this paper, we propose a robust temporal point processes clustering framework w…
▽ More
Clustering of event stream data is of great importance in many application scenarios, including but not limited to, e-commerce, electronic health, online testing, mobile music service, etc. Existing clustering algorithms fail to take outlier data into consideration and are implemented without theoretical guarantees. In this paper, we propose a robust temporal point processes clustering framework which works under mild assumptions and meanwhile addresses several important issues in the event stream clustering problem.Specifically, we introduce a computationally efficient model-free distance function to quantify the dissimilarity between different event streams so that the outliers can be detected and the good initial clusters could be obtained. We further consider an expectation-maximization-type algorithm incorporated with a Catoni's influence function for robust estimation and fine-tuning of clusters. We also establish the theoretical results including algorithmic convergence, estimation error bound, outlier detection, etc. Simulation results corroborate our theoretical findings and real data applications show the effectiveness of our proposed methodology.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Function based sim-to-real learning for shape control of deformable free-form surfaces
Authors:
Yingjun Tian,
Guoxin Fang,
Renbo Su,
Weiming Wang,
Simeon Gill,
Andrew Weightman,
Charlie C. L. Wang
Abstract:
For the shape control of deformable free-form surfaces, simulation plays a crucial role in establishing the mapping between the actuation parameters and the deformed shapes. The differentiation of this forward kinematic mapping is usually employed to solve the inverse kinematic problem for determining the actuation parameters that can realize a target shape. However, the free-form surfaces obtaine…
▽ More
For the shape control of deformable free-form surfaces, simulation plays a crucial role in establishing the mapping between the actuation parameters and the deformed shapes. The differentiation of this forward kinematic mapping is usually employed to solve the inverse kinematic problem for determining the actuation parameters that can realize a target shape. However, the free-form surfaces obtained from simulators are always different from the physically deformed shapes due to the errors introduced by hardware and the simplification adopted in physical simulation. To fill the gap, we propose a novel deformation function based sim-to-real learning method that can map the geometric shape of a simulated model into its corresponding shape of the physical model. Unlike the existing sim-to-real learning methods that rely on completely acquired dense markers, our method accommodates sparsely distributed markers and can resiliently use all captured frames -- even for those in the presence of missing markers. To demonstrate its effectiveness, our sim-to-real method has been integrated into a neural network-based computational pipeline designed to tackle the inverse kinematic problem on a pneumatically actuated deformable mannequin.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Millimeter Wave Radar-based Human Activity Recognition for Healthcare Monitoring Robot
Authors:
Zhanzhong Gu,
Xiangjian He,
Gengfa Fang,
Chengpei Xu,
Feng Xia,
Wenjing Jia
Abstract:
Healthcare monitoring is crucial, especially for the daily care of elderly individuals living alone. It can detect dangerous occurrences, such as falls, and provide timely alerts to save lives. Non-invasive millimeter wave (mmWave) radar-based healthcare monitoring systems using advanced human activity recognition (HAR) models have recently gained significant attention. However, they encounter cha…
▽ More
Healthcare monitoring is crucial, especially for the daily care of elderly individuals living alone. It can detect dangerous occurrences, such as falls, and provide timely alerts to save lives. Non-invasive millimeter wave (mmWave) radar-based healthcare monitoring systems using advanced human activity recognition (HAR) models have recently gained significant attention. However, they encounter challenges in handling sparse point clouds, achieving real-time continuous classification, and coping with limited monitoring ranges when statically mounted. To overcome these limitations, we propose RobHAR, a movable robot-mounted mmWave radar system with lightweight deep neural networks for real-time monitoring of human activities. Specifically, we first propose a sparse point cloud-based global embedding to learn the features of point clouds using the light-PointNet (LPN) backbone. Then, we learn the temporal pattern with a bidirectional lightweight LSTM model (BiLiLSTM). In addition, we implement a transition optimization strategy, integrating the Hidden Markov Model (HMM) with Connectionist Temporal Classification (CTC) to improve the accuracy and robustness of the continuous HAR. Our experiments on three datasets indicate that our method significantly outperforms the previous studies in both discrete and continuous HAR tasks. Finally, we deploy our system on a movable robot-mounted edge computing platform, achieving flexible healthcare monitoring in real-world scenarios.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
USmorph: An Updated Framework of Automatic Classification of Galaxy Morphologies and Its Application to Galaxies in the COSMOS Field
Authors:
Jie Song,
GuanWen Fang,
Shuo Ba,
Zesen Lin,
Yizhou Gu,
Chichun Zhou,
Tao Wang,
Cai-Na Hao,
Guilin Liu,
Hongxin Zhang,
Yao Yao,
Xu Kong
Abstract:
Morphological classification conveys abundant information on the formation, evolution, and environment of galaxies. In this work, we refine the two-step galaxy morphological classification framework ({\tt\string USmorph}), which employs a combination of unsupervised machine learning (UML) and supervised machine learning (SML) techniques, along with a self-consistent and robust data preprocessing s…
▽ More
Morphological classification conveys abundant information on the formation, evolution, and environment of galaxies. In this work, we refine the two-step galaxy morphological classification framework ({\tt\string USmorph}), which employs a combination of unsupervised machine learning (UML) and supervised machine learning (SML) techniques, along with a self-consistent and robust data preprocessing step. The updated method is applied to the galaxies with $I_{\rm mag}<25$ at $0.2<z<1.2$ in the COSMOS field. Based on their HST/ACS I-band images, we classify them into five distinct morphological types: spherical (SPH, 15,200), early-type disk (ETD, 17,369), late-type disk (LTD, 21,143), irregular disk (IRR, 28,965), and unclassified (UNC, 17,129). In addition, we have conducted both parametric and nonparametric morphological measurements. For galaxies with stellar masses exceeding $10^{9}M_{\sun}$, a gradual increase in effective radius from SPHs to IRRs is observed, accompanied by a decrease in the Sérsic index. Nonparametric morphologies reveal distinct distributions of galaxies across the $Gini-M_{20}$ and $C-A$ parameter spaces for different categories. Moreover, different categories exhibit significant dissimilarity in their $G_2$ and $Ψ$ distributions. We find morphology to be strongly correlated with redshift and stellar mass. The consistency of these classification results with expected correlations among multiple parameters underscores the validity and reliability of our classification method, rendering it a valuable tool for future studies.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Exploring Diverse Sounds: Identifying Outliers in a Music Corpus
Authors:
Le Cai,
Sam Ferguson,
Gengfa Fang,
Hani Alshamrani
Abstract:
Existing research on music recommendation systems primarily focuses on recommending similar music, thereby often neglecting diverse and distinctive musical recordings. Musical outliers can provide valuable insights due to the inherent diversity of music itself. In this paper, we explore music outliers, investigating their potential usefulness for music discovery and recommendation systems. We argu…
▽ More
Existing research on music recommendation systems primarily focuses on recommending similar music, thereby often neglecting diverse and distinctive musical recordings. Musical outliers can provide valuable insights due to the inherent diversity of music itself. In this paper, we explore music outliers, investigating their potential usefulness for music discovery and recommendation systems. We argue that not all outliers should be treated as noise, as they can offer interesting perspectives and contribute to a richer understanding of an artist's work. We introduce the concept of 'Genuine' music outliers and provide a definition for them. These genuine outliers can reveal unique aspects of an artist's repertoire and hold the potential to enhance music discovery by exposing listeners to novel and diverse musical experiences.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians
Authors:
Guangchi Fang,
Bing Wang
Abstract:
In this study, we explore the challenge of efficiently representing scenes with a constrained number of Gaussians. Our analysis shifts from traditional graphics and 2D computer vision to the perspective of point clouds, highlighting the inefficient spatial distribution of Gaussian representation as a key limitation in model performance. To address this, we introduce strategies for densification in…
▽ More
In this study, we explore the challenge of efficiently representing scenes with a constrained number of Gaussians. Our analysis shifts from traditional graphics and 2D computer vision to the perspective of point clouds, highlighting the inefficient spatial distribution of Gaussian representation as a key limitation in model performance. To address this, we introduce strategies for densification including blur split and depth reinitialization, and simplification through intersection preserving and sampling. These techniques reorganize the spatial positions of the Gaussians, resulting in significant improvements across various datasets and benchmarks in terms of rendering quality, resource consumption, and storage compression. Our Mini-Splatting integrates seamlessly with the original rasterization pipeline, providing a strong baseline for future research in Gaussian-Splatting-based works. \href{https://github.com/fatPeter/mini-splatting}{Code is available}.
△ Less
Submitted 16 October, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
ChartThinker: A Contextual Chain-of-Thought Approach to Optimized Chart Summarization
Authors:
Mengsha Liu,
Daoyuan Chen,
Yaliang Li,
Guian Fang,
Ying Shen
Abstract:
Data visualization serves as a critical means for presenting data and mining its valuable insights. The task of chart summarization, through natural language processing techniques, facilitates in-depth data analysis of charts. However, there still are notable deficiencies in terms of visual-language matching and reasoning ability for existing approaches. To address these limitations, this study co…
▽ More
Data visualization serves as a critical means for presenting data and mining its valuable insights. The task of chart summarization, through natural language processing techniques, facilitates in-depth data analysis of charts. However, there still are notable deficiencies in terms of visual-language matching and reasoning ability for existing approaches. To address these limitations, this study constructs a large-scale dataset of comprehensive chart-caption pairs and fine-tuning instructions on each chart. Thanks to the broad coverage of various topics and visual styles within this dataset, better matching degree can be achieved from the view of training data. Moreover, we propose an innovative chart summarization method, ChartThinker, which synthesizes deep analysis based on chains of thought and strategies of context retrieval, aiming to improve the logical coherence and accuracy of the generated summaries. Built upon the curated datasets, our trained model consistently exhibits superior performance in chart summarization tasks, surpassing 8 state-of-the-art models over 7 evaluation metrics. Our dataset and codes are publicly accessible.
△ Less
Submitted 24 April, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
Design principles of nonlinear optical materials for Terahertz lasers
Authors:
Juan Han,
Yiwei Sun,
Xiamin Huang,
Wenjun Shuai,
Guangyou Fang,
Zhou Li
Abstract:
We have investigated both inter-band and intra-band second order nonlinear optical conductivity based on the velocity correlation formalism and the spectral expansion technique. We propose a scenario in which the second order intra-band process is nonzero while the inter-band process is zero. This occurs for a band structure with momentum asymmetry in the Brillouin zone. Very low-energy photons ar…
▽ More
We have investigated both inter-band and intra-band second order nonlinear optical conductivity based on the velocity correlation formalism and the spectral expansion technique. We propose a scenario in which the second order intra-band process is nonzero while the inter-band process is zero. This occurs for a band structure with momentum asymmetry in the Brillouin zone. Very low-energy photons are blocked by the Pauli exclusion principle from participating in the inter-band process; however, they are permitted to participate in the intra-band process, with the band smeared by some impurity scattering. We establish a connection between the inter-band nonlinear optical conductivity in the velocity gauge and the shift vector in the length gauge for a two-band model. Using a quasiclassical kinetic approach, we demonstrate the importance of intra-band transitions in high harmonic generations for the single tilted Dirac cone model and hexagonal warping model. We confirm that the Kramers-Kronig relations break down for the limit case of ($ω$, $-ω$) in the nonlinear optical conductivity. Finally, we calculate the superconducting transition temperature of NbN and the dielectric function of AlN, and the resistance of the NbN/AlN junction. The natural non-linearity of the Josephson junction brings a Josephson plasma with frequency in the Terahertz region.
△ Less
Submitted 4 July, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
On provable privacy vulnerabilities of graph representations
Authors:
Ruofan Wu,
Guanhua Fang,
Qiying Pan,
Mingyang Zhang,
Tengfei Liu,
Weiqiang Wang
Abstract:
Graph representation learning (GRL) is critical for extracting insights from complex network structures, but it also raises security concerns due to potential privacy vulnerabilities in these representations. This paper investigates the structural vulnerabilities in graph neural models where sensitive topological information can be inferred through edge reconstruction attacks. Our research primari…
▽ More
Graph representation learning (GRL) is critical for extracting insights from complex network structures, but it also raises security concerns due to potential privacy vulnerabilities in these representations. This paper investigates the structural vulnerabilities in graph neural models where sensitive topological information can be inferred through edge reconstruction attacks. Our research primarily addresses the theoretical underpinnings of similarity-based edge reconstruction attacks (SERA), furnishing a non-asymptotic analysis of their reconstruction capacities. Moreover, we present empirical corroboration indicating that such attacks can perfectly reconstruct sparse graphs as graph size increases. Conversely, we establish that sparsity is a critical factor for SERA's effectiveness, as demonstrated through analysis and experiments on (dense) stochastic block models. Finally, we explore the resilience of private graph representations produced via noisy aggregation (NAG) mechanism against SERA. Through theoretical analysis and empirical assessments, we affirm the mitigation of SERA using NAG . In parallel, we also empirically delineate instances wherein SERA demonstrates both efficacy and deficiency in its capacity to function as an instrument for elucidating the trade-off between privacy and utility.
△ Less
Submitted 23 May, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Study on the mixing of $Ξ_c$ and $Ξ'_c$ by the transition $Ξ_{b}\toΞ^{(')}_c$
Authors:
Hong-Wei Ke,
Gang-Yang Fang,
Yan-Liang Shi
Abstract:
Recently, the LHCb collaboration has observed the decays $Ξ^0_{b}\toΞ^+_{c}D^-_s$ and $Ξ^-_{b}\toΞ^0_{c}D^-_s$. They measured the relative branching fractions times the ratio of beauty-baryon production cross-sections $\mathcal{R}(\frac{Ξ^0_b}{Λ_b})\equiv\frac{σ(Ξ_b^0)}{σ(Λ^0_b)}\times\frac{B(Ξ^0_{b}\toΞ^+_{c}D^-_s)}{B(Λ^0_{b}\toΛ^+_{c}D^-_s)}$ and…
▽ More
Recently, the LHCb collaboration has observed the decays $Ξ^0_{b}\toΞ^+_{c}D^-_s$ and $Ξ^-_{b}\toΞ^0_{c}D^-_s$. They measured the relative branching fractions times the ratio of beauty-baryon production cross-sections $\mathcal{R}(\frac{Ξ^0_b}{Λ_b})\equiv\frac{σ(Ξ_b^0)}{σ(Λ^0_b)}\times\frac{B(Ξ^0_{b}\toΞ^+_{c}D^-_s)}{B(Λ^0_{b}\toΛ^+_{c}D^-_s)}$ and $\mathcal{R}(\frac{Ξ^-_b}{Λ_b})\equiv\frac{σ(Ξ^-_b)}{σ(Λ^0_b)}\times\frac{B(Ξ^-_{b}\toΞ^0_{c}D^-_s)}{B(Λ^0_{b}\toΛ^+_{c}D^-_s)}$. Once the ratio $\frac{σ(Ξ_b^0)}{σ(Λ^0_b)}$ or $\frac{σ(Ξ_b^-)}{σ(Λ^0_b)}$ is known, one can determine the relative branching fractions which can be used to exam the mixing of $Ξ_c$ and $Ξ'_c$. In previous literature, $Ξ_c$ and $Ξ'_c$ were assumed to belong to SU(3)$_F$ antitriple and sextet, respectively. However, recent experimental measurements such as the ratio $Γ(Ξ_{cc}\toΞ_cπ^+)/Γ(Ξ_{cc}\toΞ'_cπ^+)$ indicate the spin-flavor structures of $Ξ_{c}$ and $Ξ'_{c}$ are a mixture of $Ξ^{\bar 3}_{c}$ and $Ξ^{6}_{c}$. The exact value of mixing angle $θ$ is still under debate. In theoretical models, the mixing angle was fitted to be about $16.27^\circ\pm2.30^\circ$ or $85.54^\circ\pm2.30^\circ$ based on decay channels $Ξ_{cc} \toΞ^{(')}_{cc} $. While in lattice calculation, a small angle ($1.2^\circ\pm0.1^\circ$) is preferred. To address such discrepancy and test the mixing of $Ξ_c$ and $Ξ'_c$, here we propose the analysis of semileptonic and non-leptonic decays of $Ξ_{b}\toΞ_{c}$ and $Ξ_{b}\toΞ^{'}_{c}$. We calculate the decay rate of $Ξ_{b}\toΞ_{c}$ and $Ξ_{b}\toΞ^{'}_{c}$ based on light-front quark model and study the effect of the mixing angle on the ratios of weak decays $Ξ_{b}\toΞ_{c}$ and $Ξ_{b}\toΞ^{'}_{c}$.....
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Towards A Better Metric for Text-to-Video Generation
Authors:
Jay Zhangjie Wu,
Guian Fang,
Haoning Wu,
Xintao Wang,
Yixiao Ge,
Xiaodong Cun,
David Junhao Zhang,
Jia-Wei Liu,
Yuchao Gu,
Rui Zhao,
Weisi Lin,
Wynne Hsu,
Ying Shan,
Mike Zheng Shou
Abstract:
Generative models have demonstrated remarkable capability in synthesizing high-quality text, images, and videos. For video generation, contemporary text-to-video models exhibit impressive capabilities, crafting visually stunning videos. Nonetheless, evaluating such videos poses significant challenges. Current research predominantly employs automated metrics such as FVD, IS, and CLIP Score. However…
▽ More
Generative models have demonstrated remarkable capability in synthesizing high-quality text, images, and videos. For video generation, contemporary text-to-video models exhibit impressive capabilities, crafting visually stunning videos. Nonetheless, evaluating such videos poses significant challenges. Current research predominantly employs automated metrics such as FVD, IS, and CLIP Score. However, these metrics provide an incomplete analysis, particularly in the temporal assessment of video content, thus rendering them unreliable indicators of true video quality. Furthermore, while user studies have the potential to reflect human perception accurately, they are hampered by their time-intensive and laborious nature, with outcomes that are often tainted by subjective bias. In this paper, we investigate the limitations inherent in existing metrics and introduce a novel evaluation pipeline, the Text-to-Video Score (T2VScore). This metric integrates two pivotal criteria: (1) Text-Video Alignment, which scrutinizes the fidelity of the video in representing the given text description, and (2) Video Quality, which evaluates the video's overall production caliber with a mixture of experts. Moreover, to evaluate the proposed metrics and facilitate future improvements on them, we present the TVGE dataset, collecting human judgements of 2,543 text-to-video generated videos on the two criteria. Experiments on the TVGE dataset demonstrate the superiority of the proposed T2VScore on offering a better metric for text-to-video generation.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
The Hubble Deep Hydrogen Alpha (HDH$α$) Project: I. Catalog of Emission-line Galaxies
Authors:
Shuairu Zhu,
Zhen-Ya Zheng,
James Rhoads,
Junxian Wang,
Linhua Jiang,
Chunyan Jiang,
Fang-Ting Yuan,
P. T. Rahna,
Weida Hu,
Ruqiu Lin,
Huanyuan Shan,
Chun Xu,
Leopoldo Infante,
L. Felipe Barrientos,
Xianzhong Zheng,
Guanwen Fang,
Zhixiong Liang
Abstract:
We present the first results of the Hubble Deep Hydrogen Alpha (HDH$α$) project, which analyzes the space-borne deep H$α$ narrowband imaging data in the GOODS-S region. The HDH$α$ data comprises 72 orbits' images taken with the HST ACS/WFC F658N filter. The exposure time varies across a total area of $\sim$76.1 $\rm{arcmin}^2$, adding up to a total exposure time of 195.7 ks, among which 68.8 ks ar…
▽ More
We present the first results of the Hubble Deep Hydrogen Alpha (HDH$α$) project, which analyzes the space-borne deep H$α$ narrowband imaging data in the GOODS-S region. The HDH$α$ data comprises 72 orbits' images taken with the HST ACS/WFC F658N filter. The exposure time varies across a total area of $\sim$76.1 $\rm{arcmin}^2$, adding up to a total exposure time of 195.7 ks, among which 68.8 ks are spent in the deepest region. These images are aligned, reprojected, and combined to have the same pixel grid as the Hubble Legacy Fields (HLF). The scientific goals of the HDH$α$ include establishing a sample of emission-line galaxies (ELGs) including [O III] emitters at $z\sim$ 0.3, [O II] emitters at $z\sim$ 0.8, and Lyman-$α$ emitters (LAEs) at $z \sim 4.4$, studying the line morphology of ELGs with high resolution imaging data, and statistically analyzing the line luminosity functions and line equivalent-width distributions of ELGs selected with HST. Furthermore, the HDH$α$ project enhances the legacy value of the GOODS-S field by contributing the first HST-based narrowband image to the existing data sets, which includes the HST broadband data and other ancillary data from X-ray to radio taken by other facilities. In this paper, we describe the data reduction process of the HDH$α$, select ELGs based on HST's F658N and broadband data, validate the redshifts of the selected candidates by cross matching with the public spectroscopic catalogs in the GOODS-S, and present a final catalog of the confirmed [O III] emitters at $z\sim$ 0.3, [O II] emitters at $z\sim$ 0.8, and LAEs at $z \sim 4.4$.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
SlimSAM: 0.1% Data Makes Segment Anything Slim
Authors:
Zigeng Chen,
Gongfan Fang,
Xinyin Ma,
Xinchao Wang
Abstract:
Current approaches for compressing the Segment Anything Model (SAM) yield commendable results, yet necessitate extensive data to train a new network from scratch. Employing conventional pruning techniques can remarkably reduce data requirements but would suffer from a degradation in performance. To address this challenging trade-off, we introduce SlimSAM, a novel data-efficient SAM compression met…
▽ More
Current approaches for compressing the Segment Anything Model (SAM) yield commendable results, yet necessitate extensive data to train a new network from scratch. Employing conventional pruning techniques can remarkably reduce data requirements but would suffer from a degradation in performance. To address this challenging trade-off, we introduce SlimSAM, a novel data-efficient SAM compression method that achieves superior performance with extremely less training data. The essence of SlimSAM is encapsulated in the alternate slimming framework which effectively enhances knowledge inheritance under severely limited training data availability and exceptional pruning ratio. Diverging from prior techniques, our framework progressively compresses the model by alternately pruning and distilling distinct, decoupled sub-structures. Disturbed Taylor pruning is also proposed to address the misalignment between the pruning objective and training target, thereby boosting the post-distillation after pruning. SlimSAM yields significant performance improvements while demanding over 10 times less training data than any other existing compression methods. Even when compared to the original SAM, SlimSAM achieves approaching performance while reducing parameter counts to merely 1.4% (9.1M), MACs to 0.8% (23G), and requiring only 0.1% (10k) of the SAM training data. The code is available at http://github.com/czg1225/SlimSAM.
△ Less
Submitted 26 September, 2024; v1 submitted 8 December, 2023;
originally announced December 2023.
-
DeepCache: Accelerating Diffusion Models for Free
Authors:
Xinyin Ma,
Gongfan Fang,
Xinchao Wang
Abstract:
Diffusion models have recently gained unprecedented attention in the field of image synthesis due to their remarkable generative capabilities. Notwithstanding their prowess, these models often incur substantial computational costs, primarily attributed to the sequential denoising process and cumbersome model size. Traditional methods for compressing diffusion models typically involve extensive ret…
▽ More
Diffusion models have recently gained unprecedented attention in the field of image synthesis due to their remarkable generative capabilities. Notwithstanding their prowess, these models often incur substantial computational costs, primarily attributed to the sequential denoising process and cumbersome model size. Traditional methods for compressing diffusion models typically involve extensive retraining, presenting cost and feasibility challenges. In this paper, we introduce DeepCache, a novel training-free paradigm that accelerates diffusion models from the perspective of model architecture. DeepCache capitalizes on the inherent temporal redundancy observed in the sequential denoising steps of diffusion models, which caches and retrieves features across adjacent denoising stages, thereby curtailing redundant computations. Utilizing the property of the U-Net, we reuse the high-level features while updating the low-level features in a very cheap way. This innovative strategy, in turn, enables a speedup factor of 2.3$\times$ for Stable Diffusion v1.5 with only a 0.05 decline in CLIP Score, and 4.1$\times$ for LDM-4-G with a slight decrease of 0.22 in FID on ImageNet. Our experiments also demonstrate DeepCache's superiority over existing pruning and distillation methods that necessitate retraining and its compatibility with current sampling techniques. Furthermore, we find that under the same throughput, DeepCache effectively achieves comparable or even marginally improved results with DDIM or PLMS. The code is available at https://github.com/horseee/DeepCache
△ Less
Submitted 7 December, 2023; v1 submitted 1 December, 2023;
originally announced December 2023.
-
Exceptional Mechanical Performance by Spatial Printing with Continuous Fiber: Curved Slicing, Toolpath Generation and Physical Verification
Authors:
Guoxin Fang,
Tianyu Zhang,
Yuming Huang,
Zhizhou Zhang,
Kunal Masania,
Charlie C. L. Wang
Abstract:
This work explores a spatial printing method to fabricate continuous fiber-reinforced thermoplastic composites (CFRTPCs), which can achieve exceptional mechanical performance. For models giving complex 3D stress distribution under loads, typical planar-layer based fiber placement usually fails to provide sufficient reinforcement due to their orientations being constrained to planes. The effectiven…
▽ More
This work explores a spatial printing method to fabricate continuous fiber-reinforced thermoplastic composites (CFRTPCs), which can achieve exceptional mechanical performance. For models giving complex 3D stress distribution under loads, typical planar-layer based fiber placement usually fails to provide sufficient reinforcement due to their orientations being constrained to planes. The effectiveness of fiber reinforcement could be maximized by using multi-axis additive manufacturing (MAAM) to better control the orientation of continuous fibers in 3D-printed composites. Here, we propose a computational approach to generate 3D toolpaths that satisfy two major reinforcement objectives: 1) following the maximal stress directions in critical regions and 2) connecting multiple load-bearing regions by continuous fibers. Principal stress lines are first extracted in an input solid model to identify critical regions. Curved layers aligned with maximal stresses in these critical regions are generated by computing an optimized scalar field and extracting its iso-surfaces. Then, topological analysis and operations are applied to each curved layer to generate a computational domain that preserves fiber continuity between load-bearing regions. Lastly, continuous fiber toolpaths aligned with maximal stresses are generated on each surface layer by computing an optimized scalar field and extracting its iso-curves. A hardware system with dual robotic arms is employed to conduct the physical MAAM tasks depositing polymer or fiber reinforced polymer composite materials by applying a force normal to the extrusion plane to aid consolidation. When comparing to planar-layer based printing results in tension, up to 644% failure load and 240% stiffness are observed on shapes fabricated by our spatial printing method.
△ Less
Submitted 25 January, 2024; v1 submitted 28 November, 2023;
originally announced November 2023.
-
Multimodal Identification of Alzheimer's Disease: A Review
Authors:
Guian Fang,
Mengsha Liu,
Yi Zhong,
Zhuolin Zhang,
Jiehui Huang,
Zhenchao Tang,
Calvin Yu-Chian Chen
Abstract:
Alzheimer's disease is a progressive neurological disorder characterized by cognitive impairment and memory loss. With the increasing aging population, the incidence of AD is continuously rising, making early diagnosis and intervention an urgent need. In recent years, a considerable number of teams have applied computer-aided diagnostic techniques to early classification research of AD. Most studi…
▽ More
Alzheimer's disease is a progressive neurological disorder characterized by cognitive impairment and memory loss. With the increasing aging population, the incidence of AD is continuously rising, making early diagnosis and intervention an urgent need. In recent years, a considerable number of teams have applied computer-aided diagnostic techniques to early classification research of AD. Most studies have utilized imaging modalities such as magnetic resonance imaging (MRI), positron emission tomography (PET), and electroencephalogram (EEG). However, there have also been studies that attempted to use other modalities as input features for the models, such as sound, posture, biomarkers, cognitive assessment scores, and their fusion. Experimental results have shown that the combination of multiple modalities often leads to better performance compared to a single modality. Therefore, this paper will focus on different modalities and their fusion, thoroughly elucidate the mechanisms of various modalities, explore which methods should be combined to better harness their utility, analyze and summarize the literature in the field of early classification of AD in recent years, in order to explore more possibilities of modality combinations.
△ Less
Submitted 6 October, 2023;
originally announced November 2023.
-
A Direct Approach for Solving Cloud Computing Task Assignment with Soft Deadlines
Authors:
Guang Fang,
Yuxiang Zhao
Abstract:
Job scheduling in cloud computing environments is a critical yet complex problem. Cloud computing user job requirements are highly dynamic and uncertain, while cloud computing resources are heterogeneous and constrained. This paper studies the online resource allocation problem for elastic computing jobs with soft deadlines in cloud computing environments. The main contributions include: 1) Intege…
▽ More
Job scheduling in cloud computing environments is a critical yet complex problem. Cloud computing user job requirements are highly dynamic and uncertain, while cloud computing resources are heterogeneous and constrained. This paper studies the online resource allocation problem for elastic computing jobs with soft deadlines in cloud computing environments. The main contributions include: 1) Integer linear programming modeling is used to design an auction time scheduling framework with three key modules - resource allocation, evaluation, and operation, which can dynamically allocate resources in closed loops. 2) Methods such as time-based single resource utilization evaluation and weighted average evaluation are proposed to evaluate resource usage efficiency. 3) Soft acceptance protocols are introduced to achieve elastic online resource allocation. 4) The time complexity of the proposed algorithms is analyzed and proven to be polynomial time, demonstrating efficiency. 5) Modular design makes the framework extensible. This paper provides a structured cloud computing auction framework as a reference for building practical cloud resource management systems. Future work may explore more complex models of random arrival and multi-dimensional resource constraints, evaluate algorithm performance on real cloud workloads, and further enhance system robustness, efficiency and fairness.
△ Less
Submitted 22 December, 2023; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Real-space sampling of terahertz waveforms under scanning tunneling microscope
Authors:
Hongbo Li,
Tianwu Wang,
Wenyin Wei,
Kai Zhang,
Jing-yin Xu,
Yirong Wu,
Guangyou Fang
Abstract:
Terahertz scanning tunneling microscopy (THz-STM) has emerged as a potent technique for probing ultrafast nanoscale dynamics with exceptional spatiotemporal precision, whereby the acquisition of THz near-field waveforms holds paramount significance. While substantial efforts have been dedicated to retrieving the waveform utilizing the photoemission current or a molecular sensor, these methods are…
▽ More
Terahertz scanning tunneling microscopy (THz-STM) has emerged as a potent technique for probing ultrafast nanoscale dynamics with exceptional spatiotemporal precision, whereby the acquisition of THz near-field waveforms holds paramount significance. While substantial efforts have been dedicated to retrieving the waveform utilizing the photoemission current or a molecular sensor, these methods are challenged by intensive thermal effects or complex sample preparations. In this study, we introduce a universal approach for real-time characterization of THz near-field waveforms within the tunnel junction, achieving sub-nanometer spatial resolution. Utilizing the gating mechanism intrinsic to the STM junction, coherent scanning of a gated strong THz pulse over a weak THz pulse is achieved, facilitating direct measurement of the waveform. Notably, employing a custom-built Carrier-Envelope Phase (CEP) shifter, THz-CEP has been successfully characterized in the tunnel junction. Moreover, sub-nanometer spatial resolution of the THz-driven field emission current has been shown, underscoring the nanoscale resolving ability of our methodology. Ultimately, the potential application of this method for local THz time domain spectroscopy imaging has been demonstrated through point-to-point waveform sampling over the Au (111) surface.
△ Less
Submitted 20 March, 2024; v1 submitted 26 October, 2023;
originally announced October 2023.
-
Solution to the conflict between the resolved and unresolved galaxy stellar mass estimation from the perspective of JWST
Authors:
Jie Song,
GuanWen Fang,
Zesen Lin,
Yizhou Gu,
Xu Kong
Abstract:
By utilizing the spatially-resolved photometry of galaxies at $0.2<z<3.0$ in the CEERS field, we estimate the resolved and unresolved stellar mass via spectral energy distribution (SED) fitting to study the discrepancy between them. We first compare $M_{\ast}$ derived from photometry with and without the JWST wavelength coverage and find that $M_{\ast}$ can be overestimated by up to 0.2 dex when l…
▽ More
By utilizing the spatially-resolved photometry of galaxies at $0.2<z<3.0$ in the CEERS field, we estimate the resolved and unresolved stellar mass via spectral energy distribution (SED) fitting to study the discrepancy between them. We first compare $M_{\ast}$ derived from photometry with and without the JWST wavelength coverage and find that $M_{\ast}$ can be overestimated by up to 0.2 dex when lacking rest-frame NIR data. The SED fitting process tends to overestimate both stellar age and dust attenuation in the absence of rest-frame NIR data, consequently leading to a larger observed mass-to-light ratio and hence an elevated $M_{\ast}$. With the inclusion of the JWST NIR photometry, we find no significant disparity between the resolved and unresolved stellar mass estimates, providing a plausible solution to the conflict between them out to $z\sim 3$. Further investigation demonstrates that reliable $M_{\ast}$ estimates can be obtained, regardless of whether they are derived from spatially resolved or spatially unresolved photometry, so long as the reddest filter included in the SED fitting has a rest-frame wavelength larger than 10000 Å.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Improving Compositional Text-to-image Generation with Large Vision-Language Models
Authors:
Song Wen,
Guian Fang,
Renrui Zhang,
Peng Gao,
Hao Dong,
Dimitris Metaxas
Abstract:
Recent advancements in text-to-image models, particularly diffusion models, have shown significant promise. However, compositional text-to-image models frequently encounter difficulties in generating high-quality images that accurately align with input texts describing multiple objects, variable attributes, and intricate spatial relationships. To address this limitation, we employ large vision-lan…
▽ More
Recent advancements in text-to-image models, particularly diffusion models, have shown significant promise. However, compositional text-to-image models frequently encounter difficulties in generating high-quality images that accurately align with input texts describing multiple objects, variable attributes, and intricate spatial relationships. To address this limitation, we employ large vision-language models (LVLMs) for multi-dimensional assessment of the alignment between generated images and their corresponding input texts. Utilizing this assessment, we fine-tune the diffusion model to enhance its alignment capabilities. During the inference phase, an initial image is produced using the fine-tuned diffusion model. The LVLM is then employed to pinpoint areas of misalignment in the initial image, which are subsequently corrected using the image editing algorithm until no further misalignments are detected by the LVLM. The resultant image is consequently more closely aligned with the input text. Our experimental results validate that the proposed methodology significantly improves text-image alignment in compositional image generation, particularly with respect to object number, attribute binding, spatial relationships, and aesthetic quality.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
Spring-IMU Fusion Based Proprioception for Feedback Control of Soft Manipulators
Authors:
Yinan Meng,
Guoxin Fang,
Jiong Yang,
Yuhu Guo,
Charlie C. L. Wang
Abstract:
This paper presents a novel framework to realize proprioception and closed-loop control for soft manipulators. Deformations with large elongation and large bending can be precisely predicted using geometry-based sensor signals obtained from the inductive springs and the inertial measurement units (IMUs) with the help of machine learning techniques. Multiple geometric signals are fused into robust…
▽ More
This paper presents a novel framework to realize proprioception and closed-loop control for soft manipulators. Deformations with large elongation and large bending can be precisely predicted using geometry-based sensor signals obtained from the inductive springs and the inertial measurement units (IMUs) with the help of machine learning techniques. Multiple geometric signals are fused into robust pose estimations, and a data-efficient training process is achieved after applying the strategy of sim-to-real transfer. As a result, we can achieve proprioception that is robust to the variation of external loading and has an average error of 0.7% across the workspace on a pneumatic-driven soft manipulator. The realized proprioception on soft manipulator is then contributed to building a sensor-space based algorithm for closed-loop control. A gradient descent solver is developed to drive the end-effector to achieve the required poses by iteratively computing a sequence of reference sensor signals. A conventional controller is employed in the inner loop of our algorithm to update actuators (i.e., the pressures in chambers) for approaching a reference signal in the sensor-space. The systematic function of closed-loop control has been demonstrated in tasks like path following and pick-and-place under different external loads.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
Empirical Risk Minimization for Losses without Variance
Authors:
Guanhua Fang,
Ping Li,
Gennady Samorodnitsky
Abstract:
This paper considers an empirical risk minimization problem under heavy-tailed settings, where data does not have finite variance, but only has $p$-th moment with $p \in (1,2)$. Instead of using estimation procedure based on truncated observed data, we choose the optimizer by minimizing the risk value. Those risk values can be robustly estimated via using the remarkable Catoni's method (Catoni, 20…
▽ More
This paper considers an empirical risk minimization problem under heavy-tailed settings, where data does not have finite variance, but only has $p$-th moment with $p \in (1,2)$. Instead of using estimation procedure based on truncated observed data, we choose the optimizer by minimizing the risk value. Those risk values can be robustly estimated via using the remarkable Catoni's method (Catoni, 2012). Thanks to the structure of Catoni-type influence functions, we are able to establish excess risk upper bounds via using generalized generic chaining methods. Moreover, we take computational issues into consideration. We especially theoretically investigate two types of optimization methods, robust gradient descent algorithm and empirical risk-based methods. With an extensive numerical study, we find that the optimizer based on empirical risks via Catoni-style estimation indeed shows better performance than other baselines. It indicates that estimation directly based on truncated data may lead to unsatisfactory results.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
Learning Causality-inspired Representation Consistency for Video Anomaly Detection
Authors:
Yang Liu,
Zhaoyang Xia,
Mengyang Zhao,
Donglai Wei,
Yuzheng Wang,
Liu Siao,
Bobo Ju,
Gaoyun Fang,
Jing Liu,
Liang Song
Abstract:
Video anomaly detection is an essential yet challenging task in the multimedia community, with promising applications in smart cities and secure communities. Existing methods attempt to learn abstract representations of regular events with statistical dependence to model the endogenous normality, which discriminates anomalies by measuring the deviations to the learned distribution. However, conven…
▽ More
Video anomaly detection is an essential yet challenging task in the multimedia community, with promising applications in smart cities and secure communities. Existing methods attempt to learn abstract representations of regular events with statistical dependence to model the endogenous normality, which discriminates anomalies by measuring the deviations to the learned distribution. However, conventional representation learning is only a crude description of video normality and lacks an exploration of its underlying causality. The learned statistical dependence is unreliable for diverse regular events in the real world and may cause high false alarms due to overgeneralization. Inspired by causal representation learning, we think that there exists a causal variable capable of adequately representing the general patterns of regular events in which anomalies will present significant variations. Therefore, we design a causality-inspired representation consistency (CRC) framework to implicitly learn the unobservable causal variables of normality directly from available normal videos and detect abnormal events with the learned representation consistency. Extensive experiments show that the causality-inspired normality is robust to regular events with label-independent shifts, and the proposed CRC framework can quickly and accurately detect various complicated anomalies from real-world surveillance videos.
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
Copula for Instance-wise Feature Selection and Ranking
Authors:
Hanyu Peng,
Guanhua Fang,
Ping Li
Abstract:
Instance-wise feature selection and ranking methods can achieve a good selection of task-friendly features for each sample in the context of neural networks. However, existing approaches that assume feature subsets to be independent are imperfect when considering the dependency between features. To address this limitation, we propose to incorporate the Gaussian copula, a powerful mathematical tech…
▽ More
Instance-wise feature selection and ranking methods can achieve a good selection of task-friendly features for each sample in the context of neural networks. However, existing approaches that assume feature subsets to be independent are imperfect when considering the dependency between features. To address this limitation, we propose to incorporate the Gaussian copula, a powerful mathematical technique for capturing correlations between variables, into the current feature selection framework with no additional changes needed. Experimental results on both synthetic and real datasets, in terms of performance comparison and interpretability, demonstrate that our method is capable of capturing meaningful correlations.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
Vector Field Based Volume Peeling for Multi-Axis Machining
Authors:
Neelotpal Dutta,
Tianyu Zhang,
Guoxin Fang,
Ismail E. Yigit,
Charlie C. L. Wang
Abstract:
This paper presents an easy-to-control volume peeling method for multi-axis machining based on the computation taken on vector fields. The current scalar field based methods are not flexible and the vector-field based methods do not guarantee the satisfaction of the constraints in the final results. We first conduct an optimization formulation to compute an initial vector field that is well aligne…
▽ More
This paper presents an easy-to-control volume peeling method for multi-axis machining based on the computation taken on vector fields. The current scalar field based methods are not flexible and the vector-field based methods do not guarantee the satisfaction of the constraints in the final results. We first conduct an optimization formulation to compute an initial vector field that is well aligned with those anchor vectors specified by users according to different manufacturing requirements. The vector field is further optimized to be an irrotational field so that it can be completely realized by a scalar field's gradients. Iso-surfaces of the scalar field will be employed as the layers of working surfaces for multi-axis volume peeling in the rough machining. Algorithms are also developed to remove and process singularities of the fields. Our method has been tested on a variety of models and verified by physical experimental machining.
△ Less
Submitted 4 October, 2023; v1 submitted 1 August, 2023;
originally announced August 2023.
-
Evolution of Non-parametric Morphology of Galaxies in the JWST CEERS Field at $z\simeq$0.8-3.0
Authors:
Yao Yao,
Jie Song,
Xu Kong,
Guanwen Fang,
Hong-Xin Zhang,
Xinkai Chen
Abstract:
Galaxy morphology is one of the most fundamental ways to describe galaxy properties, but the morphology we observe may be affected by wavelength and spatial resolution, which may introduce systematic bias when comparing galaxies at different redshift. Taking advantage of the broad wavelength coverage from optical to near-IR and high resolution NIRCam instrument of JWST, we measure the non-parametr…
▽ More
Galaxy morphology is one of the most fundamental ways to describe galaxy properties, but the morphology we observe may be affected by wavelength and spatial resolution, which may introduce systematic bias when comparing galaxies at different redshift. Taking advantage of the broad wavelength coverage from optical to near-IR and high resolution NIRCam instrument of JWST, we measure the non-parametric morphological parameters of a total of 1376 galaxies at $z\simeq$0.8-3.0 in the CEERS field through an optimized code called {\tt\string statmorph\_csst}. We divide our sample into three redshift intervals and investigate the wavelength- and redshift-dependence of the morphological parameters. We also explore how the widely-used galaxy type classification methods based on the morphological parameters depend on wavelength and spatial resolution. We find that there are variations in all morphological parameters with rest-frame wavelength ($λ_{\rm rf}$), especially at the short wavelength end, and the $λ_{\rm rf}$ mainly affects the classification between late-type and early-type galaxy. As the $λ_{\rm rf}$ increases, the galaxies on the $G-M_{20}$ diagram move to the upper left with a slope of -0.23$\pm$0.03 on average. We find that spatial resolution mainly affects the merger identification. The merger fraction in F200W resolution can be $\ga$2 times larger than that in F444W resolution. Furthermore, We compare the morphological parameter evolution of galaxies with different stellar masses. We find that there are differences in the morphological evolution of high- and low-mass (log$M_*\geqslant$10 and 9$<$log$M_*<$10) galaxies in the studied redshift range, which may be caused by their different evolution paths.
△ Less
Submitted 26 July, 2023;
originally announced July 2023.
-
The Classification of Galaxy Morphology in H-band of COSMOS-DASH Field: a combination-based machine learning clustering model
Authors:
Yao Dai,
Jun Xu,
Jie Song,
Guanwen Fang,
Chichun Zhou,
Shuo Ba,
Yizhou Gu,
Zesen Lin,
Xu Kong
Abstract:
By applying our previously developed two-step scheme for galaxy morphology classification, we present a catalog of galaxy morphology for H-band selected massive galaxies in the COSMOS-DASH field, which includes 17292 galaxies with stellar mass $M_{\star}>10^{10}~M_{\odot}$ at $0.5<z<2.5$. The classification scheme is designed to provide a complete morphology classification for galaxies via a combi…
▽ More
By applying our previously developed two-step scheme for galaxy morphology classification, we present a catalog of galaxy morphology for H-band selected massive galaxies in the COSMOS-DASH field, which includes 17292 galaxies with stellar mass $M_{\star}>10^{10}~M_{\odot}$ at $0.5<z<2.5$. The classification scheme is designed to provide a complete morphology classification for galaxies via a combination of two machine-learning steps. We first use an unsupervised machine learning method (i.e., bagging-based multi-clustering) to cluster galaxies into five categories: spherical (SPH), early-type disk (ETD), late-type disk (LTD), irregular (IRR), and unclassified (UNC). About 48\% of galaxies (8258/17292) are successfully clustered during this step. For the remaining sample, we adopt a supervised machine learning method (i.e., GoogLeNet) to classify them, during which galaxies that are well-classified in the previous step are taken as our training set. Consequently, we obtain a morphology classification result for the full sample. The t-SNE test shows that galaxies in our sample can be well aggregated. We also measure the parametric and nonparametric morphologies of these galaxies. We find that the Sérsic index increases from IRR to SPH and the effective radius decreases from IRR to SPH, consistent with the corresponding definitions. Galaxies from different categories are separately distributed in the $G$--$M_{20}$ space. Such consistencies with other characteristic descriptions of galaxy morphology demonstrate the reliability of our classification result, ensuring that it can be used as a basic catalog for further galaxy studies.
△ Less
Submitted 6 July, 2023; v1 submitted 5 July, 2023;
originally announced July 2023.
-
A comparison of cosmological models with high-redshift quasars
Authors:
Liuyuan Fan,
Guanwen Fang,
Jian Hu
Abstract:
The non-linear relationship between the monochromatic X-ray and UV luminosities in quasars offers the possibility of using high-z quasars as standard candles for cosmological testing. In this paper, we use a high-quality catalog of 1598 quasars extending to redshift 6, to compare the flat and uniformly expanding cosmological model, $R_h$ = ct and $Λ$CDM cosmological models which are the most debat…
▽ More
The non-linear relationship between the monochromatic X-ray and UV luminosities in quasars offers the possibility of using high-z quasars as standard candles for cosmological testing. In this paper, we use a high-quality catalog of 1598 quasars extending to redshift 6, to compare the flat and uniformly expanding cosmological model, $R_h$ = ct and $Λ$CDM cosmological models which are the most debated. The quasar samples are mainly from the XMM-Newton and the Sloan Digital Sky Survey (SDSS). The final result is that the Akaike Information Criterion favors $Λ$CDM over $R_h$=ct with a relative probability of 86.30% versus 13.70%.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
Wise in Vaccine Allocation
Authors:
Baiqiao Yin,
Jiaqing Yuan,
Weichen Lv,
Jiehui Huang,
Guian Fang
Abstract:
The paper uses machine learning and mathematical modeling to predict future vaccine distribution and solve the problem of allocating vaccines to different types of hospitals. They collected data and analyzed it, finding factors such as nearby residents, transportation, and medical personnel that impact distribution. They used the results to create a model and allocate vaccines to central and commu…
▽ More
The paper uses machine learning and mathematical modeling to predict future vaccine distribution and solve the problem of allocating vaccines to different types of hospitals. They collected data and analyzed it, finding factors such as nearby residents, transportation, and medical personnel that impact distribution. They used the results to create a model and allocate vaccines to central and community hospitals and health centers in Hangzhou Gongshu District and Harbin Daoli District based on the model. They provide an explanation for the vaccine distribution based on their model and conclusions.
△ Less
Submitted 12 June, 2023;
originally announced June 2023.
-
LTCR: Long-Text Chinese Rumor Detection Dataset
Authors:
Ziyang Ma,
Mengsha Liu,
Guian Fang,
Ying Shen
Abstract:
False information can spread quickly on social media, negatively influencing the citizens' behaviors and responses to social events. To better detect all of the fake news, especially long texts which are harder to find completely, a Long-Text Chinese Rumor detection dataset named LTCR is proposed. The LTCR dataset provides a valuable resource for accurately detecting misinformation, especially in…
▽ More
False information can spread quickly on social media, negatively influencing the citizens' behaviors and responses to social events. To better detect all of the fake news, especially long texts which are harder to find completely, a Long-Text Chinese Rumor detection dataset named LTCR is proposed. The LTCR dataset provides a valuable resource for accurately detecting misinformation, especially in the context of complex fake news related to COVID-19. The dataset consists of 1,729 and 500 pieces of real and fake news, respectively. The average lengths of real and fake news are approximately 230 and 152 characters. We also propose \method, Salience-aware Fake News Detection Model, which achieves the highest accuracy (95.85%), fake news recall (90.91%) and F-score (90.60%) on the dataset. (https://github.com/Enderfga/DoubleCheck)
△ Less
Submitted 13 June, 2023; v1 submitted 12 June, 2023;
originally announced June 2023.
-
A Cover Time Study of a non-Markovian Algorithm
Authors:
Guanhua Fang,
Gennady Samorodnitsky,
Zhiqiang Xu
Abstract:
Given a traversal algorithm, cover time is the expected number of steps needed to visit all nodes in a given graph. A smaller cover time means a higher exploration efficiency of traversal algorithm. Although random walk algorithms have been studied extensively in the existing literature, there has been no cover time result for any non-Markovian method. In this work, we stand on a theoretical persp…
▽ More
Given a traversal algorithm, cover time is the expected number of steps needed to visit all nodes in a given graph. A smaller cover time means a higher exploration efficiency of traversal algorithm. Although random walk algorithms have been studied extensively in the existing literature, there has been no cover time result for any non-Markovian method. In this work, we stand on a theoretical perspective and show that the negative feedback strategy (a count-based exploration method) is better than the naive random walk search. In particular, the former strategy can locally improve the search efficiency for an arbitrary graph. It also achieves smaller cover times for special but important graphs, including clique graphs, tree graphs, etc. Moreover, we make connections between our results and reinforcement learning literature to give new insights on why classical UCB and MCTS algorithms are so useful. Various numerical results corroborate our theoretical findings.
△ Less
Submitted 11 August, 2023; v1 submitted 7 June, 2023;
originally announced June 2023.
-
RealignDiff: Boosting Text-to-Image Diffusion Model with Coarse-to-fine Semantic Re-alignment
Authors:
Guian Fang,
Zutao Jiang,
Jianhua Han,
Guansong Lu,
Hang Xu,
Shengcai Liao,
Xiaodan Liang
Abstract:
Recent advances in text-to-image diffusion models have achieved remarkable success in generating high-quality, realistic images from textual descriptions. However, these approaches have faced challenges in precisely aligning the generated visual content with the textual concepts described in the prompts. In this paper, we propose a two-stage coarse-to-fine semantic re-alignment method, named Reali…
▽ More
Recent advances in text-to-image diffusion models have achieved remarkable success in generating high-quality, realistic images from textual descriptions. However, these approaches have faced challenges in precisely aligning the generated visual content with the textual concepts described in the prompts. In this paper, we propose a two-stage coarse-to-fine semantic re-alignment method, named RealignDiff, aimed at improving the alignment between text and images in text-to-image diffusion models. In the coarse semantic re-alignment phase, a novel caption reward, leveraging the BLIP-2 model, is proposed to evaluate the semantic discrepancy between the generated image caption and the given text prompt. Subsequently, the fine semantic re-alignment stage employs a local dense caption generation module and a re-weighting attention modulation module to refine the previously generated images from a local semantic view. Experimental results on the MS-COCO benchmark demonstrate that the proposed two-stage coarse-to-fine semantic re-alignment method outperforms other baseline re-alignment techniques by a substantial margin in both visual quality and semantic similarity with the input prompt.
△ Less
Submitted 27 November, 2023; v1 submitted 31 May, 2023;
originally announced May 2023.
-
LLM-Pruner: On the Structural Pruning of Large Language Models
Authors:
Xinyin Ma,
Gongfan Fang,
Xinchao Wang
Abstract:
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation. However, such impressive capability typically comes with a substantial model size, which presents significant challenges in both the deployment, inference, and training stages. With LLM being a general-purpose task solver, we explore its compression in a task-agnostic manner, which aims to pre…
▽ More
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation. However, such impressive capability typically comes with a substantial model size, which presents significant challenges in both the deployment, inference, and training stages. With LLM being a general-purpose task solver, we explore its compression in a task-agnostic manner, which aims to preserve the multi-task solving and language generation ability of the original LLM. One challenge to achieving this is the enormous size of the training corpus of LLM, which makes both data transfer and model post-training over-burdensome. Thus, we tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset. Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures based on gradient information, maximally preserving the majority of the LLM's functionality. To this end, the performance of pruned models can be efficiently recovered through tuning techniques, LoRA, in merely 3 hours, requiring only 50K data. We validate the LLM-Pruner on three LLMs, including LLaMA, Vicuna, and ChatGLM, and demonstrate that the compressed models still exhibit satisfactory capabilities in zero-shot classification and generation. The code is available at: https://github.com/horseee/LLM-Pruner
△ Less
Submitted 27 September, 2023; v1 submitted 19 May, 2023;
originally announced May 2023.
-
Structural Pruning for Diffusion Models
Authors:
Gongfan Fang,
Xinyin Ma,
Xinchao Wang
Abstract:
Generative modeling has recently undergone remarkable advancements, primarily propelled by the transformative implications of Diffusion Probabilistic Models (DPMs). The impressive capability of these models, however, often entails significant computational overhead during both training and inference. To tackle this challenge, we present Diff-Pruning, an efficient compression method tailored for le…
▽ More
Generative modeling has recently undergone remarkable advancements, primarily propelled by the transformative implications of Diffusion Probabilistic Models (DPMs). The impressive capability of these models, however, often entails significant computational overhead during both training and inference. To tackle this challenge, we present Diff-Pruning, an efficient compression method tailored for learning lightweight diffusion models from pre-existing ones, without the need for extensive re-training. The essence of Diff-Pruning is encapsulated in a Taylor expansion over pruned timesteps, a process that disregards non-contributory diffusion steps and ensembles informative gradients to identify important weights. Our empirical assessment, undertaken across several datasets highlights two primary benefits of our proposed method: 1) Efficiency: it enables approximately a 50\% reduction in FLOPs at a mere 10\% to 20\% of the original training expenditure; 2) Consistency: the pruned diffusion models inherently preserve generative behavior congruent with their pre-trained models. Code is available at \url{https://github.com/VainF/Diff-Pruning}.
△ Less
Submitted 30 September, 2023; v1 submitted 18 May, 2023;
originally announced May 2023.
-
The effect of environment on the properties of the most massive galaxies at $0.5<z<2.5$ in the cosmos-dash field
Authors:
Jie Song,
Guanwen Fang,
Yizhou Gu,
Zesen Lin,
Xu Kong
Abstract:
How the environment influences the most massive galaxies is still unclear. To explore the environmental effects on morphology and star formation in the most massive galaxies at high redshift, we select galaxies with stellar mass $\log(M_{\star}/M_{\odot})>11$ at $0.5<z<2.5$ in the COSMOS-DASH field, which is the largest field with near-infrared photometrical observations using HST/WFC3 to date. Co…
▽ More
How the environment influences the most massive galaxies is still unclear. To explore the environmental effects on morphology and star formation in the most massive galaxies at high redshift, we select galaxies with stellar mass $\log(M_{\star}/M_{\odot})>11$ at $0.5<z<2.5$ in the COSMOS-DASH field, which is the largest field with near-infrared photometrical observations using HST/WFC3 to date. Combining with the newly published COSMOS2020 catalog, we estimate the localized galaxy overdensity using a density estimator within the Bayesian probability framework. With the overdensity map, no significant environmental dependence is found in the distributions of Sérsic index and effective radius. When we consider the star formation state, galaxies in lower density are found to have higher median specific star formation rate (sSFR) at $0.5<z<1.5$. But for star-forming galaxies only, sSFR is independent of the environment within the whole redshift range, indicating that the primary effect of the environment might be to control the quiescent fraction. Based on these observations, the possible environmental quenching process for these massive galaxies might be mergers.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
Evaluating ChatGPT's Information Extraction Capabilities: An Assessment of Performance, Explainability, Calibration, and Faithfulness
Authors:
Bo Li,
Gexiang Fang,
Yang Yang,
Quansen Wang,
Wei Ye,
Wen Zhao,
Shikun Zhang
Abstract:
The capability of Large Language Models (LLMs) like ChatGPT to comprehend user intent and provide reasonable responses has made them extremely popular lately. In this paper, we focus on assessing the overall ability of ChatGPT using 7 fine-grained information extraction (IE) tasks. Specially, we present the systematically analysis by measuring ChatGPT's performance, explainability, calibration, an…
▽ More
The capability of Large Language Models (LLMs) like ChatGPT to comprehend user intent and provide reasonable responses has made them extremely popular lately. In this paper, we focus on assessing the overall ability of ChatGPT using 7 fine-grained information extraction (IE) tasks. Specially, we present the systematically analysis by measuring ChatGPT's performance, explainability, calibration, and faithfulness, and resulting in 15 keys from either the ChatGPT or domain experts. Our findings reveal that ChatGPT's performance in Standard-IE setting is poor, but it surprisingly exhibits excellent performance in the OpenIE setting, as evidenced by human evaluation. In addition, our research indicates that ChatGPT provides high-quality and trustworthy explanations for its decisions. However, there is an issue of ChatGPT being overconfident in its predictions, which resulting in low calibration. Furthermore, ChatGPT demonstrates a high level of faithfulness to the original text in the majority of cases. We manually annotate and release the test sets of 7 fine-grained IE tasks contains 14 datasets to further promote the research. The datasets and code are available at https://github.com/pkuserc/ChatGPT_for_IE.
△ Less
Submitted 23 April, 2023;
originally announced April 2023.
-
DMMG: Dual Min-Max Games for Self-Supervised Skeleton-Based Action Recognition
Authors:
Shannan Guan,
Xin Yu,
Wei Huang,
Gengfa Fang,
Haiyan Lu
Abstract:
In this work, we propose a new Dual Min-Max Games (DMMG) based self-supervised skeleton action recognition method by augmenting unlabeled data in a contrastive learning framework. Our DMMG consists of a viewpoint variation min-max game and an edge perturbation min-max game. These two min-max games adopt an adversarial paradigm to perform data augmentation on the skeleton sequences and graph-struct…
▽ More
In this work, we propose a new Dual Min-Max Games (DMMG) based self-supervised skeleton action recognition method by augmenting unlabeled data in a contrastive learning framework. Our DMMG consists of a viewpoint variation min-max game and an edge perturbation min-max game. These two min-max games adopt an adversarial paradigm to perform data augmentation on the skeleton sequences and graph-structured body joints, respectively. Our viewpoint variation min-max game focuses on constructing various hard contrastive pairs by generating skeleton sequences from various viewpoints. These hard contrastive pairs help our model learn representative action features, thus facilitating model transfer to downstream tasks. Moreover, our edge perturbation min-max game specializes in building diverse hard contrastive samples through perturbing connectivity strength among graph-based body joints. The connectivity-strength varying contrastive pairs enable the model to capture minimal sufficient information of different actions, such as representative gestures for an action while preventing the model from overfitting. By fully exploiting the proposed DMMG, we can generate sufficient challenging contrastive pairs and thus achieve discriminative action feature representations from unlabeled skeleton data in a self-supervised manner. Extensive experiments demonstrate that our method achieves superior results under various evaluation protocols on widely-used NTU-RGB+D and NTU120-RGB+D datasets.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.