-
Dormancy and Reawakening Over Years: Eight New Recurrent Changing-Look AGNs
Authors:
Shu Wang,
Jong-Hak Woo,
Elena Gallo,
Donghoon Son,
Qian Yang,
Junjie Jin,
Hengxiao Guo,
Minzhi Kong
Abstract:
We report the discovery of eight new recurrent changing-look (CL) active galactic nuclei (AGNs), including seven re-brightening turn-off AGNs and one fading turn-on AGN. These systems are valuable for placing constraints on the duration of dim and bright states, which may be linked to the AGN duty cycle or disk instability. Long-term optical light curve analysis reveals that many objects in our sa…
▽ More
We report the discovery of eight new recurrent changing-look (CL) active galactic nuclei (AGNs), including seven re-brightening turn-off AGNs and one fading turn-on AGN. These systems are valuable for placing constraints on the duration of dim and bright states, which may be linked to the AGN duty cycle or disk instability. Long-term optical light curve analysis reveals that many objects in our sample exhibit a prolonged plateau during the dim states lasting 4 to 7 years, with gradual turn-on/off process. We observe no significant difference between the turn-on and turn-off timescales, and this timescale is broadly consistent with the heating/cooling front propagation timescale. The comparison between optical and infrared variations supports that these transitions are driven by changes in accretion disk emission rather than dust obscuration. Our discovery significantly increases the previously identified recurrent CL AGN sample from eleven objects to nineteen, demonstrating that some AGNs can enter dormancy and reawaken on timescales of a few years, which provides useful information for understanding AGN episodic accretion.
△ Less
Submitted 20 October, 2024;
originally announced October 2024.
-
Li$_{14}$Mn$_{2}$S$_{9}$ and Li$_{10}$Si$_{2}$S$_{9}$ as a pair of all-electrochem-active electrode and solid-state electrolyte with chemical compatibility and low interface resistance
Authors:
Qifan Yang,
Jing Xu,
Xiao Fu,
Jingchen Lian,
Liqi Wang,
Xuhe Gong,
Zibin Wang,
Ruijuan Xiao,
Hong Li
Abstract:
In solid-state batteries (SSBs), improving the physical contact at the electrode-electrolyte interface is essential for achieving better performance and durability. On the one hand, it is necessary to look for solid-state electrolytes (SSEs) with high ionic conductivity and no reaction with the electrode, on the other hand, to design the all-electrochem-active (AEA) electrodes that contain no SSEs…
▽ More
In solid-state batteries (SSBs), improving the physical contact at the electrode-electrolyte interface is essential for achieving better performance and durability. On the one hand, it is necessary to look for solid-state electrolytes (SSEs) with high ionic conductivity and no reaction with the electrode, on the other hand, to design the all-electrochem-active (AEA) electrodes that contain no SSEs and other non-active substances. In this work, we proposed a pair of AEA-electrode and SSE with the same structural framework and excellent interface compatibility, Li$_{14}$Mn$_{2}$S$_{9}$ and Li$_{10}$Si$_{2}$S$_{9}$, and confirmed the feasibility by ab-initio molecular dynamics (AIMD) simulations and machine learning interatomic potential based molecular dynamics (MLIP-based MD) simulations, providing a new approach to promote interfacial stability in SSBs.
△ Less
Submitted 20 October, 2024;
originally announced October 2024.
-
Advancing Large Language Model Attribution through Self-Improving
Authors:
Lei Huang,
Xiaocheng Feng,
Weitao Ma,
Liang Zhao,
Yuchun Fan,
Weihong Zhong,
Dongliang Xu,
Qing Yang,
Hongtao Liu,
Bing Qin
Abstract:
Teaching large language models (LLMs) to generate text with citations to evidence sources can mitigate hallucinations and enhance verifiability in information-seeking systems. However, improving this capability requires high-quality attribution data, which is costly and labor-intensive. Inspired by recent advances in self-improvement that enhance LLMs without manual annotation, we present START, a…
▽ More
Teaching large language models (LLMs) to generate text with citations to evidence sources can mitigate hallucinations and enhance verifiability in information-seeking systems. However, improving this capability requires high-quality attribution data, which is costly and labor-intensive. Inspired by recent advances in self-improvement that enhance LLMs without manual annotation, we present START, a Self-Taught AttRibuTion framework for iteratively improving the attribution capability of LLMs. First, to prevent models from stagnating due to initially insufficient supervision signals, START leverages the model to self-construct synthetic training data for warming up. To further self-improve the model's attribution ability, START iteratively utilizes fine-grained preference supervision signals constructed from its sampled responses to encourage robust, comprehensive, and attributable generation. Experiments on three open-domain question-answering datasets, covering long-form QA and multi-step reasoning, demonstrate significant performance gains of 25.13% on average without relying on human annotations and more advanced models. Further analysis reveals that START excels in aggregating information across multiple sources.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
From Babbling to Fluency: Evaluating the Evolution of Language Models in Terms of Human Language Acquisition
Authors:
Qiyuan Yang,
Pengda Wang,
Luke D. Plonsky,
Frederick L. Oswald,
Hanjie Chen
Abstract:
We examine the language capabilities of language models (LMs) from the critical perspective of human language acquisition. Building on classical language development theories, we propose a three-stage framework to assess the abilities of LMs, ranging from preliminary word understanding to complex grammar and complex logical reasoning. Using this framework, we evaluate the generative capacities of…
▽ More
We examine the language capabilities of language models (LMs) from the critical perspective of human language acquisition. Building on classical language development theories, we propose a three-stage framework to assess the abilities of LMs, ranging from preliminary word understanding to complex grammar and complex logical reasoning. Using this framework, we evaluate the generative capacities of LMs using methods from linguistic research. Results indicate that although recent LMs outperform earlier models in overall performance, their developmental trajectory does not strictly follow the path of human language acquisition. Notably, in generation tasks, LMs are more similar to human performance in areas where information is easier to extract from the corpus, such as average word length, clauses, and auxiliary verbs. Newer LMs did not exhibit significant progress in terms of specific dimensions, such as clauses and auxiliary verbs, where the variation across corpora is relatively limited. Register theory offers a plausible explanation for these observations, suggesting that the linguistic features of the training data have a substantial impact on the models' abilities.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
Multi-modal graph neural networks for localized off-grid weather forecasting
Authors:
Qidong Yang,
Jonathan Giezendanner,
Daniel Salles Civitarese,
Johannes Jakubik,
Eric Schmitt,
Anirban Chandra,
Jeremy Vila,
Detlef Hohl,
Chris Hill,
Campbell Watson,
Sherrie Wang
Abstract:
Urgent applications like wildfire management and renewable energy generation require precise, localized weather forecasts near the Earth's surface. However, weather forecast products from machine learning or numerical weather models are currently generated on a global regular grid, on which a naive interpolation cannot accurately reflect fine-grained weather patterns close to the ground. In this w…
▽ More
Urgent applications like wildfire management and renewable energy generation require precise, localized weather forecasts near the Earth's surface. However, weather forecast products from machine learning or numerical weather models are currently generated on a global regular grid, on which a naive interpolation cannot accurately reflect fine-grained weather patterns close to the ground. In this work, we train a heterogeneous graph neural network (GNN) end-to-end to downscale gridded forecasts to off-grid locations of interest. This multi-modal GNN takes advantage of local historical weather observations (e.g., wind, temperature) to correct the gridded weather forecast at different lead times towards locally accurate forecasts. Each data modality is modeled as a different type of node in the graph. Using message passing, the node at the prediction location aggregates information from its heterogeneous neighbor nodes. Experiments using weather stations across the Northeastern United States show that our model outperforms a range of data-driven and non-data-driven off-grid forecasting methods. Our approach demonstrates how the gap between global large-scale weather models and locally accurate predictions can be bridged to inform localized decision-making.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
Exploring Quantum Aspects of Dark Matter Axions and Dark Photons Transitions within a Resonant Cavity
Authors:
Ruifeng Zheng,
Puxian Wei,
Qiaoli Yang
Abstract:
When axionic dark matter interacts with a static magnetic field, it can convert into photons with energy near the axion's mass. Classical analysis shows that incorporating a resonant cavity significantly enhances this conversion rate, forming the basis for many experiments aimed at detecting dark matter axions. However, one question remains: Does the axion-photon conversion rate increase for a sin…
▽ More
When axionic dark matter interacts with a static magnetic field, it can convert into photons with energy near the axion's mass. Classical analysis shows that incorporating a resonant cavity significantly enhances this conversion rate, forming the basis for many experiments aimed at detecting dark matter axions. However, one question remains: Does the axion-photon conversion rate increase for a single axion-photon transition? Answering this issue could lead to optimizations in searching for axions by integrating quantum measurement techniques. In this paper, we demonstrate that at the quantum level, single axion-photon transitions are amplified by the cavity's quality factor $Q$; even if dark matter particles lack coherence, the conversion rate is still enhanced. Furthermore, the quantum perspective reveals an additional factor of $π/2$ in the transition rate compared to the classical result. Additionally, we also provide an analysis of the scenario involving dark photon dark matter.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
Disentangling data distribution for Federated Learning
Authors:
Xinyuan Zhao,
Hanlin Gu,
Lixin Fan,
Qiang Yang,
Yuxing Han
Abstract:
Federated Learning (FL) facilitates collaborative training of a global model whose performance is boosted by private data owned by distributed clients, without compromising data privacy. Yet the wide applicability of FL is hindered by entanglement of data distributions across different clients. This paper demonstrates for the first time that by disentangling data distributions FL can in principle…
▽ More
Federated Learning (FL) facilitates collaborative training of a global model whose performance is boosted by private data owned by distributed clients, without compromising data privacy. Yet the wide applicability of FL is hindered by entanglement of data distributions across different clients. This paper demonstrates for the first time that by disentangling data distributions FL can in principle achieve efficiencies comparable to those of distributed systems, requiring only one round of communication. To this end, we propose a novel FedDistr algorithm, which employs stable diffusion models to decouple and recover data distributions. Empirical results on the CIFAR100 and DomainNet datasets show that FedDistr significantly enhances model utility and efficiency in both disentangled and near-disentangled scenarios while ensuring privacy, outperforming traditional federated learning methods.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
Landau-based Schubert analysis
Authors:
Song He,
Xuhang Jiang,
Jiahao Liu,
Qinglin Yang
Abstract:
We revisit the conjectural method called Schubert analysis for generating the alphabet of symbol letters for Feynman integrals, which was based on geometries of intersecting lines associated with corresponding cut diagrams. We explain the effectiveness of this somewhat mysterious method by relating such geometries to the corresponding Landau singularities, which also amounts to ``uplifting" Landau…
▽ More
We revisit the conjectural method called Schubert analysis for generating the alphabet of symbol letters for Feynman integrals, which was based on geometries of intersecting lines associated with corresponding cut diagrams. We explain the effectiveness of this somewhat mysterious method by relating such geometries to the corresponding Landau singularities, which also amounts to ``uplifting" Landau singularities of a Feynman integral to its symbol letters. We illustrate this {\it Landau-based Schubert analysis} using various multi-loop Feynman integrals in four dimensions and present an automated {\ttfamily Mathematica} notebook for it. We then apply the method to a simplified problem of studying alphabets of physical quantities such as scattering amplitudes and form factors in planar ${\cal N}=4$ super-Yang-Mills. By focusing on a small set of Landau diagrams (as opposed to all relevant Feynman integrals), we show how this method nicely produces the two-loop alphabet of $n$-point MHV amplitudes and that of the $n=4$ MHV form factors. A byproduct of our analysis is an explicit representation of any symbol alphabet obtained this way as the union of various type-$A$ cluster algebras.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model
Authors:
Jiazhi Guan,
Quanwei Yang,
Kaisiyuan Wang,
Hang Zhou,
Shengyi He,
Zhiliang Xu,
Haocheng Feng,
Errui Ding,
Jingdong Wang,
Hongtao Xie,
Youjian Zhao,
Ziwei Liu
Abstract:
Recently, 2D speaking avatars have increasingly participated in everyday scenarios due to the fast development of facial animation techniques. However, most existing works neglect the explicit control of human bodies. In this paper, we propose to drive not only the faces but also the torso and gesture movements of a speaking figure. Inspired by recent advances in diffusion models, we propose the M…
▽ More
Recently, 2D speaking avatars have increasingly participated in everyday scenarios due to the fast development of facial animation techniques. However, most existing works neglect the explicit control of human bodies. In this paper, we propose to drive not only the faces but also the torso and gesture movements of a speaking figure. Inspired by recent advances in diffusion models, we propose the Motion-Enhanced Textural-Aware ModeLing for SpeaKing Avatar Reenactment (TALK-Act) framework, which enables high-fidelity avatar reenactment from only short footage of monocular video. Our key idea is to enhance the textural awareness with explicit motion guidance in diffusion modeling. Specifically, we carefully construct 2D and 3D structural information as intermediate guidance. While recent diffusion models adopt a side network for control information injection, they fail to synthesize temporally stable results even with person-specific fine-tuning. We propose a Motion-Enhanced Textural Alignment module to enhance the bond between driving and target signals. Moreover, we build a Memory-based Hand-Recovering module to help with the difficulties in hand-shape preserving. After pre-training, our model can achieve high-fidelity 2D avatar reenactment with only 30 seconds of person-specific data. Extensive experiments demonstrate the effectiveness and superiority of our proposed framework. Resources can be found at https://guanjz20.github.io/projects/TALK-Act.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
Model-Based Differentially Private Knowledge Transfer for Large Language Models
Authors:
Zhaomin Wu,
Jizhou Guo,
Junyi Hou,
Bingsheng He,
Lixin Fan,
Qiang Yang
Abstract:
As large language models (LLMs) become increasingly prevalent in web services, effectively leveraging domain-specific knowledge while ensuring privacy has become critical. Existing methods, such as retrieval-augmented generation (RAG) and differentially private data synthesis, often compromise either the utility of domain knowledge or the privacy of sensitive data, limiting their applicability in…
▽ More
As large language models (LLMs) become increasingly prevalent in web services, effectively leveraging domain-specific knowledge while ensuring privacy has become critical. Existing methods, such as retrieval-augmented generation (RAG) and differentially private data synthesis, often compromise either the utility of domain knowledge or the privacy of sensitive data, limiting their applicability in specialized domains. To address these challenges, we propose \textit{Llamdex}, a novel framework that integrates privacy-preserving, domain-specific models into LLMs. Our approach significantly enhances the accuracy of domain-specific tasks, achieving up to a 26\% improvement compared to existing methods under the same differential privacy constraints. Experimental results show that Llamdex not only improves the accuracy of LLM responses but also maintains comparable inference efficiency to the original LLM, highlighting its potential for real-world applications.
△ Less
Submitted 14 October, 2024;
originally announced October 2024.
-
Exploiting Memory-aware Q-distribution Prediction for Nuclear Fusion via Modern Hopfield Network
Authors:
Qingchuan Ma,
Shiao Wang,
Tong Zheng,
Xiaodong Dai,
Yifeng Wang,
Qingquan Yang,
Xiao Wang
Abstract:
This study addresses the critical challenge of predicting the Q-distribution in long-term stable nuclear fusion task, a key component for advancing clean energy solutions. We introduce an innovative deep learning framework that employs Modern Hopfield Networks to incorporate associative memory from historical shots. Utilizing a newly compiled dataset, we demonstrate the effectiveness of our approa…
▽ More
This study addresses the critical challenge of predicting the Q-distribution in long-term stable nuclear fusion task, a key component for advancing clean energy solutions. We introduce an innovative deep learning framework that employs Modern Hopfield Networks to incorporate associative memory from historical shots. Utilizing a newly compiled dataset, we demonstrate the effectiveness of our approach in enhancing Q-distribution prediction. The proposed method represents a significant advancement by leveraging historical memory information for the first time in this context, showcasing improved prediction accuracy and contributing to the optimization of nuclear fusion research.
△ Less
Submitted 11 October, 2024;
originally announced October 2024.
-
Multi-modal Fusion based Q-distribution Prediction for Controlled Nuclear Fusion
Authors:
Shiao Wang,
Yifeng Wang,
Qingchuan Ma,
Xiao Wang,
Ning Yan,
Qingquan Yang,
Guosheng Xu,
Jin Tang
Abstract:
Q-distribution prediction is a crucial research direction in controlled nuclear fusion, with deep learning emerging as a key approach to solving prediction challenges. In this paper, we leverage deep learning techniques to tackle the complexities of Q-distribution prediction. Specifically, we explore multimodal fusion methods in computer vision, integrating 2D line image data with the original 1D…
▽ More
Q-distribution prediction is a crucial research direction in controlled nuclear fusion, with deep learning emerging as a key approach to solving prediction challenges. In this paper, we leverage deep learning techniques to tackle the complexities of Q-distribution prediction. Specifically, we explore multimodal fusion methods in computer vision, integrating 2D line image data with the original 1D data to form a bimodal input. Additionally, we employ the Transformer's attention mechanism for feature extraction and the interactive fusion of bimodal information. Extensive experiments validate the effectiveness of our approach, significantly reducing prediction errors in Q-distribution.
△ Less
Submitted 11 October, 2024;
originally announced October 2024.
-
MMLF: Multi-modal Multi-class Late Fusion for Object Detection with Uncertainty Estimation
Authors:
Qihang Yang,
Yang Zhao,
Hong Cheng
Abstract:
Autonomous driving necessitates advanced object detection techniques that integrate information from multiple modalities to overcome the limitations associated with single-modal approaches. The challenges of aligning diverse data in early fusion and the complexities, along with overfitting issues introduced by deep fusion, underscore the efficacy of late fusion at the decision level. Late fusion e…
▽ More
Autonomous driving necessitates advanced object detection techniques that integrate information from multiple modalities to overcome the limitations associated with single-modal approaches. The challenges of aligning diverse data in early fusion and the complexities, along with overfitting issues introduced by deep fusion, underscore the efficacy of late fusion at the decision level. Late fusion ensures seamless integration without altering the original detector's network structure. This paper introduces a pioneering Multi-modal Multi-class Late Fusion method, designed for late fusion to enable multi-class detection. Fusion experiments conducted on the KITTI validation and official test datasets illustrate substantial performance improvements, presenting our model as a versatile solution for multi-modal object detection in autonomous driving. Moreover, our approach incorporates uncertainty analysis into the classification fusion process, rendering our model more transparent and trustworthy and providing more reliable insights into category predictions.
△ Less
Submitted 11 October, 2024;
originally announced October 2024.
-
FedL2G: Learning to Guide Local Training in Heterogeneous Federated Learning
Authors:
Jianqing Zhang,
Yang Liu,
Yang Hua,
Jian Cao,
Qiang Yang
Abstract:
Data and model heterogeneity are two core issues in Heterogeneous Federated Learning (HtFL). In scenarios with heterogeneous model architectures, aggregating model parameters becomes infeasible, leading to the use of prototypes (i.e., class representative feature vectors) for aggregation and guidance. However, they still experience a mismatch between the extra guiding objective and the client's or…
▽ More
Data and model heterogeneity are two core issues in Heterogeneous Federated Learning (HtFL). In scenarios with heterogeneous model architectures, aggregating model parameters becomes infeasible, leading to the use of prototypes (i.e., class representative feature vectors) for aggregation and guidance. However, they still experience a mismatch between the extra guiding objective and the client's original local objective when aligned with global prototypes. Thus, we propose a Federated Learning-to-Guide (FedL2G) method that adaptively learns to guide local training in a federated manner and ensures the extra guidance is beneficial to clients' original tasks. With theoretical guarantees, FedL2G efficiently implements the learning-to-guide process using only first-order derivatives w.r.t. model parameters and achieves a non-convex convergence rate of O(1/T). We conduct extensive experiments on two data heterogeneity and six model heterogeneity settings using 14 heterogeneous model architectures (e.g., CNNs and ViTs) to demonstrate FedL2G's superior performance compared to six counterparts.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
Towards Ultra-Low-Power Neuromorphic Speech Enhancement with Spiking-FullSubNet
Authors:
Xiang Hao,
Chenxiang Ma,
Qu Yang,
Jibin Wu,
Kay Chen Tan
Abstract:
Speech enhancement is critical for improving speech intelligibility and quality in various audio devices. In recent years, deep learning-based methods have significantly improved speech enhancement performance, but they often come with a high computational cost, which is prohibitive for a large number of edge devices, such as headsets and hearing aids. This work proposes an ultra-low-power speech…
▽ More
Speech enhancement is critical for improving speech intelligibility and quality in various audio devices. In recent years, deep learning-based methods have significantly improved speech enhancement performance, but they often come with a high computational cost, which is prohibitive for a large number of edge devices, such as headsets and hearing aids. This work proposes an ultra-low-power speech enhancement system based on the brain-inspired spiking neural network (SNN) called Spiking-FullSubNet. Spiking-FullSubNet follows a full-band and sub-band fusioned approach to effectively capture both global and local spectral information. To enhance the efficiency of computationally expensive sub-band modeling, we introduce a frequency partitioning method inspired by the sensitivity profile of the human peripheral auditory system. Furthermore, we introduce a novel spiking neuron model that can dynamically control the input information integration and forgetting, enhancing the multi-scale temporal processing capability of SNN, which is critical for speech denoising. Experiments conducted on the recent Intel Neuromorphic Deep Noise Suppression (N-DNS) Challenge dataset show that the Spiking-FullSubNet surpasses state-of-the-art methods by large margins in terms of both speech quality and energy efficiency metrics. Notably, our system won the championship of the Intel N-DNS Challenge (Algorithmic Track), opening up a myriad of opportunities for ultra-low-power speech enhancement at the edge. Our source code and model checkpoints are publicly available at https://github.com/haoxiangsnr/spiking-fullsubnet.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
GlobeSumm: A Challenging Benchmark Towards Unifying Multi-lingual, Cross-lingual and Multi-document News Summarization
Authors:
Yangfan Ye,
Xiachong Feng,
Xiaocheng Feng,
Weitao Ma,
Libo Qin,
Dongliang Xu,
Qing Yang,
Hongtao Liu,
Bing Qin
Abstract:
News summarization in today's global scene can be daunting with its flood of multilingual content and varied viewpoints from different sources. However, current studies often neglect such real-world scenarios as they tend to focus solely on either single-language or single-document tasks. To bridge this gap, we aim to unify Multi-lingual, Cross-lingual and Multi-document Summarization into a novel…
▽ More
News summarization in today's global scene can be daunting with its flood of multilingual content and varied viewpoints from different sources. However, current studies often neglect such real-world scenarios as they tend to focus solely on either single-language or single-document tasks. To bridge this gap, we aim to unify Multi-lingual, Cross-lingual and Multi-document Summarization into a novel task, i.e., MCMS, which encapsulates the real-world requirements all-in-one. Nevertheless, the lack of a benchmark inhibits researchers from adequately studying this invaluable problem. To tackle this, we have meticulously constructed the GLOBESUMM dataset by first collecting a wealth of multilingual news reports and restructuring them into event-centric format. Additionally, we introduce the method of protocol-guided prompting for high-quality and cost-effective reference annotation. In MCMS, we also highlight the challenge of conflicts between news reports, in addition to the issues of redundancies and omissions, further enhancing the complexity of GLOBESUMM. Through extensive experimental analysis, we validate the quality of our dataset and elucidate the inherent challenges of the task. We firmly believe that GLOBESUMM, given its challenging nature, will greatly contribute to the multilingual communities and the evaluation of LLMs.
△ Less
Submitted 5 October, 2024;
originally announced October 2024.
-
Peetre conjecture on real interpolation spaces of Besov spaces and Grid K functional
Authors:
Qixiang Yang,
Haibo Yang,
Bin Zou,
Jianxun He
Abstract:
In this paper, Peetre's conjecture about the real interpolation space of Besov space {\bf is solved completely } by using the classification of vertices of cuboids defined by {\bf wavelet coefficients and wavelet's grid structure}. Littlewood-Paley analysis provides only a decomposition of the function on the ring. We extend Lorentz's rearrangement function and Hunt's Marcinkiewicz interpolation t…
▽ More
In this paper, Peetre's conjecture about the real interpolation space of Besov space {\bf is solved completely } by using the classification of vertices of cuboids defined by {\bf wavelet coefficients and wavelet's grid structure}. Littlewood-Paley analysis provides only a decomposition of the function on the ring. We extend Lorentz's rearrangement function and Hunt's Marcinkiewicz interpolation theorem to more general cases. We use the method of calculating the topological quantity of the grid to replace the traditional methods of data classification such as gradient descent method and distributed algorithm.
We developed a series of new techniques to solve this longstanding open problem. These skills make up for the deficiency of Lions-Peetre iterative theorem in dealing with strong nonlinearity. Using the properties of wavelet basis, a series of {\bf functional nonlinearities} are studied. Using the lattice property of wavelet, we study the lattice topology. By three kinds of {\bf topology nonlinearities}, we give the specific wavelet expression of K functional.
△ Less
Submitted 12 September, 2024;
originally announced October 2024.
-
Interpolation spaces of Besov hierarchical spaces and non-linearities defined by vertex K functional and grid topology
Authors:
Qixiang Yang,
Haibo Yang
Abstract:
In 1967, Peetre proposed to give a precise description of the real interpolation space for Besov hierarchical spaces $l^{s,q}(A)$. In 1974, Cwikel proved that the Lions-Peetre formula for $(l^{q_0}(A_0), l^{q_1}(A_1))_{θ,r}$ have no reasonable generalization for any $r\neq q$. In this paper, we apply wavelets to transform the study of real interpolation space into {\bf the study of nonlinear funct…
▽ More
In 1967, Peetre proposed to give a precise description of the real interpolation space for Besov hierarchical spaces $l^{s,q}(A)$. In 1974, Cwikel proved that the Lions-Peetre formula for $(l^{q_0}(A_0), l^{q_1}(A_1))_{θ,r}$ have no reasonable generalization for any $r\neq q$. In this paper, we apply wavelets to transform the study of real interpolation space into {\bf the study of nonlinear functional structure and nonlinear topological structure}. We solve completely Peetre's longstanding open problem.
△ Less
Submitted 12 September, 2024;
originally announced October 2024.
-
Extending Context Window of Large Language Models from a Distributional Perspective
Authors:
Yingsheng Wu,
Yuxuan Gu,
Xiaocheng Feng,
Weihong Zhong,
Dongliang Xu,
Qing Yang,
Hongtao Liu,
Bing Qin
Abstract:
Scaling the rotary position embedding (RoPE) has become a common method for extending the context window of RoPE-based large language models (LLMs). However, existing scaling methods often rely on empirical approaches and lack a profound understanding of the internal distribution within RoPE, resulting in suboptimal performance in extending the context window length. In this paper, we propose to o…
▽ More
Scaling the rotary position embedding (RoPE) has become a common method for extending the context window of RoPE-based large language models (LLMs). However, existing scaling methods often rely on empirical approaches and lack a profound understanding of the internal distribution within RoPE, resulting in suboptimal performance in extending the context window length. In this paper, we propose to optimize the context window extending task from the view of rotary angle distribution. Specifically, we first estimate the distribution of the rotary angles within the model and analyze the extent to which length extension perturbs this distribution. Then, we present a novel extension strategy that minimizes the disturbance between rotary angle distributions to maintain consistency with the pre-training phase, enhancing the model's capability to generalize to longer sequences. Experimental results compared to the strong baseline methods demonstrate that our approach reduces by up to 72% of the distributional disturbance when extending LLaMA2's context window to 8k, and reduces by up to 32% when extending to 16k. On the LongBench-E benchmark, our method achieves an average improvement of up to 4.33% over existing state-of-the-art methods. Furthermore, Our method maintains the model's performance on the Hugging Face Open LLM benchmark after context window extension, with only an average performance fluctuation ranging from -0.12 to +0.22.
△ Less
Submitted 3 October, 2024; v1 submitted 2 October, 2024;
originally announced October 2024.
-
A simple emulator that enables interpretation of parameter-output relationships, applied to two climate model PPEs
Authors:
Qingyuan Yang,
Gregory S Elsaesser,
Marcus Van Lier-Walqui,
Trude Eidhammer
Abstract:
We present a new additive method, nicknamed sage for Simplified Additive Gaussian processes Emulator, to emulate climate model Perturbed Parameter Ensembles (PPEs). It estimates the value of a climate model output as the sum of additive terms. Each additive term is the mean of a Gaussian Process, and corresponds to the impact of a parameter or parameter group on the variable of interest. This desi…
▽ More
We present a new additive method, nicknamed sage for Simplified Additive Gaussian processes Emulator, to emulate climate model Perturbed Parameter Ensembles (PPEs). It estimates the value of a climate model output as the sum of additive terms. Each additive term is the mean of a Gaussian Process, and corresponds to the impact of a parameter or parameter group on the variable of interest. This design caters to the sparsity of PPEs which are characterized by limited ensemble members and high dimensionality of the parameter space. sage quantifies the variability explained by different parameters and parameter groups, providing additional insights on the parameter-climate model output relationship. We apply the method to two climate model PPEs and compare it to a fully connected Neural Network. The two methods have comparable performance with both PPEs, but sage provides insights on parameter and parameter group importance as well as diagnostics useful for optimizing PPE design. Insights gained are valid regardless of the emulator method used, and have not been previously addressed. Our work highlights that analyzing the PPE used to train an emulator is different from analyzing data generated from an emulator trained on the PPE, as the former provides more insights on the data structure in the PPE which could help inform the emulator design.
△ Less
Submitted 8 October, 2024; v1 submitted 30 September, 2024;
originally announced October 2024.
-
The Future of HCI-Policy Collaboration
Authors:
Qian Yang,
Richmond Y Wong,
Steven J Jackson,
Sabine Junginger,
Margaret D Hagan,
Thomas Gilbert,
John Zimmerman
Abstract:
Policies significantly shape computation's societal impact, a crucial HCI concern. However, challenges persist when HCI professionals attempt to integrate policy into their work or affect policy outcomes. Prior research considered these challenges at the ``border'' of HCI and policy. This paper asks: What if HCI considers policy integral to its intellectual concerns, placing system-people-policy i…
▽ More
Policies significantly shape computation's societal impact, a crucial HCI concern. However, challenges persist when HCI professionals attempt to integrate policy into their work or affect policy outcomes. Prior research considered these challenges at the ``border'' of HCI and policy. This paper asks: What if HCI considers policy integral to its intellectual concerns, placing system-people-policy interaction not at the border but nearer the center of HCI research, practice, and education? What if HCI fosters a mosaic of methods and knowledge contributions that blend system, human, and policy expertise in various ways, just like HCI has done with blending system and human expertise? We present this re-imagined HCI-policy relationship as a provocation and highlight its usefulness: It spotlights previously overlooked system-people-policy interaction work in HCI. It unveils new opportunities for HCI's futuring, empirical, and design projects. It allows HCI to coordinate its diverse policy engagements, enhancing its collective impact on policy outcomes.
△ Less
Submitted 29 September, 2024;
originally announced September 2024.
-
Programming on Bitcoin: A Survey of Layer 1 and Layer 2 Technologies in Bitcoin Ecosystem
Authors:
Guofu Liao,
Taotao Wang,
Qing Yang,
Yihan Xia,
Long Shi,
Xiang Zhao,
Xiaoxiao Wu,
Shengli Zhang,
Anthony Chan,
Richard Yuen
Abstract:
This paper surveys innovative protocols that enhance the programming functionality of the Bitcoin blockchain, a key part of the "Bitcoin Ecosystem." Bitcoin utilizes the Unspent Transaction Output (UTXO) model and a stack-based script language for efficient peer-to-peer payments, but it faces limitations in programming capability and throughput. The 2021 Taproot upgrade introduced the Schnorr sign…
▽ More
This paper surveys innovative protocols that enhance the programming functionality of the Bitcoin blockchain, a key part of the "Bitcoin Ecosystem." Bitcoin utilizes the Unspent Transaction Output (UTXO) model and a stack-based script language for efficient peer-to-peer payments, but it faces limitations in programming capability and throughput. The 2021 Taproot upgrade introduced the Schnorr signature algorithm and P2TR transaction type, significantly improving Bitcoin's privacy and programming capabilities. This upgrade has led to the development of protocols like Ordinals, Atomicals, and BitVM, which enhance Bitcoin's programming functionality and enrich its ecosystem. We explore the technical aspects of the Taproot upgrade and examine Bitcoin Layer 1 protocols that leverage Taproot's features to program non-fungible tokens (NFTs) into transactions, including Ordinals and Atomicals, along with the fungible token standards BRC-20 and ARC-20.
Additionally, we categorize certain Bitcoin ecosystem protocols as Layer 2 solutions similar to Ethereum's, analyzing their impact on Bitcoin's performance. By analyzing data from the Bitcoin blockchain, we gather metrics on block capacity, miner fees, and the growth of Taproot transactions. Our findings confirm the positive effects of these protocols on Bitcoin's mainnet, bridging gaps in the literature regarding Bitcoin's programming capabilities and ecosystem protocols and providing valuable insights for practitioners and researchers.
△ Less
Submitted 29 September, 2024;
originally announced September 2024.
-
Theory of Pressure Dependence of Superconductivity in Bilayer Nickelate La$_3$Ni$_2$O$_{7}$
Authors:
Kai-Yue Jiang,
Yu-Han Cao,
Qing-Geng Yang,
Hong-Yan Lu,
Qiang-Hua Wang
Abstract:
The recent experiment shows the superconducting transition temperature in the Ruddlesden-Popper bilayer La$_3$Ni$_2$O$_{7}$ decreases monotonically with increasing pressure above 14 GPa. In order to unravel the underlying mechanism for this unusual dependence, we performed theoretical investigations by combining the density functional theory (DFT) and the unbiased functional renormalization group…
▽ More
The recent experiment shows the superconducting transition temperature in the Ruddlesden-Popper bilayer La$_3$Ni$_2$O$_{7}$ decreases monotonically with increasing pressure above 14 GPa. In order to unravel the underlying mechanism for this unusual dependence, we performed theoretical investigations by combining the density functional theory (DFT) and the unbiased functional renormalization group (FRG). Our DFT calculations show that the Fermi pockets are essentially unchanged with increasing pressure (above 14 GPa), but the bandwidth is enlarged, and particularly the interlayer hopping integral between the nickel $3d_{3z^2-r^2}$ orbitals is enhanced. From the DFT band structure, we construct the bilayer tight-binding model in terms of the nickel $3d_{3z^2-r^2}$ and $3d_{x^2-y^2}$ orbitals. On this basis, we investigate the superconductivity induced by correlation effects by FRG calculations. We find consistently $s_\pm$-wave pairing triggered by spin fluctuations, but the latter are weakened by pressure and lead to a decreasing transition temperature versus pressure, in qualitatively agreement with the experiment. We emphasize that the itinerancy of the $d$-orbitals is important and captured naturally in our FRG calculations, and we argue that the unusual pressure dependence would be unnatural, if not impossible, in the otherwise local-moment picture of the nickel $d$-orbitals. This sheds lights on the pertinent microscopic description of, and more importantly the mechanism of superconductivity in La$_3$Ni$_2$O$_{7}$.
△ Less
Submitted 26 September, 2024;
originally announced September 2024.
-
BioZero: An Efficient and Privacy-Preserving Decentralized Biometric Authentication Protocol on Open Blockchain
Authors:
Junhao Lai,
Taotao Wang,
Shengli Zhang,
Qing Yang,
Soung Chang Liew
Abstract:
Digital identity plays a vital role in enabling secure access to resources and services in the digital world. Traditional identity authentication methods, such as password-based and biometric authentications, have limitations in terms of security, privacy, and scalability. Decentralized authentication approaches leveraging blockchain technology have emerged as a promising solution. However, existi…
▽ More
Digital identity plays a vital role in enabling secure access to resources and services in the digital world. Traditional identity authentication methods, such as password-based and biometric authentications, have limitations in terms of security, privacy, and scalability. Decentralized authentication approaches leveraging blockchain technology have emerged as a promising solution. However, existing decentralized authentication methods often rely on indirect identity verification (e.g. using passwords or digital signatures as authentication credentials) and face challenges such as Sybil attacks. In this paper, we propose BioZero, an efficient and privacy-preserving decentralized biometric authentication protocol that can be implemented on open blockchain. BioZero leverages Pedersen commitment and homomorphic computation to protect user biometric privacy while enabling efficient verification. We enhance the protocol with non-interactive homomorphic computation and employ zero-knowledge proofs for secure on-chain verification. The unique aspect of BioZero is that it is fully decentralized and can be executed by blockchain smart contracts in a very efficient way. We analyze the security of BioZero and validate its performance through a prototype implementation. The results demonstrate the effectiveness, efficiency, and security of BioZero in decentralized authentication scenarios. Our work contributes to the advancement of decentralized identity authentication using biometrics.
△ Less
Submitted 25 September, 2024;
originally announced September 2024.
-
SEE: Semantically Aligned EEG-to-Text Translation
Authors:
Yitian Tao,
Yan Liang,
Luoyu Wang,
Yongqing Li,
Qing Yang,
Han Zhang
Abstract:
Decoding neurophysiological signals into language is of great research interest within brain-computer interface (BCI) applications. Electroencephalography (EEG), known for its non-invasiveness, ease of use, and cost-effectiveness, has been a popular method in this field. However, current EEG-to-Text decoding approaches face challenges due to the huge domain gap between EEG recordings and raw texts…
▽ More
Decoding neurophysiological signals into language is of great research interest within brain-computer interface (BCI) applications. Electroencephalography (EEG), known for its non-invasiveness, ease of use, and cost-effectiveness, has been a popular method in this field. However, current EEG-to-Text decoding approaches face challenges due to the huge domain gap between EEG recordings and raw texts, inherent data bias, and small closed vocabularies. In this paper, we propose SEE: Semantically Aligned EEG-to-Text Translation, a novel method aimed at improving EEG-to-Text decoding by seamlessly integrating two modules into a pre-trained BART language model. These two modules include (1) a Cross-Modal Codebook that learns cross-modal representations to enhance feature consolidation and mitigate domain gap, and (2) a Semantic Matching Module that fully utilizes pre-trained text representations to align multi-modal features extracted from EEG-Text pairs while considering noise caused by false negatives, i.e., data from different EEG-Text pairs that have similar semantic meanings. Experimental results on the Zurich Cognitive Language Processing Corpus (ZuCo) demonstrate the effectiveness of SEE, which enhances the feasibility of accurate EEG-to-Text decoding.
△ Less
Submitted 14 September, 2024;
originally announced September 2024.
-
High-fidelity near-diffraction-limited projection through scattering with reference-less transmission matrix
Authors:
Jingshan Zhong,
Quanzhi Li,
Zhong Wen,
Qilin Deng,
Haonan Zhang,
Weizheng Jin,
Qing Yang
Abstract:
Image projection through scattering media has applications ranging from light delivery through multimode fiber to near-eye displays. Conventional methods utilize the transmission matrix (TM) measured by interfering with a reference beam. However, it is noise-sensitive, often resulting in artifacts that degrade the projection quality. Here we propose to characterize the scattering by computationall…
▽ More
Image projection through scattering media has applications ranging from light delivery through multimode fiber to near-eye displays. Conventional methods utilize the transmission matrix (TM) measured by interfering with a reference beam. However, it is noise-sensitive, often resulting in artifacts that degrade the projection quality. Here we propose to characterize the scattering by computationally retrieving TM from intensity-only measurements and solve the projection problem formulated with the retrieved TM by optimization. We experimentally validate the proposed method by projecting through a multimode fiber. Compared to the conventional methods, it projects improved-quality images with resolution near to the diffraction limit, and simplifies the experimental setup by eliminating the reference. It paves the way for applications of high-quality near-diffraction-limited projection through scattering.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
Prior Knowledge Distillation Network for Face Super-Resolution
Authors:
Qiu Yang,
Xiao Sun,
Xin-yu Li,
Feng-Qi Cui,
Yu-Tong Guo,
Shuang-Zhen Hu,
Ping Luo,
Si-Ying Li
Abstract:
The purpose of face super-resolution (FSR) is to reconstruct high-resolution (HR) face images from low-resolution (LR) inputs. With the continuous advancement of deep learning technologies, contemporary prior-guided FSR methods initially estimate facial priors and then use this information to assist in the super-resolution reconstruction process. However, ensuring the accuracy of prior estimation…
▽ More
The purpose of face super-resolution (FSR) is to reconstruct high-resolution (HR) face images from low-resolution (LR) inputs. With the continuous advancement of deep learning technologies, contemporary prior-guided FSR methods initially estimate facial priors and then use this information to assist in the super-resolution reconstruction process. However, ensuring the accuracy of prior estimation remains challenging, and straightforward cascading and convolutional operations often fail to fully leverage prior knowledge. Inaccurate or insufficiently utilized prior information inevitably degrades FSR performance. To address this issue, we propose a prior knowledge distillation network (PKDN) for FSR, which involves transferring prior information from the teacher network to the student network. This approach enables the network to learn priors during the training stage while relying solely on low-resolution facial images during the testing stage, thus mitigating the adverse effects of prior estimation inaccuracies. Additionally, we incorporate robust attention mechanisms to design a parsing map fusion block that effectively utilizes prior information. To prevent feature loss, we retain multi-scale features during the feature extraction stage and employ them in the subsequent super-resolution reconstruction process. Experimental results on benchmark datasets demonstrate that our PKDN approach surpasses existing FSR methods in generating high-quality face images.
△ Less
Submitted 22 September, 2024;
originally announced September 2024.
-
CFSP: An Efficient Structured Pruning Framework for LLMs with Coarse-to-Fine Activation Information
Authors:
Yuxin Wang,
Minghua Ma,
Zekun Wang,
Jingchang Chen,
Huiming Fan,
Liping Shan,
Qing Yang,
Dongliang Xu,
Ming Liu,
Bing Qin
Abstract:
The colossal parameters and computational overhead of Large Language Models (LLMs) challenge their real-world applications. Network pruning, which targets unstructured or structured sparsity by removing redundant parameters, has recently been explored for LLM acceleration. Existing LLM pruning works focus on unstructured pruning, which typically requires special hardware support for a practical sp…
▽ More
The colossal parameters and computational overhead of Large Language Models (LLMs) challenge their real-world applications. Network pruning, which targets unstructured or structured sparsity by removing redundant parameters, has recently been explored for LLM acceleration. Existing LLM pruning works focus on unstructured pruning, which typically requires special hardware support for a practical speed-up. In contrast, structured pruning can reduce latency on general devices. However, it remains a challenge to perform structured pruning efficiently and maintain performance, especially at high sparsity ratios. To this end, we introduce an efficient structured pruning framework named CFSP, which leverages both Coarse (interblock) and Fine-grained (intrablock) activation information as an importance criterion to guide pruning. The pruning is highly efficient, as it only requires one forward pass to compute feature activations. Specifically, we first allocate the sparsity budget across blocks based on their importance and then retain important weights within each block. In addition, we introduce a recovery fine-tuning strategy that adaptively allocates training overhead based on coarse-grained importance to further improve performance. Experimental results demonstrate that CFSP outperforms existing methods on diverse models across various sparsity budgets. Our code will be available at https://github.com/wyxscir/CFSP.
△ Less
Submitted 20 September, 2024;
originally announced September 2024.
-
An adapted large language model facilitates multiple medical tasks in diabetes care
Authors:
Lai Wei,
Zhen Ying,
Muyang He,
Yutong Chen,
Qian Yang,
Yanzhe Hong,
Jiaping Lu,
Xiaoying Li,
Weiran Huang,
Ying Chen
Abstract:
Diabetes is a chronic disease that poses a significant global health burden, and optimizing diabetes management requires multi-stakeholder collaboration. Large language models (LLMs) have shown promise in various healthcare scenarios, but their effectiveness across a diverse range of diabetes tasks remains unproven. In this study, we introduced a framework to train and validate diabetes-specific L…
▽ More
Diabetes is a chronic disease that poses a significant global health burden, and optimizing diabetes management requires multi-stakeholder collaboration. Large language models (LLMs) have shown promise in various healthcare scenarios, but their effectiveness across a diverse range of diabetes tasks remains unproven. In this study, we introduced a framework to train and validate diabetes-specific LLMs. We first developed a comprehensive data processing pipeline that includes data collection, filtering, augmentation and refinement. This approach contributes to creating a high-quality, diabetes-specific dataset, and several evaluation benchmarks entirely from scratch. Utilizing the collected training dataset, we fine-tuned a diabetes-specific LLM family that demonstrated state-of-the-art proficiency in understanding and processing various diabetes tasks compared to other LLMs. Furthermore, clinical studies showed the potential applications of our models in diabetes care, including providing personalized healthcare, assisting medical education, and streamlining clinical tasks. In conclusion, our study introduced a framework to develop and evaluate a diabetes-specific LLM family, and highlighted its potential to enhance clinical practice and provide personalized, data-driven support for diabetes support when facing different end users. The code is provided via GitHub at https://github.com/waltonfuture/Diabetica.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation
Authors:
Finn Lukas Busch,
Timon Homberger,
Jesús Ortega-Peimbert,
Quantao Yang,
Olov Andersson
Abstract:
The capability to efficiently search for objects in complex environments is fundamental for many real-world robot applications. Recent advances in open-vocabulary vision models have resulted in semantically-informed object navigation methods that allow a robot to search for an arbitrary object without prior training. However, these zero-shot methods have so far treated the environment as unknown f…
▽ More
The capability to efficiently search for objects in complex environments is fundamental for many real-world robot applications. Recent advances in open-vocabulary vision models have resulted in semantically-informed object navigation methods that allow a robot to search for an arbitrary object without prior training. However, these zero-shot methods have so far treated the environment as unknown for each consecutive query. In this paper we introduce a new benchmark for zero-shot multi-object navigation, allowing the robot to leverage information gathered from previous searches to more efficiently find new objects. To address this problem we build a reusable open-vocabulary feature map tailored for real-time object search. We further propose a probabilistic-semantic map update that mitigates common sources of errors in semantic feature extraction and leverage this semantic uncertainty for informed multi-object exploration. We evaluate our method on a set of object navigation tasks in both simulation as well as with a real robot, running in real-time on a Jetson Orin AGX. We demonstrate that it outperforms existing state-of-the-art approaches both on single and multi-object navigation tasks. Additional videos, code and the multi-object navigation benchmark will be available on https://finnbsch.github.io/OneMap.
△ Less
Submitted 18 September, 2024;
originally announced September 2024.
-
Selective Switching Between Two Band-Edge Alignments in Ternary Pentagonal CdSeTe Monolayer: Atom-Valley Locking
Authors:
Zhi-Qiang Wen,
Qiu Yang,
Shu-Hao Cao,
Zhao-Yi Zeng,
Hua-Yun Geng,
Xiang-Rong Chen
Abstract:
In the field of photocatalytic water splitting, no current studies have explicitly investigated the coexistence of multiple band-edge alignments in two-dimensional (2D) materials with intrinsic electric fields. In this Letter, we designed the ternary pentagonal CdSeTe monolayer, and proposed a novel concept called atom-valley locking, which could provide multiple band-edge positions. In the CdSeTe…
▽ More
In the field of photocatalytic water splitting, no current studies have explicitly investigated the coexistence of multiple band-edge alignments in two-dimensional (2D) materials with intrinsic electric fields. In this Letter, we designed the ternary pentagonal CdSeTe monolayer, and proposed a novel concept called atom-valley locking, which could provide multiple band-edge positions. In the CdSeTe monolayer, two distinct valleys emerge in the electronic structure, one contributed by Se atoms and the other by Te atoms, with a spontaneous polarization of 187 meV between them. This phenomenon can be attributed to the localization of valley electrons and the breaking of four-fold rotational reflection symmetry, yet it does not rely on the breaking of time-reversal symmetry. Due to the atom-dependent valley distribution, two types of band-edge alignments can be identified. Moreover, selective switching between them can be achieved by strain engineering, thereby enabling precise control over the site of the hydrogen evolution reaction. Our findings open up new opportunities for exploring valley polarization and provide unique insights into the photocatalytic applications of 2D materials with intrinsic electric fields.
△ Less
Submitted 15 September, 2024;
originally announced September 2024.
-
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis
Authors:
Qi Yang,
Binjie Mao,
Zili Wang,
Xing Nie,
Pengfei Gao,
Ying Guo,
Cheng Zhen,
Pengfei Yan,
Shiming Xiang
Abstract:
Foley is a term commonly used in filmmaking, referring to the addition of daily sound effects to silent films or videos to enhance the auditory experience. Video-to-Audio (V2A), as a particular type of automatic foley task, presents inherent challenges related to audio-visual synchronization. These challenges encompass maintaining the content consistency between the input video and the generated a…
▽ More
Foley is a term commonly used in filmmaking, referring to the addition of daily sound effects to silent films or videos to enhance the auditory experience. Video-to-Audio (V2A), as a particular type of automatic foley task, presents inherent challenges related to audio-visual synchronization. These challenges encompass maintaining the content consistency between the input video and the generated audio, as well as the alignment of temporal and loudness properties within the video. To address these issues, we construct a controllable video-to-audio synthesis model, termed Draw an Audio, which supports multiple input instructions through drawn masks and loudness signals. To ensure content consistency between the synthesized audio and target video, we introduce the Mask-Attention Module (MAM), which employs masked video instruction to enable the model to focus on regions of interest. Additionally, we implement the Time-Loudness Module (TLM), which uses an auxiliary loudness signal to ensure the synthesis of sound that aligns with the video in both loudness and temporal dimensions. Furthermore, we have extended a large-scale V2A dataset, named VGGSound-Caption, by annotating caption prompts. Extensive experiments on challenging benchmarks across two large-scale V2A datasets verify Draw an Audio achieves the state-of-the-art. Project page: https://yannqi.github.io/Draw-an-Audio/.
△ Less
Submitted 9 September, 2024;
originally announced September 2024.
-
Multi-feature Compensatory Motion Analysis for Reaching Motions Over a Discretely Sampled Workspace
Authors:
Qihan Yang,
Yuri Gloumakov,
Adam J. Spiers
Abstract:
The absence of functional arm joints, such as the wrist, in upper extremity prostheses leads to compensatory motions in the users' daily activities. Compensatory motions have been previously studied for varying task protocols and evaluation metrics. However, the movement targets' spatial locations in previous protocols were not standardised and incomparable between studies, and the evaluation metr…
▽ More
The absence of functional arm joints, such as the wrist, in upper extremity prostheses leads to compensatory motions in the users' daily activities. Compensatory motions have been previously studied for varying task protocols and evaluation metrics. However, the movement targets' spatial locations in previous protocols were not standardised and incomparable between studies, and the evaluation metrics were rudimentary. This work analysed compensatory motions in the final pose of subjects reaching across a discretely sampled 7*7 2D grid of targets under unbraced (normative) and braced (compensatory) conditions. For the braced condition, a bracing system was applied to simulate a transradial prosthetic limb by restricting participants' wrist joints. A total of 1372 reaching poses were analysed, and a Compensation Index was proposed to indicate the severity level of compensation. This index combined joint spatial location analysis, joint angle analysis, separability analysis, and machine learning (clustering) analysis. The individual analysis results and the final Compensation Index were presented in heatmap format to correspond to the spatial layout of the workspace, revealing the spatial dependency of compensatory motions. The results indicate that compensatory motions occur mainly in a right trapezoid region in the upper left area and a vertical trapezoid region in the middle left area for right-handed subjects reaching horizontally and vertically. Such results might guide motion selection in clinical rehabilitation, occupational therapy, and prosthetic evaluation to help avoid residual limb pain and overuse syndromes.
△ Less
Submitted 23 August, 2024;
originally announced September 2024.
-
Dual conformal invariant kinematics and folding of Grassmannian cluster algebras
Authors:
Jian-Rong Li,
Changjian Su,
Qinglin Yang
Abstract:
In quantum field theory study, Grassmannian manifolds $\text{Gr}(4,n)$ are closely related to $D{=}4$ kinematics input for $n$-particle scattering processes, whose combinatorial and geometrical structures have been widely applied in studying conformal invariant physical theories and their scattering amplitudes. Recently, \cite{HLY21} observed that constraining $D{=}4$ kinematics input to its…
▽ More
In quantum field theory study, Grassmannian manifolds $\text{Gr}(4,n)$ are closely related to $D{=}4$ kinematics input for $n$-particle scattering processes, whose combinatorial and geometrical structures have been widely applied in studying conformal invariant physical theories and their scattering amplitudes. Recently, \cite{HLY21} observed that constraining $D{=}4$ kinematics input to its $D{=}3$ subspace can be interpreted as folding Grassmannian cluster algebras $\mathbb{C}[\text{Gr}(4,n)]$. In this paper, we deduce general expressions for these constraints in terms of Plücker variables of $\text{Gr}(4,n)$ directly from $D{=}3$ subspace definition, and propose a series of initial quivers for algebra $\mathbb{C}[\text{Gr}(4,n)]$ whose folding conditions exactly meet the constraints, which proves the observation finally.
△ Less
Submitted 8 September, 2024;
originally announced September 2024.
-
Semantic Communication for Efficient Point Cloud Transmission
Authors:
Shangzhuo Xie,
Qianqian Yang,
Yuyi Sun,
Tianxiao Han,
Zhaohui Yang,
Zhiguo Shi
Abstract:
As three-dimensional acquisition technologies like LiDAR cameras advance, the need for efficient transmission of 3D point clouds is becoming increasingly important. In this paper, we present a novel semantic communication (SemCom) approach for efficient 3D point cloud transmission. Different from existing methods that rely on downsampling and feature extraction for compression, our approach utiliz…
▽ More
As three-dimensional acquisition technologies like LiDAR cameras advance, the need for efficient transmission of 3D point clouds is becoming increasingly important. In this paper, we present a novel semantic communication (SemCom) approach for efficient 3D point cloud transmission. Different from existing methods that rely on downsampling and feature extraction for compression, our approach utilizes a parallel structure to separately extract both global and local information from point clouds. This system is composed of five key components: local semantic encoder, global semantic encoder, channel encoder, channel decoder, and semantic decoder. Our numerical results indicate that this approach surpasses both the traditional Octree compression methodology and alternative deep learning-based strategies in terms of reconstruction quality. Moreover, our system is capable of achieving high-quality point cloud reconstruction under adverse channel conditions, specifically maintaining a reconstruction quality of over 37dB even with severe channel noise.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
Incorporating Like-Minded Peers to Overcome Friend Data Sparsity in Session-Based Social Recommendations
Authors:
Chunyan An,
Yunhan Li,
Qiang Yang,
Winston K. G. Seah,
Zhixu Li,
Conghao Yang
Abstract:
Session-based Social Recommendation (SSR) leverages social relationships within online networks to enhance the performance of Session-based Recommendation (SR). However, existing SSR algorithms often encounter the challenge of "friend data sparsity". Moreover, significant discrepancies can exist between the purchase preferences of social network friends and those of the target user, reducing the i…
▽ More
Session-based Social Recommendation (SSR) leverages social relationships within online networks to enhance the performance of Session-based Recommendation (SR). However, existing SSR algorithms often encounter the challenge of "friend data sparsity". Moreover, significant discrepancies can exist between the purchase preferences of social network friends and those of the target user, reducing the influence of friends relative to the target user's own preferences. To address these challenges, this paper introduces the concept of "Like-minded Peers" (LMP), representing users whose preferences align with the target user's current session based on their historical sessions. This is the first work, to our knowledge, that uses LMP to enhance the modeling of social influence in SSR. This approach not only alleviates the problem of friend data sparsity but also effectively incorporates users with similar preferences to the target user. We propose a novel model named Transformer Encoder with Graph Attention Aggregator Recommendation (TEGAARec), which includes the TEGAA module and the GAT-based social aggregation module. The TEGAA module captures and merges both long-term and short-term interests for target users and LMP users. Concurrently, the GAT-based social aggregation module is designed to aggregate the target users' dynamic interests and social influence in a weighted manner. Extensive experiments on four real-world datasets demonstrate the efficacy and superiority of our proposed model and ablation studies are done to illustrate the contributions of each component in TEGAARec.
△ Less
Submitted 6 September, 2024; v1 submitted 4 September, 2024;
originally announced September 2024.
-
Multi-Sources Fusion Learning for Multi-Points NLOS Localization in OFDM System
Authors:
Bohao Wang,
Zitao Shuai,
Chongwen Huang,
Qianqian Yang,
Zhaohui Yang,
Richeng Jin,
Ahmed Al Hammadi,
Zhaoyang Zhang,
Chau Yuen,
Mérouane Debbah
Abstract:
Accurate localization of mobile terminals is a pivotal aspect of integrated sensing and communication systems. Traditional fingerprint-based localization methods, which infer coordinates from channel information within pre-set rectangular areas, often face challenges due to the heterogeneous distribution of fingerprints inherent in non-line-of-sight (NLOS) scenarios, particularly within orthogonal…
▽ More
Accurate localization of mobile terminals is a pivotal aspect of integrated sensing and communication systems. Traditional fingerprint-based localization methods, which infer coordinates from channel information within pre-set rectangular areas, often face challenges due to the heterogeneous distribution of fingerprints inherent in non-line-of-sight (NLOS) scenarios, particularly within orthogonal frequency division multiplexing systems. To overcome this limitation, we develop a novel multi-sources information fusion learning framework referred to as the Autosync Multi-Domains NLOS Localization (AMDNLoc). Specifically, AMDNLoc employs a two-stage matched filter fused with a target tracking algorithm and iterative centroid-based clustering to automatically and irregularly segment NLOS regions, ensuring uniform distribution within channel state information across frequency, power, and time-delay domains. Additionally, the framework utilizes a segment-specific linear classifier array, coupled with deep residual network-based feature extraction and fusion, to establish the correlation function between fingerprint features and coordinates within these regions. Simulation results reveal that AMDNLoc achieves an impressive NLOS localization accuracy of 1.46 meters on typical wireless artificial intelligence research datasets and demonstrates significant improvements in interpretability, adaptability, and scalability.
△ Less
Submitted 4 September, 2024;
originally announced September 2024.
-
Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning
Authors:
Jinglin Liang,
Jin Zhong,
Hanlin Gu,
Zhongqi Lu,
Xingxing Tang,
Gang Dai,
Shuangping Huang,
Lixin Fan,
Qiang Yang
Abstract:
Federated Class Continual Learning (FCCL) merges the challenges of distributed client learning with the need for seamless adaptation to new classes without forgetting old ones. The key challenge in FCCL is catastrophic forgetting, an issue that has been explored to some extent in Continual Learning (CL). However, due to privacy preservation requirements, some conventional methods, such as experien…
▽ More
Federated Class Continual Learning (FCCL) merges the challenges of distributed client learning with the need for seamless adaptation to new classes without forgetting old ones. The key challenge in FCCL is catastrophic forgetting, an issue that has been explored to some extent in Continual Learning (CL). However, due to privacy preservation requirements, some conventional methods, such as experience replay, are not directly applicable to FCCL. Existing FCCL methods mitigate forgetting by generating historical data through federated training of GANs or data-free knowledge distillation. However, these approaches often suffer from unstable training of generators or low-quality generated data, limiting their guidance for the model. To address this challenge, we propose a novel method of data replay based on diffusion models. Instead of training a diffusion model, we employ a pre-trained conditional diffusion model to reverse-engineer each class, searching the corresponding input conditions for each class within the model's input space, significantly reducing computational resources and time consumption while ensuring effective generation. Furthermore, we enhance the classifier's domain generalization ability on generated and real data through contrastive learning, indirectly improving the representational capability of generated data for real data. Comprehensive experiments demonstrate that our method significantly outperforms existing baselines. Code is available at https://github.com/jinglin-liang/DDDR.
△ Less
Submitted 3 September, 2024; v1 submitted 2 September, 2024;
originally announced September 2024.
-
Hadronic cross section measurements with the DAMPE space mission using 20GeV-10TeV cosmic-ray protons and $^4$He
Authors:
F. Alemanno,
Q. An,
P. Azzarello,
F. C. T. Barbato,
P. Bernardini,
X. J. Bi,
I. Cagnoli,
M. S. Cai,
E. Casilli,
E. Catanzani,
J. Chang,
D. Y. Chen,
J. L. Chen,
Z. F. Chen,
P. Coppin,
M. Y. Cui,
T. S. Cui,
Y. X. Cui,
H. T. Dai,
A. De Benedittis,
I. De Mitri,
F. de Palma,
A. Di Giovanni,
Q. Ding,
T. K. Dong
, et al. (126 additional authors not shown)
Abstract:
Precise direct cosmic-ray (CR) measurements provide an important probe to study the energetic particle sources in our Galaxy, and the interstellar environment through which these particles propagate. Uncertainties on hadronic models, ion-nucleon cross sections in particular, are currently the limiting factor towards obtaining more accurate CR ion flux measurements with calorimetric space-based exp…
▽ More
Precise direct cosmic-ray (CR) measurements provide an important probe to study the energetic particle sources in our Galaxy, and the interstellar environment through which these particles propagate. Uncertainties on hadronic models, ion-nucleon cross sections in particular, are currently the limiting factor towards obtaining more accurate CR ion flux measurements with calorimetric space-based experiments. We present an energy-dependent measurement of the inelastic cross section of protons and helium-4 nuclei (alpha particles) on a Bi$_4$Ge$_3$O$_{12}$ target, using 88 months of data collected by the DAMPE space mission. The kinetic energy range per nucleon of the measurement points ranges from 18 GeV to 9 TeV for protons, and from 5 GeV/n to 3 TeV/n for helium-4 nuclei. Our results lead to a significant improvement of the CR flux normalisation. In the case of helium-4, these results correspond to the first cross section measurements on a heavy target material at energies above 10 GeV/n.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Authors:
Shengpeng Ji,
Ziyue Jiang,
Xize Cheng,
Yifu Chen,
Minghui Fang,
Jialong Zuo,
Qian Yang,
Ruiqi Li,
Ziang Zhang,
Xiaoda Yang,
Rongjie Huang,
Yidi Jiang,
Qian Chen,
Siqi Zheng,
Wen Wang,
Zhou Zhao
Abstract:
Language models have been effectively applied to modeling natural signals, such as images, video, speech, and audio. A crucial component of these models is the codec tokenizer, which compresses high-dimensional natural signals into lower-dimensional discrete tokens. In this paper, we introduce WavTokenizer, which offers several advantages over previous SOTA acoustic codec models in the audio domai…
▽ More
Language models have been effectively applied to modeling natural signals, such as images, video, speech, and audio. A crucial component of these models is the codec tokenizer, which compresses high-dimensional natural signals into lower-dimensional discrete tokens. In this paper, we introduce WavTokenizer, which offers several advantages over previous SOTA acoustic codec models in the audio domain: 1)extreme compression. By compressing the layers of quantizers and the temporal dimension of the discrete codec, one-second audio of 24kHz sampling rate requires only a single quantizer with 40 or 75 tokens. 2)improved subjective quality. Despite the reduced number of tokens, WavTokenizer achieves state-of-the-art reconstruction quality with outstanding UTMOS scores and inherently contains richer semantic information. Specifically, we achieve these results by designing a broader VQ space, extended contextual windows, and improved attention networks, as well as introducing a powerful multi-scale discriminator and an inverse Fourier transform structure. We conducted extensive reconstruction experiments in the domains of speech, audio, and music. WavTokenizer exhibited strong performance across various objective and subjective metrics compared to state-of-the-art models. We also tested semantic information, VQ utilization, and adaptability to generative models. Comprehensive ablation studies confirm the necessity of each module in WavTokenizer. The related code, demos, and pre-trained models are available at https://github.com/jishengpeng/WavTokenizer.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
Galaxies Lighting Up: Discovery of Seventy New Turn-on Changing-look Quasars
Authors:
Qian Yang,
Paul J. Green,
Xue-Bing Wu,
Michael Eracleous,
Linhua Jiang,
Yuming Fu
Abstract:
"Changing-look quasars" (CLQs), discovered less than a decade ago, show dramatic, rapid changes in optical/UV continuum and broad line emission. The majority of CLQs have been found dimming as "turn-off" CLQs because most selection methods start from samples of spectroscopically-confirmed quasars. We present here a sample of 82 spectroscopically confirmed "turn-on" CLQs, 70 of which are newly iden…
▽ More
"Changing-look quasars" (CLQs), discovered less than a decade ago, show dramatic, rapid changes in optical/UV continuum and broad line emission. The majority of CLQs have been found dimming as "turn-off" CLQs because most selection methods start from samples of spectroscopically-confirmed quasars. We present here a sample of 82 spectroscopically confirmed "turn-on" CLQs, 70 of which are newly identified. The turn-on CLQs are selected from spectroscopically classified galaxies with subsequent significant and dramatic variability in both the optical and mid-infrared bands, indicating a mechanism of changing accretion rate of the supermassive black holes rather than variable obscuration. Based on their bright state Eddington ratios, turn-on CLQs are associated with lower accretion rates compared to turn-off CLQs or typical SDSS quasars with similar redshift and magnitude distributions, even though turn-on CLQs have lower black hole masses. Most turn-on CLQs reside in host galaxies that follow local relations between the central black hole mass and host galaxy properties, such as stellar mass and velocity dispersion. However, their host galaxies have higher mass than normal inactive galaxies, with star formation rates more similar to hosts of Type 2 AGN than to the overall galaxy population.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
HEAD: A Bandwidth-Efficient Cooperative Perception Approach for Heterogeneous Connected and Autonomous Vehicles
Authors:
Deyuan Qu,
Qi Chen,
Yongqi Zhu,
Yihao Zhu,
Sergei S. Avedisov,
Song Fu,
Qing Yang
Abstract:
In cooperative perception studies, there is often a trade-off between communication bandwidth and perception performance. While current feature fusion solutions are known for their excellent object detection performance, transmitting the entire sets of intermediate feature maps requires substantial bandwidth. Furthermore, these fusion approaches are typically limited to vehicles that use identical…
▽ More
In cooperative perception studies, there is often a trade-off between communication bandwidth and perception performance. While current feature fusion solutions are known for their excellent object detection performance, transmitting the entire sets of intermediate feature maps requires substantial bandwidth. Furthermore, these fusion approaches are typically limited to vehicles that use identical detection models. Our goal is to develop a solution that supports cooperative perception across vehicles equipped with different modalities of sensors. This method aims to deliver improved perception performance compared to late fusion techniques, while achieving precision similar to the state-of-art intermediate fusion, but requires an order of magnitude less bandwidth. We propose HEAD, a method that fuses features from the classification and regression heads in 3D object detection networks. Our method is compatible with heterogeneous detection networks such as LiDAR PointPillars, SECOND, VoxelNet, and camera Bird's-eye View (BEV) Encoder. Given the naturally smaller feature size in the detection heads, we design a self-attention mechanism to fuse the classification head and a complementary feature fusion layer to fuse the regression head. Our experiments, comprehensively evaluated on the V2V4Real and OPV2V datasets, demonstrate that HEAD is a fusion method that effectively balances communication bandwidth and perception performance.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
A Survey on Reinforcement Learning Applications in SLAM
Authors:
Mohammad Dehghani Tezerjani,
Mohammad Khoshnazar,
Mohammadhamed Tangestanizadeh,
Qing Yang
Abstract:
The emergence of mobile robotics, particularly in the automotive industry, introduces a promising era of enriched user experiences and adept handling of complex navigation challenges. The realization of these advancements necessitates a focused technological effort and the successful execution of numerous intricate tasks, particularly in the critical domain of Simultaneous Localization and Mapping…
▽ More
The emergence of mobile robotics, particularly in the automotive industry, introduces a promising era of enriched user experiences and adept handling of complex navigation challenges. The realization of these advancements necessitates a focused technological effort and the successful execution of numerous intricate tasks, particularly in the critical domain of Simultaneous Localization and Mapping (SLAM). Various artificial intelligence (AI) methodologies, such as deep learning and reinforcement learning, present viable solutions to address the challenges in SLAM. This study specifically explores the application of reinforcement learning in the context of SLAM. By enabling the agent (the robot) to iteratively interact with and receive feedback from its environment, reinforcement learning facilitates the acquisition of navigation and mapping skills, thereby enhancing the robot's decision-making capabilities. This approach offers several advantages, including improved navigation proficiency, increased resilience, reduced dependence on sensor precision, and refinement of the decision-making process. The findings of this study, which provide an overview of reinforcement learning's utilization in SLAM, reveal significant advancements in the field. The investigation also highlights the evolution and innovative integration of these techniques.
△ Less
Submitted 25 August, 2024;
originally announced August 2024.
-
NEXUS: the North ecliptic pole EXtragalactic Unified Survey
Authors:
Yue Shen,
Ming-Yang Zhuang,
Junyao Li,
Adam J. Burgasser,
Xiaohui Fan,
Jenny E. Greene,
Gautham Narayan,
Alice E. Shapley,
Fengwu Sun,
Feige Wang,
Qian Yang
Abstract:
NEXUS is a JWST Multi-Cycle (Cycles 3-5; 368 primary hrs) GO Treasury imaging and spectroscopic survey around the North Ecliptic Pole. It contains two overlapping tiers. The Wide tier ($\sim 400~{\rm arcmin}^2$) performs NIRCam/WFSS 2.4-5 micron grism spectroscopy with three epochs over 3 years (final continuum ${\rm S/N/pixel>3}$ at F444W$<22.2$). The Deep tier ($\sim 50~{\rm arcmin}^2$) performs…
▽ More
NEXUS is a JWST Multi-Cycle (Cycles 3-5; 368 primary hrs) GO Treasury imaging and spectroscopic survey around the North Ecliptic Pole. It contains two overlapping tiers. The Wide tier ($\sim 400~{\rm arcmin}^2$) performs NIRCam/WFSS 2.4-5 micron grism spectroscopy with three epochs over 3 years (final continuum ${\rm S/N/pixel>3}$ at F444W$<22.2$). The Deep tier ($\sim 50~{\rm arcmin}^2$) performs high-multiplexing NIRSpec 0.6-5.3 micron MOS/PRISM spectroscopy for $\sim 10,000$ targets, over 18 epochs with a 2-month cadence (epoch/final continuum ${\rm S/N/pixel>3}$ at F200W$\lesssim 27/29$). All epochs have simultaneous multi-band NIRCam and MIRI imaging ($5σ$ final depths of $\sim 28-29$ in NIRCam and $\sim 25$ in MIRI). The field is within the continuous viewing zone of JWST, and is fully covered by the Euclid Ultra-Deep Field, with 0.9-2 micron deep Euclid spectroscopy and cadenced photometry. NEXUS has three science pillars. First, with its massive and nearly complete (flux-limited) spectroscopic samples and deep photometry, it will perform efficient classification and physical characterization of galaxies and AGNs from $z\sim 1$ to Cosmic Dawn. With the large contiguous area coverage, it will measure the spatial clustering and demography of the first galaxies and SMBHs at $z>6$. Second, multi-epoch observations enable systematic time-domain investigations, focusing on $z>3$ transients and low-mass AGN reverberation mapping. Third, the comprehensive data set will enable knowledge transfer to other legacy fields, create data challenges, and initiate benchmark work for future space missions. With rapid public releases of processed data and an open invitation for collaboration, NEXUS aims for broad and swift community engagement, to become a powerhouse to drive transformative advancements in multiple key science areas of astronomy.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Research on Improved U-net Based Remote Sensing Image Segmentation Algorithm
Authors:
Qiming Yang,
Zixin Wang,
Shinan Liu,
Zizheng Li
Abstract:
In recent years, although U-Net network has made significant progress in the field of image segmentation, it still faces performance bottlenecks in remote sensing image segmentation. In this paper, we innovatively propose to introduce SimAM and CBAM attention mechanism in U-Net, and the experimental results show that after adding SimAM and CBAM modules alone, the model improves 17.41% and 12.23% i…
▽ More
In recent years, although U-Net network has made significant progress in the field of image segmentation, it still faces performance bottlenecks in remote sensing image segmentation. In this paper, we innovatively propose to introduce SimAM and CBAM attention mechanism in U-Net, and the experimental results show that after adding SimAM and CBAM modules alone, the model improves 17.41% and 12.23% in MIoU, and the Mpa and Accuracy are also significantly improved. And after fusing the two,the model performance jumps up to 19.11% in MIoU, and the Mpa and Accuracy are also improved by 16.38% and 14.8% respectively, showing excellent segmentation accuracy and visual effect with strong generalization ability and robustness. This study opens up a new path for remote sensing image segmentation technology and has important reference value for algorithm selection and improvement.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Green Probabilistic Semantic Communication over Wireless Networks
Authors:
Ruopeng Xu,
Zhaohui Yang,
Yijie Mao,
Chongwen Huang,
Qianqian Yang,
Lexi Xu,
Wei Xu,
Zhaoyang Zhang
Abstract:
In this paper, we propose a multi-user green semantic communication system facilitated by a probabilistic knowledge graph (PKG). By integrating probability into the knowledge graph, we enable probabilistic semantic communication (PSC) and represent semantic information accordingly. On this basis, a semantic compression model designed for multi-user downlink task-oriented communication is introduce…
▽ More
In this paper, we propose a multi-user green semantic communication system facilitated by a probabilistic knowledge graph (PKG). By integrating probability into the knowledge graph, we enable probabilistic semantic communication (PSC) and represent semantic information accordingly. On this basis, a semantic compression model designed for multi-user downlink task-oriented communication is introduced, utilizing the semantic compression ratio (SCR) as a parameter to connect the computation and communication processes of information transmission. Based on the rate-splitting multiple access (RSMA) technology, we derive mathematical expressions for system transmission energy consumption and related formulations. Subsequently, the multi-user green semantic communication system is modeled and the optimal problem with the goal of minimizing system energy consumption comprehensively considering the computation and communication process under given constrains is formulated. In order to address the optimal problem, we propose an alternating optimization algorithm that tackles sub-problems of power allocation and beamforming design, semantic compression ratio, and computation capacity allocation. Simulation results validate the effectiveness of our approach, demonstrating the superiority of our system over methods using Space Division Multiple Access (SDMA) and non-orthogonal multiple access (NOMA) instead of RSMA, and highlighting the benefits of our PSC compression model.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Physics-Driven AI Correction in Laser Absorption Sensing Quantification
Authors:
Ruiyuan Kang,
Panos Liatsis,
Meixia Geng,
Qingjie Yang
Abstract:
Laser absorption spectroscopy (LAS) quantification is a popular tool used in measuring temperature and concentration of gases. It has low error tolerance, whereas current ML-based solutions cannot guarantee their measure reliability. In this work, we propose a new framework, SPEC, to address this issue. In addition to the conventional ML estimator-based estimation mode, SPEC also includes a Physic…
▽ More
Laser absorption spectroscopy (LAS) quantification is a popular tool used in measuring temperature and concentration of gases. It has low error tolerance, whereas current ML-based solutions cannot guarantee their measure reliability. In this work, we propose a new framework, SPEC, to address this issue. In addition to the conventional ML estimator-based estimation mode, SPEC also includes a Physics-driven Anomaly Detection module (PAD) to assess the error of the estimation. And a Correction mode is designed to correct the unreliable estimation. The correction mode is a network-based optimization algorithm, which uses the guidance of error to iteratively correct the estimation. A hybrid surrogate error model is proposed to estimate the error distribution, which contains an ensemble of networks to simulate reconstruction error, and true feasible error computation. A greedy ensemble search is proposed to find the optimal correction robustly and efficiently from the gradient guidance of surrogate model. The proposed SPEC is validated on the test scenarios which are outside the training distribution. The results show that SPEC can significantly improve the estimation quality, and the correction mode outperforms current network-based optimization algorithms. In addition, SPEC has the reconfigurability, which can be easily adapted to different quantification tasks via changing PAD without retraining the ML estimator.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Exploiting Fine-Grained Prototype Distribution for Boosting Unsupervised Class Incremental Learning
Authors:
Jiaming Liu,
Hongyuan Liu,
Zhili Qin,
Wei Han,
Yulu Fan,
Qinli Yang,
Junming Shao
Abstract:
The dynamic nature of open-world scenarios has attracted more attention to class incremental learning (CIL). However, existing CIL methods typically presume the availability of complete ground-truth labels throughout the training process, an assumption rarely met in practical applications. Consequently, this paper explores a more challenging problem of unsupervised class incremental learning (UCIL…
▽ More
The dynamic nature of open-world scenarios has attracted more attention to class incremental learning (CIL). However, existing CIL methods typically presume the availability of complete ground-truth labels throughout the training process, an assumption rarely met in practical applications. Consequently, this paper explores a more challenging problem of unsupervised class incremental learning (UCIL). The essence of addressing this problem lies in effectively capturing comprehensive feature representations and discovering unknown novel classes. To achieve this, we first model the knowledge of class distribution by exploiting fine-grained prototypes. Subsequently, a granularity alignment technique is introduced to enhance the unsupervised class discovery. Additionally, we proposed a strategy to minimize overlap between novel and existing classes, thereby preserving historical knowledge and mitigating the phenomenon of catastrophic forgetting. Extensive experiments on the five datasets demonstrate that our approach significantly outperforms current state-of-the-art methods, indicating the effectiveness of the proposed method.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
MalLight: Influence-Aware Coordinated Traffic Signal Control for Traffic Signal Malfunctions
Authors:
Qinchen Yang,
Zejun Xie,
Hua Wei,
Desheng Zhang,
Yu Yang
Abstract:
Urban traffic is subject to disruptions that cause extended waiting time and safety issues at signalized intersections. While numerous studies have addressed the issue of intelligent traffic systems in the context of various disturbances, traffic signal malfunction, a common real-world occurrence with significant repercussions, has received comparatively limited attention. The primary objective of…
▽ More
Urban traffic is subject to disruptions that cause extended waiting time and safety issues at signalized intersections. While numerous studies have addressed the issue of intelligent traffic systems in the context of various disturbances, traffic signal malfunction, a common real-world occurrence with significant repercussions, has received comparatively limited attention. The primary objective of this research is to mitigate the adverse effects of traffic signal malfunction, such as traffic congestion and collision, by optimizing the control of neighboring functioning signals. To achieve this goal, this paper presents a novel traffic signal control framework (MalLight), which leverages an Influence-aware State Aggregation Module (ISAM) and an Influence-aware Reward Aggregation Module (IRAM) to achieve coordinated control of surrounding traffic signals. To the best of our knowledge, this study pioneers the application of a Reinforcement Learning(RL)-based approach to address the challenges posed by traffic signal malfunction. Empirical investigations conducted on real-world datasets substantiate the superior performance of our proposed methodology over conventional and deep learning-based alternatives in the presence of signal malfunction, with reduction of throughput alleviated by as much as 48.6$\%$.
△ Less
Submitted 12 September, 2024; v1 submitted 19 August, 2024;
originally announced August 2024.
-
PA-LLaVA: A Large Language-Vision Assistant for Human Pathology Image Understanding
Authors:
Dawei Dai,
Yuanhui Zhang,
Long Xu,
Qianlan Yang,
Xiaojing Shen,
Shuyin Xia,
Guoyin Wang
Abstract:
The previous advancements in pathology image understanding primarily involved developing models tailored to specific tasks. Recent studies has demonstrated that the large vision-language model can enhance the performance of various downstream tasks in medical image understanding. In this study, we developed a domain-specific large language-vision assistant (PA-LLaVA) for pathology image understand…
▽ More
The previous advancements in pathology image understanding primarily involved developing models tailored to specific tasks. Recent studies has demonstrated that the large vision-language model can enhance the performance of various downstream tasks in medical image understanding. In this study, we developed a domain-specific large language-vision assistant (PA-LLaVA) for pathology image understanding. Specifically, (1) we first construct a human pathology image-text dataset by cleaning the public medical image-text data for domain-specific alignment; (2) Using the proposed image-text data, we first train a pathology language-image pretraining (PLIP) model as the specialized visual encoder for pathology image, and then we developed scale-invariant connector to avoid the information loss caused by image scaling; (3) We adopt two-stage learning to train PA-LLaVA, first stage for domain alignment, and second stage for end to end visual question \& answering (VQA) task. In experiments, we evaluate our PA-LLaVA on both supervised and zero-shot VQA datasets, our model achieved the best overall performance among multimodal models of similar scale. The ablation experiments also confirmed the effectiveness of our design. We posit that our PA-LLaVA model and the datasets presented in this work can promote research in field of computational pathology. All codes are available at: https://github.com/ddw2AIGROUP2CQUPT/PA-LLaVA}{https://github.com/ddw2AIGROUP2CQUPT/PA-LLaVA
△ Less
Submitted 18 August, 2024;
originally announced August 2024.