subscribe to arXiv mailings

Long Term Memory: The Foundation of AI Self-Evolution

Authors: Xun Jiang, Feng Li, Han Zhao, Jiaying Wang, Jun Shao, Shihao Xu, Shu Zhang, Weiling Chen, Xavier Tang, Yize Chen, Mengyue Wu, Weizhi Ma, Mengdi Wang, Tianqiao Chen

Abstract: Large language models (LLMs) like GPTs, trained on vast datasets, have demonstrated impressive capabilities in language understanding, reasoning, and planning, achieving human-level performance in various tasks. Most studies focus on enhancing these models by training on ever-larger datasets to build more powerful foundation models. While training stronger models is important, enabling models to e… ▽ More Large language models (LLMs) like GPTs, trained on vast datasets, have demonstrated impressive capabilities in language understanding, reasoning, and planning, achieving human-level performance in various tasks. Most studies focus on enhancing these models by training on ever-larger datasets to build more powerful foundation models. While training stronger models is important, enabling models to evolve during inference is equally crucial, a process we refer to as AI self-evolution. Unlike large-scale training, self-evolution may rely on limited data or interactions. Inspired by the columnar organization of the human cerebral cortex, we hypothesize that AI models could develop cognitive abilities and build internal representations through iterative interactions with their environment. To achieve this, models need long-term memory (LTM) to store and manage processed interaction data. LTM supports self-evolution by representing diverse experiences across environments and agents. In this report, we explore AI self-evolution and its potential to enhance models during inference. We examine LTM's role in lifelong learning, allowing models to evolve based on accumulated interactions. We outline the structure of LTM and the systems needed for effective data retention and representation. We also classify approaches for building personalized models with LTM data and show how these models achieve self-evolution through interaction. Using LTM, our multi-agent framework OMNE achieved first place on the GAIA benchmark, demonstrating LTM's potential for AI self-evolution. Finally, we present a roadmap for future research, emphasizing the importance of LTM for advancing AI technology and its practical applications. △ Less

Submitted 21 October, 2024; originally announced October 2024.

Comments: 56 pages, 13 figures

arXiv:2410.15569 [pdf, other]

Online Pseudo-Label Unified Object Detection for Multiple Datasets Training

Authors: XiaoJun Tang, Jingru Wang, Zeyu Shangguan, Darun Tang, Yuyu Liu

Abstract: The Unified Object Detection (UOD) task aims to achieve object detection of all merged categories through training on multiple datasets, and is of great significance in comprehensive object detection scenarios. In this paper, we conduct a thorough analysis of the cross datasets missing annotations issue, and propose an Online Pseudo-Label Unified Object Detection scheme. Our method uses a periodic… ▽ More The Unified Object Detection (UOD) task aims to achieve object detection of all merged categories through training on multiple datasets, and is of great significance in comprehensive object detection scenarios. In this paper, we conduct a thorough analysis of the cross datasets missing annotations issue, and propose an Online Pseudo-Label Unified Object Detection scheme. Our method uses a periodically updated teacher model to generate pseudo-labels for the unlabelled objects in each sub-dataset. This periodical update strategy could better ensure that the accuracy of the teacher model reaches the local maxima and maximized the quality of pseudo-labels. In addition, we survey the influence of overlapped region proposals on the accuracy of box regression. We propose a category specific box regression and a pseudo-label RPN head to improve the recall rate of the Region Proposal Network (PRN). Our experimental results on common used benchmarks (\eg COCO, Object365 and OpenImages) indicates that our online pseudo-label UOD method achieves higher accuracy than existing SOTA methods. △ Less

Submitted 20 October, 2024; originally announced October 2024.

arXiv:2410.15333 [pdf, other]

The ALMA-QUARKS Survey: Fibers' role in star formation unveiled in an intermediate-mass protocluster region of the Vela D cloud

Authors: Dongting Yang, HongLi Liu, Tie Liu, Anandmayee Tej, Xunchuan Liu, Jinhua He, Guido Garay, Amelia Stutz, Lei Zhu, Sheng-Li Qin, Fengwei Xu, Pak-Shing Li, Mika Juvela, Pablo Garcia, Paul F. Goldsmith, Siju Zhang, Xindi Tang, Patricio Sanhueza, Shanghuo Li, Chang Won Lee, Swagat Ranjan Das, Wenyu Jiao, Xiaofeng Mai, Prasanta Gorai, Yichen Zhang , et al. (10 additional authors not shown)

Abstract: In this paper, we present a detailed analysis of the IRS 17 filament within the intermediate-mass protocluster IRAS 08448-4343 (of $\sim\,10^3\,\rm M_{\odot}$), using ALMA data from the ATOMS 3-mm and QUARKS 1.3-mm surveys. The IRS 17 filament, which spans $\sim$54000 au ($0.26\,\rm pc$) in length and $\sim$4000 au ($0.02\,\rm pc$) in width, exhibits a complex, multi-component velocity field, and… ▽ More In this paper, we present a detailed analysis of the IRS 17 filament within the intermediate-mass protocluster IRAS 08448-4343 (of $\sim\,10^3\,\rm M_{\odot}$), using ALMA data from the ATOMS 3-mm and QUARKS 1.3-mm surveys. The IRS 17 filament, which spans $\sim$54000 au ($0.26\,\rm pc$) in length and $\sim$4000 au ($0.02\,\rm pc$) in width, exhibits a complex, multi-component velocity field, and harbours hierarchical substructures. These substructures include three bundles of seven velocity-coherent fibers, and 29 dense ($n\sim 10^8\,\rm cm^{-3}$) condensations. The fibers have a median length of $\sim 4500\,\rm au$ and a median width of $\sim 1400\,\rm au$. Among these fibers, four are identified as ``fertile", each hosting at least three dense condensations, which are regarded as the ``seeds" of star formation. While the detected cores are randomly spaced within the IRS\,17 filament based on the 3-mm dust continuum image, periodic spacing ($\sim1600\,\rm au$) of condensations is observed in the fertile fibers according to the 1.3-mm dust map, consistent with the predictions of linear isothermal cylinder fragmentation models. These findings underscore the crucial role of fibers in star formation and suggest a hierarchical fragmentation process that extends from the filament to the fibers, and ultimately, to the smallest-scale condensations. △ Less

Submitted 20 October, 2024; originally announced October 2024.

Comments: 19 pages, 10 figures, 4 tables, accepted by ApJ

arXiv:2410.15275 [pdf]

MAD: Move AI Decompiler to Improve Transparency and Auditability on Non-Open-Source Blockchain Smart Contract

Authors: Eason Chen, Xinyi Tang, Zimo Xiao, Chuangji Li, Shizhuo Li, Wu Tingguan, Siyun Wang, Kostas Kryptos Chalkias

Abstract: Web3 aims to enhance user control over data and assets, but this vision is challenged by non-transparent, scam-prone applications and vulnerable smart contracts. While code audits are one solution to this problem, the lack of smart contracts source code on many blockchain platforms, such as Sui, hinders the ease of auditing. A promising approach to this issue is the use of a decompiler to reverse-… ▽ More Web3 aims to enhance user control over data and assets, but this vision is challenged by non-transparent, scam-prone applications and vulnerable smart contracts. While code audits are one solution to this problem, the lack of smart contracts source code on many blockchain platforms, such as Sui, hinders the ease of auditing. A promising approach to this issue is the use of a decompiler to reverse-engineer smart contract bytecode. However, existing decompilers for Sui produce code that is difficult to understand and cannot be directly recompiled. To address this, we developed the Move AI Decompiler (MAD), a Large Language Model (LLM)-powered web application that decompiles smart contract bytecodes on Sui into logically correct, human-readable, and re-compilable source code. Our evaluation shows that MAD produces logically correct code that successfully passes original unit tests and achieves a 66.7% recompilation success rate on real-world smart contracts. Additionally, in a user study involving 12 developers, MAD significantly reduced the auditing workload compared to using traditional decompilers. Participants found MAD's outputs comparable to the original source code, simplifying the process of smart contract logic comprehension and auditing. Despite some limitations, such as occasional hallucinations and compile errors, MAD still provides significant improvements over traditional decompilers. MAD has practical implications for blockchain smart contract transparency, auditing, and education. It empowers users to review and audit non-open-source smart contracts, fostering trust and accountability. Additionally, MAD's approach could potentially extend to other smart contract languages, like Solidity, promoting transparency across various blockchains. △ Less

Submitted 20 October, 2024; originally announced October 2024.

arXiv:2410.15257 [pdf, other]

Learning-Augmented Algorithms for the Bahncard Problem

Authors: Hailiang Zhao, Xueyan Tang, Peng Chen, Shuiguang Deng

Abstract: In this paper, we study learning-augmented algorithms for the Bahncard problem. The Bahncard problem is a generalization of the ski-rental problem, where a traveler needs to irrevocably and repeatedly decide between a cheap short-term solution and an expensive long-term one with an unknown future. Even though the problem is canonical, only a primal-dual-based learning-augmented algorithm was expli… ▽ More In this paper, we study learning-augmented algorithms for the Bahncard problem. The Bahncard problem is a generalization of the ski-rental problem, where a traveler needs to irrevocably and repeatedly decide between a cheap short-term solution and an expensive long-term one with an unknown future. Even though the problem is canonical, only a primal-dual-based learning-augmented algorithm was explicitly designed for it. We develop a new learning-augmented algorithm, named PFSUM, that incorporates both history and short-term future to improve online decision making. We derive the competitive ratio of PFSUM as a function of the prediction error and conduct extensive experiments to show that PFSUM outperforms the primal-dual-based algorithm. △ Less

Submitted 19 October, 2024; originally announced October 2024.

Comments: This paper has been accepted by the 38th Conference on Neural Information Processing Systems (NeurIPS 2024)

arXiv:2410.12229 [pdf, other]

Comprehending Knowledge Graphs with Large Language Models for Recommender Systems

Authors: Ziqiang Cui, Yunpeng Weng, Xing Tang, Fuyuan Lyu, Dugang Liu, Xiuqiang He, Chen Ma

Abstract: Recently, the introduction of knowledge graphs (KGs) has significantly advanced recommender systems by facilitating the discovery of potential associations between items. However, existing methods still face several limitations. First, most KGs suffer from missing facts or limited scopes. This can lead to biased knowledge representations, thereby constraining the model's performance. Second, exist… ▽ More Recently, the introduction of knowledge graphs (KGs) has significantly advanced recommender systems by facilitating the discovery of potential associations between items. However, existing methods still face several limitations. First, most KGs suffer from missing facts or limited scopes. This can lead to biased knowledge representations, thereby constraining the model's performance. Second, existing methods typically convert textual information into IDs, resulting in the loss of natural semantic connections between different items. Third, existing methods struggle to capture high-order relationships in global KGs due to their inefficient layer-by-layer information propagation mechanisms, which are prone to introducing significant noise. To address these limitations, we propose a novel method called CoLaKG, which leverages large language models (LLMs) for knowledge-aware recommendation. The extensive world knowledge and remarkable reasoning capabilities of LLMs enable them to supplement KGs. Additionally, the strong text comprehension abilities of LLMs allow for a better understanding of semantic information. Based on this, we first extract subgraphs centered on each item from the KG and convert them into textual inputs for the LLM. The LLM then outputs its comprehension of these item-centered subgraphs, which are subsequently transformed into semantic embeddings. Furthermore, to utilize the global information of the KG, we construct an item-item graph using these semantic embeddings, which can directly capture higher-order associations between items. Both the semantic embeddings and the structural information from the item-item graph are effectively integrated into the recommendation model through our designed representation alignment and neighbor augmentation modules. Extensive experiments on four real-world datasets demonstrate the superiority of our method. △ Less

Submitted 16 October, 2024; originally announced October 2024.

arXiv:2410.12207 [pdf, other]

Divide-Verify-Refine: Aligning LLM Responses with Complex Instructions

Authors: Xianren Zhang, Xianfeng Tang, Hui Liu, Zongyu Wu, Qi He, Dongwon Lee, Suhang Wang

Abstract: Recent studies show that LLMs, particularly open-source models, struggle to follow complex instructions with multiple constraints. Despite the importance, methods to improve LLMs' adherence to such constraints remain unexplored, and current research focuses on evaluating this ability rather than developing solutions. While a few studies enhance constraint adherence through model tuning, this appro… ▽ More Recent studies show that LLMs, particularly open-source models, struggle to follow complex instructions with multiple constraints. Despite the importance, methods to improve LLMs' adherence to such constraints remain unexplored, and current research focuses on evaluating this ability rather than developing solutions. While a few studies enhance constraint adherence through model tuning, this approach is computationally expensive and heavily reliant on training data quality. An alternative is to leverage LLMs' self-correction capabilities, allowing them to adjust responses to better meet specified constraints. However, this self-correction ability of LLMs is limited by the feedback quality, as LLMs cannot autonomously generate reliable feedback or detect errors. Moreover, the self-refinement process heavily depends on few-shot examples that illustrate how to modify responses to meet constraints. As constraints in complex instructions are diverse and vary widely, manually crafting few-shot examples for each constraint type can be labor-intensive and sub-optimal. To deal with these two challenges, we propose the Divide-Verify-Refine (DVR) framework with three steps: (1) Divide complex instructions into single constraints and prepare appropriate tools; (2) Verify: To address the feedback quality problem, these tools will rigorously verify responses and provide reliable feedback; (3) Refine: To address the constraint diversity challenge, we design a refinement repository that collects successful refinement processes and uses them as few-shot demonstrations for future cases, allowing LLMs to learn from the past experience during inference. Additionally, we develop a new dataset of complex instructions, each containing 1-6 constraints. Experiments show that the framework significantly improves performance, doubling LLama3.1-8B's constraint adherence on instructions with 6 constraints. △ Less

Submitted 16 October, 2024; originally announced October 2024.

Comments: Under review

arXiv:2410.11437 [pdf, other]

Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs

Authors: Sihang Zhao, Youliang Yuan, Xiaoying Tang, Pinjia He

Abstract: Multimodal Large Language Models (MLLMs) demonstrate a strong understanding of the real world and can even handle complex tasks. However, they still fail on some straightforward visual question-answering (VQA) problems. This paper dives deeper into this issue, revealing that models tend to err when answering easy questions (e.g. Yes/No questions) about an image, even though they can correctly desc… ▽ More Multimodal Large Language Models (MLLMs) demonstrate a strong understanding of the real world and can even handle complex tasks. However, they still fail on some straightforward visual question-answering (VQA) problems. This paper dives deeper into this issue, revealing that models tend to err when answering easy questions (e.g. Yes/No questions) about an image, even though they can correctly describe it. We refer to this model behavior discrepancy between difficult and simple questions as model laziness. To systematically investigate model laziness, we manually construct LazyBench, a benchmark that includes Yes/No, multiple choice, short answer questions, and image description tasks that are related to the same subjects in the images. Based on LazyBench, we observe that laziness widely exists in current advanced MLLMs (e.g. GPT-4o, Gemini-1.5-pro, Claude 3 and LLaVA-v1.5-13B), and it is more pronounced on stronger models. We also analyze the VQA v2 (LLaVA-v1.5-13B) benchmark and find that about half of its failure cases are caused by model laziness, which further highlights the importance of ensuring that the model fully utilizes its capability. To this end, we conduct preliminary exploration on how to mitigate laziness and find that chain of thought (CoT) can effectively address this issue. △ Less

Submitted 15 October, 2024; originally announced October 2024.

Comments: EMNLP 2024 Findings

arXiv:2410.11327 [pdf, other]

Sequential LLM Framework for Fashion Recommendation

Authors: Han Liu, Xianfeng Tang, Tianlang Chen, Jiapeng Liu, Indu Indu, Henry Peng Zou, Peng Dai, Roberto Fernandez Galan, Michael D Porter, Dongmei Jia, Ning Zhang, Lian Xiong

Abstract: The fashion industry is one of the leading domains in the global e-commerce sector, prompting major online retailers to employ recommendation systems for product suggestions and customer convenience. While recommendation systems have been widely studied, most are designed for general e-commerce problems and struggle with the unique challenges of the fashion domain. To address these issues, we prop… ▽ More The fashion industry is one of the leading domains in the global e-commerce sector, prompting major online retailers to employ recommendation systems for product suggestions and customer convenience. While recommendation systems have been widely studied, most are designed for general e-commerce problems and struggle with the unique challenges of the fashion domain. To address these issues, we propose a sequential fashion recommendation framework that leverages a pre-trained large language model (LLM) enhanced with recommendation-specific prompts. Our framework employs parameter-efficient fine-tuning with extensive fashion data and introduces a novel mix-up-based retrieval technique for translating text into relevant product suggestions. Extensive experiments show our proposed framework significantly enhances fashion recommendation performance. △ Less

Submitted 15 October, 2024; originally announced October 2024.

arXiv:2410.11267 [pdf, other]

FedCCRL: Federated Domain Generalization with Cross-Client Representation Learning

Authors: Xinpeng Wang, Xiaoying Tang

Abstract: Domain Generalization (DG) aims to train models that can effectively generalize to unseen domains. However, in the context of Federated Learning (FL), where clients collaboratively train a model without directly sharing their data, most existing DG algorithms are not directly applicable to the FL setting due to privacy constraints, as well as the limited data quantity and domain diversity at each… ▽ More Domain Generalization (DG) aims to train models that can effectively generalize to unseen domains. However, in the context of Federated Learning (FL), where clients collaboratively train a model without directly sharing their data, most existing DG algorithms are not directly applicable to the FL setting due to privacy constraints, as well as the limited data quantity and domain diversity at each client. To tackle these challenges, we propose FedCCRL, a novel federated domain generalization method that significantly improves the model's ability to generalize to unseen domains without compromising privacy or incurring excessive computational and communication costs. Specifically, we adapt MixStyle to the federated setting to transfer domain-specific features while AugMix is employed to perturb domain-invariant features. Furthermore, we leverage supervised contrastive loss for representation alignment and utilize Jensen-Shannon divergence to ensure consistent predictions between original and augmented samples. Extensive experimental results demonstrate that FedCCRL achieves the state-of-the-art performances on the PACS, OfficeHome and miniDomainNet datasets across varying numbers of clients. Code is available at https://github.com/SanphouWang/FedCCRL. △ Less

Submitted 16 October, 2024; v1 submitted 15 October, 2024; originally announced October 2024.

arXiv:2410.11187 [pdf, other]

Multiview Scene Graph

Authors: Juexiao Zhang, Gao Zhu, Sihang Li, Xinhao Liu, Haorui Song, Xinran Tang, Chen Feng

Abstract: A proper scene representation is central to the pursuit of spatial intelligence where agents can robustly reconstruct and efficiently understand 3D scenes. A scene representation is either metric, such as landmark maps in 3D reconstruction, 3D bounding boxes in object detection, or voxel grids in occupancy prediction, or topological, such as pose graphs with loop closures in SLAM or visibility gra… ▽ More A proper scene representation is central to the pursuit of spatial intelligence where agents can robustly reconstruct and efficiently understand 3D scenes. A scene representation is either metric, such as landmark maps in 3D reconstruction, 3D bounding boxes in object detection, or voxel grids in occupancy prediction, or topological, such as pose graphs with loop closures in SLAM or visibility graphs in SfM. In this work, we propose to build Multiview Scene Graphs (MSG) from unposed images, representing a scene topologically with interconnected place and object nodes. The task of building MSG is challenging for existing representation learning methods since it needs to jointly address both visual place recognition, object detection, and object association from images with limited fields of view and potentially large viewpoint changes. To evaluate any method tackling this task, we developed an MSG dataset and annotation based on a public 3D dataset. We also propose an evaluation metric based on the intersection-over-union score of MSG edges. Moreover, we develop a novel baseline method built on mainstream pretrained vision models, combining visual place recognition and object association into one Transformer decoder architecture. Experiments demonstrate our method has superior performance compared to existing relevant baselines. △ Less

Submitted 14 October, 2024; originally announced October 2024.

Comments: To be published in NeurIPS 2024. Website at https://ai4ce.github.io/MSG/

arXiv:2410.10847 [pdf, other]

Lotus: learning-based online thermal and latency variation management for two-stage detectors on edge devices

Authors: Yifan Gong, Yushu Wu, Zheng Zhan, Pu Zhao, Liangkai Liu, Chao Wu, Xulong Tang, Yanzhi Wang

Abstract: Two-stage object detectors exhibit high accuracy and precise localization, especially for identifying small objects that are favorable for various edge applications. However, the high computation costs associated with two-stage detection methods cause more severe thermal issues on edge devices, incurring dynamic runtime frequency change and thus large inference latency variations. Furthermore, the… ▽ More Two-stage object detectors exhibit high accuracy and precise localization, especially for identifying small objects that are favorable for various edge applications. However, the high computation costs associated with two-stage detection methods cause more severe thermal issues on edge devices, incurring dynamic runtime frequency change and thus large inference latency variations. Furthermore, the dynamic number of proposals in different frames leads to various computations over time, resulting in further latency variations. The significant latency variations of detectors on edge devices can harm user experience and waste hardware resources. To avoid thermal throttling and provide stable inference speed, we propose Lotus, a novel framework that is tailored for two-stage detectors to dynamically scale CPU and GPU frequencies jointly in an online manner based on deep reinforcement learning (DRL). To demonstrate the effectiveness of Lotus, we implement it on NVIDIA Jetson Orin Nano and Mi 11 Lite mobile platforms. The results indicate that Lotus can consistently and significantly reduce latency variation, achieve faster inference, and maintain lower CPU and GPU temperatures under various settings. △ Less

Submitted 1 October, 2024; originally announced October 2024.

Comments: DAC'24, code is available at: https://github.com/wuyushuwys/LOTUS

arXiv:2410.10628 [pdf, other]

Test smells in LLM-Generated Unit Tests

Authors: Wendkûuni C. Ouédraogo, Yinghua Li, Kader Kaboré, Xunzhu Tang, Anil Koyuncu, Jacques Klein, David Lo, Tegawendé F. Bissyandé

Abstract: The use of Large Language Models (LLMs) in automated test generation is gaining popularity, with much of the research focusing on metrics like compilability rate, code coverage and bug detection. However, an equally important quality metric is the presence of test smells design flaws or anti patterns in test code that hinder maintainability and readability. In this study, we explore the diffusion… ▽ More The use of Large Language Models (LLMs) in automated test generation is gaining popularity, with much of the research focusing on metrics like compilability rate, code coverage and bug detection. However, an equally important quality metric is the presence of test smells design flaws or anti patterns in test code that hinder maintainability and readability. In this study, we explore the diffusion of test smells in LLM generated unit test suites and compare them to those found in human written ones. We analyze a benchmark of 20,500 LLM-generated test suites produced by four models (GPT-3.5, GPT-4, Mistral 7B, and Mixtral 8x7B) across five prompt engineering techniques, alongside a dataset of 780,144 human written test suites from 34,637 projects. Leveraging TsDetect, a state of the art tool capable of detecting 21 different types of test smells, we identify and analyze the prevalence and co-occurrence of various test smells in both human written and LLM-generated test suites. Our findings reveal new insights into the strengths and limitations of LLMs in test generation. First, regarding prevalence, we observe that LLMs frequently generate tests with common test smells, such as Magic Number Test and Assertion Roulette. Second, in terms of co occurrence, certain smells, like Long Test and Useless Test, tend to co occur in LLM-generated suites, influenced by specific prompt techniques. Third, we find that project complexity and LLM specific factors, including model size and context length, significantly affect the prevalence of test smells. Finally, the patterns of test smells in LLM-generated tests often mirror those in human-written tests, suggesting potential data leakage from training datasets. These insights underscore the need to refine LLM-based test generation for cleaner code and suggest improvements in both LLM capabilities and software testing practices. △ Less

Submitted 14 October, 2024; originally announced October 2024.

arXiv:2410.09704 [pdf]

EchoPrime: A Multi-Video View-Informed Vision-Language Model for Comprehensive Echocardiography Interpretation

Authors: Milos Vukadinovic, Xiu Tang, Neal Yuan, Paul Cheng, Debiao Li, Susan Cheng, Bryan He, David Ouyang

Abstract: Echocardiography is the most widely used cardiac imaging modality, capturing ultrasound video data to assess cardiac structure and function. Artificial intelligence (AI) in echocardiography has the potential to streamline manual tasks and improve reproducibility and precision. However, most echocardiography AI models are single-view, single-task systems that do not synthesize complementary informa… ▽ More Echocardiography is the most widely used cardiac imaging modality, capturing ultrasound video data to assess cardiac structure and function. Artificial intelligence (AI) in echocardiography has the potential to streamline manual tasks and improve reproducibility and precision. However, most echocardiography AI models are single-view, single-task systems that do not synthesize complementary information from multiple views captured during a full exam, and thus lead to limited performance and scope of applications. To address this problem, we introduce EchoPrime, a multi-view, view-informed, video-based vision-language foundation model trained on over 12 million video-report pairs. EchoPrime uses contrastive learning to train a unified embedding model for all standard views in a comprehensive echocardiogram study with representation of both rare and common diseases and diagnoses. EchoPrime then utilizes view-classification and a view-informed anatomic attention model to weight video-specific interpretations that accurately maps the relationship between echocardiographic views and anatomical structures. With retrieval-augmented interpretation, EchoPrime integrates information from all echocardiogram videos in a comprehensive study and performs holistic comprehensive clinical echocardiography interpretation. In datasets from two independent healthcare systems, EchoPrime achieves state-of-the art performance on 23 diverse benchmarks of cardiac form and function, surpassing the performance of both task-specific approaches and prior foundation models. Following rigorous clinical evaluation, EchoPrime can assist physicians in the automated preliminary assessment of comprehensive echocardiography. △ Less

Submitted 12 October, 2024; originally announced October 2024.

Comments: 30 pages, 3 tables, 3 figures

arXiv:2410.08888 [pdf, other]

Simulating anisotropic diffusion processes with smoothed particle hydrodynamics

Authors: Xiaojing Tang, Oskar Haidn, Xiangyu Hu

Abstract: Diffusion problems with anisotropic features arise in the various areas of science and engineering fields. As a Lagrangian mesh-less method, SPH has a special advantage in addressing the diffusion problems due to the the benefit of dealing with the advection term. But its application to solving anisotropic diffusion is still limited since a robust and general SPH formulation is required to obtain… ▽ More Diffusion problems with anisotropic features arise in the various areas of science and engineering fields. As a Lagrangian mesh-less method, SPH has a special advantage in addressing the diffusion problems due to the the benefit of dealing with the advection term. But its application to solving anisotropic diffusion is still limited since a robust and general SPH formulation is required to obtain accurate approximations of second derivatives. In this paper, we modify a second derivatives model based on the SPH formulation to obtain a full version of Hessian matrix consisting of the Laplacian operator elements. To verify the proposed SPH scheme, firstly, the diffusion of a scalar which distributes following a pre-function within a thin structure is performed by using anisotropic resolution coupling anisotropic kernel. With various anisotropic ratios, excellent agreements with the theoretical solution are achieved. Then, the anisotropic diffusion of a contaminant in fluid is simulated. The simulation results are very consistent with corresponding analytical solutions, showing that the present algorithm can obtain smooth solution without the spurious oscillations for contaminant transport problems with discontinuities, and achieve second-order accuracy. Subsequently, we utilize this newly developed SPH formulation to tackle the problem of the fluid diffusion through a thin porous membrane and the anisotropic transport of transmembrane potential within the left ventricle, demonstrating the capabilities of the proposed SPH framework in solving the complex anisotropic problems. △ Less

Submitted 11 October, 2024; originally announced October 2024.

arXiv:2410.08175 [pdf, ps, other]

A note on the symplectic classification of almost-toric systems

Authors: Xiudi Tang

Abstract: Since simple semitoric systems were classified about fifteen years ago, and semitoric systems five years ago, we want to move a step forward to almost-toric systems. We give a classification of compact almost-toric systems in dimension four up to fiber-preserving symplectomorphisms, in terms of the base, Taylor series, and twisting indices, analogous to the five invariants for semitoric systems. F… ▽ More Since simple semitoric systems were classified about fifteen years ago, and semitoric systems five years ago, we want to move a step forward to almost-toric systems. We give a classification of compact almost-toric systems in dimension four up to fiber-preserving symplectomorphisms, in terms of the base, Taylor series, and twisting indices, analogous to the five invariants for semitoric systems. For convenience, we specify an ordering of focus-focus values and a choice of two cut rays at each of them. △ Less

Submitted 10 October, 2024; originally announced October 2024.

Comments: 12 pages

MSC Class: 53D20 (Primary) 53A15 (Secondary)

arXiv:2410.08126 [pdf, other]

Mars: Situated Inductive Reasoning in an Open-World Environment

Authors: Xiaojuan Tang, Jiaqi Li, Yitao Liang, Song-chun Zhu, Muhan Zhang, Zilong Zheng

Abstract: Large Language Models (LLMs) trained on massive corpora have shown remarkable success in knowledge-intensive tasks. Yet, most of them rely on pre-stored knowledge. Inducing new general knowledge from a specific environment and performing reasoning with the acquired knowledge -- \textit{situated inductive reasoning}, is crucial and challenging for machine intelligence. In this paper, we design Mars… ▽ More Large Language Models (LLMs) trained on massive corpora have shown remarkable success in knowledge-intensive tasks. Yet, most of them rely on pre-stored knowledge. Inducing new general knowledge from a specific environment and performing reasoning with the acquired knowledge -- \textit{situated inductive reasoning}, is crucial and challenging for machine intelligence. In this paper, we design Mars, an interactive environment devised for situated inductive reasoning. It introduces counter-commonsense game mechanisms by modifying terrain, survival setting and task dependency while adhering to certain principles. In Mars, agents need to actively interact with their surroundings, derive useful rules and perform decision-making tasks in specific contexts. We conduct experiments on various RL-based and LLM-based methods, finding that they all struggle on this challenging situated inductive reasoning benchmark. Furthermore, we explore \textit{Induction from Reflection}, where we instruct agents to perform inductive reasoning from history trajectory. The superior performance underscores the importance of inductive reasoning in Mars. Through Mars, we aim to galvanize advancements in situated inductive reasoning and set the stage for developing the next generation of AI systems that can reason in an adaptive and context-sensitive way. △ Less

Submitted 10 October, 2024; originally announced October 2024.

arXiv:2410.07508 [pdf, other]

MOLA: Enhancing Industrial Process Monitoring Using Multi-Block Orthogonal Long Short-Term Memory Autoencoder

Authors: Fangyuan Ma, Cheng Ji, Jingde Wang, Wei Sun, Xun Tang, Zheyu Jiang

Abstract: In this work, we introduce MOLA: a Multi-block Orthogonal Long short-term memory Autoencoder paradigm, to conduct accurate, reliable fault detection of industrial processes. To achieve this, MOLA effectively extracts dynamic orthogonal features by introducing an orthogonality-based loss function to constrain the latent space output. This helps eliminate the redundancy in the features identified, t… ▽ More In this work, we introduce MOLA: a Multi-block Orthogonal Long short-term memory Autoencoder paradigm, to conduct accurate, reliable fault detection of industrial processes. To achieve this, MOLA effectively extracts dynamic orthogonal features by introducing an orthogonality-based loss function to constrain the latent space output. This helps eliminate the redundancy in the features identified, thereby improving the overall monitoring performance. On top of this, a multi-block monitoring structure is proposed, which categorizes the process variables into multiple blocks by leveraging expert process knowledge about their associations with the overall process. Each block is associated with its specific Orthogonal Long short-term memory Autoencoder model, whose extracted dynamic orthogonal features are monitored by distance-based Hotelling's $T^2$ statistics and quantile-based cumulative sum (CUSUM) designed for multivariate data streams that are nonparametric, heterogeneous in nature. Compared to having a single model accounting for all process variables, such a multi-block structure improves the overall process monitoring performance significantly, especially for large-scale industrial processes. Finally, we propose an adaptive weight-based Bayesian fusion (W-BF) framework to aggregate all block-wise monitoring statistics into a global statistic that we monitor for faults, with the goal of improving fault detection speed by assigning weights to blocks based on the sequential order where alarms are raised. We demonstrate the efficiency and effectiveness of our MOLA framework by applying it to the Tennessee Eastman Process and comparing the performance with various benchmark methods. △ Less

Submitted 9 October, 2024; originally announced October 2024.

Comments: 21 pages, 9 figures, 9 tables. Submitted to Processes

arXiv:2410.07286 [pdf, other]

Benchmarking Data Heterogeneity Evaluation Approaches for Personalized Federated Learning

Authors: Zhilong Li, Xiaohu Wu, Xiaoli Tang, Tiantian He, Yew-Soon Ong, Mengmeng Chen, Qiqi Liu, Qicheng Lao, Xiaoxiao Li, Han Yu

Abstract: There is growing research interest in measuring the statistical heterogeneity of clients' local datasets. Such measurements are used to estimate the suitability for collaborative training of personalized federated learning (PFL) models. Currently, these research endeavors are taking place in silos and there is a lack of a unified benchmark to provide a fair and convenient comparison among various… ▽ More There is growing research interest in measuring the statistical heterogeneity of clients' local datasets. Such measurements are used to estimate the suitability for collaborative training of personalized federated learning (PFL) models. Currently, these research endeavors are taking place in silos and there is a lack of a unified benchmark to provide a fair and convenient comparison among various approaches in common settings. We aim to bridge this important gap in this paper. The proposed benchmarking framework currently includes six representative approaches. Extensive experiments have been conducted to compare these approaches under five standard non-IID FL settings, providing much needed insights into which approaches are advantageous under which settings. The proposed framework offers useful guidance on the suitability of various data divergence measures in FL systems. It is beneficial for keeping related research activities on the right track in terms of: (1) designing PFL schemes, (2) selecting appropriate data heterogeneity evaluation approaches for specific FL application scenarios, and (3) addressing fairness issues in collaborative model training. The code is available at https://github.com/Xiaoni-61/DH-Benchmark. △ Less

Submitted 9 October, 2024; originally announced October 2024.

Comments: Accepted to FL@FM-NeurIPS'24

arXiv:2410.05643 [pdf, other]

TRACE: Temporal Grounding Video LLM via Causal Event Modeling

Authors: Yongxin Guo, Jingyu Liu, Mingda Li, Xiaoying Tang, Qingbin Liu, Xi Chen

Abstract: Video Temporal Grounding (VTG) is a crucial capability for video understanding models and plays a vital role in downstream tasks such as video browsing and editing. To effectively handle various tasks simultaneously and enable zero-shot prediction, there is a growing trend in employing video LLMs for VTG tasks. However, current video LLM-based methods rely exclusively on natural language generatio… ▽ More Video Temporal Grounding (VTG) is a crucial capability for video understanding models and plays a vital role in downstream tasks such as video browsing and editing. To effectively handle various tasks simultaneously and enable zero-shot prediction, there is a growing trend in employing video LLMs for VTG tasks. However, current video LLM-based methods rely exclusively on natural language generation, lacking the ability to model the clear structure inherent in videos, which restricts their effectiveness in tackling VTG tasks. To address this issue, this paper first formally introduces causal event modeling framework, which represents videos as sequences of events, and predict the current event using previous events, video inputs, and textural instructions. Each event consists of three components: timestamps, salient scores, and textual captions. We then propose a novel task-interleaved video LLM called TRACE to effectively implement the causal event modeling framework in practice. The TRACE processes visual frames, timestamps, salient scores, and text as distinct tasks, employing various encoders and decoding heads for each. Task tokens are arranged in an interleaved sequence according to the causal event modeling framework's formulation. Extensive experiments on various VTG tasks and datasets demonstrate the superior performance of TRACE compared to state-of-the-art video LLMs. Our model and code are available at \url{https://github.com/gyxxyg/TRACE}. △ Less

Submitted 7 October, 2024; originally announced October 2024.

arXiv:2410.04886 [pdf, other]

High Information Density and Low Coverage Data Storage in DNA with Efficient Channel Coding Schemes

Authors: Yi Ding, Xuan He, Tuan Thanh Nguyen, Wentu Song, Zohar Yakhini, Eitan Yaakobi, Linqiang Pan, Xiaohu Tang, Kui Cai

Abstract: DNA-based data storage has been attracting significant attention due to its extremely high density, low power consumption, and long duration compared to traditional data storage mediums. Despite the recent advancements in DNA data storage technology, significant challenges remain. In particular, various types of errors can occur during the processes of DNA synthesis, storage, and sequencing, inclu… ▽ More DNA-based data storage has been attracting significant attention due to its extremely high density, low power consumption, and long duration compared to traditional data storage mediums. Despite the recent advancements in DNA data storage technology, significant challenges remain. In particular, various types of errors can occur during the processes of DNA synthesis, storage, and sequencing, including substitution errors, insertion errors, and deletion errors. Furthermore, the entire oligo may be lost. In this work, we report a DNA-based data storage architecture that incorporates efficient channel coding schemes, including different types of error-correcting codes (ECCs) and constrained codes, for both the inner coding and outer coding for the DNA data storage channel. We also carry out large scale experiments to validate our proposed DNA data storage architecture. Specifically, 1.61 and 1.69 MB data are encoded into 30,000 oligos each, with information densities of 1.731 and 1.815, respectively. It has been found that the stored information can be fully recovered without any error at average coverages 4.5 and 6.0, respectively. Compared to previous experimental studies, our architecture achieves higher information density and lower coverage, demonstrating the efficiency of the proposed channel coding schemes. △ Less

Submitted 7 October, 2024; v1 submitted 7 October, 2024; originally announced October 2024.

arXiv:2410.03274 [pdf, other]

Performance assessment of the HERD calorimeter with a photo-diode read-out system for high-energy electron beams

Authors: O. Adriani, G. Ambrosi, M. Antonelli, Y. Bai, X. Bai, T. Bao, M. Barbanera, E. Berti, P. Betti, G. Bigongiari, M. Bongi, V. Bonvicini, S. Bottai, I. Cagnoli, W. Cao, J. Casaus, D. Cerasole, Z. Chen, X. Cui, R. D'Alessandro, L. Di Venere, C. Diaz, Y. Dong, S. Detti, M. Duranti , et al. (41 additional authors not shown)

Abstract: The measurement of cosmic rays at energies exceeding 100 TeV per nucleon is crucial for enhancing the understanding of high-energy particle propagation and acceleration models in the Galaxy. HERD is a space-borne calorimetric experiment that aims to extend the current direct measurements of cosmic rays to unexplored energies. The payload is scheduled to be installed on the Chinese Space Station in… ▽ More The measurement of cosmic rays at energies exceeding 100 TeV per nucleon is crucial for enhancing the understanding of high-energy particle propagation and acceleration models in the Galaxy. HERD is a space-borne calorimetric experiment that aims to extend the current direct measurements of cosmic rays to unexplored energies. The payload is scheduled to be installed on the Chinese Space Station in 2027. The primary peculiarity of the instrument is its capability to measure particles coming from all directions, with the main detector being a deep, homogeneous, 3D calorimeter. The active elements are read out using two independent systems: one based on wavelength shifter fibers coupled to CMOS cameras, and the other based on photo-diodes read-out with custom front-end electronics. A large calorimeter prototype was tested in 2023 during an extensive beam test campaign at CERN. In this paper, the performance of the calorimeter for high-energy electron beams, as obtained from the photo-diode system data, is presented. The prototype demonstrated excellent performance, e.g., an energy resolution better than 1% for electrons at 250 GeV. A comparison between beam test data and Monte Carlo simulation data is also presented. △ Less

Submitted 4 October, 2024; originally announced October 2024.

arXiv:2410.03177 [pdf]

Hybrid Centralized-Distributed Resource Allocation Based on Deep Reinforcement Learning for Cooperative D2D Communications

Authors: Yang Yu, Xiaoqing Tang

Abstract: Device-to-device (D2D) technology enables direct communication between adjacent devices within cellular networks. Due to its high data rate, low latency, and performance improvement in spectrum and energy efficiency, it has been widely investigated and applied as a critical technology in 5G New Radio (NR). In addition to conventional overlay and underlay D2D communications, cooperative D2D communi… ▽ More Device-to-device (D2D) technology enables direct communication between adjacent devices within cellular networks. Due to its high data rate, low latency, and performance improvement in spectrum and energy efficiency, it has been widely investigated and applied as a critical technology in 5G New Radio (NR). In addition to conventional overlay and underlay D2D communications, cooperative D2D communication, which can achieve a win-win situation between cellular users (CUs) and D2D users (DUs) through cooperative relaying technique, has attracted extensive attention from academic and industrial circles in the past decade. This paper delves into optimizing joint spectrum allocation, power control, and link-matching between multiple CUs and DUs for cooperative D2D communications, using weighted sum energy efficiency (WSEE) as the performance metric to address the challenges of green communication and sustainable development. This integer programming problem can be decomposed into a classic weighted bipartite graph matching and a series of nonconvex spectrum allocation and power control problems between potentially matched cellular and D2D link pairs. To address this issue, we propose a hybrid centralized-distributed scheme based on deep reinforcement learning (DRL) and the Kuhn-Munkres (KM) algorithm. Leveraging the latter, the CUs and DUs autonomously optimize spectrum allocation and power control by only utilizing local information. Then, the base station (BS) determines the link matching. Simulation results reveal that it achieves near-optimal performance and significantly enhances the network convergence speed with low signaling overheads. In addition, we also propose and utilize cooperative link sets for corresponding D2D links to accelerate the proposed scheme and reduce signaling exchange further. △ Less

Submitted 4 October, 2024; originally announced October 2024.

Comments: 12 pages,9 figures

arXiv:2409.20305 [pdf, other]

Mixed-Precision Embeddings for Large-Scale Recommendation Models

Authors: Shiwei Li, Zhuoqi Hu, Xing Tang, Haozhao Wang, Shijie Xu, Weihong Luo, Yuhua Li, Xiuqiang He, Ruixuan Li

Abstract: Embedding techniques have become essential components of large databases in the deep learning era. By encoding discrete entities, such as words, items, or graph nodes, into continuous vector spaces, embeddings facilitate more efficient storage, retrieval, and processing in large databases. Especially in the domain of recommender systems, millions of categorical features are encoded as unique embed… ▽ More Embedding techniques have become essential components of large databases in the deep learning era. By encoding discrete entities, such as words, items, or graph nodes, into continuous vector spaces, embeddings facilitate more efficient storage, retrieval, and processing in large databases. Especially in the domain of recommender systems, millions of categorical features are encoded as unique embedding vectors, which facilitates the modeling of similarities and interactions among features. However, numerous embedding vectors can result in significant storage overhead. In this paper, we aim to compress the embedding table through quantization techniques. Given that features vary in importance levels, we seek to identify an appropriate precision for each feature to balance model accuracy and memory usage. To this end, we propose a novel embedding compression method, termed Mixed-Precision Embeddings (MPE). Specifically, to reduce the size of the search space, we first group features by frequency and then search precision for each feature group. MPE further learns the probability distribution over precision levels for each feature group, which can be used to identify the most suitable precision with a specially designed sampling strategy. Extensive experiments on three public datasets demonstrate that MPE significantly outperforms existing embedding compression methods. Remarkably, MPE achieves about 200x compression on the Criteo dataset without comprising the prediction accuracy. △ Less

Submitted 17 October, 2024; v1 submitted 30 September, 2024; originally announced September 2024.

Comments: under submision

arXiv:2409.18815 [pdf, other]

Seeing the Invisible through Speckle Images

Authors: Weiru Fan, Xiaobin Tang, Xingqi Xu, Huizhu Hu, Vladislav V. Yakovlev, Shi-Yao Zhu, Da-Wei Wang, Delong Zhang

Abstract: Scattering obscures information carried by wave by producing a speckle pattern, posing a common challenge across various fields, including microscopy and astronomy. Traditional methods for extracting information from speckles often rely on significant physical assumptions, complex devices, or intricate algorithms. Recently, machine learning has emerged as a scalable and widely adopted tool for int… ▽ More Scattering obscures information carried by wave by producing a speckle pattern, posing a common challenge across various fields, including microscopy and astronomy. Traditional methods for extracting information from speckles often rely on significant physical assumptions, complex devices, or intricate algorithms. Recently, machine learning has emerged as a scalable and widely adopted tool for interpreting speckle patterns. However, most current machine learning techniques depend heavily on supervised training with extensive labeled datasets, which is problematic when labels are unavailable. To address this, we propose a strategy based on unsupervised learning for speckle recognition and evaluation, enabling to capture high-level information, such as object classes, directly from speckles without labeled data. By deriving invariant features from speckles, this method allows for the classification of speckles and facilitates diverse applications in image sensing. We experimentally validated our strategy through two significant applications: a noninvasive glucose monitoring system capable of differentiating time-lapse glucose concentrations, and a high-throughput communication system utilizing multimode fibers in dynamic environments. The versatility of this method holds promise for a broad range of far-reaching applications, including biomedical diagnostics, quantum network decoupling, and remote sensing. △ Less

Submitted 27 September, 2024; originally announced September 2024.

arXiv:2409.18724 [pdf]

Cross-Domain Keyword Extraction with Keyness Patterns

Authors: Dongmei Zhou, Xuri Tang

Abstract: Domain dependence and annotation subjectivity pose challenges for supervised keyword extraction. Based on the premises that second-order keyness patterns are existent at the community level and learnable from annotated keyword extraction datasets, this paper proposes a supervised ranking approach to keyword extraction that ranks keywords with keyness patterns consisting of independent features (su… ▽ More Domain dependence and annotation subjectivity pose challenges for supervised keyword extraction. Based on the premises that second-order keyness patterns are existent at the community level and learnable from annotated keyword extraction datasets, this paper proposes a supervised ranking approach to keyword extraction that ranks keywords with keyness patterns consisting of independent features (such as sublanguage domain and term length) and three categories of dependent features -- heuristic features, specificity features, and representavity features. The approach uses two convolutional-neural-network based models to learn keyness patterns from keyword datasets and overcomes annotation subjectivity by training the two models with bootstrap sampling strategy. Experiments demonstrate that the approach not only achieves state-of-the-art performance on ten keyword datasets in general supervised keyword extraction with an average top-10-F-measure of 0.316 , but also robust cross-domain performance with an average top-10-F-measure of 0.346 on four datasets that are excluded in the training process. Such cross-domain robustness is attributed to the fact that community-level keyness patterns are limited in number and temperately independent of language domains, the distinction between independent features and dependent features, and the sampling training strategy that balances excess risk and lack of negative training data. △ Less

Submitted 27 September, 2024; originally announced September 2024.

Comments: 26 pages, 14 figures

ACM Class: H.3.1; H.3.3

arXiv:2409.17289 [pdf, other]

Steering LLM Summarization with Visual Workspaces for Sensemaking

Authors: Xuxin Tang, Eric Krokos, Can Liu, Kylie Davidson, Kirsten Whitley, Naren Ramakrishnan, Chris North

Abstract: Large Language Models (LLMs) have been widely applied in summarization due to their speedy and high-quality text generation. Summarization for sensemaking involves information compression and insight extraction. Human guidance in sensemaking tasks can prioritize and cluster relevant information for LLMs. However, users must translate their cognitive thinking into natural language to communicate wi… ▽ More Large Language Models (LLMs) have been widely applied in summarization due to their speedy and high-quality text generation. Summarization for sensemaking involves information compression and insight extraction. Human guidance in sensemaking tasks can prioritize and cluster relevant information for LLMs. However, users must translate their cognitive thinking into natural language to communicate with LLMs. Can we use more readable and operable visual representations to guide the summarization process for sensemaking? Therefore, we propose introducing an intermediate step--a schematic visual workspace for human sensemaking--before the LLM generation to steer and refine the summarization process. We conduct a series of proof-of-concept experiments to investigate the potential for enhancing the summarization by GPT-4 through visual workspaces. Leveraging a textual sensemaking dataset with a ground truth summary, we evaluate the impact of a human-generated visual workspace on LLM-generated summarization of the dataset and assess the effectiveness of space-steered summarization. We categorize several types of extractable information from typical human workspaces that can be injected into engineered prompts to steer the LLM summarization. The results demonstrate how such workspaces can help align an LLM with the ground truth, leading to more accurate summarization results than without the workspaces. △ Less

Submitted 25 September, 2024; originally announced September 2024.

Comments: 11 figures, 7 pages

arXiv:2409.16614 [pdf, ps, other]

Large radiation back-flux from Monte Carlo simulations of fusion neutron-material interactions

Authors: Michael A. Lively, Danny Perez, Blas Uberuaga, Yanzeng Zhang, Xian-Zhu Tang

Abstract: Radiation back-fluxes, generated from neutron-material interactions in fusion power reactors, can dramatically impact the plasma dynamics, e.g., by seeding runaway electrons during disruptions via Compton scattering of background electrons by wall-emitted gamma radiation. Here, we quantify these back-fluxes, including neutrons, gamma rays, and electrons, using Monte Carlo calculations for a range… ▽ More Radiation back-fluxes, generated from neutron-material interactions in fusion power reactors, can dramatically impact the plasma dynamics, e.g., by seeding runaway electrons during disruptions via Compton scattering of background electrons by wall-emitted gamma radiation. Here, we quantify these back-fluxes, including neutrons, gamma rays, and electrons, using Monte Carlo calculations for a range of structural material candidates and first wall thicknesses. The radiation back-flux magnitudes are remarkably large, with neutron and gamma radiation back-fluxes on the same order of magnitude as the incident fusion neutron flux. Electron back-fluxes are two orders of magnitudes lower, but are emitted at sufficiently high energies to provide a relatively large back-current through the sheath which may cause sheath reversal. Material configuration plays a key role in determining back-flux magnitudes. The structural material chiefly determines the neutron back-flux magnitude, while the first wall thickness principally attenuates the gamma ray and electron back-fluxes. In addition to prompt back-fluxes, which are emitted immediately after fusion neutrons impact the surface, significant delayed gamma ray and electron back-fluxes arise from nuclear decay processes in the activated materials. These delayed back-flux magnitudes range from 2%--7% of the prompt back-fluxes, and remain present during transients when fusion no longer occurs. During disruptions, build-up of delayed gamma radiation back-flux represents potential runaway electron seeding mechanisms, posing additional challenges for disruption mitigation in a power reactor compared with non-nuclear plasma operations. This work highlights the impact of these radiation back-fluxes plasma performance and demonstrates the importance of considering back-flux generation in materials selection for fusion power reactors. △ Less

Submitted 7 October, 2024; v1 submitted 25 September, 2024; originally announced September 2024.

Comments: 33 pages (19 main, 14 supplementary). 29 figures (12 main, 17 supplementary). For submission to the Journal of Nuclear Materials

Report number: LA-UR-24-29015

arXiv:2409.16493 [pdf, other]

NoTeeline: Supporting Real-Time, Personalized Notetaking with LLM-Enhanced Micronotes

Authors: Faria Huq, Abdus Samee, David Chuan-en Lin, Xiaodi Alice Tang, Jeffrey P. Bigham

Abstract: Taking notes quickly while effectively capturing key information can be challenging, especially when watching videos that present simultaneous visual and auditory streams. Manually taken notes often miss crucial details due to the fast-paced nature of the content, while automatically generated notes fail to incorporate user preferences and discourage active engagement with the content. To address… ▽ More Taking notes quickly while effectively capturing key information can be challenging, especially when watching videos that present simultaneous visual and auditory streams. Manually taken notes often miss crucial details due to the fast-paced nature of the content, while automatically generated notes fail to incorporate user preferences and discourage active engagement with the content. To address this, we propose an interactive system, NoTeeline, for supporting real-time, personalized notetaking. Given 'micronotes', NoTeeline automatically expands them into full-fledged notes using Large Language Model (LLM). The generated notes build on the content of micronotes by adding relevant details while maintaining consistency with the user's writing style. In a within-subjects study (n=12), we found that NoTeeline creates high-quality notes that capture the essence of their micronotes with 93.2% factual correctness and accurately align with their writing style (8.33% improvement). Using NoTeeline, participants could capture their desired notes with significantly reduced mental effort, writing 47.0% less text and completing their note in 43.9% less time compared to a manual notetaking baseline. Our results suggest that NoTeeline enables users to integrate LLM assistance in a familiar notetaking workflow while ensuring consistency with their preference. △ Less

Submitted 15 October, 2024; v1 submitted 24 September, 2024; originally announced September 2024.

Comments: Early Draft. Paper under review

arXiv:2409.15907 [pdf, other]

Enhancing Text-to-SQL Capabilities of Large Language Models via Domain Database Knowledge Injection

Authors: Xingyu Ma, Xin Tian, Lingxiang Wu, Xuepeng Wang, Xueming Tang, Jinqiao Wang

Abstract: Text-to-SQL is a subtask in semantic parsing that has seen rapid progress with the evolution of Large Language Models (LLMs). However, LLMs face challenges due to hallucination issues and a lack of domain-specific database knowledge(such as table schema and cell values). As a result, they can make errors in generating table names, columns, and matching values to the correct columns in SQL statemen… ▽ More Text-to-SQL is a subtask in semantic parsing that has seen rapid progress with the evolution of Large Language Models (LLMs). However, LLMs face challenges due to hallucination issues and a lack of domain-specific database knowledge(such as table schema and cell values). As a result, they can make errors in generating table names, columns, and matching values to the correct columns in SQL statements. This paper introduces a method of knowledge injection to enhance LLMs' ability to understand schema contents by incorporating prior knowledge. This approach improves their performance in Text-to-SQL tasks. Experimental results show that pre-training LLMs on domain-specific database knowledge and fine-tuning them on downstream Text-to-SQL tasks significantly improves the Execution Match (EX) and Exact Match (EM) metrics across various models. This effectively reduces errors in generating column names and matching values to the columns. Furthermore, the knowledge-injected models can be applied to many downstream Text-to-SQL tasks, demonstrating the generalizability of the approach presented in this paper. △ Less

Submitted 24 September, 2024; originally announced September 2024.

Comments: This paper has been accepted by ECAI 2024

arXiv:2409.15890 [pdf, other]

HLB: Benchmarking LLMs' Humanlikeness in Language Use

Authors: Xufeng Duan, Bei Xiao, Xuemei Tang, Zhenguang G. Cai

Abstract: As synthetic data becomes increasingly prevalent in training language models, particularly through generated dialogue, concerns have emerged that these models may deviate from authentic human language patterns, potentially losing the richness and creativity inherent in human communication. This highlights the critical need to assess the humanlikeness of language models in real-world language use.… ▽ More As synthetic data becomes increasingly prevalent in training language models, particularly through generated dialogue, concerns have emerged that these models may deviate from authentic human language patterns, potentially losing the richness and creativity inherent in human communication. This highlights the critical need to assess the humanlikeness of language models in real-world language use. In this paper, we present a comprehensive humanlikeness benchmark (HLB) evaluating 20 large language models (LLMs) using 10 psycholinguistic experiments designed to probe core linguistic aspects, including sound, word, syntax, semantics, and discourse (see https://huggingface.co/spaces/XufengDuan/HumanLikeness). To anchor these comparisons, we collected responses from over 2,000 human participants and compared them to outputs from the LLMs in these experiments. For rigorous evaluation, we developed a coding algorithm that accurately identified language use patterns, enabling the extraction of response distributions for each task. By comparing the response distributions between human participants and LLMs, we quantified humanlikeness through distributional similarity. Our results reveal fine-grained differences in how well LLMs replicate human responses across various linguistic levels. Importantly, we found that improvements in other performance metrics did not necessarily lead to greater humanlikeness, and in some cases, even resulted in a decline. By introducing psycholinguistic methods to model evaluation, this benchmark offers the first framework for systematically assessing the humanlikeness of LLMs in language use. △ Less

Submitted 24 September, 2024; originally announced September 2024.

arXiv:2409.15830 [pdf, other]

Self-mediation of runaway electrons via self-excited wave-wave and wave-particle interactions

Authors: Qile Zhang, Yanzeng Zhang, Qi Tang, Xian-Zhu Tang

Abstract: Nonlinear dynamics of runaway electron induced wave instabilities can significantly modify the runaway distribution critical to tokamak operations. Here we present the first-ever fully kinetic simulations of runaway-driven instabilities towards nonlinear saturation in a warm plasma as in tokamak start up. It is found that the slow-X modes grow an order of magnitude faster than the whistler modes,… ▽ More Nonlinear dynamics of runaway electron induced wave instabilities can significantly modify the runaway distribution critical to tokamak operations. Here we present the first-ever fully kinetic simulations of runaway-driven instabilities towards nonlinear saturation in a warm plasma as in tokamak start up. It is found that the slow-X modes grow an order of magnitude faster than the whistler modes, and they parametrically decay to produce whistlers much faster than those directly driven by runaways. These parent-daughter waves, as well as secondary and tertiary wave instabilities, initiate a chain of wave-particle resonances that strongly diffuse runaways to the backward direction. This reduces almost half of the current carried by high-energy runaways, over a time scale orders of magnitude faster than experimental shot duration. These results beyond quasilinear analysis may impact anisotropic energetic electrons broadly in laboratory, space and astrophysics. △ Less

Submitted 24 September, 2024; originally announced September 2024.

arXiv:2409.15115 [pdf]

Compact Broadband Light Source Based on Noise-Like Pulses

Authors: Fanglin Chen, Xiahui Tang, Ming Tang, Luming Zhao

Abstract: We report on broadband generation based on noise-like pulse (NLP) fiber lasers at 1.55 μm and 1.06 μm, respectively. The 1.55 μm laser system can generate a broadband spectrum with a 20 dB bandwidth of up to 205 nm, while the 1.06 μm one can achieve a 20 dB bandwidth of 341 nm after amplification and spectral broadening. Simulation results reproduce experimental details and highlight the role of n… ▽ More We report on broadband generation based on noise-like pulse (NLP) fiber lasers at 1.55 μm and 1.06 μm, respectively. The 1.55 μm laser system can generate a broadband spectrum with a 20 dB bandwidth of up to 205 nm, while the 1.06 μm one can achieve a 20 dB bandwidth of 341 nm after amplification and spectral broadening. Simulation results reproduce experimental details and highlight the role of nonlinear effects in achieving broad spectral outputs, underscoring the suitability of NLPs for advanced applications. △ Less

Submitted 23 September, 2024; originally announced September 2024.

Comments: 8 pages, 6 figures

arXiv:2409.14796 [pdf]

Research on Dynamic Data Flow Anomaly Detection based on Machine Learning

Authors: Liyang Wang, Yu Cheng, Hao Gong, Jiacheng Hu, Xirui Tang, Iris Li

Abstract: The sophistication and diversity of contemporary cyberattacks have rendered the use of proxies, gateways, firewalls, and encrypted tunnels as a standalone defensive strategy inadequate. Consequently, the proactive identification of data anomalies has emerged as a prominent area of research within the field of data security. The majority of extant studies concentrate on sample equilibrium data, wit… ▽ More The sophistication and diversity of contemporary cyberattacks have rendered the use of proxies, gateways, firewalls, and encrypted tunnels as a standalone defensive strategy inadequate. Consequently, the proactive identification of data anomalies has emerged as a prominent area of research within the field of data security. The majority of extant studies concentrate on sample equilibrium data, with the consequence that the detection effect is not optimal in the context of unbalanced data. In this study, the unsupervised learning method is employed to identify anomalies in dynamic data flows. Initially, multi-dimensional features are extracted from real-time data, and a clustering algorithm is utilised to analyse the patterns of the data. This enables the potential outliers to be automatically identified. By clustering similar data, the model is able to detect data behaviour that deviates significantly from normal traffic without the need for labelled data. The results of the experiments demonstrate that the proposed method exhibits high accuracy in the detection of anomalies across a range of scenarios. Notably, it demonstrates robust and adaptable performance, particularly in the context of unbalanced data. △ Less

Submitted 23 September, 2024; originally announced September 2024.

arXiv:2409.14034 [pdf]

Cost-Effective Community-Hierarchy-Based Mutual Voting Approach for Influence Maximization in Complex Networks

Authors: Yi Liu, Xiaoan Tang, Witold Pedrycz, Qiang Zhang

Abstract: Various types of promising techniques have come into being for influence maximization whose aim is to identify influential nodes in complex networks. In essence, real-world applications usually have high requirements on the balance between time complexity and accuracy of influential nodes identification. To address the challenges of imperfect node influence measurement and inefficient seed nodes s… ▽ More Various types of promising techniques have come into being for influence maximization whose aim is to identify influential nodes in complex networks. In essence, real-world applications usually have high requirements on the balance between time complexity and accuracy of influential nodes identification. To address the challenges of imperfect node influence measurement and inefficient seed nodes selection mechanism in such class of foregoing techniques, this article proposes a novel approach called Cost-Effective Community-Hierarchy-Based Mutual Voting for influence maximization in complex networks. First, we develop a method for measuring the importance of different nodes in networks based on an original concept of Dual-Scale Community-Hierarchy Information that synthesizes both hierarchy structural information and community structural information of nodes. The community structural information contained in the nodes is measured by a new notion of Hierarchical-Community Entropy. Second, we develop a method named Cost-Effective Mutual-Influence-based Voting for seed nodes selection. Hereinto, a low-computational-cost mutual voting mechanism and an updating strategy called Lazy Score Updating Strategy are newly constructed for optimizing the selecting of seed nodes. Third, we develop a balance index to evaluate the performance of different methods in striking the tradeoff between time complexity and the accuracy of influential nodes identification. Finally, we demonstrate the approach performance over ten public datasets. The extensive experiments show that the proposed approach outperforms 16 state-of-the-art techniques on the balance between time complexity and accuracy of influential nodes identification. Compared with the method with the second highest value of the balance index, our approach can be improved by at most 9.29%. △ Less

Submitted 21 September, 2024; originally announced September 2024.

arXiv:2409.12576 [pdf, other]

StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation

Authors: Zhengguang Zhou, Jing Li, Huaxia Li, Nemo Chen, Xu Tang

Abstract: Tuning-free personalized image generation methods have achieved significant success in maintaining facial consistency, i.e., identities, even with multiple characters. However, the lack of holistic consistency in scenes with multiple characters hampers these methods' ability to create a cohesive narrative. In this paper, we introduce StoryMaker, a personalization solution that preserves not only f… ▽ More Tuning-free personalized image generation methods have achieved significant success in maintaining facial consistency, i.e., identities, even with multiple characters. However, the lack of holistic consistency in scenes with multiple characters hampers these methods' ability to create a cohesive narrative. In this paper, we introduce StoryMaker, a personalization solution that preserves not only facial consistency but also clothing, hairstyles, and body consistency, thus facilitating the creation of a story through a series of images. StoryMaker incorporates conditions based on face identities and cropped character images, which include clothing, hairstyles, and bodies. Specifically, we integrate the facial identity information with the cropped character images using the Positional-aware Perceiver Resampler (PPR) to obtain distinct character features. To prevent intermingling of multiple characters and the background, we separately constrain the cross-attention impact regions of different characters and the background using MSE loss with segmentation masks. Additionally, we train the generation network conditioned on poses to promote decoupling from poses. A LoRA is also employed to enhance fidelity and quality. Experiments underscore the effectiveness of our approach. StoryMaker supports numerous applications and is compatible with other societal plug-ins. Our source codes and model weights are available at https://github.com/RedAIGC/StoryMaker. △ Less

Submitted 19 September, 2024; originally announced September 2024.

Comments: 12 pages, 5 figures

arXiv:2409.10793 [pdf, other]

Impact of current uncertainties in the 12C+12C nuclear reaction rate on intermediate-mass stars and massive white dwarfs

Authors: Francisco C. De Gerónimo, Marcelo M. Miller Bertolami, Tiara Battich, Xiaodong Tang, Márcio Catelan, Alejandro H. Córsico, Yunjun Li, Xiao Fang, Leandro G. Althaus

Abstract: Recent determinations of the total rate of the 12C+12C nuclear reaction show non-negligible differences with the reference reaction rate commonly used in previous stellar simulations. In addition, the current uncertainties in determining each exit channel constitute one of the main uncertainties in shaping the inner structure of super asymptotic giant branch stars that could have a measurable impa… ▽ More Recent determinations of the total rate of the 12C+12C nuclear reaction show non-negligible differences with the reference reaction rate commonly used in previous stellar simulations. In addition, the current uncertainties in determining each exit channel constitute one of the main uncertainties in shaping the inner structure of super asymptotic giant branch stars that could have a measurable impact on the properties of pulsating ultra-massive white dwarfs (WDs). We explore how new determinations of the nuclear reaction rate and its branching ratios affect the evolution of WD progenitors. We show that the current uncertainties in the branching ratios constitute the main uncertainty factor in determining the inner composition of ultra-massive WDs and their progenitors. We found that the use of extreme branching ratios leads to differences in the central abundances of 20Ne of at most 17%, which are translated into differences of at most 1.3 and 0.8% in the cooling times and size of the crystallized core. However, the impact on the pulsation properties is small, less than 1 s for the asymptotic period spacing. We found that the carbon burns partially in the interior of ultra-massive WD progenitors within a particular range of masses, leaving a hybrid CONe-core composition in their cores. The evolution of these new kinds of predicted objects differs substantially from the evolution of objects with pure CO cores. Differences in the size of the crystallized core and cooling times of up to 15 and 6%, respectively leading to distinct patterns in the period spacing distribution. △ Less

Submitted 16 September, 2024; originally announced September 2024.

Comments: 12 pages, 14 figures. Accepted for publication in ApJ

arXiv:2409.09726 [pdf, other]

High Definition Map Mapping and Update: A General Overview and Future Directions

Authors: Benny Wijaya, Kun Jiang, Mengmeng Yang, Tuopu Wen, Yunlong Wang, Xuewei Tang, Zheng Fu, Taohua Zhou, Diange Yang

Abstract: Along with the rapid growth of autonomous vehicles (AVs), more and more demands are required for environment perception technology. Among others, HD mapping has become one of the more prominent roles in helping the vehicle realize essential tasks such as localization and path planning. While increasing research efforts have been directed toward HD Map development. However, a comprehensive overview… ▽ More Along with the rapid growth of autonomous vehicles (AVs), more and more demands are required for environment perception technology. Among others, HD mapping has become one of the more prominent roles in helping the vehicle realize essential tasks such as localization and path planning. While increasing research efforts have been directed toward HD Map development. However, a comprehensive overview of the overall HD map mapping and update framework is still lacking. This article introduces the development and current state of the algorithm involved in creating HD map mapping and its maintenance. As part of this study, the primary data preprocessing approach of processing raw data to information ready to feed for mapping and update purposes, semantic segmentation, and localization are also briefly reviewed. Moreover, the map taxonomy, ontology, and quality assessment are extensively discussed, the map data's general representation method is presented, and the mapping algorithm ranging from SLAM to transformers learning-based approaches are also discussed. The development of the HD map update algorithm, from change detection to the update methods, is also presented. Finally, the authors discuss possible future developments and the remaining challenges in HD map mapping and update technology. This paper simultaneously serves as a position paper and tutorial to those new to HD map mapping and update domains. △ Less

Submitted 15 September, 2024; originally announced September 2024.

Comments: 30 Pages, 13 figures

arXiv:2409.08478 [pdf, other]

On a class of coupled obstacle systems

Authors: Lili Du, Xu Tang, Cong Wang

Abstract: In this paper, we explore cooperative and competitive coupled obstacle systems, which, up to now, are new type obstacle systems and formed by coupling two equations belonging to classical obstacle problem. On one hand, applying the constrained minimizer in variational methods we establish the existence of solutions for the systems. Moreover, the optimal regularity of solutions is obtained, which i… ▽ More In this paper, we explore cooperative and competitive coupled obstacle systems, which, up to now, are new type obstacle systems and formed by coupling two equations belonging to classical obstacle problem. On one hand, applying the constrained minimizer in variational methods we establish the existence of solutions for the systems. Moreover, the optimal regularity of solutions is obtained, which is the cornerstone for further research on so-called free boundary. Furthermore, as coefficient $λ\to0$, there exists a sequence of solutions converging to solutions of the single classical obstacle equation. On the other hand, motivated by the heartstirring ideas of single classical obstacle problem, based on the corresponding blowup methods, Weiss type monotonicity formula and Monneau type monotonicity formula of systems to be studied, we investigate the regularity of free boundary, and on the regular and singular points in particular, as it should be, which is more challenging but exceedingly meaningful in solving free boundary problems. △ Less

Submitted 12 September, 2024; originally announced September 2024.

arXiv:2409.07503 [pdf, other]

AdaPPA: Adaptive Position Pre-Fill Jailbreak Attack Approach Targeting LLMs

Authors: Lijia Lv, Weigang Zhang, Xuehai Tang, Jie Wen, Feng Liu, Jizhong Han, Songlin Hu

Abstract: Jailbreak vulnerabilities in Large Language Models (LLMs) refer to methods that extract malicious content from the model by carefully crafting prompts or suffixes, which has garnered significant attention from the research community. However, traditional attack methods, which primarily focus on the semantic level, are easily detected by the model. These methods overlook the difference in the model… ▽ More Jailbreak vulnerabilities in Large Language Models (LLMs) refer to methods that extract malicious content from the model by carefully crafting prompts or suffixes, which has garnered significant attention from the research community. However, traditional attack methods, which primarily focus on the semantic level, are easily detected by the model. These methods overlook the difference in the model's alignment protection capabilities at different output stages. To address this issue, we propose an adaptive position pre-fill jailbreak attack approach for executing jailbreak attacks on LLMs. Our method leverages the model's instruction-following capabilities to first output pre-filled safe content, then exploits its narrative-shifting abilities to generate harmful content. Extensive black-box experiments demonstrate our method can improve the attack success rate by 47% on the widely recognized secure model (Llama2) compared to existing approaches. Our code can be found at: https://github.com/Yummy416/AdaPPA. △ Less

Submitted 10 September, 2024; originally announced September 2024.

arXiv:2409.05977 [pdf]

AI for Mathematics Mathematical Formalized Problem Solving and Theorem Proving in Different Fields in Lean4

Authors: Xichen Tang

Abstract: Using computerized verifiable formal languages like Lean 4 to prove mathematical theorems has a significant impact on mathematical formalization. Lean 4 offers prominent potential for advancing mathematical reasoning. However, existing efforts are limited to mathematical formalization languages in substantial online corpora and are dedicated to keeping pace with rapidly evolving languages. To brid… ▽ More Using computerized verifiable formal languages like Lean 4 to prove mathematical theorems has a significant impact on mathematical formalization. Lean 4 offers prominent potential for advancing mathematical reasoning. However, existing efforts are limited to mathematical formalization languages in substantial online corpora and are dedicated to keeping pace with rapidly evolving languages. To bridge the gap between the traditional and computerized proof, my approach to formalizing theorem proving involves generating formal steps and complete proofs using Large Language Models (LLMs) based on Natural Language (NL) proofs. The method is to introduce the basic structure and tactics in general, determine how AI can assist the mathematical formalization process to improve its performance, and give examples of solving problems in Lean 4 comparing to NL, mainly in IMO, and a sample theorem proving in abstract algebra. △ Less

Submitted 9 September, 2024; originally announced September 2024.

arXiv:2409.05893 [pdf, other]

Latent Space Dynamics Learning for Stiff Collisional-radiative Models

Authors: Xuping Xie, Qi Tang, Xianzhu Tang

Abstract: Collisional-radiative (CR) models describe the atomic processes in a plasma by tracking the population density in the ground and excited states for each charge state of the atom or ion. These models predict important plasma properties such as charge state distributions and radiative emissivity and opacity. Accurate CR modeling is essential in radiative plasma modeling for magnetic fusion, especial… ▽ More Collisional-radiative (CR) models describe the atomic processes in a plasma by tracking the population density in the ground and excited states for each charge state of the atom or ion. These models predict important plasma properties such as charge state distributions and radiative emissivity and opacity. Accurate CR modeling is essential in radiative plasma modeling for magnetic fusion, especially when significant amount of impurities are introduced into the plasmas. In radiative plasma simulations, a CR model, which is a set of high-dimensional stiff ordinary differential equations (ODE), needs to be solved on each grid point in the configuration space, which can overwhelm the plasma simulation cost. In this work, we propose a deep learning method that discovers the latent space and learns its corresponding latent dynamics, which can capture the essential physics to make accurate predictions at much lower online computational cost. To facilitate coupling of the latent space CR dynamics with the plasma simulation model in physical variables, our latent space in the autoencoder must be a grey box, consisting of a physical latent space and a data-driven or blackbox latent space. It has been demonstrated that the proposed architecture can accurately predict both the full-order CR dynamics and the critical physical quantity of interest, the so-called radiative power loss rate. △ Less

Submitted 1 September, 2024; originally announced September 2024.

Comments: 27 pages, 22 figures

Report number: LA-UR-24-26289

arXiv:2409.05387 [pdf, other]

doi 10.1145/3680528.3687609

Decoupling Contact for Fine-Grained Motion Style Transfer

Authors: Xiangjun Tang, Linjun Wu, He Wang, Yiqian Wu, Bo Hu, Songnan Li, Xu Gong, Yuchen Liao, Qilong Kou, Xiaogang Jin

Abstract: Motion style transfer changes the style of a motion while retaining its content and is useful in computer animations and games. Contact is an essential component of motion style transfer that should be controlled explicitly in order to express the style vividly while enhancing motion naturalness and quality. However, it is unknown how to decouple and control contact to achieve fine-grained control… ▽ More Motion style transfer changes the style of a motion while retaining its content and is useful in computer animations and games. Contact is an essential component of motion style transfer that should be controlled explicitly in order to express the style vividly while enhancing motion naturalness and quality. However, it is unknown how to decouple and control contact to achieve fine-grained control in motion style transfer. In this paper, we present a novel style transfer method for fine-grained control over contacts while achieving both motion naturalness and spatial-temporal variations of style. Based on our empirical evidence, we propose controlling contact indirectly through the hip velocity, which can be further decomposed into the trajectory and contact timing, respectively. To this end, we propose a new model that explicitly models the correlations between motions and trajectory/contact timing/style, allowing us to decouple and control each separately. Our approach is built around a motion manifold, where hip controls can be easily integrated into a Transformer-based decoder. It is versatile in that it can generate motions directly as well as be used as post-processing for existing methods to improve quality and contact controllability. In addition, we propose a new metric that measures a correlation pattern of motions based on our empirical evidence, aligning well with human perception in terms of motion naturalness. Based on extensive evaluation, our method outperforms existing methods in terms of style expressivity and motion quality. △ Less

Submitted 9 September, 2024; originally announced September 2024.

arXiv:2409.04674 [pdf]

Optimization-Based Image Reconstruction Regularized with Inter-Spectral Structural Similarity for Limited-Angle Dual-Energy Cone-Beam CT

Authors: Junbo Peng, Tonghe Wang, Richard L. J. Qiu, Chih-Wei Chang, Justin Roper, David S. Yu, Xiangyang Tang, Xiaofeng Yang

Abstract: Background: Limited-angle (LA) dual-energy (DE) cone-beam CT (CBCT) is considered as a potential solution to achieve fast and low-dose DE imaging on current CBCT scanners without hardware modification. However, its clinical implementations are hindered by the challenging image reconstruction from LA projections. While optimization-based and deep learning-based methods have been proposed for image… ▽ More Background: Limited-angle (LA) dual-energy (DE) cone-beam CT (CBCT) is considered as a potential solution to achieve fast and low-dose DE imaging on current CBCT scanners without hardware modification. However, its clinical implementations are hindered by the challenging image reconstruction from LA projections. While optimization-based and deep learning-based methods have been proposed for image reconstruction, their utilization is limited by the requirement for X-ray spectra measurement or paired datasets for model training. Purpose: This work aims to facilitate the clinical applications of fast and low-dose DECBCT by developing a practical solution for image reconstruction in LA-DECBCT. Methods: An inter-spectral structural similarity-based regularization was integrated into the iterative image reconstruction in LA-DECBCT. By enforcing the similarity between the DE images, LA artifacts were efficiently reduced in the reconstructed DECBCT images. The proposed method was evaluated using four physical phantoms and three digital phantoms, demonstrating its efficacy in quantitative DECBCT imaging. Results: In all the studies, the proposed method achieves accurate image reconstruction without visible residual artifacts from LA-DECBCT projection data. In the digital phantom study, the proposed method reduces the mean-absolute-error (MAE) from 419 to 14 HU for the High-energy CBCT and 591 to 20 HU for the low-energy CBCT. Conclusions: The proposed method achieves accurate image reconstruction without the need for X-ray spectra measurement for optimization or paired datasets for model training, showing great practical value in clinical implementations of LA-DECBCT. △ Less

Submitted 6 September, 2024; originally announced September 2024.

arXiv:2409.02797 [pdf, ps, other]

Joint Beamforming for Backscatter Integrated Sensing and Communication

Authors: Zongyao Zhao, Tiankuo Wei, Zhenyu Liu, Xinke Tang, Xiao-Ping Zhang, Yuhan Dong

Abstract: Integrated sensing and communication (ISAC) is a key technology of next generation wireless communication. Backscatter communication (BackCom) plays an important role for internet of things (IoT). Then the integration of ISAC with BackCom technology enables low-power data transmission while enhancing the system sensing ability, which is expected to provide a potentially revolutionary solution for… ▽ More Integrated sensing and communication (ISAC) is a key technology of next generation wireless communication. Backscatter communication (BackCom) plays an important role for internet of things (IoT). Then the integration of ISAC with BackCom technology enables low-power data transmission while enhancing the system sensing ability, which is expected to provide a potentially revolutionary solution for IoT applications. In this paper, we propose a novel backscatter-ISAC (B-ISAC) system and focus on the joint beamforming design for the system. We formulate the communication and sensing model of the B-ISAC system and derive the metrics of communication and sensing performance respectively, i.e., communication rate and detection probability. We propose a joint beamforming scheme aiming to optimize the communication rate under sensing constraint and power budget. A successive convex approximation (SCA) based algorithm and an iterative algorithm are developed for solving the complicated non-convex optimization problem. Numerical results validate the effectiveness of the proposed scheme and associated algorithms. The proposed B-ISAC system has broad application prospect in IoT scenarios. △ Less

Submitted 4 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

Comments: 6 pages, 4 figures, IEEE Global Communications Conference (Globecom) 2024. This paper is the conference version of the following work: arXiv:2407.19235

arXiv:2409.02469 [pdf, other]

UAV-Mounted Movable Antenna: Joint Optimization of UAV Placement and Antenna Configuration

Authors: Xiao-Wei Tang, Yunmei Shi, Yi Huang, Qingqing Wu

Abstract: Recently, movable antennas (MAs) have garnered immense attention due to their capability to favorably alter channel conditions through agile movement. In this letter, we delve into a spectrum sharing system enabled by unmanned aerial vehicle (UAV) mounted MAs, thereby introducing a new degree of freedom vertically alongside the horizontal local mobility for MAs. Our objective is to maximize the mi… ▽ More Recently, movable antennas (MAs) have garnered immense attention due to their capability to favorably alter channel conditions through agile movement. In this letter, we delve into a spectrum sharing system enabled by unmanned aerial vehicle (UAV) mounted MAs, thereby introducing a new degree of freedom vertically alongside the horizontal local mobility for MAs. Our objective is to maximize the minimum beamforming gain for secondary users (SUs) while ensuring that interference to the primary users (PUs) remains below a predefined threshold, which necessitates a joint optimization involving the UAV's height, the antenna weight vector (AWV), and the antenna position vector (APV). However, the formulated optimization problem is non-convex and challenging to solve optimally. To tackle this issue, we propose an alternating optimization algorithm that optimizes the UAV's height, APV and AWV in an iterative manner, thus yielding a near-optimal solution. Numerical results demonstrate the superiority of the proposed scheme as well as its ability to deliver full beamforming gain to SUs with reduced computational complexity. △ Less

Submitted 4 September, 2024; originally announced September 2024.

arXiv:2409.02421 [pdf, other]

MusicMamba: A Dual-Feature Modeling Approach for Generating Chinese Traditional Music with Modal Precision

Authors: Jiatao Chen, Tianming Xie, Xing Tang, Jing Wang, Wenjing Dong, Bing Shi

Abstract: In recent years, deep learning has significantly advanced the MIDI domain, solidifying music generation as a key application of artificial intelligence. However, existing research primarily focuses on Western music and encounters challenges in generating melodies for Chinese traditional music, especially in capturing modal characteristics and emotional expression. To address these issues, we propo… ▽ More In recent years, deep learning has significantly advanced the MIDI domain, solidifying music generation as a key application of artificial intelligence. However, existing research primarily focuses on Western music and encounters challenges in generating melodies for Chinese traditional music, especially in capturing modal characteristics and emotional expression. To address these issues, we propose a new architecture, the Dual-Feature Modeling Module, which integrates the long-range dependency modeling of the Mamba Block with the global structure capturing capabilities of the Transformer Block. Additionally, we introduce the Bidirectional Mamba Fusion Layer, which integrates local details and global structures through bidirectional scanning, enhancing the modeling of complex sequences. Building on this architecture, we propose the REMI-M representation, which more accurately captures and generates modal information in melodies. To support this research, we developed FolkDB, a high-quality Chinese traditional music dataset encompassing various styles and totaling over 11 hours of music. Experimental results demonstrate that the proposed architecture excels in generating melodies with Chinese traditional music characteristics, offering a new and effective solution for music generation. △ Less

Submitted 4 September, 2024; originally announced September 2024.

arXiv:2409.01347 [pdf, other]

Target-Driven Distillation: Consistency Distillation with Target Timestep Selection and Decoupled Guidance

Authors: Cunzheng Wang, Ziyuan Guo, Yuxuan Duan, Huaxia Li, Nemo Chen, Xu Tang, Yao Hu

Abstract: Consistency distillation methods have demonstrated significant success in accelerating generative tasks of diffusion models. However, since previous consistency distillation methods use simple and straightforward strategies in selecting target timesteps, they usually struggle with blurs and detail losses in generated images. To address these limitations, we introduce Target-Driven Distillation (TD… ▽ More Consistency distillation methods have demonstrated significant success in accelerating generative tasks of diffusion models. However, since previous consistency distillation methods use simple and straightforward strategies in selecting target timesteps, they usually struggle with blurs and detail losses in generated images. To address these limitations, we introduce Target-Driven Distillation (TDD), which (1) adopts a delicate selection strategy of target timesteps, increasing the training efficiency; (2) utilizes decoupled guidances during training, making TDD open to post-tuning on guidance scale during inference periods; (3) can be optionally equipped with non-equidistant sampling and x0 clipping, enabling a more flexible and accurate way for image sampling. Experiments verify that TDD achieves state-of-the-art performance in few-step generation, offering a better choice among consistency distillation models. △ Less

Submitted 2 September, 2024; originally announced September 2024.

arXiv:2409.01128 [pdf, other]

Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning

Authors: Jinglin Liang, Jin Zhong, Hanlin Gu, Zhongqi Lu, Xingxing Tang, Gang Dai, Shuangping Huang, Lixin Fan, Qiang Yang

Abstract: Federated Class Continual Learning (FCCL) merges the challenges of distributed client learning with the need for seamless adaptation to new classes without forgetting old ones. The key challenge in FCCL is catastrophic forgetting, an issue that has been explored to some extent in Continual Learning (CL). However, due to privacy preservation requirements, some conventional methods, such as experien… ▽ More Federated Class Continual Learning (FCCL) merges the challenges of distributed client learning with the need for seamless adaptation to new classes without forgetting old ones. The key challenge in FCCL is catastrophic forgetting, an issue that has been explored to some extent in Continual Learning (CL). However, due to privacy preservation requirements, some conventional methods, such as experience replay, are not directly applicable to FCCL. Existing FCCL methods mitigate forgetting by generating historical data through federated training of GANs or data-free knowledge distillation. However, these approaches often suffer from unstable training of generators or low-quality generated data, limiting their guidance for the model. To address this challenge, we propose a novel method of data replay based on diffusion models. Instead of training a diffusion model, we employ a pre-trained conditional diffusion model to reverse-engineer each class, searching the corresponding input conditions for each class within the model's input space, significantly reducing computational resources and time consumption while ensuring effective generation. Furthermore, we enhance the classifier's domain generalization ability on generated and real data through contrastive learning, indirectly improving the representational capability of generated data for real data. Comprehensive experiments demonstrate that our method significantly outperforms existing baselines. Code is available at https://github.com/jinglin-liang/DDDR. △ Less

Submitted 3 September, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

Comments: Accepted by ECCV 2024 Oral

arXiv:2408.16634 [pdf, other]

RLCP: A Reinforcement Learning-based Copyright Protection Method for Text-to-Image Diffusion Model

Authors: Zhuan Shi, Jing Yan, Xiaoli Tang, Lingjuan Lyu, Boi Faltings

Abstract: The increasing sophistication of text-to-image generative models has led to complex challenges in defining and enforcing copyright infringement criteria and protection. Existing methods, such as watermarking and dataset deduplication, fail to provide comprehensive solutions due to the lack of standardized metrics and the inherent complexity of addressing copyright infringement in diffusion models.… ▽ More The increasing sophistication of text-to-image generative models has led to complex challenges in defining and enforcing copyright infringement criteria and protection. Existing methods, such as watermarking and dataset deduplication, fail to provide comprehensive solutions due to the lack of standardized metrics and the inherent complexity of addressing copyright infringement in diffusion models. To deal with these challenges, we propose a Reinforcement Learning-based Copyright Protection(RLCP) method for Text-to-Image Diffusion Model, which minimizes the generation of copyright-infringing content while maintaining the quality of the model-generated dataset. Our approach begins with the introduction of a novel copyright metric grounded in copyright law and court precedents on infringement. We then utilize the Denoising Diffusion Policy Optimization (DDPO) framework to guide the model through a multi-step decision-making process, optimizing it using a reward function that incorporates our proposed copyright metric. Additionally, we employ KL divergence as a regularization term to mitigate some failure modes and stabilize RL fine-tuning. Experiments conducted on 3 mixed datasets of copyright and non-copyright images demonstrate that our approach significantly reduces copyright infringement risk while maintaining image quality. △ Less

Submitted 2 September, 2024; v1 submitted 29 August, 2024; originally announced August 2024.

Comments: arXiv admin note: text overlap with arXiv:2403.12052 by other authors

Showing 1–50 of 1,750 results for author: Tang, X