Skip to main content

Showing 51–100 of 1,512 results for author: Yang, Q

  1. arXiv:2408.08716  [pdf

    cond-mat.mtrl-sci physics.comp-ph

    Tailoring light holes in $β$-$Ga_{2}O_{3}$ via Anion-Anion Antibonding Coupling

    Authors: Ke Xu, Qiaolin Yang, Wenhao Liu, Rong Zhang, Zhi Wang, Jiandong Ye

    Abstract: A significant limitation of wide-bandgap materials is their low hole mobility related to localized holes with heavy effective masses ($m_h^*$). We identify in low-symmetric wide-bandgap compounds an anion-anion antibonding coupling (AAAC) effect as the intrinsic factor behind hole localization, which explains the extremely heavy $m_h^*$ and self-trapped hole (STH) formation observed in gallium oxi… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 22 pages, 1 table, 5 figures

  2. arXiv:2408.08696  [pdf, other

    cs.CL cs.LG

    Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling

    Authors: Xianzhen Luo, Yixuan Wang, Qingfu Zhu, Zhiming Zhang, Xuanyu Zhang, Qing Yang, Dongliang Xu, Wanxiang Che

    Abstract: The rapid growth in the parameters of large language models (LLMs) has made inference latency a fundamental bottleneck, limiting broader application of LLMs. Speculative decoding represents a lossless approach to accelerate inference through a guess-and-verify paradigm, leveraging the parallel capabilities of modern hardware. Some speculative decoding methods rely on additional structures to guess… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: under review

  3. arXiv:2408.08527  [pdf, other

    cs.CV cs.AI

    Focus on Focus: Focus-oriented Representation Learning and Multi-view Cross-modal Alignment for Glioma Grading

    Authors: Li Pan, Yupei Zhang, Qiushi Yang, Tan Li, Xiaohan Xing, Maximus C. F. Yeung, Zhen Chen

    Abstract: Recently, multimodal deep learning, which integrates histopathology slides and molecular biomarkers, has achieved a promising performance in glioma grading. Despite great progress, due to the intra-modality complexity and inter-modality heterogeneity, existing studies suffer from inadequate histopathology representation learning and inefficient molecular-pathology knowledge alignment. These two is… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  4. arXiv:2408.07500  [pdf, other

    cs.CV

    Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach

    Authors: Shizhou Zhang, Wenlong Luo, De Cheng, Qingchun Yang, Lingyan Ran, Yinghui Xing, Yanning Zhang

    Abstract: In this paper, we construct a large-scale benchmark dataset for Ground-to-Aerial Video-based person Re-Identification, named G2A-VReID, which comprises 185,907 images and 5,576 tracklets, featuring 2,788 distinct identities. To our knowledge, this is the first dataset for video ReID under Ground-to-Aerial scenarios. G2A-VReID dataset has the following characteristics: 1) Drastic view changes; 2) L… ▽ More

    Submitted 2 September, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

    Comments: Published at ECCV 2024

  5. arXiv:2408.07354  [pdf

    physics.optics physics.app-ph

    In-line fiber optic optofluidic sensor based on a fully open Fabry-Perot interferometer

    Authors: Dewen Duan, Qian Kang, Qianhui Yang, Zihao Zhao, Na Li, Guan-Xiang Du, Yi-Yuan Xie

    Abstract: We present an all-fiber, fully open Fabry-Perot interferometer (FPI) cavity that is suitable for fluidic measurement applications. Fabrication of the FPI involves the alignment and bonding of three optical fiber sections using either ceramic glue or low-temperature melting glass. The fabrication procedure allows the protection of the cleaved optical fiber end faces, which serve as the two mirrors… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 6 pages, 10 figures

    MSC Class: 7805

    Journal ref: Laser Phys. 34 115102 (2024)

  6. arXiv:2408.06652  [pdf, other

    hep-ph hep-th physics.atom-ph physics.optics

    Search for QCD Axions in light of String Theory

    Authors: Qiaoli Yang, Runchao Huang

    Abstract: The QCD axion stands as one of the most promising candidates for resolving the strong CP problem. However, the value of the axion's decay constant $f_a$ and, by extension, its mass $m_a$, remain uncertain within the framework of effective field theory, posing a challenge for experimental detection. Fortunately, fields such as cosmology and astrophysics can offer crucial clues about potential mass… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 7 pages, 3 figures

  7. arXiv:2408.05043  [pdf, other

    astro-ph.CO gr-qc

    Anisotropy of Nanohertz Gravitational Wave Background and Individual Sources from Supermassive Binary Black Holes: Probe of Cosmic Large Scale Structure

    Authors: Qing Yang, Xiao Guo, Zhoujian Cao, Xiaoyun Shao, Xi Yuan

    Abstract: Several pulsar timing array (PTA) groups have recently claimed the detection of nanohertz gravitational wave (GW) background, but the origin of this GW signal remains unclear. Nanohertz GWs generated by supermassive binary black holes (SMBBHs) are one of the most important GW sources in the PTA band. Utilizing data from numerical cosmology simulation, we generate mock SMBBHs within the observable… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 26 pages,18 figures,1 table

  8. arXiv:2408.04777  [pdf

    eess.IV cs.CV

    Deep Learning-based Unsupervised Domain Adaptation via a Unified Model for Prostate Lesion Detection Using Multisite Bi-parametric MRI Datasets

    Authors: Hao Li, Han Liu, Heinrich von Busch, Robert Grimm, Henkjan Huisman, Angela Tong, David Winkel, Tobias Penzkofer, Ivan Shabunin, Moon Hyung Choi, Qingsong Yang, Dieter Szolar, Steven Shea, Fergus Coakley, Mukesh Harisinghani, Ipek Oguz, Dorin Comaniciu, Ali Kamen, Bin Lou

    Abstract: Our hypothesis is that UDA using diffusion-weighted images, generated with a unified model, offers a promising and reliable strategy for enhancing the performance of supervised learning models in multi-site prostate lesion detection, especially when various b-values are present. This retrospective study included data from 5,150 patients (14,191 samples) collected across nine different imaging cent… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accept at Radiology: Artificial Intelligence. Journal reference and external DOI will be added once published

    Journal ref: Radiology: Artificial Intelligence 2024;6(5):e230521

  9. arXiv:2408.04499  [pdf, other

    cs.LG

    Knowledge-Aided Semantic Communication Leveraging Probabilistic Graphical Modeling

    Authors: Haowen Wan, Qianqian Yang, Jiancheng Tang, Zhiguo shi

    Abstract: In this paper, we propose a semantic communication approach based on probabilistic graphical model (PGM). The proposed approach involves constructing a PGM from a training dataset, which is then shared as common knowledge between the transmitter and receiver. We evaluate the importance of various semantic features and present a PGM-based compression algorithm designed to eliminate predictable port… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  10. arXiv:2408.04222  [pdf, other

    hep-th hep-ph

    From squared amplitudes to energy correlators

    Authors: Song He, Xuhang Jiang, Qinglin Yang, Yao-Qi Zhang

    Abstract: The leading order $N$-point energy correlators of maximally supersymmetric Yang-Mills theory in the limit where the $N$ detectors are collinear can be expressed as an integral of the $1\to N$ splitting function, which is given by the $(N{+}3)$-point squared super-amplitudes at tree level. This provides yet another example that the integrand of certain physical observable -- $N$-point energy correl… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 10 pages, several figures and tables, and an ancillary file with squared amplitudes up to 12 points, and explicit EC integrands up to 7 points

  11. Resonant Beam Enabled DoA Estimation in Passive Positioning System

    Authors: Yixuan Guo, Qingwei Jiang, Mengyuan Xu, Wen Fang, Qingwen Liu, Gang Yan, Qunhui Yang, Hai Lu

    Abstract: The rapid advancement of the next generation of communications and internet of things (IoT) technologies has made the provision of location-based services for diverse devices an increasingly pressing necessity. Localizing devices with/without intelligent computing abilities, including both active and passive devices is essential, especially in indoor scenarios. For traditional RF positioning syste… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  12. arXiv:2408.04168  [pdf, other

    cs.AI

    Perceive, Reflect, and Plan: Designing LLM Agent for Goal-Directed City Navigation without Instructions

    Authors: Qingbin Zeng, Qinglong Yang, Shunan Dong, Heming Du, Liang Zheng, Fengli Xu, Yong Li

    Abstract: This paper considers a scenario in city navigation: an AI agent is provided with language descriptions of the goal location with respect to some well-known landmarks; By only observing the scene around, including recognizing landmarks and road network connections, the agent has to make decisions to navigate to the goal location without instructions. This problem is very challenging, because it req… ▽ More

    Submitted 17 October, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

  13. AdapMTL: Adaptive Pruning Framework for Multitask Learning Model

    Authors: Mingcan Xiang, Steven Jiaxun Tang, Qizheng Yang, Hui Guan, Tongping Liu

    Abstract: In the domain of multimedia and multimodal processing, the efficient handling of diverse data streams such as images, video, and sensor data is paramount. Model compression and multitask learning (MTL) are crucial in this field, offering the potential to address the resource-intensive demands of processing and interpreting multiple forms of media simultaneously. However, effectively compressing a… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 13 pages, 9 figures, Published at ACM Multimedia (ACM MM) 2024

  14. arXiv:2408.02907  [pdf, other

    cs.CL

    Leveraging Inter-Chunk Interactions for Enhanced Retrieval in Large Language Model-Based Question Answering

    Authors: Tiezheng Guo, Chen Wang, Yanyi Liu, Jiawei Tang, Pan Li, Sai Xu, Qingwen Yang, Xianlin Gao, Zhi Li, Yingyou Wen

    Abstract: Retrieving external knowledge and prompting large language models with relevant information is an effective paradigm to enhance the performance of question-answering tasks. Previous research typically handles paragraphs from external documents in isolation, resulting in a lack of context and ambiguous references, particularly in multi-document and complex tasks. To overcome these challenges, we pr… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  15. arXiv:2408.01791  [pdf

    cs.NI

    Implementing NAT Hole Punching with QUIC

    Authors: Jinyu Liang, Wei Xu, Taotao Wang, Qing Yang, Shengli Zhang

    Abstract: The widespread adoption of Network Address Translation (NAT) technology has led to a significant number of network end nodes being located in private networks behind NAT devices, impeding direct communication between these nodes. To solve this problem, a technique known as "hole punching" has been devised for NAT traversal to facilitate peer-to-peer communication among end nodes located in distinc… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: The paper has been accepted for oral presentation at the VTC2024-Fall Conference

  16. arXiv:2408.01708  [pdf, other

    cs.CV

    AVESFormer: Efficient Transformer Design for Real-Time Audio-Visual Segmentation

    Authors: Zili Wang, Qi Yang, Linsu Shi, Jiazhong Yu, Qinghua Liang, Fei Li, Shiming Xiang

    Abstract: Recently, transformer-based models have demonstrated remarkable performance on audio-visual segmentation (AVS) tasks. However, their expensive computational cost makes real-time inference impractical. By characterizing attention maps of the network, we identify two key obstacles in AVS models: 1) attention dissipation, corresponding to the over-concentrated attention weights by Softmax within rest… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  17. arXiv:2408.00381  [pdf, other

    cs.IT eess.SY

    Statistical AoI Guarantee Optimization for Supporting xURLLC in ISAC-enabled V2I Networks

    Authors: Yanxi Zhang, Mingwu Yao, Qinghai Yang, Dongqi Yan, Xu Zhang, Xu Bao, Muyu Mei

    Abstract: This paper addresses the critical challenge of supporting next-generation ultra-reliable and low-latency communication (xURLLC) within integrated sensing and communication (ISAC)-enabled vehicle-to-infrastructure (V2I) networks. We incorporate channel evaluation and retransmission mechanisms for real-time reliability enhancement. Using stochastic network calculus (SNC), we establish a theoretical… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  18. arXiv:2408.00368  [pdf, other

    eess.SP

    Illumination Design for Joint Imaging and Wireless Power Transfer Systems

    Authors: Qianyu Yang, Haiyang Zhang, Chunguo Li, Ruiqi Liu, Baoyun Wang

    Abstract: This paper presents a novel concept termed Integrated Imaging and Wireless Power Transfer (IWPT), wherein the integration of imaging and wireless power transfer functionalities is achieved on a unified hardware platform. IWPT leverages a transmitting array to efficiently illuminate a specific Region of Interest (ROI), enabling the extraction of ROI's scattering coefficients while concurrently prov… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures

  19. arXiv:2407.21538  [pdf

    cond-mat.mes-hall cond-mat.soft physics.chem-ph

    In-plane dielectric constant and conductivity of confined water

    Authors: R. Wang, M. Souilamas, A. Esfandiar, R. Fabregas, S. Benaglia, H. Nevison-Andrews, Q. Yang, J. Normansell, P. Ares, G. Ferrari, A. Principi, A. K. Geim, L. Fumagalli

    Abstract: Water is essential for almost every aspect of life on our planet and, unsurprisingly, its properties have been studied in great detail. However, disproportionately little remains known about the electrical properties of interfacial and strongly confined water where its structure deviates from that of bulk water, becoming distinctly layered. The structural change is expected to affect water's condu… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  20. arXiv:2407.20264  [pdf, other

    eess.SP

    Beam Focusing for Near-Field Multi-User Localization

    Authors: Qianyu Yang, Anna Guerra, Francesco Guidi, Nir Shlezinger, Haiyang Zhang, Davide Dardari, Baoyun Wang, Yonina C. Eldar

    Abstract: Extremely large-scale antenna arrays are poised to play a pivotal role in sixth-generation (6G) networks. Utilizing such arrays often results in a near-field spherical wave transmission environment, enabling the generation of focused beams, which introduces new degrees of freedom for wireless localization. In this paper, we consider a beam-focusing design for localizing multiple sources in the rad… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 13 pages, 11 figures

  21. arXiv:2407.17897  [pdf

    cond-mat.mtrl-sci math-ph

    A general thermodynamically consistent phase-field-micromechanics model of sintering with coupled diffusion and grain motion

    Authors: Qingcheng Yang, Arkadz Kirshtein

    Abstract: Sintering is a pivotal technology for processing ceramic and metallic powders into solid objects. A profound understanding of microstructure evolution during sintering is essential for manufacturing products with tailored properties. While various phase-field models have been proposed to simulate microstructure evolution in solid-state sintering, correctly incorporating the crucial densification m… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  22. arXiv:2407.17715  [pdf, ps, other

    hep-th astro-ph.CO

    Differential equations and recursive solutions for cosmological amplitudes

    Authors: Song He, Xuhang Jiang, Jiahao Liu, Qinglin Yang, Yao-Qi Zhang

    Abstract: Recently considerable efforts have been devoted to computing cosmological correlators and the corresponding wavefunction coefficients, as well as understanding their analytical structures. In this note, we revisit the computation of these ``cosmological amplitudes" associated with any tree or loop graph for conformal scalars with time-dependent interactions in the power-law FRW universe, directly… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 43 pages; many figures

  23. arXiv:2407.16341  [pdf, other

    cs.CV

    Motion Capture from Inertial and Vision Sensors

    Authors: Xiaodong Chen, Wu Liu, Qian Bao, Xinchen Liu, Quanwei Yang, Ruoli Dai, Tao Mei

    Abstract: Human motion capture is the foundation for many computer vision and graphics tasks. While industrial motion capture systems with complex camera arrays or expensive wearable sensors have been widely adopted in movie and game production, consumer-affordable and easy-to-use solutions for personal applications are still far from mature. To utilize a mixture of a monocular camera and very few inertial… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 17 pages,9 figures

  24. arXiv:2407.15488  [pdf, other

    cs.CV

    DiffX: Guide Your Layout to Cross-Modal Generative Modeling

    Authors: Zeyu Wang, Jingyu Lin, Yifei Qian, Yi Huang, Shicen Tian, Bosong Chai, Juncan Deng, Qu Yang, Lan Du, Cunjian Chen, Kejie Huang

    Abstract: Diffusion models have made significant strides in language-driven and layout-driven image generation. However, most diffusion models are limited to visible RGB image generation. In fact, human perception of the world is enriched by diverse viewpoints, such as chromatic contrast, thermal illumination, and depth information. In this paper, we introduce a novel diffusion model for general layout-guid… ▽ More

    Submitted 20 October, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

  25. arXiv:2407.15435  [pdf, other

    cs.CV

    Enhancement of 3D Gaussian Splatting using Raw Mesh for Photorealistic Recreation of Architectures

    Authors: Ruizhe Wang, Chunliang Hua, Tomakayev Shingys, Mengyuan Niu, Qingxin Yang, Lizhong Gao, Yi Zheng, Junyan Yang, Qiao Wang

    Abstract: The photorealistic reconstruction and rendering of architectural scenes have extensive applications in industries such as film, games, and transportation. It also plays an important role in urban planning, architectural design, and the city's promotion, especially in protecting historical and cultural relics. The 3D Gaussian Splatting, due to better performance over NeRF, has become a mainstream t… ▽ More

    Submitted 25 September, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

  26. arXiv:2407.14197  [pdf, other

    cs.CV

    A Benchmark for Gaussian Splatting Compression and Quality Assessment Study

    Authors: Qi Yang, Kaifa Yang, Yuke Xing, Yiling Xu, Zhu Li

    Abstract: To fill the gap of traditional GS compression method, in this paper, we first propose a simple and effective GS data compression anchor called Graph-based GS Compression (GGSC). GGSC is inspired by graph signal processing theory and uses two branches to compress the primitive center and attributes. We split the whole GS sample via KDTree and clip the high-frequency components after the graph Fouri… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  27. arXiv:2407.14006  [pdf, other

    eess.AS cs.SD

    MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis

    Authors: Qian Yang, Jialong Zuo, Zhe Su, Ziyue Jiang, Mingze Li, Zhou Zhao, Feiyang Chen, Zhefeng Wang, Baoxing Huai

    Abstract: We introduce an open source high-quality Mandarin TTS dataset MSceneSpeech (Multiple Scene Speech Dataset), which is intended to provide resources for expressive speech synthesis. MSceneSpeech comprises numerous audio recordings and texts performed and recorded according to daily life scenarios. Each scenario includes multiple speakers and a diverse range of prosodic styles, making it suitable for… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by INTERSPEECH 2024

  28. arXiv:2407.13117  [pdf, other

    cs.CY cs.MM

    SOMONITOR: Explainable Marketing Data Processing and Analysis with Large Language Models

    Authors: Qi Yang, Sergey Nikolenko, Marlo Ongpin, Ilia Gossoudarev, Yu-Yi Chu-Farseeva, Aleksandr Farseev

    Abstract: Online marketing faces formidable challenges in managing and interpreting immense volumes of data necessary for competitor analysis, content research, and strategic branding. It is impossible to review hundreds to thousands of transient online content items by hand, and partial analysis often leads to suboptimal outcomes and poorly performing campaigns. We introduce an explainable AI framework SoM… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  29. arXiv:2407.12517  [pdf, other

    cs.LG

    Evaluating the transferability potential of deep learning models for climate downscaling

    Authors: Ayush Prasad, Paula Harder, Qidong Yang, Prasanna Sattegeri, Daniela Szwarcman, Campbell Watson, David Rolnick

    Abstract: Climate downscaling, the process of generating high-resolution climate data from low-resolution simulations, is essential for understanding and adapting to climate change at regional and local scales. Deep learning approaches have proven useful in tackling this problem. However, existing studies usually focus on training models for one specific task, location and variable, which are therefore limi… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  30. arXiv:2407.12014  [pdf, other

    cs.HC cs.CY

    Surprising Performances of Students with Autism in Classroom with NAO Robot

    Authors: Qin Yang, Huan Lu, Dandan Liang, Shengrong Gong, Huanghao Feng

    Abstract: Autism is a developmental disorder that manifests in early childhood and persists throughout life, profoundly affecting social behavior and hindering the acquisition of learning and social skills in those diagnosed. As technological advancements progress, an increasing array of technologies is being utilized to support the education of students with Autism Spectrum Disorder (ASD), aiming to improv… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

  31. arXiv:2407.11536  [pdf, other

    cs.CL cs.AI

    Fine-Tuning Medical Language Models for Enhanced Long-Contextual Understanding and Domain Expertise

    Authors: Qimin Yang, Rongsheng Wang, Jiexin Chen, Runqi Su, Tao Tan

    Abstract: Large Language Models (LLMs) have been widely applied in various professional fields. By fine-tuning the models using domain specific question and answer datasets, the professional domain knowledge and Q\&A abilities of these models have significantly improved, for example, medical professional LLMs that use fine-tuning of doctor-patient Q\&A data exhibit extraordinary disease diagnostic abilities… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 5 pages, 1 figure. Accepted by the Workshop on Long-Context Foundation Models (LCFM) at ICML 2024

  32. arXiv:2407.10759  [pdf, other

    eess.AS cs.CL cs.LG

    Qwen2-Audio Technical Report

    Authors: Yunfei Chu, Jin Xu, Qian Yang, Haojie Wei, Xipin Wei, Zhifang Guo, Yichong Leng, Yuanjun Lv, Jinzheng He, Junyang Lin, Chang Zhou, Jingren Zhou

    Abstract: We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. In contrast to complex hierarchical tags, we have simplified the pre-training process by utilizing natural language prompts for different data an… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: https://github.com/QwenLM/Qwen2-Audio. Checkpoints, codes and scripts will be opensoursed soon

  33. arXiv:2407.10416  [pdf, other

    cs.AR

    SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated Tiling

    Authors: Huizheng Wang, Jiahao Fang, Xinru Tang, Zhiheng Yue, Jinxi Li, Yubin Qin, Sihan Guan, Qize Yang, Yang Wang, Chao Li, Yang Hu, Shouyi Yin

    Abstract: Benefiting from the self-attention mechanism, Transformer models have attained impressive contextual comprehension capabilities for lengthy texts. The requirements of high-throughput inference arise as the large language models (LLMs) become increasingly prevalent, which calls for large-scale token parallel processing (LTPP). However, existing dynamic sparse accelerators struggle to effectively ha… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  34. arXiv:2407.10285  [pdf, other

    cs.CV

    Noise Calibration: Plug-and-play Content-Preserving Video Enhancement using Pre-trained Video Diffusion Models

    Authors: Qinyu Yang, Haoxin Chen, Yong Zhang, Menghan Xia, Xiaodong Cun, Zhixun Su, Ying Shan

    Abstract: In order to improve the quality of synthesized videos, currently, one predominant method involves retraining an expert diffusion model and then implementing a noising-denoising process for refinement. Despite the significant training costs, maintaining consistency of content between the original and enhanced videos remains a major challenge. To tackle this challenge, we propose a novel formulation… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024, Project Page: https://yangqy1110.github.io/NC-SDEdit/, Code Repo: https://github.com/yangqy1110/NC-SDEdit/

    ACM Class: I.2; I.4.3

  35. arXiv:2407.09806  [pdf, other

    cs.CV

    Asynchronous Feedback Network for Perceptual Point Cloud Quality Assessment

    Authors: Yujie Zhang, Qi Yang, Ziyu Shan, Yiling Xu

    Abstract: Recent years have witnessed the success of the deep learning-based technique in research of no-reference point cloud quality assessment (NR-PCQA). For a more accurate quality prediction, many previous studies have attempted to capture global and local feature in a bottom-up manner, but ignored the interaction and promotion between them. To solve this problem, we propose a novel asynchronous feedba… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  36. arXiv:2407.08559  [pdf

    physics.app-ph

    Study of a Novel Capacitive Pressure Sensor Using Spiral Comb Electrodes

    Authors: Wenjie Chen, Qi Yang, Qi Liu, Yiqun Zhang, Liang He, Yuanlin Xia, Zhuqing Wang, Yubo Huang, Jianfeng Chen, Cao Xia

    Abstract: For traditional capacitive pressure sensors, high nonlinearity and poor sensitivity greatly limited their sensing applications. Hence, an innovative design of capacitors based on spiral comb electrodes is proposed for high-sensitivity pressure detection in this work. Compared to traditional capacitive pressure sensors with straight plate electrodes, the proposed sensor with the spiral electrodes i… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 20 pages, 14 figures

    MSC Class: -

  37. arXiv:2407.08165  [pdf, other

    eess.IV cs.CV

    Explicit-NeRF-QA: A Quality Assessment Database for Explicit NeRF Model Compression

    Authors: Yuke Xing, Qi Yang, Kaifa Yang, Yilin Xu, Zhu Li

    Abstract: In recent years, Neural Radiance Fields (NeRF) have demonstrated significant advantages in representing and synthesizing 3D scenes. Explicit NeRF models facilitate the practical NeRF applications with faster rendering speed, and also attract considerable attention in NeRF compression due to its huge storage cost. To address the challenge of the NeRF compression study, in this paper, we construct a… ▽ More

    Submitted 20 September, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures, 2 tables, conference

  38. arXiv:2407.07840  [pdf, other

    cs.CV cs.CL

    Decompose and Compare Consistency: Measuring VLMs' Answer Reliability via Task-Decomposition Consistency Comparison

    Authors: Qian Yang, Weixiang Yan, Aishwarya Agrawal

    Abstract: Despite tremendous advancements, current state-of-the-art Vision-Language Models (VLMs) are still far from perfect. They tend to hallucinate and may generate biased responses. In such circumstances, having a way to assess the reliability of a given response generated by a VLM is quite useful. Existing methods, such as estimating uncertainty using answer likelihoods or prompt-based confidence gener… ▽ More

    Submitted 8 October, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted to EMNLP 2024 Main Conference

  39. arXiv:2407.07495  [pdf, other

    cs.CL

    Bucket Pre-training is All You Need

    Authors: Hongtao Liu, Qiyao Peng, Qing Yang, Kai Liu, Hongyan Xu

    Abstract: Large language models (LLMs) have demonstrated exceptional performance across various natural language processing tasks. However, the conventional fixed-length data composition strategy for pretraining, which involves concatenating and splitting documents, can introduce noise and limit the model's ability to capture long-range dependencies. To address this, we first introduce three metrics for eva… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  40. arXiv:2407.07487  [pdf, other

    cs.CL

    Review-LLM: Harnessing Large Language Models for Personalized Review Generation

    Authors: Qiyao Peng, Hongtao Liu, Hongyan Xu, Qing Yang, Minglai Shao, Wenjun Wang

    Abstract: Product review generation is an important task in recommender systems, which could provide explanation and persuasiveness for the recommendation. Recently, Large Language Models (LLMs, e.g., ChatGPT) have shown superior text modeling and generating ability, which could be applied in review generation. However, directly applying the LLMs for generating reviews might be troubled by the ``polite'' ph… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  41. Nonrigid Reconstruction of Freehand Ultrasound without a Tracker

    Authors: Qi Li, Ziyi Shen, Qianye Yang, Dean C. Barratt, Matthew J. Clarkson, Tom Vercauteren, Yipeng Hu

    Abstract: Reconstructing 2D freehand Ultrasound (US) frames into 3D space without using a tracker has recently seen advances with deep learning. Predicting good frame-to-frame rigid transformations is often accepted as the learning objective, especially when the ground-truth labels from spatial tracking devices are inherently rigid transformations. Motivated by a) the observed nonrigid deformation due to so… ▽ More

    Submitted 14 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted at MICCAI 2024

  42. Properties of Besov-Lorentz spaces and application to Navier-Stokes equations

    Authors: Qixiang Yang, Hongwei Li

    Abstract: Inspired by Caffarelli-Kohn-Nirenberg, Fefferman and Lin, we try to investigate how to control the set of large value points for the strong solution of Navier-Stokes equations. Besov-Lorentz spaces have multiple indices which can reflect complex changes of the set of the large value points. Hence we consider some properties of Gauss flow, paraproduct flow and couple flow related to the Besov-Loren… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    MSC Class: 35Q30; 76D03; 42B35; 46E30

    Journal ref: SCIENCE CHINA Mathematics (2024)

  43. arXiv:2407.03885  [pdf, other

    cs.CV eess.IV

    Perception-Guided Quality Metric of 3D Point Clouds Using Hybrid Strategy

    Authors: Yujie Zhang, Qi Yang, Yiling Xu, Shan Liu

    Abstract: Full-reference point cloud quality assessment (FR-PCQA) aims to infer the quality of distorted point clouds with available references. Most of the existing FR-PCQA metrics ignore the fact that the human visual system (HVS) dynamically tackles visual information according to different distortion levels (i.e., distortion detection for high-quality samples and appearance perception for low-quality sa… ▽ More

    Submitted 27 September, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  44. arXiv:2407.03876  [pdf, other

    cs.CR cs.CL

    Automated Progressive Red Teaming

    Authors: Bojian Jiang, Yi Jing, Tianhao Shen, Tong Wu, Qing Yang, Deyi Xiong

    Abstract: Ensuring the safety of large language models (LLMs) is paramount, yet identifying potential vulnerabilities is challenging. While manual red teaming is effective, it is time-consuming, costly and lacks scalability. Automated red teaming (ART) offers a more cost-effective alternative, automatically generating adversarial prompts to expose LLM vulnerabilities. However, in current ART efforts, a robu… ▽ More

    Submitted 5 October, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  45. arXiv:2407.03440  [pdf, other

    cs.SD cs.LG eess.AS

    Advanced Framework for Animal Sound Classification With Features Optimization

    Authors: Qiang Yang, Xiuying Chen, Changsheng Ma, Carlos M. Duarte, Xiangliang Zhang

    Abstract: The automatic classification of animal sounds presents an enduring challenge in bioacoustics, owing to the diverse statistical properties of sound signals, variations in recording equipment, and prevalent low Signal-to-Noise Ratio (SNR) conditions. Deep learning models like Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) have excelled in human speech recognition but have not… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  46. arXiv:2407.03390  [pdf, other

    cond-mat.mes-hall physics.optics

    Observation of Co-propagating Chiral Zero Modes in Magnetic Photonic Crystals

    Authors: Zhongfu Li, Shaojie Ma, Shuwei Li, Oubo you, Yachao Liu, Qingdong Yang, Yuanjiang Xiang, Peiheng Zhou, Shuang Zhang

    Abstract: Topological singularities, such as Weyl points and Dirac points, can give rise to unidirectional propagation channels known as chiral zero modes (CZMs) when subject to a magnetic field. These CZMs are responsible for intriguing phenomena like the chiral anomaly in quantum systems. The propagation direction of each CZM is determined by both the applied magnetic field and the topological charge of t… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 6 pages, 5 figures

  47. arXiv:2407.00478  [pdf, other

    cs.LG cs.AI

    Knowledge-Aware Parsimony Learning: A Perspective from Relational Graphs

    Authors: Quanming Yao, Yongqi Zhang, Yaqing Wang, Nan Yin, James Kwok, Qiang Yang

    Abstract: The scaling law, which involves the brute-force expansion of training datasets and learnable parameters, has become a prevalent strategy for developing more robust learning models. However, due to bottlenecks in data, computation, and trust, the sustainability of the scaling law is a serious concern for the future of deep learning. In this paper, we address this issue by developing next-generation… ▽ More

    Submitted 10 October, 2024; v1 submitted 29 June, 2024; originally announced July 2024.

  48. arXiv:2406.19703  [pdf, other

    cs.CV

    Vision Transformer with Key-select Routing Attention for Single Image Dehazing

    Authors: Lihan Tong, Weijia Li, Qingxia Yang, Liyuan Chen, Peng Chen

    Abstract: We present Ksformer, utilizing Multi-scale Key-select Routing Attention (MKRA) for intelligent selection of key areas through multi-channel, multi-scale windows with a top-k operator, and Lightweight Frequency Processing Module (LFPM) to enhance high-frequency features, outperforming other dehazing methods in tests.

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 5 pages,4 figures,IEICE Trans. Information and Systems

    Report number: Vol.E107-D,No.11,pp.-,Nov. 2024 MSC Class: 68U10(Primary) ACM Class: I.4

  49. arXiv:2406.18862  [pdf, other

    cs.SD eess.AS

    Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study

    Authors: Peikun Chen, Sining Sun, Changhao Shan, Qing Yang, Lei Xie

    Abstract: Unified speech-text models like SpeechGPT, VioLA, and AudioPaLM have shown impressive performance across various speech-related tasks, especially in Automatic Speech Recognition (ASR). These models typically adopt a unified method to model discrete speech and text tokens, followed by training a decoder-only transformer. However, they are all designed for non-streaming ASR tasks, where the entire s… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted for Interspeech 2024

  50. arXiv:2406.17404  [pdf, other

    cs.CL cs.LG

    Make Some Noise: Unlocking Language Model Parallel Inference Capability through Noisy Training

    Authors: Yixuan Wang, Xianzhen Luo, Fuxuan Wei, Yijun Liu, Qingfu Zhu, Xuanyu Zhang, Qing Yang, Dongliang Xu, Wanxiang Che

    Abstract: Existing speculative decoding methods typically require additional model structure and training processes to assist the model for draft token generation. This makes the migration of acceleration methods to the new model more costly and more demanding on device memory. To address this problem, we propose the Make Some Noise (MSN) training framework as a replacement for the supervised fine-tuning st… ▽ More

    Submitted 5 October, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: EMNLP 2024, camera ready