subscribe to arXiv mailings

Learning Multimodal Cues of Children's Uncertainty

Authors: Qi Cheng, Mert İnan, Rahma Mbarki, Grace Grmek, Theresa Choi, Yiming Sun, Kimele Persaud, Jenny Wang, Malihe Alikhani

Abstract: Understanding uncertainty plays a critical role in achieving common ground (Clark et al.,1983). This is especially important for multimodal AI systems that collaborate with users to solve a problem or guide the user through a challenging concept. In this work, for the first time, we present a dataset annotated in collaboration with developmental and cognitive psychologists for the purpose of study… ▽ More Understanding uncertainty plays a critical role in achieving common ground (Clark et al.,1983). This is especially important for multimodal AI systems that collaborate with users to solve a problem or guide the user through a challenging concept. In this work, for the first time, we present a dataset annotated in collaboration with developmental and cognitive psychologists for the purpose of studying nonverbal cues of uncertainty. We then present an analysis of the data, studying different roles of uncertainty and its relationship with task difficulty and performance. Lastly, we present a multimodal machine learning model that can predict uncertainty given a real-time video clip of a participant, which we find improves upon a baseline multimodal transformer model. This work informs research on cognitive coordination between human-human and human-AI and has broad implications for gesture understanding and generation. The anonymized version of our data and code will be publicly available upon the completion of the required consent forms and data sheets. △ Less

Submitted 17 October, 2024; originally announced October 2024.

Comments: SIGDIAL 2023

arXiv:2410.13529 [pdf, ps, other]

A Construction of Evolving $3$-threshold Secret Sharing Scheme with Perfect Security and Smaller Share Size

Authors: Qi Cheng, Hongru Cao, Sian-Jheng Lin

Abstract: The evolving $k$-threshold secret sharing scheme allows the dealer to distribute the secret to many participants such that only no less than $k$ shares together can restore the secret. In contrast to the conventional secret sharing scheme, the evolving scheme allows the number of participants to be uncertain and even ever-growing. In this paper, we consider the evolving secret sharing scheme with… ▽ More The evolving $k$-threshold secret sharing scheme allows the dealer to distribute the secret to many participants such that only no less than $k$ shares together can restore the secret. In contrast to the conventional secret sharing scheme, the evolving scheme allows the number of participants to be uncertain and even ever-growing. In this paper, we consider the evolving secret sharing scheme with $k=3$. First, we point out that the prior approach has risks in the security. To solve this issue, we then propose a new evolving $3$-threshold scheme with perfect security. Given a $\ell$-bit secret, the $t$-th share of the proposed scheme has $\lceil\log_2 t\rceil +O({\lceil \log_4 \log_2 t\rceil}^2)+\log_2 p(2\lceil \log_4 \log_2 t\rceil-1)$ bits, where $p$ is a prime. Compared with the prior result $2 \lfloor\log_2 t\rfloor+O(\lfloor\log_2 t\rfloor)+\ell$, the proposed scheme reduces the leading constant from $2$ to $1$. Finally, we propose a conventional $3$-threshold secret sharing scheme over a finite field. Based on this model of the revised scheme and the proposed conventional $3$-threshold scheme, we present a brand-new and more concise evolving $3$-threshold secret sharing scheme. △ Less

Submitted 17 October, 2024; originally announced October 2024.

Comments: arXiv admin note: text overlap with arXiv:2402.01144

arXiv:2410.08496 [pdf]

Detecting collagen by machine learning improved photoacoustic spectral analysis for breast cancer diagnostics: feasibility studies with murine models

Authors: Jiayan Li, Lu Bai, Yingna Chen, Junmei Cao, Jingtao Zhu, Wanxiang Zhi, Qian Cheng

Abstract: Collagen, a key structural component of the extracellular matrix, undergoes significant remodeling during carcinogenesis. However, the important role of collagen levels in breast cancer diagnostics still lacks effective in vivo detection techniques to provide a deeper understanding. This study presents photoacoustic spectral analysis improved by machine learning as a promising non-invasive diagnos… ▽ More Collagen, a key structural component of the extracellular matrix, undergoes significant remodeling during carcinogenesis. However, the important role of collagen levels in breast cancer diagnostics still lacks effective in vivo detection techniques to provide a deeper understanding. This study presents photoacoustic spectral analysis improved by machine learning as a promising non-invasive diagnostic method, focusing on exploring collagen as a salient biomarker. Murine model experiments revealed more profound associations of collagen with other cancer components than in normal tissues. Moreover, an optimal set of feature wavelengths was identified by a genetic algorithm for enhanced diagnostic performance, among which 75% were from collagen-dominated absorption wavebands. Using optimal spectra, the diagnostic algorithm achieved 72% accuracy, 66% sensitivity, and 78% specificity, surpassing full-range spectra by 6%, 4%, and 8%, respectively. The proposed photoacoustic methods examine the feasibility of offering valuable biochemical insights into existing techniques, showing great potential for early-stage cancer detection. △ Less

Submitted 10 October, 2024; originally announced October 2024.

arXiv:2410.07576 [pdf]

Simplified radar architecture based on information metasurface

Authors: Si Ran Wang, Zhan Ye Chen, Shao Nan Chen, Jun Yan Dai, Jun Wei Zhang, Zhen Jie Qi, Li Jie Wu, Meng Ke Sun, Qun Yan Zhou, Hui Dong Li, Zhang Jie Luo, Qiang Cheng, Tie Jun Cui

Abstract: Modern radar typically employs a chain architecture that consists of radio-frequency (RF) and intermediate frequency (IF) units, baseband digital signal processor, and information display. However, this architecture often results in high costs, significant hardware demands, and integration challenges. Here we propose a simplified radar architecture based on space-time-coding (STC) information meta… ▽ More Modern radar typically employs a chain architecture that consists of radio-frequency (RF) and intermediate frequency (IF) units, baseband digital signal processor, and information display. However, this architecture often results in high costs, significant hardware demands, and integration challenges. Here we propose a simplified radar architecture based on space-time-coding (STC) information metasurfaces. With their powerful capabilities to generate multiple harmonic frequencies and customize their phases, the STC metasurfaces play a key role in chirp signal generation, transmission, and echo reception. Remarkably, the receiving STC metasurface can implement dechirp processing directly on the RF level and realize the digital information outputs, which are beneficial to lower the hardware requirement at the receiving end while potentially shortening the time needed for conventional digital processing. As a proof of concept, the proposed metasurface radar is tested in a series of experiments for target detection and range/speed measurement, yielding results comparable to those obtained by conventional methods. This study provides valuable inspiration for a new radar system paradigm to combine the RF front ends and signal processors on the information metasurface platform that offers essential functionalities while significantly reducing the system complexity and cost. △ Less

Submitted 9 October, 2024; originally announced October 2024.

Comments: 25 pages, 10 figures

arXiv:2410.06115 [pdf, other]

A physics-based perspective for understanding and utilizing spatial resources of wireless channels

Authors: Hui Xu, Jun Wei Wu, Zhen Jie Qi, Hao Tian Wu, Rui Wen Shao, Qiang Cheng, Jieao Zhu, Linglong Dai, Tie Jun Cui

Abstract: To satisfy the increasing demands for transmission rates of wireless communications, it is necessary to use spatial resources of electromagnetic (EM) waves. In this context, EM information theory (EIT) has become a hot topic by integrating the theoretical framework of deterministic mathematics and stochastic statistics to explore the transmission mechanisms of continuous EM waves. However, the pre… ▽ More To satisfy the increasing demands for transmission rates of wireless communications, it is necessary to use spatial resources of electromagnetic (EM) waves. In this context, EM information theory (EIT) has become a hot topic by integrating the theoretical framework of deterministic mathematics and stochastic statistics to explore the transmission mechanisms of continuous EM waves. However, the previous studies were primarily focused on frame analysis, with limited exploration of practical applications and a comprehensive understanding of its essential physical characteristics. In this paper, we present a three-dimensional (3-D) line-of-sight channel capacity formula that captures the vector EM physics and accommodates both near- and far-field scenes. Based on the rigorous mathematical equation and the physical mechanism of fast multipole expansion, a channel model is established, and the finite angular spectral bandwidth feature of scattered waves is revealed. To adapt to the feature of the channel, an optimization problem is formulated for determining the mode currents on the transmitter, aiming to obtain the optimal design of the precoder and combiner. We make comprehensive analyses to investigate the relationship among the spatial degree of freedom, noise, and transmitted power, thereby establishing a rigorous upper bound of channel capacity. A series of simulations are conducted to validate the theoretical model and numerical method. This work offers a novel perspective and methodology for understanding and leveraging EIT, and provides a theoretical foundation for the design and optimization of future wireless communications. △ Less

Submitted 8 October, 2024; originally announced October 2024.

Comments: 31pages, 8 figures

arXiv:2410.05541 [pdf]

Dilated space-and-wavelength selective crosspoint optical switch

Authors: Ziyao Zhang, Minjia Chen, Rui Ma, Bohao Sun, Adrian Wonfor, Richard Penty, Qixiang Cheng

Abstract: Photonic integrated switches that are both space and wavelength selective are a highly promising technology for data-intensive applications as they benefit from multi-dimensional manipulation of optical signals. However, scaling these switches normally poses stringent challenges such as increased fabrication complexity and control difficulties, due to the growing number of switching elements. In t… ▽ More Photonic integrated switches that are both space and wavelength selective are a highly promising technology for data-intensive applications as they benefit from multi-dimensional manipulation of optical signals. However, scaling these switches normally poses stringent challenges such as increased fabrication complexity and control difficulties, due to the growing number of switching elements. In this work, we propose a novel dilated crosspoint topology, which efficiently handles both space and wavelength selective switching, while reducing the required switching element count by an order of magnitude compared to reported designs. To the best of our knowledge, our design requires the fewest switching elements for an equivalent routing paths number and it fully cancels the first-order in-band crosstalk. We demonstrate such an ultra-compact space-and-wavelength-selective switch (SWSS) at a scale of 4{\times}4{\times}4λ on the silicon-on-insulator (SOI) platform. Experimental results reveal that the switch achieves an insertion loss ranging from 2.3 dB to 8.6 dB and crosstalk levels in between -35.3 dB and -59.7 dB. The add-drop microring-resonators (MRRs) are equipped with micro-heaters, exhibiting a rise and fall time of 46 μs and 0.33 μs, respectively. These performance characteristics highlight the switch's ultra-low element count and crosstalk with low insertion loss, making it a promising candidate for advanced data center applications. △ Less

Submitted 7 October, 2024; originally announced October 2024.

arXiv:2410.03404 [pdf]

Photoacoustic tracking of photo-magnetically powered nanoparticles for cancer therapy

Authors: Jiayan Li, Chang Xu, Yingna Chen, Junmei Cao, Wanli Ye, Yu Cheng, Qian Cheng

Abstract: The in vivo propulsion and monitoring of nanoparticles (NPs) have received tremendous achievements in the past decade. Developing functional NPs that can be efficiently manipulated inside the human body with a non-invasive tracking modality is critical to clinical translation. This study synthesized a photo-magnetically powered nanoparticle (PMN) with a Fe3O4 core and gold spiky surface. The Au-na… ▽ More The in vivo propulsion and monitoring of nanoparticles (NPs) have received tremendous achievements in the past decade. Developing functional NPs that can be efficiently manipulated inside the human body with a non-invasive tracking modality is critical to clinical translation. This study synthesized a photo-magnetically powered nanoparticle (PMN) with a Fe3O4 core and gold spiky surface. The Au-nanotips ensure PMNs have a strong light absorption in the second near-infrared (NIR) window and produce outstanding photoacoustic signals. The Bio-transmission electron microscopy and simulation results prove that the assembly of PMNs under a magnetic field further enhances the photothermal conversion in cells, contributing to the reduction of ambient viscosity. Photoacoustic imaging (PAI) realized real-time monitoring of PMN movements and revealed that laser plus magnetic coupling couldimprove intratumoral distribution and retention. The proposed methods exhibit excellent potential for the clinical research of cancer nanotherapies. △ Less

Submitted 4 October, 2024; originally announced October 2024.

arXiv:2410.03324 [pdf]

Longitudinal photoacoustic monitoring of collagen evolution modulated by cancer-associated fibroblasts: simulation and experiment studies

Authors: Jiayan Li, Lu Bai, Junmei Cao, Wenxiang Zhi, Qian Cheng

Abstract: Noninvasive in vivo detection of collagen facilitates the investigation of mechanisms by which cancer-associated fibroblast (CAF) regulates the extracellular matrix. This study explored the feasibility of photoacoustic spectrum analysis (PASA) in identifying longitudinal changes of collagen modulated by CAFs using simulations and experiment studies. Optical and acoustic simulations in tissues were… ▽ More Noninvasive in vivo detection of collagen facilitates the investigation of mechanisms by which cancer-associated fibroblast (CAF) regulates the extracellular matrix. This study explored the feasibility of photoacoustic spectrum analysis (PASA) in identifying longitudinal changes of collagen modulated by CAFs using simulations and experiment studies. Optical and acoustic simulations in tissues were performed based on the histological slides of maximum cross-sections of murine malignancies to verify the effectiveness of photoacoustic (PA) detection system and the parameter "relative area of power spectrum density (APSD)". Experiments were conducted on three groups of mouse models with incremental ratios of CAFs and breast cancer cells at 3 continuous time points. Results discovered that the system configuration and APSD were capable of reflecting the evolution of collagen during cancer growth. Furthermore, cancers receiving a high dose of CAFs exhibited a suppressed collagen level. The presented methods show great potential for clinical translation of PASA in the field of cancer therapies targeting CAFs. △ Less

Submitted 4 October, 2024; originally announced October 2024.

Comments: 4 pages, 5 figures,conference

arXiv:2410.00592 [pdf]

Ultra-low-crosstalk Silicon Switches Driven Thermally and Electrically

Authors: Peng Bao, Chunhui Yao, Chenxi Tan, Alan Yilun Yuan, Minjia Chen, Seb J. Savory, Richard Penty, Qixiang Cheng

Abstract: Silicon photonic switches are widely considered as a cost-effective solution for addressing the ever-growing data traffic in datacenter networks, as they offer unique advantages such as low power consumption, low latency, small footprint and high bandwidth. Despite extensive research efforts, crosstalk in large-scale photonic circuits still poses a threat to the signal integrity. In this paper, we… ▽ More Silicon photonic switches are widely considered as a cost-effective solution for addressing the ever-growing data traffic in datacenter networks, as they offer unique advantages such as low power consumption, low latency, small footprint and high bandwidth. Despite extensive research efforts, crosstalk in large-scale photonic circuits still poses a threat to the signal integrity. In this paper, we present two designs of silicon Mach-Zehnder Interferometer (MZI) switches achieving ultra-low-crosstalk, driven thermally and electrically. Each switch fabric is optimized at both the device and circuit level to suppress crosstalk and reduce system complexity. Notably, for the first time to the best of our knowledge, we harness the inherent self-heating effect in a carrier-injection-based MZI switch to create a pair of phase shifters that offer arbitrary phase differences. Such a pair of phase shifters induces matched insertion loss at each arm, thus minimizing crosstalk. Experimentally, an ultra-low crosstalk ratio below -40 dB is demonstrated for both thermo-optic (T-O) and electro-optic (E-O) switches. The T-O switch exhibits an on-chip loss of less than 5 dB with a switching time of 500 microseconds, whereas the E-O switch achieves an on-chip loss as low as 8.5 dB with a switching time of under 100 ns. In addition, data transmission of a 50 Gb/s on-off keying signal is demonstrated with high fidelity on the E-O switch, showing the great potential of the proposed switch designs. △ Less

Submitted 1 October, 2024; originally announced October 2024.

Comments: 12 pages, 5 figures

arXiv:2409.17420 [pdf, other]

VibraForge: A Scalable Prototyping Toolkit For Creating Spatialized Vibrotactile Feedback Systems

Authors: Bingjian Huang, Siyi Ren, Yuewen Luo, Qilong Cheng, Hanfeng Cai, Yeqi Sang, Mauricio Sousa, Paul H. Dietz, Daniel Wigdor

Abstract: Spatialized vibrotactile feedback systems deliver tactile information by placing multiple vibrotactile actuators on the body. As increasing numbers of actuators are required to adequately convey information in complicated applications, haptic designers find it difficult to create such systems due to limited scalability of existing toolkits. We propose VibraForge, an open-source vibrotactile toolki… ▽ More Spatialized vibrotactile feedback systems deliver tactile information by placing multiple vibrotactile actuators on the body. As increasing numbers of actuators are required to adequately convey information in complicated applications, haptic designers find it difficult to create such systems due to limited scalability of existing toolkits. We propose VibraForge, an open-source vibrotactile toolkit that supports up to 128 vibrotactile actuators. Each actuator is encapsulated within a self-contained vibration unit and driven by its own microcontroller. By leveraging a chain-connection method, each unit receives independent vibration commands from a control unit, with fine-grained control over intensity and frequency. We also designed a GUI Editor to expedite the authoring of spatial vibrotactile patterns. Technical evaluations show that vibration units reliably reproduce audio waveforms with low-latency and high-bandwidth data communication. Case studies of phonemic tactile display, virtual reality fitness training, and drone teleoperation demonstrate the potential usage of VibraForge within different domains. △ Less

Submitted 25 September, 2024; originally announced September 2024.

arXiv:2408.16859 [pdf, other]

Comparative Analysis of Transfer Learning Models for Breast Cancer Classification

Authors: Sania Eskandari, Ali Eslamian, Qiang Cheng

Abstract: The classification of histopathological images is crucial for the early and precise detection of breast cancer. This study investigates the efficiency of deep learning models in distinguishing between Invasive Ductal Carcinoma (IDC) and non-IDC in histopathology slides. We conducted a thorough comparison examination of eight sophisticated models: ResNet-50, DenseNet-121, ResNeXt-50, Vision Transfo… ▽ More The classification of histopathological images is crucial for the early and precise detection of breast cancer. This study investigates the efficiency of deep learning models in distinguishing between Invasive Ductal Carcinoma (IDC) and non-IDC in histopathology slides. We conducted a thorough comparison examination of eight sophisticated models: ResNet-50, DenseNet-121, ResNeXt-50, Vision Transformer (ViT), GoogLeNet (Inception v3), EfficientNet, MobileNet, and SqueezeNet. This analysis was carried out using a large dataset of 277,524 image patches. Our research makes a substantial contribution to the field by offering a comprehensive assessment of the performance of each model. We particularly highlight the exceptional efficacy of attention-based mechanisms in the ViT model, which achieved a remarkable validation accuracy of 93\%, surpassing conventional convolutional networks. This study highlights the promise of advanced machine learning approaches in clinical settings, offering improved precision as well as efficiency in breast cancer diagnosis. △ Less

Submitted 29 August, 2024; originally announced August 2024.

arXiv:2408.15534 [pdf, other]

Computing optimal partition problems via Lagrange multiplier approach

Authors: Qing Cheng, Jing Guo, Dong Wang

Abstract: In this paper, we consider numerical approximations for the optimal partition problem using Lagrange multipliers. By rewriting it into constrained gradient flows, three and four steps numerical schemes based on the Lagrange multiplier approach \cite{ChSh22,ChSh_II22} are proposed to solve the constrained gradient system. Numerical schemes proposed for the constrained gradient flows satisfy the nic… ▽ More In this paper, we consider numerical approximations for the optimal partition problem using Lagrange multipliers. By rewriting it into constrained gradient flows, three and four steps numerical schemes based on the Lagrange multiplier approach \cite{ChSh22,ChSh_II22} are proposed to solve the constrained gradient system. Numerical schemes proposed for the constrained gradient flows satisfy the nice properties of orthogonality-preserving, norm-preserving, positivity-preserving and energy dissipating. The proposed schemes are very efficient in which only linear Poisson equations are solved at each time step. Extensive numerical results in 2D and 3D for optimal partition problem are presented to validate the effectiveness and accuracy of the proposed numerical schemes. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: 25 pages, 15 figures

arXiv:2408.08182 [pdf, other]

Your Turn: At Home Turning Angle Estimation for Parkinson's Disease Severity Assessment

Authors: Qiushuo Cheng, Catherine Morgan, Arindam Sikdar, Alessandro Masullo, Alan Whone, Majid Mirmehdi

Abstract: People with Parkinson's Disease (PD) often experience progressively worsening gait, including changes in how they turn around, as the disease progresses. Existing clinical rating tools are not capable of capturing hour-by-hour variations of PD symptoms, as they are confined to brief assessments within clinic settings. Measuring gait turning angles continuously and passively is a component step tow… ▽ More People with Parkinson's Disease (PD) often experience progressively worsening gait, including changes in how they turn around, as the disease progresses. Existing clinical rating tools are not capable of capturing hour-by-hour variations of PD symptoms, as they are confined to brief assessments within clinic settings. Measuring gait turning angles continuously and passively is a component step towards using gait characteristics as sensitive indicators of disease progression in PD. This paper presents a deep learning-based approach to automatically quantify turning angles by extracting 3D skeletons from videos and calculating the rotation of hip and knee joints. We utilise state-of-the-art human pose estimation models, Fastpose and Strided Transformer, on a total of 1386 turning video clips from 24 subjects (12 people with PD and 12 healthy control volunteers), trimmed from a PD dataset of unscripted free-living videos in a home-like setting (Turn-REMAP). We also curate a turning video dataset, Turn-H3.6M, from the public Human3.6M human pose benchmark with 3D ground truth, to further validate our method. Previous gait research has primarily taken place in clinics or laboratories evaluating scripted gait outcomes, but this work focuses on free-living home settings where complexities exist, such as baggy clothing and poor lighting. Due to difficulties in obtaining accurate ground truth data in a free-living setting, we quantise the angle into the nearest bin $45^\circ$ based on the manual labelling of expert clinicians. Our method achieves a turning calculation accuracy of 41.6%, a Mean Absolute Error (MAE) of 34.7°, and a weighted precision WPrec of 68.3% for Turn-REMAP. This is the first work to explore the use of single monocular camera data to quantify turns by PD patients in a home setting. △ Less

Submitted 24 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

arXiv:2408.04896 [pdf, other]

doi 10.3847/1538-4365/ad5953

A Method of Rapidly Deriving Late-type Contact Binary Parameters and Its Application in the Catalina Sky Survey

Authors: JinLiang Wang, Xu Ding, JiaJia Li, JianPing Xiong, Qiyuan Cheng, KaiFan Ji

Abstract: With the continuous development of large optical surveys, a large number of light curves of late-type contact binary systems (CBs) have been released. Deriving parameters for CBs using the the WD program and the PHOEBE program poses a challenge. Therefore, this study developed a method for rapidly deriving light curves based on the Neural Networks (NN) model combined with the Hamiltonian Monte Car… ▽ More With the continuous development of large optical surveys, a large number of light curves of late-type contact binary systems (CBs) have been released. Deriving parameters for CBs using the the WD program and the PHOEBE program poses a challenge. Therefore, this study developed a method for rapidly deriving light curves based on the Neural Networks (NN) model combined with the Hamiltonian Monte Carlo (HMC) algorithm (NNHMC). The neural network was employed to establish the mapping relationship between the parameters and the pregenerated light curves by the PHOEBE program, and the HMC algorithm was used to obtain the posterior distribution of the parameters. The NNHMC method was applied to a large contact binary sample from the Catalina Sky Survey, and a total of 19,104 late-type contact binary parameters were derived. Among them, 5172 have an inclination greater than 70 deg and a temperature difference less than 400 K. The obtained results were compared with the previous studies for 30 CBs, and there was an essentially consistent goodness-of-fit (R2) distribution between them. The NNHMC method possesses the capability to simultaneously derive parameters for a vast number of targets. Furthermore, it can provide an extremely efficient tool for rapid derivation of parameters in future sky surveys involving large samples of CBs. △ Less

Submitted 9 August, 2024; originally announced August 2024.

Journal ref: The Astrophysical Journal Supplement Series. Published 2024 July 31

arXiv:2408.01901 [pdf, ps, other]

doi 10.1103/PhysRevB.110.014518

Field-free Josephson diode effect in altermagnet/normal metal/altermagnet junctions

Authors: Qiang Cheng, Yue Mao, Qing-Feng Sun

Abstract: The field-free and highly efficient diodes with the nonreciprocity of supercurrent are believed to be the core block of the superconducting computing devices without dissipation. In this paper, we propose a Josephson diode based upon altermagnets with the vanishing net macroscopic magnetization. The nonreciprocity of supercurrent can be realized without applying any external magnetic field or ferr… ▽ More The field-free and highly efficient diodes with the nonreciprocity of supercurrent are believed to be the core block of the superconducting computing devices without dissipation. In this paper, we propose a Josephson diode based upon altermagnets with the vanishing net macroscopic magnetization. The nonreciprocity of supercurrent can be realized without applying any external magnetic field or ferromagnetic exchange field, which can avoid the magnetic cross-talk between the basic elements of the devices. The high efficiency exceeding $40\%$ can be obtained and the efficiency shows the high stability when the structure parameters are changed. The diode efficiency is antisymmetric about the relative orientation angle of the superconducting leads, so that its sign can easily be inverted by adjusting the relative orientation angle. The symmetries satisfied by the current-phase difference relations and the diode efficiency are analyzed by considering the transformations of the junctions under the time-reversal, the spin-rotation and the mirror reflection operations. The high efficiency and the high stability of the Josephson diode effect in our junctions provide the possibility for the design of the field-free dissipationless diode devices. △ Less

Submitted 3 August, 2024; originally announced August 2024.

Comments: 13pages,6figures

Journal ref: Physical Review B 110, 014518 (2024)

arXiv:2407.21371 [pdf, other]

Einstein Probe discovery of a super-soft outburst from CXOU J005245.0-722844: a rare BeWD binary in the Small Magellanic Cloud

Authors: A. Marino, H. Yang, F. Coti Zelati, N. Rea, S. Guillot, G. K. Jaisawal, C. Maitra, F. Haberl, E. Kuulkers, W. Yuan, H. Feng, L. Tao, C. Jin, H. Sun, W. Zhang, W. Chen, E. P. J. van den Heuvel, R. Soria, B. Zhang, S. -S. Weng, L. Ji, G. B. Zhang, X. Pan, Z. Lv, C. Zhang , et al. (10 additional authors not shown)

Abstract: On May 27 2024, the Wide-field X-ray Telescope onboard the Einstein Probe (EP) mission detected enhanced X-ray emission from a new transient source in the Small Magellanic Cloud (SMC) during its commissioning phase. Prompt follow-up with the EP Follow-up X-ray Telescope, the Swift X-ray Telescope and Nicer have revealed a very soft, thermally emitting source (kT$\sim$0.1 keV at the outburst peak)… ▽ More On May 27 2024, the Wide-field X-ray Telescope onboard the Einstein Probe (EP) mission detected enhanced X-ray emission from a new transient source in the Small Magellanic Cloud (SMC) during its commissioning phase. Prompt follow-up with the EP Follow-up X-ray Telescope, the Swift X-ray Telescope and Nicer have revealed a very soft, thermally emitting source (kT$\sim$0.1 keV at the outburst peak) with an X-ray luminosity of L$\sim$4$\times$10$^{38}$ erg s$^{-1}$, coincident with CXOU J005245.0-722844. This super-soft outburst faded very quickly in a week time. Several emission lines and absorption edges were present in the X-ray spectrum, such as the Oxygen (0.57 keV) and Neon (0.92 keV) He-like emission lines, and deep Nitrogen (0.67 keV) and Oxygen (0.87 keV) absorption edges. The X-ray emission resembles typical nova outbursts from an accreting white dwarf (WD) in a binary system, despite the X-ray source being historically associated with an O9-B0e massive star exhibiting a 17.55 days periodicity in the optical band. The discovery of this super-soft outburst nails down CXOU J005245.0-722844 as a BeWD X-ray binary: an elusive evolutionary stage where two main-sequence massive stars have undergone a common envelope phase and experienced at least two episodes of mass transfer. In addition, the very short duration of the outburst and the presence of Ne features hint at a rather massive, i.e., close to the Chandrasekhar limit, Ne-O WD in the system. △ Less

Submitted 31 July, 2024; originally announced July 2024.

Comments: 9 pages, 5 figures; submitted to ApJL

arXiv:2407.18626 [pdf, other]

Every Part Matters: Integrity Verification of Scientific Figures Based on Multimodal Large Language Models

Authors: Xiang Shi, Jiawei Liu, Yinpeng Liu, Qikai Cheng, Wei Lu

Abstract: This paper tackles a key issue in the interpretation of scientific figures: the fine-grained alignment of text and figures. It advances beyond prior research that primarily dealt with straightforward, data-driven visualizations such as bar and pie charts and only offered a basic understanding of diagrams through captioning and classification. We introduce a novel task, Figure Integrity Verificatio… ▽ More This paper tackles a key issue in the interpretation of scientific figures: the fine-grained alignment of text and figures. It advances beyond prior research that primarily dealt with straightforward, data-driven visualizations such as bar and pie charts and only offered a basic understanding of diagrams through captioning and classification. We introduce a novel task, Figure Integrity Verification, designed to evaluate the precision of technologies in aligning textual knowledge with visual elements in scientific figures. To support this, we develop a semi-automated method for constructing a large-scale dataset, Figure-seg, specifically designed for this task. Additionally, we propose an innovative framework, Every Part Matters (EPM), which leverages Multimodal Large Language Models (MLLMs) to not only incrementally improve the alignment and verification of text-figure integrity but also enhance integrity through analogical reasoning. Our comprehensive experiments show that these innovations substantially improve upon existing methods, allowing for more precise and thorough analysis of complex scientific figures. This progress not only enhances our understanding of multimodal technologies but also stimulates further research and practical applications across fields requiring the accurate interpretation of complex visual data. △ Less

Submitted 26 July, 2024; originally announced July 2024.

Comments: 28 pages, 11 figures, under review

arXiv:2407.18172 [pdf]

Chip-scale sensor for spectroscopic metrology

Authors: Chunhui Yao, Wanlu Zhang, Peng Bao, Jie Ma, Wei Zhuo, Minjia Chen, Zhitian Shi, Jingwen Zhou, Yuxiao Ye, Liang Ming, Ting Yan, Richard Penty, Qixiang Cheng

Abstract: Miniaturized spectrometers hold great promise for in situ, in vitro, and even in vivo sensing applications. However, their size reduction imposes vital performance constraints in meeting the rigorous demands of spectroscopy, including fine resolution, high accuracy, and ultra-wide observation window. The prevailing view in the community holds that miniaturized spectrometers are most suitable for t… ▽ More Miniaturized spectrometers hold great promise for in situ, in vitro, and even in vivo sensing applications. However, their size reduction imposes vital performance constraints in meeting the rigorous demands of spectroscopy, including fine resolution, high accuracy, and ultra-wide observation window. The prevailing view in the community holds that miniaturized spectrometers are most suitable for the coarse identification of signature peaks. In this paper, we present an integrated reconstructive spectrometer that enables near-infrared (NIR) spectroscopic metrology, and demonstrate a fully packaged sensor with auxiliary electronics. Such a sensor operates over a 520 nm bandwidth together with a resolution of less than 8 pm, which translates into a record-breaking bandwidth-to-resolution ratio of over 65,000. The classification of different types of solid substances and the concentration measurement of aqueous and organic solutions are performed, all achieving approximately 100% accuracy. Notably, the detection limit of our sensor matches that of the commercial benchtop counterparts, which is as low as 0.1% (i.e. 100 mg/dL) for identifying the concentration of glucose solution. △ Less

Submitted 14 September, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.13757 [pdf, other]

Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models

Authors: Zhuo Chen, Jiawei Liu, Haotan Liu, Qikai Cheng, Fan Zhang, Wei Lu, Xiaozhong Liu

Abstract: Retrieval-Augmented Generation (RAG) is applied to solve hallucination problems and real-time constraints of large language models, but it also induces vulnerabilities against retrieval corruption attacks. Existing research mainly explores the unreliability of RAG in white-box and closed-domain QA tasks. In this paper, we aim to reveal the vulnerabilities of Retrieval-Enhanced Generative (RAG) mod… ▽ More Retrieval-Augmented Generation (RAG) is applied to solve hallucination problems and real-time constraints of large language models, but it also induces vulnerabilities against retrieval corruption attacks. Existing research mainly explores the unreliability of RAG in white-box and closed-domain QA tasks. In this paper, we aim to reveal the vulnerabilities of Retrieval-Enhanced Generative (RAG) models when faced with black-box attacks for opinion manipulation. We explore the impact of such attacks on user cognition and decision-making, providing new insight to enhance the reliability and security of RAG models. We manipulate the ranking results of the retrieval model in RAG with instruction and use these results as data to train a surrogate model. By employing adversarial retrieval attack methods to the surrogate model, black-box transfer attacks on RAG are further realized. Experiments conducted on opinion datasets across multiple topics show that the proposed attack strategy can significantly alter the opinion polarity of the content generated by RAG. This demonstrates the model's vulnerability and, more importantly, reveals the potential negative impact on user cognition and decision-making, making it easier to mislead users into accepting incorrect or biased information. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: 10 pages, 3 figures, under review

arXiv:2407.12504 [pdf, other]

Case2Code: Learning Inductive Reasoning with Synthetic Data

Authors: Yunfan Shao, Linyang Li, Yichuan Ma, Peiji Li, Demin Song, Qinyuan Cheng, Shimin Li, Xiaonan Li, Pengyu Wang, Qipeng Guo, Hang Yan, Xipeng Qiu, Xuanjing Huang, Dahua Lin

Abstract: Complex reasoning is an impressive ability shown by large language models (LLMs). Most LLMs are skilled in deductive reasoning, such as chain-of-thought prompting or iterative tool-using to solve challenging tasks step-by-step. In this paper, we hope to focus on evaluating and teaching LLMs to conduct inductive reasoning, that is, LLMs are supposed to infer underlying rules by observing examples o… ▽ More Complex reasoning is an impressive ability shown by large language models (LLMs). Most LLMs are skilled in deductive reasoning, such as chain-of-thought prompting or iterative tool-using to solve challenging tasks step-by-step. In this paper, we hope to focus on evaluating and teaching LLMs to conduct inductive reasoning, that is, LLMs are supposed to infer underlying rules by observing examples or sequential transformations. However, collecting large-scale and diverse human-generated inductive data is challenging. We focus on data synthesis in the code domain and propose a \textbf{Case2Code} task by exploiting the expressiveness and correctness of programs. Specifically, we collect a diverse set of executable programs, synthesize input-output transformations for each program, and force LLMs to infer the underlying code implementations based on the synthetic I/O cases. We first evaluate representative LLMs on the synthesized Case2Code task and demonstrate that the Case-to-code induction is challenging for LLMs. Then, we synthesize large-scale Case2Code training samples to train LLMs to perform inductive reasoning. Experimental results show that such induction training benefits not only in distribution Case2Code performance but also enhances various coding abilities of trained LLMs, demonstrating the great potential of learning inductive reasoning via synthetic data. △ Less

Submitted 17 July, 2024; originally announced July 2024.

arXiv:2407.12105 [pdf, other]

AeroHaptix: A Wearable Vibrotactile Feedback System for Enhancing Collision Avoidance in UAV Teleoperation

Authors: Bingjian Huang, Zhecheng Wang, Qilong Cheng, Siyi Ren, Hanfeng Cai, Antonio Alvarez Valdivia, Karthik Mahadevan, Daniel Wigdor

Abstract: Haptic feedback enhances collision avoidance by providing directional obstacle information to operators in unmanned aerial vehicle (UAV) teleoperation. However, such feedback is often rendered via haptic joysticks, which are unfamiliar to UAV operators and limited to single-directional force feedback. Additionally, the direct coupling of the input device and the feedback method diminishes the oper… ▽ More Haptic feedback enhances collision avoidance by providing directional obstacle information to operators in unmanned aerial vehicle (UAV) teleoperation. However, such feedback is often rendered via haptic joysticks, which are unfamiliar to UAV operators and limited to single-directional force feedback. Additionally, the direct coupling of the input device and the feedback method diminishes the operators' control authority and causes oscillatory movements. To overcome these limitations, we propose AeroHaptix, a wearable haptic feedback system that uses high-resolution vibrations to communicate multiple obstacle directions simultaneously. The vibrotactile actuators' layout was optimized based on a perceptual study to eliminate perceptual biases and achieve uniform spatial coverage. A novel rendering algorithm, MultiCBF, was adapted from control barrier functions to support multi-directional feedback. System evaluation showed that AeroHaptix effectively reduced collisions in complex environment, and operators reported significantly lower physical workload, improved situational awareness, and increased control authority. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.05406 [pdf, ps, other]

Complete minimal hypersurfaces in a hyperbolic space $H^{4}(-1)$

Authors: Qing-Ming Cheng, Yejuan Peng

Abstract: In this paper, we study $n$-dimensional complete minimal hypersurfaces in a hyperbolic space $H^{n+1}(-1)$ of constant curvature $-1$. We prove that a $3$-dimensional complete minimal hypersurface with constant scalar curvature in $H^{4}(-1)$ satisfies $S\leq \frac{21}{29}$ by making use of the Generalized Maximum Principle, where $S$ denotes the squared norm of the second fundamental form of the… ▽ More In this paper, we study $n$-dimensional complete minimal hypersurfaces in a hyperbolic space $H^{n+1}(-1)$ of constant curvature $-1$. We prove that a $3$-dimensional complete minimal hypersurface with constant scalar curvature in $H^{4}(-1)$ satisfies $S\leq \frac{21}{29}$ by making use of the Generalized Maximum Principle, where $S$ denotes the squared norm of the second fundamental form of the hypersurface. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: 17pages

arXiv:2407.00600 [pdf, other]

GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing

Authors: Yisong Xiao, Aishan Liu, QianJia Cheng, Zhenfei Yin, Siyuan Liang, Jiapeng Li, Jing Shao, Xianglong Liu, Dacheng Tao

Abstract: Large Vision-Language Models (LVLMs) have been widely adopted in various applications; however, they exhibit significant gender biases. Existing benchmarks primarily evaluate gender bias at the demographic group level, neglecting individual fairness, which emphasizes equal treatment of similar individuals. This research gap limits the detection of discriminatory behaviors, as individual fairness o… ▽ More Large Vision-Language Models (LVLMs) have been widely adopted in various applications; however, they exhibit significant gender biases. Existing benchmarks primarily evaluate gender bias at the demographic group level, neglecting individual fairness, which emphasizes equal treatment of similar individuals. This research gap limits the detection of discriminatory behaviors, as individual fairness offers a more granular examination of biases that group fairness may overlook. For the first time, this paper introduces the GenderBias-\emph{VL} benchmark to evaluate occupation-related gender bias in LVLMs using counterfactual visual questions under individual fairness criteria. To construct this benchmark, we first utilize text-to-image diffusion models to generate occupation images and their gender counterfactuals. Subsequently, we generate corresponding textual occupation options by identifying stereotyped occupation pairs with high semantic similarity but opposite gender proportions in real-world statistics. This method enables the creation of large-scale visual question counterfactuals to expose biases in LVLMs, applicable in both multimodal and unimodal contexts through modifying gender attributes in specific modalities. Overall, our GenderBias-\emph{VL} benchmark comprises 34,581 visual question counterfactual pairs, covering 177 occupations. Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs (\eg, LLaVA) and state-of-the-art commercial APIs, including GPT-4o and Gemini-Pro. Our findings reveal widespread gender biases in existing LVLMs. Our benchmark offers: (1) a comprehensive dataset for occupation-related gender bias evaluation; (2) an up-to-date leaderboard on LVLM biases; and (3) a nuanced understanding of the biases presented by these models. \footnote{The dataset and code are available at the \href{https://genderbiasvl.github.io/}{website}.} △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 9 pages, 4 figures

arXiv:2406.15720 [pdf, other]

Scaling Laws for Fact Memorization of Large Language Models

Authors: Xingyu Lu, Xiaonan Li, Qinyuan Cheng, Kai Ding, Xuanjing Huang, Xipeng Qiu

Abstract: Fact knowledge memorization is crucial for Large Language Models (LLM) to generate factual and reliable responses. However, the behaviors of LLM fact memorization remain under-explored. In this paper, we analyze the scaling laws for LLM's fact knowledge and LLMs' behaviors of memorizing different types of facts. We find that LLMs' fact knowledge capacity has a linear and negative exponential law r… ▽ More Fact knowledge memorization is crucial for Large Language Models (LLM) to generate factual and reliable responses. However, the behaviors of LLM fact memorization remain under-explored. In this paper, we analyze the scaling laws for LLM's fact knowledge and LLMs' behaviors of memorizing different types of facts. We find that LLMs' fact knowledge capacity has a linear and negative exponential law relationship with model size and training epochs, respectively. Estimated by the built scaling law, memorizing the whole Wikidata's facts requires training an LLM with 1000B non-embed parameters for 100 epochs, suggesting that using LLMs to memorize all public facts is almost implausible for a general pre-training setting. Meanwhile, we find that LLMs can generalize on unseen fact knowledge and its scaling law is similar to general pre-training. Additionally, we analyze the compatibility and preference of LLMs' fact memorization. For compatibility, we find LLMs struggle with memorizing redundant facts in a unified way. Only when correlated facts have the same direction and structure, the LLM can compatibly memorize them. This shows the inefficiency of LLM memorization for redundant facts. For preference, the LLM pays more attention to memorizing more frequent and difficult facts, and the subsequent facts can overwrite prior facts' memorization, which significantly hinders low-frequency facts memorization. Our findings reveal the capacity and characteristics of LLMs' fact knowledge learning, which provide directions for LLMs' fact knowledge augmentation. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.15279 [pdf, other]

Cross-Modality Safety Alignment

Authors: Siyin Wang, Xingsong Ye, Qinyuan Cheng, Junwen Duan, Shimin Li, Jinlan Fu, Xipeng Qiu, Xuanjing Huang

Abstract: As Artificial General Intelligence (AGI) becomes increasingly integrated into various facets of human life, ensuring the safety and ethical alignment of such systems is paramount. Previous studies primarily focus on single-modality threats, which may not suffice given the integrated and complex nature of cross-modality interactions. We introduce a novel safety alignment challenge called Safe Input… ▽ More As Artificial General Intelligence (AGI) becomes increasingly integrated into various facets of human life, ensuring the safety and ethical alignment of such systems is paramount. Previous studies primarily focus on single-modality threats, which may not suffice given the integrated and complex nature of cross-modality interactions. We introduce a novel safety alignment challenge called Safe Inputs but Unsafe Output (SIUO) to evaluate cross-modality safety alignment. Specifically, it considers cases where single modalities are safe independently but could potentially lead to unsafe or unethical outputs when combined. To empirically investigate this problem, we developed the SIUO, a cross-modality benchmark encompassing 9 critical safety domains, such as self-harm, illegal activities, and privacy violations. Our findings reveal substantial safety vulnerabilities in both closed- and open-source LVLMs, such as GPT-4V and LLaVA, underscoring the inadequacy of current models to reliably interpret and respond to complex, real-world scenarios. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.14870 [pdf, other]

A new flow dynamic approach for Wasserstein gradient flows

Authors: Qing Cheng, Qianqian Liu, Wenbin Chen, Jie Shen

Abstract: We develop in this paper a new regularized flow dynamic approach to construct efficient numerical schemes for Wasserstein gradient flows in Lagrangian coordinates. Instead of approximating the Wasserstein distance which needs to solve constrained minimization problems, we reformulate the problem using the Benamou-Brenier's flow dynamic approach, leading to algorithms which only need to solve uncon… ▽ More We develop in this paper a new regularized flow dynamic approach to construct efficient numerical schemes for Wasserstein gradient flows in Lagrangian coordinates. Instead of approximating the Wasserstein distance which needs to solve constrained minimization problems, we reformulate the problem using the Benamou-Brenier's flow dynamic approach, leading to algorithms which only need to solve unconstrained minimization problem in $L^2$ distance. Our schemes automatically inherit some essential properties of Wasserstein gradient systems such as positivity-preserving, mass conservative and energy dissipation. We present ample numerical simulations of Porous-Medium equations, Keller-Segel equations and Aggregation equations to validate the accuracy and stability of the proposed schemes. Compared to numerical schemes in Eulerian coordinates, our new schemes can capture sharp interfaces for various Wasserstein gradient flows using relatively smaller number of unknowns. △ Less

Submitted 21 June, 2024; originally announced June 2024.

MSC Class: 65M06; 65M12; 35K65; 35A15

arXiv:2406.13990 [pdf, other]

Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model Evaluation

Authors: Qin Zhu, Qingyuan Cheng, Runyu Peng, Xiaonan Li, Tengxiao Liu, Ru Peng, Xipeng Qiu, Xuanjing Huang

Abstract: The training process of large language models (LLMs) often involves varying degrees of test data contamination. Although current LLMs are achieving increasingly better performance on various benchmarks, their performance in practical applications does not always match their benchmark results. Leakage of benchmarks can prevent the accurate assessment of LLMs' true performance. However, constructing… ▽ More The training process of large language models (LLMs) often involves varying degrees of test data contamination. Although current LLMs are achieving increasingly better performance on various benchmarks, their performance in practical applications does not always match their benchmark results. Leakage of benchmarks can prevent the accurate assessment of LLMs' true performance. However, constructing new benchmarks is costly, labor-intensive and still carries the risk of leakage. Therefore, in this paper, we ask the question, Can we reuse these leaked benchmarks for LLM evaluation? We propose Inference-Time Decontamination (ITD) to address this issue by detecting and rewriting leaked samples without altering their difficulties. ITD can mitigate performance inflation caused by memorizing leaked benchmarks. Our proof-of-concept experiments demonstrate that ITD reduces inflated accuracy by 22.9% on GSM8K and 19.0% on MMLU. On MMLU, using Inference-time Decontamination can lead to a decrease in the results of Phi3 and Mistral by 6.7% and 3.6% respectively. We hope that ITD can provide more truthful evaluation results for large language models. △ Less

Submitted 23 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.13007 [pdf, other]

NTIRE 2024 Challenge on Night Photography Rendering

Authors: Egor Ershov, Artyom Panshin, Oleg Karasev, Sergey Korchagin, Shepelev Lev, Alexandr Startsev, Daniil Vladimirov, Ekaterina Zaychenkova, Nikola Banić, Dmitrii Iarchuk, Maria Efimova, Radu Timofte, Arseniy Terekhin, Shuwei Yue, Yuyang Liu, Minchen Wei, Lu Xu, Chao Zhang, Yasi Wang, Furkan Kınlı, Doğa Yılmaz, Barış Özcan, Furkan Kıraç, Shuai Liu, Jingyuan Xiao , et al. (25 additional authors not shown)

Abstract: This paper presents a review of the NTIRE 2024 challenge on night photography rendering. The goal of the challenge was to find solutions that process raw camera images taken in nighttime conditions, and thereby produce a photo-quality output images in the standard RGB (sRGB) space. Unlike the previous year's competition, the challenge images were collected with a mobile phone and the speed of algo… ▽ More This paper presents a review of the NTIRE 2024 challenge on night photography rendering. The goal of the challenge was to find solutions that process raw camera images taken in nighttime conditions, and thereby produce a photo-quality output images in the standard RGB (sRGB) space. Unlike the previous year's competition, the challenge images were collected with a mobile phone and the speed of algorithms was also measured alongside the quality of their output. To evaluate the results, a sufficient number of viewers were asked to assess the visual quality of the proposed solutions, considering the subjective nature of the task. There were 2 nominations: quality and efficiency. Top 5 solutions in terms of output quality were sorted by evaluation time (see Fig. 1). The top ranking participants' solutions effectively represent the state-of-the-art in nighttime photography rendering. More results can be found at https://nightimaging.org. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 10 pages, 10 figures

arXiv:2406.12847 [pdf, other]

ChangeViT: Unleashing Plain Vision Transformers for Change Detection

Authors: Duowang Zhu, Xiaohu Huang, Haiyan Huang, Zhenfeng Shao, Qimin Cheng

Abstract: Change detection in remote sensing images is essential for tracking environmental changes on the Earth's surface. Despite the success of vision transformers (ViTs) as backbones in numerous computer vision applications, they remain underutilized in change detection, where convolutional neural networks (CNNs) continue to dominate due to their powerful feature extraction capabilities. In this paper,… ▽ More Change detection in remote sensing images is essential for tracking environmental changes on the Earth's surface. Despite the success of vision transformers (ViTs) as backbones in numerous computer vision applications, they remain underutilized in change detection, where convolutional neural networks (CNNs) continue to dominate due to their powerful feature extraction capabilities. In this paper, our study uncovers ViTs' unique advantage in discerning large-scale changes, a capability where CNNs fall short. Capitalizing on this insight, we introduce ChangeViT, a framework that adopts a plain ViT backbone to enhance the performance of large-scale changes. This framework is supplemented by a detail-capture module that generates detailed spatial features and a feature injector that efficiently integrates fine-grained spatial information into high-level semantic learning. The feature integration ensures that ChangeViT excels in both detecting large-scale changes and capturing fine-grained details, providing comprehensive change detection across diverse scales. Without bells and whistles, ChangeViT achieves state-of-the-art performance on three popular high-resolution datasets (i.e., LEVIR-CD, WHU-CD, and CLCD) and one low-resolution dataset (i.e., OSCD), which underscores the unleashed potential of plain ViTs for change detection. Furthermore, thorough quantitative and qualitative analyses validate the efficacy of the introduced modules, solidifying the effectiveness of our approach. The source code is available at https://github.com/zhuduowang/ChangeViT. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12534 [pdf, other]

Unified Active Retrieval for Retrieval Augmented Generation

Authors: Qinyuan Cheng, Xiaonan Li, Shimin Li, Qin Zhu, Zhangyue Yin, Yunfan Shao, Linyang Li, Tianxiang Sun, Hang Yan, Xipeng Qiu

Abstract: In Retrieval-Augmented Generation (RAG), retrieval is not always helpful and applying it to every instruction is sub-optimal. Therefore, determining whether to retrieve is crucial for RAG, which is usually referred to as Active Retrieval. However, existing active retrieval methods face two challenges: 1. They usually rely on a single criterion, which struggles with handling various types of instru… ▽ More In Retrieval-Augmented Generation (RAG), retrieval is not always helpful and applying it to every instruction is sub-optimal. Therefore, determining whether to retrieve is crucial for RAG, which is usually referred to as Active Retrieval. However, existing active retrieval methods face two challenges: 1. They usually rely on a single criterion, which struggles with handling various types of instructions. 2. They depend on specialized and highly differentiated procedures, and thus combining them makes the RAG system more complicated and leads to higher response latency. To address these challenges, we propose Unified Active Retrieval (UAR). UAR contains four orthogonal criteria and casts them into plug-and-play classification tasks, which achieves multifaceted retrieval timing judgements with negligible extra inference cost. We further introduce the Unified Active Retrieval Criteria (UAR-Criteria), designed to process diverse active retrieval scenarios through a standardized procedure. Experiments on four representative types of user instructions show that UAR significantly outperforms existing work on the retrieval timing judgement and the performance of downstream tasks, which shows the effectiveness of UAR and its helpfulness to downstream tasks. △ Less

Submitted 2 October, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

Comments: Accepted to Findings of EMNLP 2024, camera-ready version

arXiv:2406.11123 [pdf, other]

Embedded cylindrical and doughnut-shaped $λ$-hypersurfaces

Authors: Qing-Ming Cheng, Junqi Lai, Guoxin Wei

Abstract: In the paper, we construct, for $λ>0$, complete embedded and non-convex $λ$-hypersurfaces, which are diffeomorphic to a cylinder. Hence, one can not expect that $λ$-hypersurfaces share a common conclusion on the planar domain conjecture even if the planar domain conjecture of T. Ilmanen for self-shrinkers of mean curvature flow are solved by Brendle \cite{B} affirmatively. Furthermore, for a fixed… ▽ More In the paper, we construct, for $λ>0$, complete embedded and non-convex $λ$-hypersurfaces, which are diffeomorphic to a cylinder. Hence, one can not expect that $λ$-hypersurfaces share a common conclusion on the planar domain conjecture even if the planar domain conjecture of T. Ilmanen for self-shrinkers of mean curvature flow are solved by Brendle \cite{B} affirmatively. Furthermore, for a fixed $λ<0$ which may have small $|λ|$, we can construct two compact embedded $λ$-hypersurfaces which are diffeomorphic to $\mathbb{S}^{1}\times \mathbb{S}^{n-1}$, but they are not isometric to each other. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: Comments are welcome

arXiv:2406.05696 [pdf, other]

Two Power Allocation and Beamforming Strategies for Active IRS-aided Wireless Network via Machine Learning

Authors: Qiankun Cheng, Jiatong Bai, Baihua Shi, Wei Gao, Feng Shu

Abstract: This paper models an active intelligent reflecting surface (IRS) -assisted wireless communication network, which has the ability to adjust power between BS and IRS. We aim to maximize the signal-to-noise ratio of user by jointly designing power allocation (PA) factor, active IRS phase shift matrix, and beamforming vector of BS, subject to a total power constraint. To tackle this non-convex problem… ▽ More This paper models an active intelligent reflecting surface (IRS) -assisted wireless communication network, which has the ability to adjust power between BS and IRS. We aim to maximize the signal-to-noise ratio of user by jointly designing power allocation (PA) factor, active IRS phase shift matrix, and beamforming vector of BS, subject to a total power constraint. To tackle this non-convex problem, we solve this problem by alternately optimizing these variables. Firstly, the PA factor is designed via polynomial regression method. Next, BS beamforming vector and IRS phase shift matrix are obtained by Dinkelbach's transform and successive convex approximation methods. To reduce the high computational complexity of the above proposed algorithm, we maximize achievable rate (AR) and use closed-form fractional programming method to transform the original problem into an equivalent form. Then, we address this problem by iteratively optimizing auxiliary variables, BS and IRS beamformings. Simulation results show that the proposed algorithms can effectively improve the AR performance compared to fixed PA strategies, aided by passive IRS, and without IRS. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.04419 [pdf, other]

TSCMamba: Mamba Meets Multi-View Learning for Time Series Classification

Authors: Md Atik Ahamed, Qiang Cheng

Abstract: Time series classification (TSC) on multivariate time series is a critical problem. We propose a novel multi-view approach integrating frequency-domain and time-domain features to provide complementary contexts for TSC. Our method fuses continuous wavelet transform spectral features with temporal convolutional or multilayer perceptron features. We leverage the Mamba state space model for efficient… ▽ More Time series classification (TSC) on multivariate time series is a critical problem. We propose a novel multi-view approach integrating frequency-domain and time-domain features to provide complementary contexts for TSC. Our method fuses continuous wavelet transform spectral features with temporal convolutional or multilayer perceptron features. We leverage the Mamba state space model for efficient and scalable sequence modeling. We also introduce a novel tango scanning scheme to better model sequence relationships. Experiments on 10 standard benchmark datasets demonstrate our approach achieves an average 6.45% accuracy improvement over state-of-the-art TSC models. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.04145 [pdf, other]

Every Answer Matters: Evaluating Commonsense with Probabilistic Measures

Authors: Qi Cheng, Michael Boratko, Pranay Kumar Yelugam, Tim O'Gorman, Nalini Singh, Andrew McCallum, Xiang Lorraine Li

Abstract: Large language models have demonstrated impressive performance on commonsense tasks; however, these tasks are often posed as multiple-choice questions, allowing models to exploit systematic biases. Commonsense is also inherently probabilistic with multiple correct answers. The purpose of "boiling water" could be making tea and cooking, but it also could be killing germs. Existing tasks do not capt… ▽ More Large language models have demonstrated impressive performance on commonsense tasks; however, these tasks are often posed as multiple-choice questions, allowing models to exploit systematic biases. Commonsense is also inherently probabilistic with multiple correct answers. The purpose of "boiling water" could be making tea and cooking, but it also could be killing germs. Existing tasks do not capture the probabilistic nature of common sense. To this end, we present commonsense frame completion (CFC), a new generative task that evaluates common sense via multiple open-ended generations. We also propose a method of probabilistic evaluation that strongly correlates with human judgments. Humans drastically outperform strong language model baselines on our dataset, indicating this approach is both a challenging and useful evaluation of machine common sense. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: ACL 2024 Camera Ready

arXiv:2405.19841 [pdf, other]

doi 10.17909/h37d-c176

The First Photometric Analysis of Two Low Mass Ratio Contact Binary Systems In TESS Survey

Authors: Qiyuan Cheng, Jianping XIong, Xu Ding, Kaifan Ji, Jiao Li, Chao Liu, Jiangdan Li, Jingxiao Luo, Xin Lyu, Zhanwen Han, Xuefei Chen

Abstract: Low mass-ratio (q) contact binary systems are progenitors of stellar mergers such as blue straggles (BS) or fast-rotating FK Com stars. In this study, we present the first light curve analysis of two newly identified low mass-ratio contact binary systems, TIC 55007847 and TIC 63597006, that are identified from TESS. Both stars are classified as A-subtype contact binaries. We obtained the precise o… ▽ More Low mass-ratio (q) contact binary systems are progenitors of stellar mergers such as blue straggles (BS) or fast-rotating FK Com stars. In this study, we present the first light curve analysis of two newly identified low mass-ratio contact binary systems, TIC 55007847 and TIC 63597006, that are identified from TESS. Both stars are classified as A-subtype contact binaries. We obtained the precise orbit periods for the two objects by using the O-C method, i.e. P=0.6117108 d for TIC 55007847 and P=0.7008995 d for TIC 63597006, respectively, and found an obvious periodic signal in the O-C curve of TIC 63597006. We suggest that the periodic signal comes from a third body. We further use the Markov Chain Monte Carlo (MCMC) method with PHOEBE to derive the photometric solutions for the two binaries. The photometric solution for this object shows that the contribution of the third body is about 6%. Our analysis revealed that TIC 55007847 has an extremely low mass ratio of q=0.08. By calculating the ratio of spin angular momentum to the orbital angular momentum Js/Jo, we found that TIC 55007847 is very close to the instability threshold with Js/Jo = 0.31, indicating that it may merge into a single, fast-rotating star in the future. For TIC 63597006, q=0.14 and Js/Jo=0.15. This object is in a relatively stable evolutionary status at present. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.18458 [pdf]

Asymmetrical estimator for training encapsulated deep photonic neural networks

Authors: Yizhi Wang, Minjia Chen, Chunhui Yao, Jie Ma, Ting Yan, Richard Penty, Qixiang Cheng

Abstract: Scalable isomorphic physical neural networks (PNNs) are emerging NN acceleration paradigms for their high-bandwidth, in-propagation computation. Despite backpropagation (BP)-based training is often the industry standard for its robustness and fast gradient convergences, existing BP-PNN training methods need to truncate the propagation of analogue signal at each layer and acquire accurate hidden ne… ▽ More Scalable isomorphic physical neural networks (PNNs) are emerging NN acceleration paradigms for their high-bandwidth, in-propagation computation. Despite backpropagation (BP)-based training is often the industry standard for its robustness and fast gradient convergences, existing BP-PNN training methods need to truncate the propagation of analogue signal at each layer and acquire accurate hidden neuron readouts for deep networks. This compromises the incentive of PNN for fast in-propagation processing. In addition, the required readouts introduce massive bottlenecks due to the conversions between the analogue-digital interfaces to shuttle information across. These factors limit both the time and energy efficiency during training. Here we introduce the asymmetrical training (AT) method, a BP-based method that can perform training on an encapsulated deep network, where the information propagation is maintained within the analogue domain until the output layer. AT's minimum information access bypass analogue-digital interface bottleneck wherever possible. For any deep network structure, AT offers significantly improved time and energy efficiency compared to existing BP-PNN methods, and scales well for large network sizes. We demonstrated AT's error-tolerant and calibration-free training for encapsulated integrated photonic deep networks to achieve near ideal BP performances. AT's well-behaved training is demonstrated repeatably across different datasets and network structures △ Less

Submitted 15 August, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: 21 pages, 6 figures

MSC Class: 78-05

arXiv:2405.13336 [pdf, other]

SIGGesture: Generalized Co-Speech Gesture Synthesis via Semantic Injection with Large-Scale Pre-Training Diffusion Models

Authors: Qingrong Cheng, Xu Li, Xinghui Fu, Fei Xia, Zhongqian Sun

Abstract: The automated synthesis of high-quality 3D gestures from speech is of significant value in virtual humans and gaming. Previous methods focus on synthesizing gestures that are synchronized with speech rhythm, yet they frequently overlook the inclusion of semantic gestures. These are sparse and follow a long-tailed distribution across the gesture sequence, making them difficult to learn in an end-to… ▽ More The automated synthesis of high-quality 3D gestures from speech is of significant value in virtual humans and gaming. Previous methods focus on synthesizing gestures that are synchronized with speech rhythm, yet they frequently overlook the inclusion of semantic gestures. These are sparse and follow a long-tailed distribution across the gesture sequence, making them difficult to learn in an end-to-end manner. Moreover, generating gestures, rhythmically aligned with speech, faces a significant issue that cannot be generalized to in-the-wild speeches. To address these issues, we introduce SIGGesture, a novel diffusion-based approach for synthesizing realistic gestures that are of both high quality and semantically pertinent. Specifically, we firstly build a strong diffusion-based foundation model for rhythmical gesture synthesis by pre-training it on a collected large-scale dataset with pseudo labels. Secondly, we leverage the powerful generalization capabilities of Large Language Models (LLMs) to generate proper semantic gestures for the various speech content. Finally, we propose a semantic injection module to infuse semantic information into the synthesized results during diffusion reverse process. Extensive experiments demonstrate that the proposed SIGGesture significantly outperforms existing baselines and shows excellent generalization and controllability. △ Less

Submitted 22 September, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

Comments: 13 pages, siggraph asia 2024,

ACM Class: I.2.6

arXiv:2405.12939 [pdf, other]

Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models

Authors: Zhangyue Yin, Qiushi Sun, Qipeng Guo, Zhiyuan Zeng, Xiaonan Li, Tianxiang Sun, Cheng Chang, Qinyuan Cheng, Ding Wang, Xiaofeng Mou, Xipeng Qiu, XuanJing Huang

Abstract: Recent advancements in Chain-of-Thought prompting have facilitated significant breakthroughs for Large Language Models (LLMs) in complex reasoning tasks. Current research enhances the reasoning performance of LLMs by sampling multiple reasoning chains and ensembling based on the answer frequency. However, this approach fails in scenarios where the correct answers are in the minority. We identify t… ▽ More Recent advancements in Chain-of-Thought prompting have facilitated significant breakthroughs for Large Language Models (LLMs) in complex reasoning tasks. Current research enhances the reasoning performance of LLMs by sampling multiple reasoning chains and ensembling based on the answer frequency. However, this approach fails in scenarios where the correct answers are in the minority. We identify this as a primary factor constraining the reasoning capabilities of LLMs, a limitation that cannot be resolved solely based on the predicted answers. To address this shortcoming, we introduce a hierarchical reasoning aggregation framework AoR (Aggregation of Reasoning), which selects answers based on the evaluation of reasoning chains. Additionally, AoR incorporates dynamic sampling, adjusting the number of reasoning chains in accordance with the complexity of the task. Experimental results on a series of complex reasoning tasks show that AoR outperforms prominent ensemble methods. Further analysis reveals that AoR not only adapts various LLMs but also achieves a superior performance ceiling when compared to current methods. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 17 pages, 14 figures, accepted by LREC-COLING 2024

arXiv:2405.03415 [pdf, other]

Unique solvability and error analysis of the Lagrange multiplier approach for gradient flows

Authors: Qing Cheng, Jie Shen, Cheng Wang

Abstract: The unique solvability and error analysis of the original Lagrange multiplier approach proposed in [8] for gradient flows is studied in this paper. We identify a necessary and sufficient condition that must be satisfied for the nonlinear algebraic equation arising from the original Lagrange multiplier approach to admit a unique solution in the neighborhood of its exact solution, and propose a modi… ▽ More The unique solvability and error analysis of the original Lagrange multiplier approach proposed in [8] for gradient flows is studied in this paper. We identify a necessary and sufficient condition that must be satisfied for the nonlinear algebraic equation arising from the original Lagrange multiplier approach to admit a unique solution in the neighborhood of its exact solution, and propose a modified Lagrange multiplier approach so that the computation can continue even if the aforementioned condition is not satisfied. Using Cahn-Hilliard equation as an example, we prove rigorously the unique solvability and establish optimal error estimates of a second-order Lagrange multiplier scheme assuming this condition and that the time step is sufficient small. We also present numerical results to demonstrate that the modified Lagrange multiplier approach is much more robust and can use much larger time step than the original Lagrange multiplier approach. △ Less

Submitted 6 May, 2024; originally announced May 2024.

MSC Class: 65M70; 65K15; 65N22

arXiv:2404.19534 [pdf, other]

MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huanjing Yue, Jingyu Yang , et al. (38 additional authors not shown)

Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/. △ Less

Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

arXiv:2404.07108 [pdf, other]

From Model-centered to Human-Centered: Revision Distance as a Metric for Text Evaluation in LLMs-based Applications

Authors: Yongqiang Ma, Lizhi Qing, Jiawei Liu, Yangyang Kang, Yue Zhang, Wei Lu, Xiaozhong Liu, Qikai Cheng

Abstract: Evaluating large language models (LLMs) is fundamental, particularly in the context of practical applications. Conventional evaluation methods, typically designed primarily for LLM development, yield numerical scores that ignore the user experience. Therefore, our study shifts the focus from model-centered to human-centered evaluation in the context of AI-powered writing assistance applications. O… ▽ More Evaluating large language models (LLMs) is fundamental, particularly in the context of practical applications. Conventional evaluation methods, typically designed primarily for LLM development, yield numerical scores that ignore the user experience. Therefore, our study shifts the focus from model-centered to human-centered evaluation in the context of AI-powered writing assistance applications. Our proposed metric, termed ``Revision Distance,'' utilizes LLMs to suggest revision edits that mimic the human writing process. It is determined by counting the revision edits generated by LLMs. Benefiting from the generated revision edit details, our metric can provide a self-explained text evaluation result in a human-understandable manner beyond the context-independent score. Our results show that for the easy-writing task, ``Revision Distance'' is consistent with established metrics (ROUGE, Bert-score, and GPT-score), but offers more insightful, detailed feedback and better distinguishes between texts. Moreover, in the context of challenging academic writing tasks, our metric still delivers reliable evaluations where other metrics tend to struggle. Furthermore, our metric also holds significant potential for scenarios lacking reference texts. △ Less

Submitted 10 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

Comments: 9 pages, 2 figures, under review

arXiv:2404.01624 [pdf]

Intelligent Optimization of Mine Environmental Damage Assessment and Repair Strategies Based on Deep Learning

Authors: Qishuo Cheng

Abstract: In recent decades, financial quantification has emerged and matured rapidly. For financial institutions such as funds, investment institutions are increasingly dissatisfied with the situation of passively constructing investment portfolios with average market returns, and are paying more and more attention to active quantitative strategy investment portfolios. This requires the introduction of act… ▽ More In recent decades, financial quantification has emerged and matured rapidly. For financial institutions such as funds, investment institutions are increasingly dissatisfied with the situation of passively constructing investment portfolios with average market returns, and are paying more and more attention to active quantitative strategy investment portfolios. This requires the introduction of active stock investment fund management models. Currently, in my country's stock fund investment market, there are many active quantitative investment strategies, and the algorithms used vary widely, such as SVM, random forest, RNN recurrent memory network, etc. This article focuses on this trend, using the emerging LSTM-GRU gate-controlled long short-term memory network model in the field of financial stock investment as a basis to build a set of active investment stock strategies, and combining it with SVM, which has been widely used in the field of quantitative stock investment. Comparing models such as RNN, theoretically speaking, compared to SVM that simply relies on kernel functions for high-order mapping and classification of data, neural network algorithms such as RNN and LSTM-GRU have better principles and are more suitable for processing financial stock data. Then, through multiple By comparison, it was finally found that the LSTM- GRU gate-controlled long short-term memory network has a better accuracy. By selecting the LSTM-GRU algorithm to construct a trading strategy based on the Shanghai and Shenzhen 300 Index constituent stocks, the parameters were adjusted and the neural layer connection was adjusted. Finally, It has significantly outperformed the benchmark index CSI 300 over the long term. The conclusion of this article is that the research results can provide certain quantitative strategy references for financial institutions to construct active stock investment portfolios. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.14970 [pdf]

doi 10.1016/j.scib.2024.03.052

Quantum spin driven Yu-Shiba-Rusinov multiplets and fermion-parity-preserving phase transition in K$_3$C$_{60}$

Authors: Shu-Ze Wang, Xue-Qing Yu, Li-Xuan Wei, Li Wang, Qiang-Jun Cheng, Kun Peng, Fang-Jun Cheng, Yu Liu, Fang-Sen Li, Xu-Cun Ma, Qi-Kun Xue, Can-Li Song

Abstract: Magnetic impurities in superconductors are of increasing interest due to emergent Yu-Shiba-Rusinov (YSR) states and Majorana zero modes for fault-tolerant quantum computation. However, a direct relationship between the YSR multiple states and magnetic anisotropy splitting of quantum impurity spins remains poorly characterized. By using scanning tunneling microscopy, we resolve systematically indiv… ▽ More Magnetic impurities in superconductors are of increasing interest due to emergent Yu-Shiba-Rusinov (YSR) states and Majorana zero modes for fault-tolerant quantum computation. However, a direct relationship between the YSR multiple states and magnetic anisotropy splitting of quantum impurity spins remains poorly characterized. By using scanning tunneling microscopy, we resolve systematically individual transition-metal (Fe, Cr and Ni) impurities induced YSR multiplets as well as their Zeeman effects in K$_3$C$_{60}$ superconductor. The YSR multiplets show identical $d$ orbital-like wave functions that are symmetry-mismatched to the threefold K$_3$C$_{60}$(111) host surface, breaking point-group symmetries of the spatial distribution of YSR bound states in real space. Remarkably, we identify an unprecedented fermion-parity-preserving quantum phase transition between ground states with opposite signs of the uniaxial magnetic anisotropy that can be manipulated by an external magnetic field. These findings can be readily understood in terms of anisotropy splitting of quantum impurity spins, and thus elucidate the intricate interplay between the magnetic anisotropy and YSR multiplets. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 38 pages, 4 figures in the main text

Journal ref: Science Bulletin 69, 1392 (2024)

arXiv:2403.09898 [pdf, other]

TimeMachine: A Time Series is Worth 4 Mambas for Long-term Forecasting

Authors: Md Atik Ahamed, Qiang Cheng

Abstract: Long-term time-series forecasting remains challenging due to the difficulty in capturing long-term dependencies, achieving linear scalability, and maintaining computational efficiency. We introduce TimeMachine, an innovative model that leverages Mamba, a state-space model, to capture long-term dependencies in multivariate time series data while maintaining linear scalability and small memory footp… ▽ More Long-term time-series forecasting remains challenging due to the difficulty in capturing long-term dependencies, achieving linear scalability, and maintaining computational efficiency. We introduce TimeMachine, an innovative model that leverages Mamba, a state-space model, to capture long-term dependencies in multivariate time series data while maintaining linear scalability and small memory footprints. TimeMachine exploits the unique properties of time series data to produce salient contextual cues at multi-scales and leverage an innovative integrated quadruple-Mamba architecture to unify the handling of channel-mixing and channel-independence situations, thus enabling effective selection of contents for prediction against global and local contexts at different scales. Experimentally, TimeMachine achieves superior performance in prediction accuracy, scalability, and memory efficiency, as extensively validated using benchmark datasets. Code availability: https://github.com/Atik-Ahamed/TimeMachine △ Less

Submitted 22 August, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: 27TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI-2024)

arXiv:2403.02866 [pdf]

Unlocking Electro-optic Resonant Phase Shifting for Multi-dimensional, Ultra-dynamic Photonic Switches

Authors: Lingzhi Luo, Rui Ma, Richard V. Penty, Qixiang Cheng

Abstract: Optical circuit switching is connection-oriented, being deterministic through the reservation of a complete wavelength channel or spatial path for a certain period. However, this comes at a trade-off against link dynamics, and overall capacity can thus be constrained by the time slot reservations, especially for switches with microsecond- to millisecond-scale reconfiguration times. For data-intens… ▽ More Optical circuit switching is connection-oriented, being deterministic through the reservation of a complete wavelength channel or spatial path for a certain period. However, this comes at a trade-off against link dynamics, and overall capacity can thus be constrained by the time slot reservations, especially for switches with microsecond- to millisecond-scale reconfiguration times. For data-intensive applications, the communication patterns associated with random data sets typically yield short-lived flows. This situation calls for a new multi-dimensional switching paradigm that fully exploits not only the space and wavelength domains but also with nanosecond-scale reconfigurable capability in the time domain to enable ultra-dynamic links. In this work, we focus on the exploitation of micro-ring resonant phase shifters (RPSs) that are wavelength selective for optical switching in a single plane. By proposing an innovative analytical method with transmission circle chart, we fully unlock the power of RPS with nanosecond-scale reconfigurability and the capability to arbitrarily manipulate its phase and amplitude. Such a compact model offers fresh insights into designs with under and critically coupled RPSs beyond the commonly explored over-coupling condition. This creates not only versatile switch elements but also perfect absorbers for robust multi-wavelength operations. The proposed device can bring about a breakthrough in the optical switching capacity that potentially addresses the challenges faced by modern data center networks, as well as other photonic circuits for high-throughput signal processing. △ Less

Submitted 12 October, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: 10 pages

arXiv:2403.02757 [pdf, other]

In-Memory Learning: A Declarative Learning Framework for Large Language Models

Authors: Bo Wang, Tianxiang Sun, Hang Yan, Siyin Wang, Qingyuan Cheng, Xipeng Qiu

Abstract: The exploration of whether agents can align with their environment without relying on human-labeled data presents an intriguing research topic. Drawing inspiration from the alignment process observed in intelligent organisms, where declarative memory plays a pivotal role in summarizing past experiences, we propose a novel learning framework. The agents adeptly distill insights from past experience… ▽ More The exploration of whether agents can align with their environment without relying on human-labeled data presents an intriguing research topic. Drawing inspiration from the alignment process observed in intelligent organisms, where declarative memory plays a pivotal role in summarizing past experiences, we propose a novel learning framework. The agents adeptly distill insights from past experiences, refining and updating existing notes to enhance their performance in the environment. This entire process transpires within the memory components and is implemented through natural language, so we character this framework as In-memory Learning. We also delve into the key features of benchmarks designed to evaluate the self-improvement process. Through systematic experiments, we demonstrate the effectiveness of our framework and provide insights into this problem. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.02232 [pdf]

doi 10.62051/ijcsit.v2n1.01

Comprehensive evaluation of Mal-API-2019 dataset by machine learning in malware detection

Authors: Zhenglin Li, Haibei Zhu, Houze Liu, Jintong Song, Qishuo Cheng

Abstract: This study conducts a thorough examination of malware detection using machine learning techniques, focusing on the evaluation of various classification models using the Mal-API-2019 dataset. The aim is to advance cybersecurity capabilities by identifying and mitigating threats more effectively. Both ensemble and non-ensemble machine learning methods, such as Random Forest, XGBoost, K Nearest Neigh… ▽ More This study conducts a thorough examination of malware detection using machine learning techniques, focusing on the evaluation of various classification models using the Mal-API-2019 dataset. The aim is to advance cybersecurity capabilities by identifying and mitigating threats more effectively. Both ensemble and non-ensemble machine learning methods, such as Random Forest, XGBoost, K Nearest Neighbor (KNN), and Neural Networks, are explored. Special emphasis is placed on the importance of data pre-processing techniques, particularly TF-IDF representation and Principal Component Analysis, in improving model performance. Results indicate that ensemble methods, particularly Random Forest and XGBoost, exhibit superior accuracy, precision, and recall compared to others, highlighting their effectiveness in malware detection. The paper also discusses limitations and potential future directions, emphasizing the need for continuous adaptation to address the evolving nature of malware. This research contributes to ongoing discussions in cybersecurity and provides practical insights for developing more robust malware detection systems in the digital era. △ Less

Submitted 25 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Journal ref: International Journal of Computer Science and Information Technology, 2024, 2(1), 1-9

arXiv:2403.01209 [pdf, other]

Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning

Authors: Shuo Yang, Zirui Shang, Yongqi Wang, Derong Deng, Hongwei Chen, Qiyuan Cheng, Xinxiao Wu

Abstract: This paper proposes a novel framework for multi-label image recognition without any training data, called data-free framework, which uses knowledge of pre-trained Large Language Model (LLM) to learn prompts to adapt pretrained Vision-Language Model (VLM) like CLIP to multilabel classification. Through asking LLM by well-designed questions, we acquire comprehensive knowledge about characteristics a… ▽ More This paper proposes a novel framework for multi-label image recognition without any training data, called data-free framework, which uses knowledge of pre-trained Large Language Model (LLM) to learn prompts to adapt pretrained Vision-Language Model (VLM) like CLIP to multilabel classification. Through asking LLM by well-designed questions, we acquire comprehensive knowledge about characteristics and contexts of objects, which provides valuable text descriptions for learning prompts. Then we propose a hierarchical prompt learning method by taking the multi-label dependency into consideration, wherein a subset of category-specific prompt tokens are shared when the corresponding objects exhibit similar attributes or are more likely to co-occur. Benefiting from the remarkable alignment between visual and linguistic semantics of CLIP, the hierarchical prompts learned from text descriptions are applied to perform classification of images during inference. Our framework presents a new way to explore the synergies between multiple pre-trained models for novel category recognition. Extensive experiments on three public datasets (MS-COCO, VOC2007, and NUS-WIDE) demonstrate that our method achieves better results than the state-of-the-art methods, especially outperforming the zero-shot multi-label recognition methods by 4.7% in mAP on MS-COCO. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2403.00496 [pdf]

Benchmarking reconstructive spectrometer with multi-resonant cavities

Authors: Chunhui Yao, Kangning Xu, Tianhua Lin, Jie Ma, Chumeng Yao, Peng Bao, Zhitian Shi, Richard Penty, Qixiang Cheng

Abstract: Recent years have seen the rapid development of miniaturized reconstructive spectrometers (RSs), yet they still confront a range of technical challenges, such as bandwidth/resolution ratio, sensing speed, and/or power efficiency. Reported RS designs often suffer from insufficient decorrelation between sampling channels, which results in limited compressive sampling efficiency, in essence, due to i… ▽ More Recent years have seen the rapid development of miniaturized reconstructive spectrometers (RSs), yet they still confront a range of technical challenges, such as bandwidth/resolution ratio, sensing speed, and/or power efficiency. Reported RS designs often suffer from insufficient decorrelation between sampling channels, which results in limited compressive sampling efficiency, in essence, due to inadequate engineering of sampling responses. This in turn leads to poor spectral-pixel-to-channel ratios (SPCRs), typically restricted at single digits. So far, there lacks a general guideline for manipulating RS sampling responses for the effectiveness of spectral information acquisition. In this study, we shed light on a fundamental parameter from the compressive sensing theory - the average mutual correlation coefficient v - and provide insight into how it serves as a critical benchmark in RS design with regards to the SPCR and reconstruction accuracy. To this end, we propose a novel RS design with multi-resonant cavities, consisting of a series of partial reflective interfaces. Such multi-cavity configuration offers an expansive parameter space, facilitating the superlative optimization of sampling matrices with minimized v. As a proof-of-concept demonstration, a single-shot, dual-band RS is implemented on a SiN platform, tailored for capturing signature spectral shapes across different wavelength regions, with customized photonic crystal nanobeam mirrors. Experimentally, the device demonstrates an overall operation bandwidth of 270 nm and a <0.5 nm resolution with only 15 sampling channels per band, leading to a record high SPCR of 18.0. Moreover, the proposed multi-cavity design can be readily adapted to various photonic platforms. For instance, we showcase that by employing multi-layer coatings, an ultra-broadband RS can be optimized to exhibit a 700 nm bandwidth with an SPCR of over 100. △ Less

Submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.17194 [pdf]

The Random Forest Model for Analyzing and Forecasting the US Stock Market in the Context of Smart Finance

Authors: Jiajian Zheng, Duan Xin, Qishuo Cheng, Miao Tian, Le Yang

Abstract: The stock market is a crucial component of the financial market, playing a vital role in wealth accumulation for investors, financing costs for listed companies, and the stable development of the national macroeconomy. Significant fluctuations in the stock market can damage the interests of stock investors and cause an imbalance in the industrial structure, which can interfere with the macro level… ▽ More The stock market is a crucial component of the financial market, playing a vital role in wealth accumulation for investors, financing costs for listed companies, and the stable development of the national macroeconomy. Significant fluctuations in the stock market can damage the interests of stock investors and cause an imbalance in the industrial structure, which can interfere with the macro level development of the national economy. The prediction of stock price trends is a popular research topic in academia. Predicting the three trends of stock pricesrising, sideways, and falling can assist investors in making informed decisions about buying, holding, or selling stocks. Establishing an effective forecasting model for predicting these trends is of substantial practical importance. This paper evaluates the predictive performance of random forest models combined with artificial intelligence on a test set of four stocks using optimal parameters. The evaluation considers both predictive accuracy and time efficiency. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: 10 pages, 8 figures

Showing 1–50 of 319 results for author: Cheng, Q