subscribe to arXiv mailings

Diffusion Transformer Policy

Authors: Zhi Hou, Tianyi Zhang, Yuwen Xiong, Hengjun Pu, Chengyang Zhao, Ronglei Tong, Yu Qiao, Jifeng Dai, Yuntao Chen

Abstract: Recent large visual-language action models pretrained on diverse robot datasets have demonstrated the potential for generalizing to new environments with a few in-domain data. However, those approaches usually predict discretized or continuous actions by a small action head, which limits the ability in handling diverse action spaces. In contrast, we model the continuous action with a large multi-m… ▽ More Recent large visual-language action models pretrained on diverse robot datasets have demonstrated the potential for generalizing to new environments with a few in-domain data. However, those approaches usually predict discretized or continuous actions by a small action head, which limits the ability in handling diverse action spaces. In contrast, we model the continuous action with a large multi-modal diffusion transformer, dubbed as Diffusion Transformer Policy, in which we directly denoise action chunks by a large transformer model rather than a small action head. By leveraging the scaling capability of transformers, the proposed approach can effectively model continuous end-effector actions across large diverse robot datasets, and achieve better generalization performance. Extensive experiments demonstrate Diffusion Transformer Policy pretrained on diverse robot data can generalize to different embodiments, including simulation environments like Maniskill2 and Calvin, as well as the real-world Franka arm. Specifically, without bells and whistles, the proposed approach achieves state-of-the-art performance with only a single third-view camera stream in the Calvin novel task setting (ABC->D), improving the average number of tasks completed in a row of 5 to 3.6, and the pretraining stage significantly facilitates the success sequence length on the Calvin by over 1.2. The code will be publicly available. △ Less

Submitted 21 October, 2024; originally announced October 2024.

Comments: Preprint

arXiv:2410.15885 [pdf, other]

How to Build a Pre-trained Multimodal model for Simultaneously Chatting and Decision-making?

Authors: Zuojin Tang, Bin Hu, Chenyang Zhao, De Ma, Gang Pan, Bin Liu

Abstract: Existing large pre-trained models typically map text input to text output in an end-to-end manner, such as ChatGPT, or map a segment of text input to a hierarchy of action decisions, such as OpenVLA. However, humans can simultaneously generate text and actions when receiving specific input signals. For example, a driver can make precise driving decisions while conversing with a friend in the passe… ▽ More Existing large pre-trained models typically map text input to text output in an end-to-end manner, such as ChatGPT, or map a segment of text input to a hierarchy of action decisions, such as OpenVLA. However, humans can simultaneously generate text and actions when receiving specific input signals. For example, a driver can make precise driving decisions while conversing with a friend in the passenger seat. Motivated by this observation, we consider the following question in this work: is it possible to construct a pre-trained model that can provide both language interaction and precise decision-making capabilities in dynamic open scenarios. We provide a definitive answer to this question by developing a new model architecture termed Visual Language Action model for Chatting and Decision Making (VLA4CD), and further demonstrating its performance in challenging autonomous driving tasks. Specifically, we leverage LoRA to fine-tune a pre-trained LLM with data of multiple modalities covering language, visual, and action. Unlike the existing LoRA operations used for LLM fine-tuning, we have designed new computational modules and training cost functions for VLA4CD. These designs enable VLA4CD to provide continuous-valued action decisions while outputting text responses. In contrast, existing LLMs can only output text responses, and current VLA models can only output action decisions. Moreover, these VLA models handle action data by discretizing and then tokenizing the discretized actions, a method unsuitable for complex decision-making tasks involving high-dimensional continuous-valued action vectors, such as autonomous driving. The experimental results on CARLA validate that: (1) our proposed model construction method is effective; (2) compared to the SOTA VLA model, VLA4CD can provide more accurate real-time decision-making while retaining the text interaction capability inherent to LLMs. △ Less

Submitted 21 October, 2024; originally announced October 2024.

arXiv:2410.15529 [pdf, other]

Measurement of gas properties for the ion-TPC of N$ν$DEx experiment

Authors: Tianyu Liang, Meiqiang Zhan, Hulin Wang, Xianglun Wei, Dongliang Zhang, Jun Liu, Chengui Lu, Qiang Hu, Yichen Yang, Chaosong Gao, Le Xiao, Xiangming Sun, Feng Liu, Chengxin Zhao, Hao Qiu, Kai Chen

Abstract: In the N$ν$DEx collaboration, a high-pressure gas TPC is being developed to search for the neutrinoless double beta decay. The use of electronegative $\mathrm{^{82}SeF_{6}}$ gas mandates an ion-TPC. The reconstruction of $z$ coordinate is to be realized exploiting the feature of multiple species of charge carriers. As the initial stage of the development, we studied the properties of the… ▽ More In the N$ν$DEx collaboration, a high-pressure gas TPC is being developed to search for the neutrinoless double beta decay. The use of electronegative $\mathrm{^{82}SeF_{6}}$ gas mandates an ion-TPC. The reconstruction of $z$ coordinate is to be realized exploiting the feature of multiple species of charge carriers. As the initial stage of the development, we studied the properties of the $\mathrm{SF_{6}}$ gas, which is non-toxic and has similar molecular structure to $\mathrm{SeF_{6}}$. In the paper we present the measurement of drift velocities and mobilities of the majority and minority negative charge carriers found in $\mathrm{SF_{6}}$ at a pressure of 750 Torr, slightly higher than the local atmospheric pressure. The reduced fields range between 3.0 and 5.5 Td. It was performed using a laser beam to ionize the gas inside a small TPC, with a drift length of 3.7 cm. A customized charge sensitive amplifier was developed to read out the anode signals induced by the slowly drifting ions. The reconstruction of $z$ coordinate using the difference in the velocities of the two carriers was also demonstrated. △ Less

Submitted 20 October, 2024; originally announced October 2024.

Comments: 10 pages, 8 figures

arXiv:2410.13955 [pdf, other]

A multi-detector neutral helium atom microscope

Authors: Chenyang Zhao, Sam M Lambrick, Nick A von Jeinsen, Yanke Yuan, Xiaolong Zhang, Aleksandar Radić, David J Ward, John Ellis, Andrew P Jardine

Abstract: Scanning helium microscopy (SHeM) is an emerging technique that uses a beam of neutral atoms to image and analyse surfaces. The low energies ($\sim$64 meV) and completely non-destructive nature of the probe particles provide exceptional sensitivity for studying delicate samples and thin devices, including 2D materials. To date, around five such instruments have been constructed and are described i… ▽ More Scanning helium microscopy (SHeM) is an emerging technique that uses a beam of neutral atoms to image and analyse surfaces. The low energies ($\sim$64 meV) and completely non-destructive nature of the probe particles provide exceptional sensitivity for studying delicate samples and thin devices, including 2D materials. To date, around five such instruments have been constructed and are described in the literature. All represent the first attempts at SHeM construction in different laboratories, and use a single detection device. Here, we describe our second generation microscope, which is the first to offer multi-detector capabilities. The new instrument builds on recent research into SHeM optimisation and incorporates many improved design features over our previous instrument. We present measurements that highlight some of the unique capabilities the instrument provides, including 3D surface profiling, alternative imaging modes, and simultaneous acquisition of images from a mixed species beam. △ Less

Submitted 17 October, 2024; originally announced October 2024.

arXiv:2410.11046 [pdf]

SGUQ: Staged Graph Convolution Neural Network for Alzheimer's Disease Diagnosis using Multi-Omics Data

Authors: Liang Tao, Yixin Xie, Jeffrey D Deng, Hui Shen, Hong-Wen Deng, Weihua Zhou, Chen Zhao

Abstract: Alzheimer's disease (AD) is a chronic neurodegenerative disorder and the leading cause of dementia, significantly impacting cost, mortality, and burden worldwide. The advent of high-throughput omics technologies, such as genomics, transcriptomics, proteomics, and epigenomics, has revolutionized the molecular understanding of AD. Conventional AI approaches typically require the completion of all om… ▽ More Alzheimer's disease (AD) is a chronic neurodegenerative disorder and the leading cause of dementia, significantly impacting cost, mortality, and burden worldwide. The advent of high-throughput omics technologies, such as genomics, transcriptomics, proteomics, and epigenomics, has revolutionized the molecular understanding of AD. Conventional AI approaches typically require the completion of all omics data at the outset to achieve optimal AD diagnosis, which are inefficient and may be unnecessary. To reduce the clinical cost and improve the accuracy of AD diagnosis using multi-omics data, we propose a novel staged graph convolutional network with uncertainty quantification (SGUQ). SGUQ begins with mRNA and progressively incorporates DNA methylation and miRNA data only when necessary, reducing overall costs and exposure to harmful tests. Experimental results indicate that 46.23% of the samples can be reliably predicted using only single-modal omics data (mRNA), while an additional 16.04% of the samples can achieve reliable predictions when combining two omics data types (mRNA + DNA methylation). In addition, the proposed staged SGUQ achieved an accuracy of 0.858 on ROSMAP dataset, which outperformed existing methods significantly. The proposed SGUQ can not only be applied to AD diagnosis using multi-omics data but also has the potential for clinical decision-making using multi-viewed data. Our implementation is publicly available at https://github.com/chenzhao2023/multiomicsuncertainty. △ Less

Submitted 14 October, 2024; originally announced October 2024.

Comments: 20 pages, 2 figures

arXiv:2410.10934 [pdf, other]

Agent-as-a-Judge: Evaluate Agents with Agents

Authors: Mingchen Zhuge, Changsheng Zhao, Dylan Ashley, Wenyi Wang, Dmitrii Khizbullin, Yunyang Xiong, Zechun Liu, Ernie Chang, Raghuraman Krishnamoorthi, Yuandong Tian, Yangyang Shi, Vikas Chandra, Jürgen Schmidhuber

Abstract: Contemporary evaluation techniques are inadequate for agentic systems. These approaches either focus exclusively on final outcomes -- ignoring the step-by-step nature of agentic systems, or require excessive manual labour. To address this, we introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems. This is an organic extension of the LLM-as-a-Judge fr… ▽ More Contemporary evaluation techniques are inadequate for agentic systems. These approaches either focus exclusively on final outcomes -- ignoring the step-by-step nature of agentic systems, or require excessive manual labour. To address this, we introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems. This is an organic extension of the LLM-as-a-Judge framework, incorporating agentic features that enable intermediate feedback for the entire task-solving process. We apply the Agent-as-a-Judge to the task of code generation. To overcome issues with existing benchmarks and provide a proof-of-concept testbed for Agent-as-a-Judge, we present DevAI, a new benchmark of 55 realistic automated AI development tasks. It includes rich manual annotations, like a total of 365 hierarchical user requirements. We benchmark three of the popular agentic systems using Agent-as-a-Judge and find it dramatically outperforms LLM-as-a-Judge and is as reliable as our human evaluation baseline. Altogether, we believe that Agent-as-a-Judge marks a concrete step forward for modern agentic systems -- by providing rich and reliable reward signals necessary for dynamic and scalable self-improvement. △ Less

Submitted 16 October, 2024; v1 submitted 14 October, 2024; originally announced October 2024.

Comments: The project can be found at https://github.com/metauto-ai/agent-as-a-judge. The dataset is released at https://huggingface.co/DEVAI-benchmark

arXiv:2410.10198 [pdf, other]

Regions of Level $\ell$ of Catalan/Semiorder-Type Arrangements

Authors: Yanru Chen, Suijie Wang, Jinxing Yang, Chengdong Zhao

Abstract: By establishing a labeled Dyck path model for the regions of $\mathcal{C}_{n,A}$ and $\mathcal{C}_{n,A}^*$, his paper explores several enumerative problems related to the number of regions of level $\ell$, denoted as $r_{\ell}(\mathcal{C}_{n,A})$ and $r_{\ell}(\mathcal{C}_{n,A}^*)$, which includes: \begin{enumerate} \item[(1)] proving a Stirling convolution relation between $r_{\ell}(… ▽ More By establishing a labeled Dyck path model for the regions of \(\mathcal{C}_{n,A}$ and $\mathcal{C}_{n,A}^*$, his paper explores several enumerative problems related to the number of regions of level $\ell$, denoted as $r_{\ell}(\mathcal{C}_{n,A})$ and $r_{\ell}(\mathcal{C}_{n,A}^*)$, which includes: \begin{enumerate} \item[(1)] proving a Stirling convolution relation between $r_{\ell}(\mathcal{C}_{n,A})$ and $r_{\ell}(\mathcal{C}_{n,A}^*)$, refining a result by Stanley and Postnikov; \item[(2)] showing that the sequences $\left(r_\ell{(\mathcal{C}_{n,A})}\right)_{n\geq 0}$ and $(r_\ell {(\mathcal{C}_{n,A}^*)})_{n\geq 0}$ exhibit properties of binomial type in the sense of Rota; \item[(3)] establishing the transformational significance of $r_{\ell}(\mathcal{C}_{n,A})$ and $r_{\ell}(\mathcal{C}_{n,A}^*)$ under Stanley's ESA framework: they can be viewed as transition matrices from binomial coefficients to their characteristic polynomials respectively. \end{enumerate} Further, we present two applications of the theories and methods: first, inspired by a question from Deshpande, Menon, and Sarkar, we provide a hyperplane arrangement counting interpretation of the two-parameter generalization of Fuss--Catalan numbers, which is closely related to the number of regions of level $\ell$ in the $m$-Catalan arrangement. Second, using labeled Dyck paths to depict the number of regions in the $m$-Catalan arrangement, we algorithmically provide the inverse mapping of the Fu, Wang, and Zhu mapping. △ Less

Submitted 15 October, 2024; v1 submitted 14 October, 2024; originally announced October 2024.

Comments: 38 pages,16 figures

MSC Class: 05B35; 52C35

arXiv:2410.09793 [pdf, other]

Energy Bands of Incommensurate Systems

Authors: Xin-Yu Guo, Jin-Rong Chen, Chen Zhao, Miao Liang, Ying-Hai Wu, Jin-Hua Gao, X. C. Xie

Abstract: Energy band theory is a fundamental cornerstone of condensed matter physics. According to conventional wisdom, discrete translational symmetry is mandatory for defining energy bands. Here, we illustrate that, in fact, the concept of energy band can be generalized to incommensurate systems lacking such symmetry, thus transcending the traditional paradigm of energy band. The validity of our theory i… ▽ More Energy band theory is a fundamental cornerstone of condensed matter physics. According to conventional wisdom, discrete translational symmetry is mandatory for defining energy bands. Here, we illustrate that, in fact, the concept of energy band can be generalized to incommensurate systems lacking such symmetry, thus transcending the traditional paradigm of energy band. The validity of our theory is verified by extensive numerical calculations in the celebrated Aubry-André-Harper model and a two-dimensional incommensurate model of graphene. Building upon the proposed concept of incommensurate energy bands, we further develop a theory of angle-resolved photoemission spectroscopy (ARPES) for incommensurate systems, providing a clear physical picture for the incommensurate ARPES spectra. Our work establishes a comprehensive energy band theory for incommensurate systems. △ Less

Submitted 13 October, 2024; originally announced October 2024.

Comments: 8 pages, 3 figures

arXiv:2410.09151 [pdf, other]

A search using GEO600 for gravitational waves coincident with fast radio bursts from SGR 1935+2154

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, I. Abouelfettouh, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, D. Agarwal, M. Agathos, M. Aghaei Abchouyeh, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, A. Al-Jodah, C. Alléné , et al. (1758 additional authors not shown)

Abstract: The magnetar SGR 1935+2154 is the only known Galactic source of fast radio bursts (FRBs). FRBs from SGR 1935+2154 were first detected by CHIME/FRB and STARE2 in 2020 April, after the conclusion of the LIGO, Virgo, and KAGRA Collaborations' O3 observing run. Here we analyze four periods of gravitational wave (GW) data from the GEO600 detector coincident with four periods of FRB activity detected by… ▽ More The magnetar SGR 1935+2154 is the only known Galactic source of fast radio bursts (FRBs). FRBs from SGR 1935+2154 were first detected by CHIME/FRB and STARE2 in 2020 April, after the conclusion of the LIGO, Virgo, and KAGRA Collaborations' O3 observing run. Here we analyze four periods of gravitational wave (GW) data from the GEO600 detector coincident with four periods of FRB activity detected by CHIME/FRB, as well as X-ray glitches and X-ray bursts detected by NICER and NuSTAR close to the time of one of the FRBs. We do not detect any significant GW emission from any of the events. Instead, using a short-duration GW search (for bursts $\leq$ 1 s) we derive 50\% (90\%) upper limits of $10^{48}$ ($10^{49}$) erg for GWs at 300 Hz and $10^{49}$ ($10^{50}$) erg at 2 kHz, and constrain the GW-to-radio energy ratio to $\leq 10^{14} - 10^{16}$. We also derive upper limits from a long-duration search for bursts with durations between 1 and 10 s. These represent the strictest upper limits on concurrent GW emission from FRBs. △ Less

Submitted 11 October, 2024; originally announced October 2024.

Comments: 15 pages of text including references, 4 figures, 5 tables

Report number: LIGO-P2400192

arXiv:2410.06534 [pdf, other]

EEG-estimated functional connectivity, and not behavior, differentiates Parkinson's patients from health controls during the Simon conflict task

Authors: Xiaoxiao Sun, Chongkun Zhao, Sharath Koorathota, Paul Sajda

Abstract: Neural biomarkers that can classify or predict disease are of broad interest to the neurological and psychiatric communities. Such biomarkers can be informative of disease state or treatment efficacy, even before there are changes in symptoms and/or behavior. This work investigates EEG-estimated functional connectivity (FC) as a Parkinson's Disease (PD) biomarker. Specifically, we investigate FC m… ▽ More Neural biomarkers that can classify or predict disease are of broad interest to the neurological and psychiatric communities. Such biomarkers can be informative of disease state or treatment efficacy, even before there are changes in symptoms and/or behavior. This work investigates EEG-estimated functional connectivity (FC) as a Parkinson's Disease (PD) biomarker. Specifically, we investigate FC mediated via neural oscillations and consider such activity during the Simons conflict task. This task yields sensory-motor conflict, and one might expect differences in behavior between PD patients and healthy controls (HCs). In addition to considering spatially focused approaches, such as FC, as a biomarker, we also consider temporal biomarkers, which are more sensitive to ongoing changes in neural activity. We find that FC, estimated from delta (1-4Hz) and theta (4-7Hz) oscillations, yields spatial FC patterns significantly better at distinguishing PD from HC than temporal features or behavior. This study reinforces that FC in spectral bands is informative of differences in brain-wide processes and can serve as a biomarker distinguishing normal brain function from that seen in disease. △ Less

Submitted 9 October, 2024; originally announced October 2024.

Comments: This work is accepted at IEEE EMBC 2024. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information

arXiv:2410.05640 [pdf, ps, other]

Non-dense orbits on topological dynamical systems

Authors: Cao Zhao, Jiao Yang, Xiaoyao Zhou

Abstract: Let $(X,d,T )$ be a topological dynamical system with the specification property. We consider the non-dense orbit set $E(z_0)$ and show that for any non-transitive point $z_0\in X$, this set $E(z_0)$ is empty or carries full topological pressure. Let $(X,d,T )$ be a topological dynamical system with the specification property. We consider the non-dense orbit set $E(z_0)$ and show that for any non-transitive point $z_0\in X$, this set $E(z_0)$ is empty or carries full topological pressure. △ Less

Submitted 7 October, 2024; originally announced October 2024.

arXiv:2410.04342 [pdf, other]

Accelerating Inference of Networks in the Frequency Domain

Authors: Chenqiu Zhao, Guanfang Dong, Anup Basu

Abstract: It has been demonstrated that networks' parameters can be significantly reduced in the frequency domain with a very small decrease in accuracy. However, given the cost of frequency transforms, the computational complexity is not significantly decreased. In this work, we propose performing network inference in the frequency domain to speed up networks whose frequency parameters are sparse. In parti… ▽ More It has been demonstrated that networks' parameters can be significantly reduced in the frequency domain with a very small decrease in accuracy. However, given the cost of frequency transforms, the computational complexity is not significantly decreased. In this work, we propose performing network inference in the frequency domain to speed up networks whose frequency parameters are sparse. In particular, we propose a frequency inference chain that is dual to the network inference in the spatial domain. In order to handle the non-linear layers, we make a compromise to apply non-linear operations on frequency data directly, which works effectively. Enabled by the frequency inference chain and the strategy for non-linear layers, the proposed approach completes the entire inference in the frequency domain. Unlike previous approaches which require extra frequency or inverse transforms for all layers, the proposed approach only needs the frequency transform and its inverse once at the beginning and once at the end of a network. Comparisons with state-of-the-art methods demonstrate that the proposed approach significantly improves accuracy in the case of a high speedup ratio (over 100x). The source code is available at \url{https://github.com/guanfangdong/FreqNet-Infer}. △ Less

Submitted 5 October, 2024; originally announced October 2024.

Comments: accepted by ACM Multimedia Asia 2024

arXiv:2410.04232 [pdf, other]

Be There, Be Together, Be Streamed! AR Scenic Live-Streaming for an Interactive and Collective Experience

Authors: Zeyu Huang, Zuyu Xu, Yuanhao Zhang, Chengzhong Liu, Yanwei Zhao, Chuhan Shi, Jason Chen Zhao, Xiaojuan Ma

Abstract: Scenic Live-Streaming (SLS), capturing real-world scenic sites from fixed cameras without streamers, combines scene immersion and the social and real-time characteristics of live-streaming into a unique experience. However, existing SLS affords limited audience interactions to engage them in a collective experience compared to many other live-streaming genres. It is also difficult for SLS to recre… ▽ More Scenic Live-Streaming (SLS), capturing real-world scenic sites from fixed cameras without streamers, combines scene immersion and the social and real-time characteristics of live-streaming into a unique experience. However, existing SLS affords limited audience interactions to engage them in a collective experience compared to many other live-streaming genres. It is also difficult for SLS to recreate important but intangible constituents of in-person trip experiences, such as cultural activities. To offer a more interactive, engaging, and meaningful experience, we propose ARSLS (Augmented Reality Scenic Live-Streaming). Culturally grounded AR objects with awareness of the live-streamed environment can be overlaid over camera views to provide additional interactive features while maintaining consistency with the live-streamed scene. To explore the design space of this new medium, we developed an ARSLS prototype for a famous landscape in China. A preliminary study (N=15) provided initial insights for ARSLS design. △ Less

Submitted 5 October, 2024; originally announced October 2024.

Comments: 4 pages, 2 figures, to appear in the adjunct proceedings of ISMAR 2024 and the ISMAR 2024 conference

arXiv:2410.03083 [pdf, other]

Scaling Parameter-Constrained Language Models with Quality Data

Authors: Ernie Chang, Matteo Paltenghi, Yang Li, Pin-Jie Lin, Changsheng Zhao, Patrick Huber, Zechun Liu, Rastislav Rabatin, Yangyang Shi, Vikas Chandra

Abstract: Scaling laws in language modeling traditionally quantify training loss as a function of dataset size and model parameters, providing compute-optimal estimates but often neglecting the impact of data quality on model generalization. In this paper, we extend the conventional understanding of scaling law by offering a microscopic view of data quality within the original formulation -- effective train… ▽ More Scaling laws in language modeling traditionally quantify training loss as a function of dataset size and model parameters, providing compute-optimal estimates but often neglecting the impact of data quality on model generalization. In this paper, we extend the conventional understanding of scaling law by offering a microscopic view of data quality within the original formulation -- effective training tokens -- which we posit to be a critical determinant of performance for parameter-constrained language models. Specifically, we formulate the proposed term of effective training tokens to be a combination of two readily-computed indicators of text: (i) text diversity and (ii) syntheticity as measured by a teacher model. We pretrained over $200$ models of 25M to 1.5B parameters on a diverse set of sampled, synthetic data, and estimated the constants that relate text quality, model size, training tokens, and eight reasoning task accuracy scores. We demonstrated the estimated constants yield +0.83 Pearson correlation with true accuracies, and analyzed it in scenarios involving widely-used data techniques such as data sampling and synthesis which aim to improve data quality. △ Less

Submitted 3 October, 2024; originally announced October 2024.

Comments: Accepted to EMNLP 2024 Industry Track, 18 pages, 9 figures, 4 tables

arXiv:2410.00768 [pdf, other]

High Mobility SiGe/Ge 2DHG Heterostructure Quantum Wells for Semiconductor Hole Spin Qubits

Authors: Zhenzhen Kong, Zonghu Li, Yuchen Zhou, Gang Cao, Hai-Ou Li, Jiale Su, Yiwen Zhang, Jinbiao Liu, Guo-Ping Guo, Junfeng Li, Jun Luo, Chao Zhao, Tianchun Ye, Guilei Wang

Abstract: Strong spin-orbit coupling and relatively weak hyperfine interactions make germanium hole spin qubits a promising candidate for semiconductor quantum processors. The two-dimensional hole gas structure of strained Ge quantum wells serves as the primary material platform for spin hole qubits.A low disorder material environment is essential for this process. In this work, we fabricated a Ge/SiGe hete… ▽ More Strong spin-orbit coupling and relatively weak hyperfine interactions make germanium hole spin qubits a promising candidate for semiconductor quantum processors. The two-dimensional hole gas structure of strained Ge quantum wells serves as the primary material platform for spin hole qubits.A low disorder material environment is essential for this process. In this work, we fabricated a Ge/SiGe heterojunction with a 60 nm buried quantum well layer on a Si substrate using reduced pressure chemical vapor deposition technology. At a temperature of 16 mK, when the carrier density is 1.87*10^11/cm2, we obtained a mobility as high as 308.64*10^4cm2/Vs. Concurrently, double quantum dot and planar germanium coupling with microwave cavities were also successfully achieved.This fully demonstrates that this structure can be used for the preparation of higher-performance hole spin qubits. △ Less

Submitted 1 October, 2024; originally announced October 2024.

arXiv:2409.20558 [pdf, other]

Uni$^2$Det: Unified and Universal Framework for Prompt-Guided Multi-dataset 3D Detection

Authors: Yubin Wang, Zhikang Zou, Xiaoqing Ye, Xiao Tan, Errui Ding, Cairong Zhao

Abstract: We present Uni$^2$Det, a brand new framework for unified and universal multi-dataset training on 3D detection, enabling robust performance across diverse domains and generalization to unseen domains. Due to substantial disparities in data distribution and variations in taxonomy across diverse domains, training such a detector by simply merging datasets poses a significant challenge. Motivated by t… ▽ More We present Uni$^2$Det, a brand new framework for unified and universal multi-dataset training on 3D detection, enabling robust performance across diverse domains and generalization to unseen domains. Due to substantial disparities in data distribution and variations in taxonomy across diverse domains, training such a detector by simply merging datasets poses a significant challenge. Motivated by this observation, we introduce multi-stage prompting modules for multi-dataset 3D detection, which leverages prompts based on the characteristics of corresponding datasets to mitigate existing differences. This elegant design facilitates seamless plug-and-play integration within various advanced 3D detection frameworks in a unified manner, while also allowing straightforward adaptation for universal applicability across datasets. Experiments are conducted across multiple dataset consolidation scenarios involving KITTI, Waymo, and nuScenes, demonstrating that our Uni$^2$Det outperforms existing methods by a large margin in multi-dataset training. Notably, results on zero-shot cross-dataset transfer validate the generalization capability of our proposed method. △ Less

Submitted 30 September, 2024; originally announced September 2024.

Comments: 13 pages, 5 figures, 6 tables

arXiv:2409.20461 [pdf, other]

Helium atom micro-diffraction as a characterisation tool for 2D materials

Authors: Nick von Jeinsen, Aleksandar Radic, Ke Wang, Chenyang Zhao, Vivian Perez, Yiru Zhu, Manish Chhowalla, Andrew Jardine, David Ward, Sam Lambrick

Abstract: We present helium atom micro-diffraction as an ideal technique for characterization of 2D materials due to its ultimate surface sensitivity combined with sub-micron spatial resolution. Thermal energy neutral helium scatters from the valence electron density, 2-3A above the ionic cores of a surface, making the technique ideal for studying 2D materials, where other approaches can struggle due to sma… ▽ More We present helium atom micro-diffraction as an ideal technique for characterization of 2D materials due to its ultimate surface sensitivity combined with sub-micron spatial resolution. Thermal energy neutral helium scatters from the valence electron density, 2-3A above the ionic cores of a surface, making the technique ideal for studying 2D materials, where other approaches can struggle due to small interaction cross-sections with few-layer samples. Sub-micron spatial resolution is key development in neutral atom scattering to allow measurements from device-scale samples. We present measurements of monolayer-substrate interactions, thermal expansion coefficients, the electron-phonon coupling constant and vacancy-type defect density on monolayer-MoS2. We also discuss extensions to the presented methods which can be immediately implemented on existing instruments to perform spatial mapping of these material properties. △ Less

Submitted 30 September, 2024; originally announced September 2024.

Comments: Draft version, 11 pages, 6 figures, 2 tables

arXiv:2409.19624 [pdf, other]

Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection

Authors: Yuhang Ma, Wenting Xu, Chaoyi Zhao, Keqiang Sun, Qinfeng Jin, Zeng Zhao, Changjie Fan, Zhipeng Hu

Abstract: Recent advances in text-to-image diffusion models have spurred significant interest in continuous story image generation. In this paper, we introduce Storynizor, a model capable of generating coherent stories with strong inter-frame character consistency, effective foreground-background separation, and diverse pose variation. The core innovation of Storynizor lies in its key modules: ID-Synchroniz… ▽ More Recent advances in text-to-image diffusion models have spurred significant interest in continuous story image generation. In this paper, we introduce Storynizor, a model capable of generating coherent stories with strong inter-frame character consistency, effective foreground-background separation, and diverse pose variation. The core innovation of Storynizor lies in its key modules: ID-Synchronizer and ID-Injector. The ID-Synchronizer employs an auto-mask self-attention module and a mask perceptual loss across inter-frame images to improve the consistency of character generation, vividly representing their postures and backgrounds. The ID-Injector utilize a Shuffling Reference Strategy (SRS) to integrate ID features into specific locations, enhancing ID-based consistent character generation. Additionally, to facilitate the training of Storynizor, we have curated a novel dataset called StoryDB comprising 100, 000 images. This dataset contains single and multiple-character sets in diverse environments, layouts, and gestures with detailed descriptions. Experimental results indicate that Storynizor demonstrates superior coherent story generation with high-fidelity character consistency, flexible postures, and vivid backgrounds compared to other character-specific methods. △ Less

Submitted 29 September, 2024; originally announced September 2024.

arXiv:2409.17795 [pdf, other]

Physics-driven complex relaxation for multi-body systems of SPH method

Authors: Chenxi Zhao, Yongchuan Yu, Oskar J. Haidn, Xiangyu Hu

Abstract: In the smoothed particle dynamics (SPH) method, the characteristics of a target particle are interpolated based on the information from its neighboring particles. Consequently, a uniform initial distribution of particles significantly enhances the accuracy of SPH calculations. This aspect is particularly critical in Eulerian SPH, where particles are stationary throughout the simulation. To address… ▽ More In the smoothed particle dynamics (SPH) method, the characteristics of a target particle are interpolated based on the information from its neighboring particles. Consequently, a uniform initial distribution of particles significantly enhances the accuracy of SPH calculations. This aspect is particularly critical in Eulerian SPH, where particles are stationary throughout the simulation. To address this, we introduce a physics-driven complex relaxation method for multi-body systems. Through a series of two-dimensional and three-dimensional case studies, we demonstrate that this method is capable of achieving a globally uniform particle distribution, especially at the interfaces between contacting bodies, and ensuring improved zero-order consistency. Moreover, the effectiveness and reliability of the complex relaxation method in enhancing the accuracy of physical simulations are further validated. △ Less

Submitted 26 September, 2024; originally announced September 2024.

Comments: 38 pages and 25 figures

arXiv:2409.16682 [pdf, other]

SynTQA: Synergistic Table-based Question Answering via Mixture of Text-to-SQL and E2E TQA

Authors: Siyue Zhang, Anh Tuan Luu, Chen Zhao

Abstract: Text-to-SQL parsing and end-to-end question answering (E2E TQA) are two main approaches for Table-based Question Answering task. Despite success on multiple benchmarks, they have yet to be compared and their synergy remains unexplored. In this paper, we identify different strengths and weaknesses through evaluating state-of-the-art models on benchmark datasets: Text-to-SQL demonstrates superiority… ▽ More Text-to-SQL parsing and end-to-end question answering (E2E TQA) are two main approaches for Table-based Question Answering task. Despite success on multiple benchmarks, they have yet to be compared and their synergy remains unexplored. In this paper, we identify different strengths and weaknesses through evaluating state-of-the-art models on benchmark datasets: Text-to-SQL demonstrates superiority in handling questions involving arithmetic operations and long tables; E2E TQA excels in addressing ambiguous questions, non-standard table schema, and complex table contents. To combine both strengths, we propose a Synergistic Table-based Question Answering approach that integrate different models via answer selection, which is agnostic to any model types. Further experiments validate that ensembling models by either feature-based or LLM-based answer selector significantly improves the performance over individual models. △ Less

Submitted 29 September, 2024; v1 submitted 25 September, 2024; originally announced September 2024.

Comments: EMNLP 2024

arXiv:2409.16332 [pdf]

SOWAHA as a Cancer Suppressor Gene Influence Metabolic Reprogramming

Authors: Xiaohong Yi, Xianwen Zhang, Claire H. Zhao, Yuhui Chen, Lijun Huang, Hua Zhong, Yumei Wang

Abstract: SOWAHA is a protein-coding gene, also known as ANKRD43. Studies have indicated that SOWAHA can serve as a prognostic biomarker in colorectal cancer and pancreatic cancer. However, there are few reports about SOWAHA in other types of cancer and the specific mechanism of action of SOWAHA in cancer is also not clear. Based on National Center for Biotechnology Information (NCBI), The Cancer Genome Atl… ▽ More SOWAHA is a protein-coding gene, also known as ANKRD43. Studies have indicated that SOWAHA can serve as a prognostic biomarker in colorectal cancer and pancreatic cancer. However, there are few reports about SOWAHA in other types of cancer and the specific mechanism of action of SOWAHA in cancer is also not clear. Based on National Center for Biotechnology Information (NCBI), The Cancer Genome Atlas (TCGA), Genotype-Tissue Expression Project (GTEx), cBioPortal, Human Protein Atlas (HPA), etc., we adopted bioinformatics methods to uncover the potential tumor genomic features of SOWAHA, including the correlation with prognosis, gene mutation, immune cell infiltration, and DNA methylation in different tumors and evaluated the association with tumor heterogeneity, stemness, chemokines chemokine receptors, and immunomodulators in pan-cancer. Besides, we knocked down SOWAHA in SW620 cells and performed RNA-seq analysis, then we conducted functional enrichment to uncover the biological significance of the gene set. SOWAHA has early diagnostic potential, and low expression of SOWAHA was associated with poor prognosis in was associated with poor prognosis in GBMLGG, PAAD, READ, etc. SOWAHA is associated with most tumor immune-infiltrating cells in pan-cancer. SOWAHA correlates with DNA methylation, tumor heterogeneity, and stemness in many epithelial carcinomas. Furthermore, SOWAHA is involved in many enzyme activity and metabolic pathways, mainly metabolic programming pathways in cancer. Additionally, we identified two potential transcription factors of SOWAHA, TBX4, and FOXP2, which are dysregulated in SW620 cells. Besides, the cell proliferation and viability in siSOWAHA groups are better than in siNC groups.SOWAHA, identified as a suppressor gene, and its role in the progression of colorectal cancer is primarily mediated through metabolic reprogramming mechanisms. △ Less

Submitted 2 October, 2024; v1 submitted 24 September, 2024; originally announced September 2024.

Comments: 37 pages, 17 figures

arXiv:2409.16280 [pdf, other]

MonoFormer: One Transformer for Both Diffusion and Autoregression

Authors: Chuyang Zhao, Yuxing Song, Wenhao Wang, Haocheng Feng, Errui Ding, Yifan Sun, Xinyan Xiao, Jingdong Wang

Abstract: Most existing multimodality methods use separate backbones for autoregression-based discrete text generation and diffusion-based continuous visual generation, or the same backbone by discretizing the visual data to use autoregression for both text and visual generation. In this paper, we propose to study a simple idea: share one transformer for both autoregression and diffusion. The feasibility co… ▽ More Most existing multimodality methods use separate backbones for autoregression-based discrete text generation and diffusion-based continuous visual generation, or the same backbone by discretizing the visual data to use autoregression for both text and visual generation. In this paper, we propose to study a simple idea: share one transformer for both autoregression and diffusion. The feasibility comes from two main aspects: (i) Transformer is successfully applied to diffusion for visual generation, and (ii) transformer training for autoregression and diffusion is very similar, and the difference merely lies in that diffusion uses bidirectional attention mask and autoregression uses causal attention mask. Experimental results show that our approach achieves comparable image generation performance to current state-of-the-art methods as well as maintains the text generation capability. The project is publicly available at https://monoformer.github.io/. △ Less

Submitted 24 September, 2024; originally announced September 2024.

arXiv:2409.16226 [pdf, other]

Liquid sloshing behaviours in an elastic tank and suppression effect of baffles

Authors: Chenxi Zhao, Yan Wu, Yongchuan Yu, Oskar J. Haidn, Xiangyu Hu

Abstract: In this paper, a fluid-structure interaction (FSI) framework based on the smoothed particle hydrodynamics (SPH) method is employed to investigate the forces and deformations experienced by LNG tanks during liquid sloshing. As a Lagrangian approach, the SPH method offers the advantage of accurately modelling free-surface flow. The fluid phase consisting of water and air is modelled as a multi-phase… ▽ More In this paper, a fluid-structure interaction (FSI) framework based on the smoothed particle hydrodynamics (SPH) method is employed to investigate the forces and deformations experienced by LNG tanks during liquid sloshing. As a Lagrangian approach, the SPH method offers the advantage of accurately modelling free-surface flow. The fluid phase consisting of water and air is modelled as a multi-phase system for getting closer to real transport situations. Additionally, the application of FSI within a single framework reduces data transfer discrepancies between fluid dynamics and solid mechanics. To validate the reliability of the numerical methodology, the simulation results about the free surface elevation and wave profiles are compared with experimental data. Subsequently, ring baffles and vertical baffles are introduced separately. While the degree of force acting on the tanks is assessed, the anti-sloshing effectiveness of baffles on sloshing suppression and the variations in stress and strain distributions are evaluated. Further, to compare the influence of the material properties of baffles on sloshing phenomena, the rigid baffle and elastic baffle with different Young's moduli are immersed in the liquid. The results indicate that in this LNG tank configuration, the closer the baffle properties align with rigidity, the more effective the sloshing inhibition. △ Less

Submitted 24 September, 2024; originally announced September 2024.

arXiv:2409.15750 [pdf, other]

The Roles of Generative Artificial Intelligence in Internet of Electric Vehicles

Authors: Hanwen Zhang, Dusit Niyato, Wei Zhang, Changyuan Zhao, Hongyang Du, Abbas Jamalipour, Sumei Sun, Yiyang Pei

Abstract: With the advancement of generative artificial intelligence (GenAI) models, their capability to generate content is seeing significant enhancement, leading to widespread applications in the field of data generation and forecasting. Furthermore, GenAI has strong capabilities in data modeling and analysis, which enhances Internet of electric vehicles (IoEV) applications in various aspects. In this pa… ▽ More With the advancement of generative artificial intelligence (GenAI) models, their capability to generate content is seeing significant enhancement, leading to widespread applications in the field of data generation and forecasting. Furthermore, GenAI has strong capabilities in data modeling and analysis, which enhances Internet of electric vehicles (IoEV) applications in various aspects. In this paper, we investigate and survey applications of GenAI in the IoEV. Specifically, we categorize GenAI for IoEV into four different layers namely, EV's battery layer, individual electric vehicle (EV) layer, smart grid with EV layer, and security layer. We first introduce various GenAI techniques used in each layer of IoEV applications. Subsequently, public datasets available for training the GenAI models are summarized. Finally, we provide recommendations for future directions. This survey not only categorizes the applications of GenAI in IoEV across different layers but also serves as a valuable resource for researchers and practitioners by highlighting the design and implementation challenges within each layer. Furthermore, it provides a roadmap for future research directions, enabling the development of more robust and efficient IoEV systems through the integration of advanced GenAI techniques. △ Less

Submitted 24 September, 2024; originally announced September 2024.

Comments: 25 Pages

arXiv:2409.15540 [pdf, ps, other]

On $p(x)$-Laplacian equations in $\mathbb{R}^{N}$ with nonlinearity sublinear at zero

Authors: Shibo Liu, Chunshan Zhao

Abstract: Let $p,q$ be functions on $\mathbb{R}^{N}$ satisfying $1\ll q\ll p\ll N$, we consider $p(x)$-Laplacian problems of the form \[ \left\{ \begin{array} [c]{l}% -Δ_{p(x)}u+V(x)\vert u\vert ^{p(x)-2}u=λ\vert u\vert ^{q(x)-2}u+g(x,u)\text{,}\\ u\in W^{1,p(x)}(\mathbb{R}^{N})\text{.}% \end{array} \right. \] To apply variational methods, we introduce a subspace $X$ of $W^{1,p(x)}(\mathbb{R}^N)$ as our wor… ▽ More Let $p,q$ be functions on $\mathbb{R}^{N}$ satisfying $1\ll q\ll p\ll N$, we consider $p(x)$-Laplacian problems of the form \[ \left\{ \begin{array} [c]{l}% -Δ_{p(x)}u+V(x)\vert u\vert ^{p(x)-2}u=λ\vert u\vert ^{q(x)-2}u+g(x,u)\text{,}\\ u\in W^{1,p(x)}(\mathbb{R}^{N})\text{.}% \end{array} \right. \] To apply variational methods, we introduce a subspace $X$ of $W^{1,p(x)}(\mathbb{R}^N)$ as our working space. Compact embedding from $X$ into $L^{q(x)}(\mathbb{R}^N)$ is proved, this enable us to get nontrivial solution of the problem; and two sequences of solutions going to $\infty$ and $0$ respectively, when $g(x,\cdot)$ is odd. △ Less

Submitted 23 September, 2024; originally announced September 2024.

Comments: 14 pages

MSC Class: Primary 35J60; Secondary 35D05

arXiv:2409.14922 [pdf, ps, other]

Hybrid Beamforming and Waveform Design for Over-the-air Integrated Signal

Authors: Chonghao Zhao

Abstract: The future wireless communications are expected to provide new use scenarios with emerging techniques. This paper focuses on vehicle to everything (V2X) network, where vehicles should cooperatively implement information obtaining, data sharing, and information postprocessing. Conventionally, the above three operations are considered in different layers or separated waveforms, leading to unavoidabl… ▽ More The future wireless communications are expected to provide new use scenarios with emerging techniques. This paper focuses on vehicle to everything (V2X) network, where vehicles should cooperatively implement information obtaining, data sharing, and information postprocessing. Conventionally, the above three operations are considered in different layers or separated waveforms, leading to unavoidable interference and inefficient resource management when the number of devices becomes large. In this paper, we exploit the hybrid beamforming to design a cost-effective over-the-air integrated framework, and further consider the integrated waveform design problem with the constant-modulus constrain (CMC) and similarity constraint (SC). To solve these non-convex problems, an alternating optimization (AO) approach is proposed to jointly optimize the digital precoder as well as the hybrid combiner, using the successive convex approximation (SCA) and Riemannian conjugate gradient (RCG) algorithm. Additionally, we use the semidefinite relaxation (SDR) method to handle the practical waveform design problem. Numerical results demonstrate the effectiveness of the proposed hybrid beamforming and waveform design. △ Less

Submitted 23 September, 2024; originally announced September 2024.

arXiv:2409.14827 [pdf, other]

AIM 2024 Challenge on Video Saliency Prediction: Methods and Results

Authors: Andrey Moskalenko, Alexey Bryncev, Dmitry Vatolin, Radu Timofte, Gen Zhan, Li Yang, Yunlong Tang, Yiting Liao, Jiongzhi Lin, Baitao Huang, Morteza Moradi, Mohammad Moradi, Francesco Rundo, Concetto Spampinato, Ali Borji, Simone Palazzo, Yuxin Zhu, Yinan Sun, Huiyu Duan, Yuqin Cao, Ziheng Jia, Qiang Hu, Xiongkuo Min, Guangtao Zhai, Hao Fang , et al. (8 additional authors not shown)

Abstract: This paper reviews the Challenge on Video Saliency Prediction at AIM 2024. The goal of the participants was to develop a method for predicting accurate saliency maps for the provided set of video sequences. Saliency maps are widely exploited in various applications, including video compression, quality assessment, visual perception studies, the advertising industry, etc. For this competition, a pr… ▽ More This paper reviews the Challenge on Video Saliency Prediction at AIM 2024. The goal of the participants was to develop a method for predicting accurate saliency maps for the provided set of video sequences. Saliency maps are widely exploited in various applications, including video compression, quality assessment, visual perception studies, the advertising industry, etc. For this competition, a previously unused large-scale audio-visual mouse saliency (AViMoS) dataset of 1500 videos with more than 70 observers per video was collected using crowdsourced mouse tracking. The dataset collection methodology has been validated using conventional eye-tracking data and has shown high consistency. Over 30 teams registered in the challenge, and there are 7 teams that submitted the results in the final phase. The final phase solutions were tested and ranked by commonly used quality metrics on a private test subset. The results of this evaluation and the descriptions of the solutions are presented in this report. All data, including the private test subset, is made publicly available on the challenge homepage - https://challenges.videoprocessing.ai/challenges/video-saliency-prediction.html. △ Less

Submitted 23 September, 2024; originally announced September 2024.

Comments: ECCVW 2024

ACM Class: I.4.6; I.2.10

arXiv:2409.14772 [pdf, other]

Domino-like magnetic phase transition induced by a bias voltage in FeRh thin film

Authors: Huiliang Wu, Jianbo Wang, Chenbo Zhao, Qingfeng Zhan, Jiangtao Xue, Senfu Zhang, Jinwu Wei, Xiangqian Wang, Qingfang Liu

Abstract: The first-order magnetic phase transition (MPT) usually happens with a very wide magnetic field range about tens of thousands Oersted which hinders its applications. In this work, we induce a domino-like MPT via introducing a bias voltage in FeRh thin film and thus realize a large narrowing of transition magnetic field range from 6*10^4 Oe to lower than 2*10^3 Oe at room temperature. Furthermore,… ▽ More The first-order magnetic phase transition (MPT) usually happens with a very wide magnetic field range about tens of thousands Oersted which hinders its applications. In this work, we induce a domino-like MPT via introducing a bias voltage in FeRh thin film and thus realize a large narrowing of transition magnetic field range from 6*10^4 Oe to lower than 2*10^3 Oe at room temperature. Furthermore, the critical condition and phase diagram for domino-like MPTs are obtained in theory and our experiments support it well. Our works not only benefit the studies and applications of MPT-based devices but also are significant in the applications of the phase transition systems with resistance change. △ Less

Submitted 23 September, 2024; originally announced September 2024.

arXiv:2409.14705 [pdf, other]

Target-Aware Language Modeling via Granular Data Sampling

Authors: Ernie Chang, Pin-Jie Lin, Yang Li, Changsheng Zhao, Daeil Kim, Rastislav Rabatin, Zechun Liu, Yangyang Shi, Vikas Chandra

Abstract: Language model pretraining generally targets a broad range of use cases and incorporates data from diverse sources. However, there are instances where we desire a model that excels in specific areas without markedly compromising performance in other areas. A cost-effective and straightforward approach is sampling with low-dimensional data features, which allows to select large-scale pretraining da… ▽ More Language model pretraining generally targets a broad range of use cases and incorporates data from diverse sources. However, there are instances where we desire a model that excels in specific areas without markedly compromising performance in other areas. A cost-effective and straightforward approach is sampling with low-dimensional data features, which allows to select large-scale pretraining data for domain-specific use cases. In this work, we revisit importance sampling with n-gram features consisting of multi-granular tokens, which strikes a good balance between sentence compression and representation capabilities. We observed the sampled data to have a high correlation with the target downstream task performance while preserving its effectiveness on other tasks. This leads to the proposed data sampling paradigm where language models can be pretrained more efficiently on selected documents. On eight benchmarks we demonstrate with $\sim$1% of the data, pretrained models perform on par with the full RefinedWeb data and outperform randomly selected samples for model sizes ranging from 125M to 1.5B. △ Less

Submitted 23 September, 2024; originally announced September 2024.

Comments: Accepted to EMNLP 2024 Main Conference, 9 pages, 6 figures, 3 tables

arXiv:2409.14441 [pdf, other]

BUPTCMCC-6G-CMG+: A GBSM-Based ISAC Channel Model Simulator

Authors: Changsheng Zhao, Yuxiang Zhang, Heng Wang, Lei Tian, Jianhua Zhang, Hanyuan Jiang

Abstract: Integrated Sensing and Communication (ISAC) is one of the key technologies in 6G, and related research and standardization efforts are progressing vigorously. Wireless channel simulation is the cornerstone for the evaluation and optimization of wireless communication technologies. This paper proposes a design and implementation method for an ISAC channel simulation based on a Geometry-Based Stocha… ▽ More Integrated Sensing and Communication (ISAC) is one of the key technologies in 6G, and related research and standardization efforts are progressing vigorously. Wireless channel simulation is the cornerstone for the evaluation and optimization of wireless communication technologies. This paper proposes a design and implementation method for an ISAC channel simulation based on a Geometry-Based Stochastic Model (GBSM) simulation framework. First, we introduce the progress of 3GPP ISAC channel standardization and the key topics of discussion. Second, addressing the current lack of a standardized ISAC channel simulation framework, we propose a cascaded ISAC channel simulation framework based on GBSM, leveraging our team's related measurements, analyses, and proposal results. Based on this framework, we develop and design the ISAC channel simulator BUPTCMCC-6G-CMG+. Finally, we analyze and validate the simulation platform results, and provide some prospects for future ISAC testing research combined with channel simulators. △ Less

Submitted 17 October, 2024; v1 submitted 22 September, 2024; originally announced September 2024.

Comments: 12 pages,5 fiures,2 tables

arXiv:2409.14365 [pdf, other]

D3RoMa: Disparity Diffusion-based Depth Sensing for Material-Agnostic Robotic Manipulation

Authors: Songlin Wei, Haoran Geng, Jiayi Chen, Congyue Deng, Wenbo Cui, Chengyang Zhao, Xiaomeng Fang, Leonidas Guibas, He Wang

Abstract: Depth sensing is an important problem for 3D vision-based robotics. Yet, a real-world active stereo or ToF depth camera often produces noisy and incomplete depth which bottlenecks robot performances. In this work, we propose D3RoMa, a learning-based depth estimation framework on stereo image pairs that predicts clean and accurate depth in diverse indoor scenes, even in the most challenging scenari… ▽ More Depth sensing is an important problem for 3D vision-based robotics. Yet, a real-world active stereo or ToF depth camera often produces noisy and incomplete depth which bottlenecks robot performances. In this work, we propose D3RoMa, a learning-based depth estimation framework on stereo image pairs that predicts clean and accurate depth in diverse indoor scenes, even in the most challenging scenarios with translucent or specular surfaces where classical depth sensing completely fails. Key to our method is that we unify depth estimation and restoration into an image-to-image translation problem by predicting the disparity map with a denoising diffusion probabilistic model. At inference time, we further incorporated a left-right consistency constraint as classifier guidance to the diffusion process. Our framework combines recently advanced learning-based approaches and geometric constraints from traditional stereo vision. For model training, we create a large scene-level synthetic dataset with diverse transparent and specular objects to compensate for existing tabletop datasets. The trained model can be directly applied to real-world in-the-wild scenes and achieve state-of-the-art performance in multiple public depth estimation benchmarks. Further experiments in real environments show that accurate depth prediction significantly improves robotic manipulation in various scenarios. △ Less

Submitted 24 September, 2024; v1 submitted 22 September, 2024; originally announced September 2024.

arXiv:2409.14031 [pdf, other]

Signal Detection in Near-field Communication with Unknown Noise Characteristics: A Diffusion Model Method

Authors: Changyuan Zhao, Jiacheng Wang, Ruichen Zhang, Dusit Niyato, Dong In Kim, Hongyang Du

Abstract: In this letter, we present a diffusion model method for signal detection in near-field communication with unknown noise characteristics. We consider an uplink transmission of a near-filed MIMO communication system consisting of multiple mobile terminals and one base station with multiple antennas. Then, we proposed a Maximum Likelihood Estimation Diffusion Detector (MLEDD) aiming at learning the d… ▽ More In this letter, we present a diffusion model method for signal detection in near-field communication with unknown noise characteristics. We consider an uplink transmission of a near-filed MIMO communication system consisting of multiple mobile terminals and one base station with multiple antennas. Then, we proposed a Maximum Likelihood Estimation Diffusion Detector (MLEDD) aiming at learning the distribution of unknown noise. To this end, we define an error function via Bayes' theorem to detect the source signal. Moreover, we present an implementation of the proposed framework. The performance of the proposed method in terms of bit error rate shows that it outperforms the MLE detector, Detection Network (DetNet), and Maximum Normalizing Flow Estimate method (MANFE) across different signal-to-noise ratios and noise distributions. Especially when the noise distribution is intractable, diffusion, as a state-of-the-art probability model, has the best distribution learning ability compared to other models. These results affirm that this framework can effectively detect signals in near-field scenarios. △ Less

Submitted 21 September, 2024; originally announced September 2024.

Comments: 5 pages, 3 figures

arXiv:2409.13763 [pdf]

Can thermal nonreciprocity improve the radiative cooling efficiency?

Authors: Mengqi Liu, Shenghao Jin, Chenglong Zhou, Boxiang Wang, Changying Zhao, Cheng-Wei Qiu

Abstract: Can thermal nonreciprocity improve the radiative cooling efficiency? Probably not. Can thermal nonreciprocity improve the radiative cooling efficiency? Probably not. △ Less

Submitted 17 September, 2024; originally announced September 2024.

Comments: 12 pages, 3 figures

arXiv:2409.12814 [pdf]

GeSn 320 \times 256 Focal Plane Array for Silicon-Based Short-wave Infrared Imaging

Authors: Guoyin Xu, Hui Cong, Yue Li, Zhengjie Wu, Fenghe Fu, Ping Chen, Chao Zhao, Chi Xu, Chunlai Xue

Abstract: Short-wave infrared (SWIR) imaging arrays have demonstrated great potential in applications spanning from military to civilian consumer electronics. However, the current focal plane arrays (FPAs), which are based on compound semiconductors, have limited applications in civilian circumstances due to elevated manufacturing costs and prolonged fabrication cycle time. To address this, a high-performan… ▽ More Short-wave infrared (SWIR) imaging arrays have demonstrated great potential in applications spanning from military to civilian consumer electronics. However, the current focal plane arrays (FPAs), which are based on compound semiconductors, have limited applications in civilian circumstances due to elevated manufacturing costs and prolonged fabrication cycle time. To address this, a high-performance 320 $\times$ 256 focal plane array based on group-IV semiconductors has been designed and manufactured on a Si substrate using a complementary metal-oxide semiconductor (CMOS) compatible fabrication process. The optical absorption layer is composed of GeSn alloy, whose bandgap could be tailored by choosing the appropriate Sn concentration. In this work, a 10% Sn concentration was employed, yielding a response cutoff wavelength of 2308 nm for the Si-based photodetector, which was measured at 298 K. Moreover, a specific detectivity of 9.7 $\times$ 10$^{11}$ cm$\cdot$ Hz$^{1/2}$ $\cdot$ W$^{-1}$ has been achieved at 77 K, surpassing all previously reported GeSn devices, and rivals commercial extended InGaAs photodetectors. With the help of read-out circuits (ROIC), SWIR images have been successfully captured for the first time by using Si-based GeSn FPA. This work demonstrates the potential of group IV imaging arrays for various applications in the commercial SWIR imaging field. △ Less

Submitted 19 September, 2024; originally announced September 2024.

arXiv:2409.11927 [pdf, other]

A Revised Spin of the Black Hole in GRS 1716-249 with a New Distance

Authors: S. J. Zhao, L. Tao, Q. Q. Yin, S. N. Zhang, R. C. Ma, P. P. Li, Q. C. Zhao, M. Y. Ge, L. Zhang, J. L. Qu, S. Zhang, X. Ma, Y. Huang, J. Q. Peng, Y. X. Xiao

Abstract: GRS 1716-249 is a stellar-mass black hole in a low-mass X-ray binary that underwent a gaint outburst in 2016/17. In this paper we use simultaneous observations of Insight-HXMT and NuSTAR to determine its basic parameters. The observations were performed during the softest part of the outburst, and the spectra show clear thermal disk emission and reflection features. We have fitted the X-ray energy… ▽ More GRS 1716-249 is a stellar-mass black hole in a low-mass X-ray binary that underwent a gaint outburst in 2016/17. In this paper we use simultaneous observations of Insight-HXMT and NuSTAR to determine its basic parameters. The observations were performed during the softest part of the outburst, and the spectra show clear thermal disk emission and reflection features. We have fitted the X-ray energy spectra using the joint fitting method of the continuum and reflection components with the kerrbb2+ relxill model. Since there is a possibility that the distance to this source was previously underestimated, we use the latest distance parameter of 6.9 kpc in our study, in contrast to previous work in which the distance was set at 2.4 kpc. Through spectral fitting of fixing black hole mass at 6.4 $M_{\rm \odot}$, we observe a strong dependence of the derived spin on the distance: $a_{*}=0.972_{-0.005}^{+0.004}$ at an assumed distance of 2.4 kpc and $a_{*}=0.464_{-0.007}^{+0.016}$ at an assumed distance of 6.9 kpc, at a confidence level of 90%. If considering the uncertainties in the distance and black hole mass, there will be a wider range of spin with $a_{*}$ < 0.78. The fitting results with the new distance indicate that GRS 1716-249 harbors a moderate spin black hole with an inclined ($i\sim 40-50^{\circ}$) accretion disk around it. Additionally, we have also found that solely using the method of the reflection component fitting but ignoring the constraints on the spin from the accretion disk component will result in an extremely high spin. △ Less

Submitted 18 September, 2024; originally announced September 2024.

arXiv:2409.11234 [pdf, other]

STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking

Authors: Jianbo Ma, Chuanming Tang, Fei Wu, Can Zhao, Jianlin Zhang, Zhiyong Xu

Abstract: Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is important for diverse applications in computer vision. Current MOT trackers rely on accurate object detection results and precise matching of target reidentification (ReID). These methods focus on optimizing target spatial attributes while overlooking temporal cues in modelling object relationships, especially for challengin… ▽ More Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is important for diverse applications in computer vision. Current MOT trackers rely on accurate object detection results and precise matching of target reidentification (ReID). These methods focus on optimizing target spatial attributes while overlooking temporal cues in modelling object relationships, especially for challenging tracking conditions such as object deformation and blurring, etc. To address the above-mentioned issues, we propose a novel Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT), which utilizes historical embedding features to model the representation of ReID and detection features in a sequential order. Concretely, a temporal embedding boosting module is introduced to enhance the discriminability of individual embedding based on adjacent frame cooperation. While the trajectory embedding is then propagated by a temporal detection refinement module to mine salient target locations in the temporal field. Extensive experiments on the VisDrone2019 and UAVDT datasets demonstrate our STCMOT sets a new state-of-the-art performance in MOTA and IDF1 metrics. The source codes are released at https://github.com/ydhcg-BoBo/STCMOT. △ Less

Submitted 17 September, 2024; originally announced September 2024.

arXiv:2409.11169 [pdf, other]

MAISI: Medical AI for Synthetic Imaging

Authors: Pengfei Guo, Can Zhao, Dong Yang, Ziyue Xu, Vishwesh Nath, Yucheng Tang, Benjamin Simon, Mason Belue, Stephanie Harmon, Baris Turkbey, Daguang Xu

Abstract: Medical imaging analysis faces challenges such as data scarcity, high annotation costs, and privacy concerns. This paper introduces the Medical AI for Synthetic Imaging (MAISI), an innovative approach using the diffusion model to generate synthetic 3D computed tomography (CT) images to address those challenges. MAISI leverages the foundation volume compression network and the latent diffusion mode… ▽ More Medical imaging analysis faces challenges such as data scarcity, high annotation costs, and privacy concerns. This paper introduces the Medical AI for Synthetic Imaging (MAISI), an innovative approach using the diffusion model to generate synthetic 3D computed tomography (CT) images to address those challenges. MAISI leverages the foundation volume compression network and the latent diffusion model to produce high-resolution CT images (up to a landmark volume dimension of 512 x 512 x 768 ) with flexible volume dimensions and voxel spacing. By incorporating ControlNet, MAISI can process organ segmentation, including 127 anatomical structures, as additional conditions and enables the generation of accurately annotated synthetic images that can be used for various downstream tasks. Our experiment results show that MAISI's capabilities in generating realistic, anatomically accurate images for diverse regions and conditions reveal its promising potential to mitigate challenges using synthetic data. △ Less

Submitted 13 September, 2024; originally announced September 2024.

arXiv:2409.09250 [pdf, ps, other]

Optimal Adaptive Control of Linear Stochastic Systems with Quadratic Cost Function

Authors: Nian Liu, Cheng Zhao, Shaolin Tan, Jinhu Lü

Abstract: In this paper, we consider the adaptive linear quadratic Gaussian control problem, where both the linear transformation matrix of the state $A$ and the control gain matrix $B$ are unknown. The proposed adaptive optimal control only assumes that $(A, B)$ is stabilizable and $(A, Q^{1/2})$ is detectable, where $Q$ is the weighting matrix of the state in the quadratic cost function. This condition si… ▽ More In this paper, we consider the adaptive linear quadratic Gaussian control problem, where both the linear transformation matrix of the state $A$ and the control gain matrix $B$ are unknown. The proposed adaptive optimal control only assumes that $(A, B)$ is stabilizable and $(A, Q^{1/2})$ is detectable, where $Q$ is the weighting matrix of the state in the quadratic cost function. This condition significantly weakens the classic assumptions used in the literature. To tackle this problem, a weighted least squares algorithm is modified by using random regularization method, which can ensure uniform stabilizability and uniform detectability of the family of estimated models. At the same time, a diminishing excitation is incorporated into the design of the proposed adaptive control to guarantee strong consistency of the desired components of the estimates. Finally, by utilizing this family of estimates, even if not all components of them converge to the true values, it is demonstrated that a certainty equivalence control with such a diminishing excitation is optimal for an ergodic quadratic cost function. △ Less

Submitted 13 September, 2024; originally announced September 2024.

arXiv:2409.05493 [pdf, other]

DexDiff: Towards Extrinsic Dexterity Manipulation of Ungraspable Objects in Unrestricted Environments

Authors: Chengzhong Ma, Houxue Yang, Hanbo Zhang, Zeyang Liu, Chao Zhao, Jian Tang, Xuguang Lan, Nanning Zheng

Abstract: Grasping large and flat objects (e.g. a book or a pan) is often regarded as an ungraspable task, which poses significant challenges due to the unreachable grasping poses. Previous works leverage Extrinsic Dexterity like walls or table edges to grasp such objects. However, they are limited to task-specific policies and lack task planning to find pre-grasp conditions. This makes it difficult to adap… ▽ More Grasping large and flat objects (e.g. a book or a pan) is often regarded as an ungraspable task, which poses significant challenges due to the unreachable grasping poses. Previous works leverage Extrinsic Dexterity like walls or table edges to grasp such objects. However, they are limited to task-specific policies and lack task planning to find pre-grasp conditions. This makes it difficult to adapt to various environments and extrinsic dexterity constraints. Therefore, we present DexDiff, a robust robotic manipulation method for long-horizon planning with extrinsic dexterity. Specifically, we utilize a vision-language model (VLM) to perceive the environmental state and generate high-level task plans, followed by a goal-conditioned action diffusion (GCAD) model to predict the sequence of low-level actions. This model learns the low-level policy from offline data with the cumulative reward guided by high-level planning as the goal condition, which allows for improved prediction of robot actions. Experimental results demonstrate that our method not only effectively performs ungraspable tasks but also generalizes to previously unseen objects. It outperforms baselines by a 47% higher success rate in simulation and facilitates efficient deployment and manipulation in real-world scenarios. △ Less

Submitted 9 September, 2024; originally announced September 2024.

arXiv:2409.04754 [pdf, ps, other]

Minimal extension property of direct images

Authors: Chen Zhao

Abstract: Given a projective morphism $f:X\to Y$ from a complex space to a complex manifold, we prove the Griffiths semi-positivity and minimal extension property of the direct image sheaf $f_\ast(\mathscr{F})$. Here, $\mathscr{F}$ is a coherent sheaf on $X$, which consists of the Grauert-Riemenschneider dualizing sheaf, a multiplier ideal sheaf, and a variation of Hodge structure (or more generally, a tame… ▽ More Given a projective morphism $f:X\to Y$ from a complex space to a complex manifold, we prove the Griffiths semi-positivity and minimal extension property of the direct image sheaf $f_\ast(\mathscr{F})$. Here, $\mathscr{F}$ is a coherent sheaf on $X$, which consists of the Grauert-Riemenschneider dualizing sheaf, a multiplier ideal sheaf, and a variation of Hodge structure (or more generally, a tame harmonic bundle). △ Less

Submitted 7 September, 2024; originally announced September 2024.

Comments: Comments are welcome

arXiv:2409.03853 [pdf, other]

doi 10.1145/3678575

Users' Perspectives on Multimodal Menstrual Tracking Using Consumer Health Devices

Authors: Georgianna Lin, Brenna Li, Helen Li, Chloe Zhao, Khai N Truong, Alex Mariakakis

Abstract: Previous menstrual health literature highlights a variety of signals not included in existing menstrual trackers because they are either difficult to gather or are not typically associated with menstrual health. Since it has become increasingly convenient to collect biomarkers through wearables and other consumer-grade devices, our work examines how people incorporate unconventional signals (e.g.,… ▽ More Previous menstrual health literature highlights a variety of signals not included in existing menstrual trackers because they are either difficult to gather or are not typically associated with menstrual health. Since it has become increasingly convenient to collect biomarkers through wearables and other consumer-grade devices, our work examines how people incorporate unconventional signals (e.g., blood glucose levels, heart rate) into their understanding of menstrual health. In this paper, we describe a three-month-long study on fifty participants' experiences as they tracked their health using physiological sensors and daily diaries. We analyzed their experiences with both conventional and unconventional menstrual health signals through surveys and interviews conducted throughout the study. We delve into the various aspects of menstrual health that participants sought to affirm using unconventional signals, explore how these signals influenced their daily behaviors, and examine how multimodal menstrual tracking expanded their scope of menstrual health. Finally, we provide design recommendations for future multimodal menstrual trackers. △ Less

Submitted 5 September, 2024; originally announced September 2024.

Comments: 25 pages, 4 figures, 2 tables. The paper was accepted by IMWUT/Ubicomp 2024

arXiv:2409.02877 [pdf, other]

Configurable Foundation Models: Building LLMs from a Modular Perspective

Authors: Chaojun Xiao, Zhengyan Zhang, Chenyang Song, Dazhi Jiang, Feng Yao, Xu Han, Xiaozhi Wang, Shuo Wang, Yufei Huang, Guanyu Lin, Yingfa Chen, Weilin Zhao, Yuge Tu, Zexuan Zhong, Ao Zhang, Chenglei Si, Khai Hao Moo, Chenyang Zhao, Huimin Chen, Yankai Lin, Zhiyuan Liu, Jingbo Shang, Maosong Sun

Abstract: Advancements in LLMs have recently unveiled challenges tied to computational efficiency and continual scalability due to their requirements of huge parameters, making the applications and evolution of these models on devices with limited computation resources and scenarios requiring various abilities increasingly cumbersome. Inspired by modularity within the human brain, there is a growing tendenc… ▽ More Advancements in LLMs have recently unveiled challenges tied to computational efficiency and continual scalability due to their requirements of huge parameters, making the applications and evolution of these models on devices with limited computation resources and scenarios requiring various abilities increasingly cumbersome. Inspired by modularity within the human brain, there is a growing tendency to decompose LLMs into numerous functional modules, allowing for inference with part of modules and dynamic assembly of modules to tackle complex tasks, such as mixture-of-experts. To highlight the inherent efficiency and composability of the modular approach, we coin the term brick to represent each functional module, designating the modularized structure as configurable foundation models. In this paper, we offer a comprehensive overview and investigation of the construction, utilization, and limitation of configurable foundation models. We first formalize modules into emergent bricks - functional neuron partitions that emerge during the pre-training phase, and customized bricks - bricks constructed via additional post-training to improve the capabilities and knowledge of LLMs. Based on diverse functional bricks, we further present four brick-oriented operations: retrieval and routing, merging, updating, and growing. These operations allow for dynamic configuration of LLMs based on instructions to handle complex tasks. To verify our perspective, we conduct an empirical analysis on widely-used LLMs. We find that the FFN layers follow modular patterns with functional specialization of neurons and functional neuron partitions. Finally, we highlight several open issues and directions for future research. Overall, this paper aims to offer a fresh modular perspective on existing LLM research and inspire the future creation of more efficient and scalable foundational models. △ Less

Submitted 4 September, 2024; originally announced September 2024.

arXiv:2409.02132 [pdf, other]

Recognition of Schrodinger cat state based on CNN

Authors: Tao Zhang, Chaoying Zhao

Abstract: We applied convolutional neural networks to the classification of cat states and coherent states. Initially, we generated datasets of Schrodinger cat states and coherent states from nonlinear processes and preprocessed these datasets. Subsequently, we constructed both LeNet and ResNet network architectures, adjusting parameters such as convolution kernels and strides to optimal values. We then tra… ▽ More We applied convolutional neural networks to the classification of cat states and coherent states. Initially, we generated datasets of Schrodinger cat states and coherent states from nonlinear processes and preprocessed these datasets. Subsequently, we constructed both LeNet and ResNet network architectures, adjusting parameters such as convolution kernels and strides to optimal values. We then trained both LeNet and ResNet on the training sets. The loss function values indicated that ResNet performs better in classifying cat states and coherent states. Finally, we evaluated the trained models on the test sets, achieving an accuracy of 97.5% for LeNet and 100% for ResNet. We evaluated cat states and coherent states with different α, demonstrating a certain degree of generalization capability. The results show that LeNet may mistakenly recognize coherent states as cat states without coherent features, while ResNet provides a feasible solution to the problem of mistakenly recognizing cat states and coherent states by traditional neural networks. △ Less

Submitted 2 September, 2024; originally announced September 2024.

Comments: 6pages,5figures

arXiv:2409.01822 [pdf, other]

Capillary-driven migration of droplets on conical fibers

Authors: Yixiao Mao, Chengxi Zhao, Kai Mu, Kai Li, Ting Si

Abstract: A droplet placed on a hydrophilic conical fiber tends to move toward the end of larger radii due to capillary action. Experimental investigations are performed to explore the dynamics of droplets with varying viscosities and volumes on different fibers at the microscale. Droplets are found to accelerate initially and subsequently decelerate during migration. A dynamic model is developed to capture… ▽ More A droplet placed on a hydrophilic conical fiber tends to move toward the end of larger radii due to capillary action. Experimental investigations are performed to explore the dynamics of droplets with varying viscosities and volumes on different fibers at the microscale. Droplets are found to accelerate initially and subsequently decelerate during migration. A dynamic model is developed to capture dynamics of the droplet migration, addressing the limitations of previous equilibrium-based scaling laws. Both experimental results and theoretical predictions indicate that droplets on more divergent fibers experience a longer acceleration phase. Additionally, gravitational effects are pronounced on fibers with small cone angles, exerting a substantial influence on droplet migration even below the capillary scale. Moreover, droplets move more slowly on dry fibers compared to those prewetted with the same liquid, primarily attributed to the increased friction. The experiments reveal the formation of a residual liquid film after droplet migration on dry fibers, leading to considerable volume loss in the droplets. To encompass the intricacies of migration on dry fibers, the model is refined to incorporate a higher friction coefficient and variable droplet volumes, providing a more comprehensive depiction of the underlying physics. △ Less

Submitted 3 September, 2024; originally announced September 2024.

arXiv:2409.01557 [pdf, other]

TASL-Net: Tri-Attention Selective Learning Network for Intelligent Diagnosis of Bimodal Ultrasound Video

Authors: Chengqian Zhao, Zhao Yao, Zhaoyu Hu, Yuanxin Xie, Yafang Zhang, Yuanyuan Wang, Shuo Li, Jianhua Zhou, Jianqiao Zhou, Yin Wang, Jinhua Yu

Abstract: In the intelligent diagnosis of bimodal (gray-scale and contrast-enhanced) ultrasound videos, medical domain knowledge such as the way sonographers browse videos, the particular areas they emphasize, and the features they pay special attention to, plays a decisive role in facilitating precise diagnosis. Embedding medical knowledge into the deep learning network can not only enhance performance but… ▽ More In the intelligent diagnosis of bimodal (gray-scale and contrast-enhanced) ultrasound videos, medical domain knowledge such as the way sonographers browse videos, the particular areas they emphasize, and the features they pay special attention to, plays a decisive role in facilitating precise diagnosis. Embedding medical knowledge into the deep learning network can not only enhance performance but also boost clinical confidence and reliability of the network. However, it is an intractable challenge to automatically focus on these person- and disease-specific features in videos and to enable networks to encode bimodal information comprehensively and efficiently. This paper proposes a novel Tri-Attention Selective Learning Network (TASL-Net) to tackle this challenge and automatically embed three types of diagnostic attention of sonographers into a mutual transformer framework for intelligent diagnosis of bimodal ultrasound videos. Firstly, a time-intensity-curve-based video selector is designed to mimic the temporal attention of sonographers, thus removing a large amount of redundant information while improving computational efficiency of TASL-Net. Then, to introduce the spatial attention of the sonographers for contrast-enhanced video analysis, we propose the earliest-enhanced position detector based on structural similarity variation, on which the TASL-Net is made to focus on the differences of perfusion variation inside and outside the lesion. Finally, by proposing a mutual encoding strategy that combines convolution and transformer, TASL-Net possesses bimodal attention to structure features on gray-scale videos and to perfusion variations on contrast-enhanced videos. These modules work collaboratively and contribute to superior performance. We conduct a detailed experimental validation of TASL-Net's performance on three datasets, including lung, breast, and liver. △ Less

Submitted 2 September, 2024; originally announced September 2024.

arXiv:2409.00060 [pdf, other]

Understanding Literary Texts by LLMs: A Case Study of Ancient Chinese Poetry

Authors: Cheng Zhao, Bin Wang, Zhen Wang

Abstract: The birth and rapid development of large language models (LLMs) have caused quite a stir in the field of literature. Once considered unattainable, AI's role in literary creation is increasingly becoming a reality. In genres such as poetry, jokes, and short stories, numerous AI tools have emerged, offering refreshing new perspectives. However, it's difficult to further improve the quality of these… ▽ More The birth and rapid development of large language models (LLMs) have caused quite a stir in the field of literature. Once considered unattainable, AI's role in literary creation is increasingly becoming a reality. In genres such as poetry, jokes, and short stories, numerous AI tools have emerged, offering refreshing new perspectives. However, it's difficult to further improve the quality of these works. This is primarily because understanding and appreciating a good literary work involves a considerable threshold, such as knowledge of literary theory, aesthetic sensibility, interdisciplinary knowledge. Therefore, authoritative data in this area is quite lacking. Additionally, evaluating literary works is often complex and hard to fully quantify, which directly hinders the further development of AI creation. To address this issue, this paper attempts to explore the mysteries of literary texts from the perspective of LLMs, using ancient Chinese poetry as an example for experimentation. First, we collected a variety of ancient poems from different sources and had experts annotate a small portion of them. Then, we designed a range of comprehension metrics based on LLMs to evaluate all these poems. Finally, we analyzed the correlations and differences between various poem collections to identify literary patterns. Through our experiments, we observed a series of enlightening phenomena that provide technical support for the future development of high-level literary creation based on LLMs. △ Less

Submitted 11 September, 2024; v1 submitted 22 August, 2024; originally announced September 2024.

arXiv:2408.17224 [pdf, other]

Hadronic cross section measurements with the DAMPE space mission using 20GeV-10TeV cosmic-ray protons and $^4$He

Authors: F. Alemanno, Q. An, P. Azzarello, F. C. T. Barbato, P. Bernardini, X. J. Bi, I. Cagnoli, M. S. Cai, E. Casilli, E. Catanzani, J. Chang, D. Y. Chen, J. L. Chen, Z. F. Chen, P. Coppin, M. Y. Cui, T. S. Cui, Y. X. Cui, H. T. Dai, A. De Benedittis, I. De Mitri, F. de Palma, A. Di Giovanni, Q. Ding, T. K. Dong , et al. (126 additional authors not shown)

Abstract: Precise direct cosmic-ray (CR) measurements provide an important probe to study the energetic particle sources in our Galaxy, and the interstellar environment through which these particles propagate. Uncertainties on hadronic models, ion-nucleon cross sections in particular, are currently the limiting factor towards obtaining more accurate CR ion flux measurements with calorimetric space-based exp… ▽ More Precise direct cosmic-ray (CR) measurements provide an important probe to study the energetic particle sources in our Galaxy, and the interstellar environment through which these particles propagate. Uncertainties on hadronic models, ion-nucleon cross sections in particular, are currently the limiting factor towards obtaining more accurate CR ion flux measurements with calorimetric space-based experiments. We present an energy-dependent measurement of the inelastic cross section of protons and helium-4 nuclei (alpha particles) on a Bi$_4$Ge$_3$O$_{12}$ target, using 88 months of data collected by the DAMPE space mission. The kinetic energy range per nucleon of the measurement points ranges from 18 GeV to 9 TeV for protons, and from 5 GeV/n to 3 TeV/n for helium-4 nuclei. Our results lead to a significant improvement of the CR flux normalisation. In the case of helium-4, these results correspond to the first cross section measurements on a heavy target material at energies above 10 GeV/n. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: 17 pages, submitted to PRD

arXiv:2408.17167 [pdf]

Highly Efficient and Stable Perovskite Solar Cells via MultiFunctional Curcumin Modified Buried Interface

Authors: Xianhu Wu, Jieyu Bi, Guanglei Cu, Nian Liu, Gaojie Xia, Jilong Sun, Jiaxin Jiang, Ning Lu, Ping Li, Chunyi Zhao, Zewen Zuo, Min Gu

Abstract: The buried interface between the electron transport layer and the perovskite layer suffers from severe interface defects and imperfect energy level alignment. To address this issue, this study employs a multifunctional organic molecule, curcumin, to modify the interface between SnO2 and the perovskite layer. The functional groups on curcumin effectively passivate the defects on both sides of the i… ▽ More The buried interface between the electron transport layer and the perovskite layer suffers from severe interface defects and imperfect energy level alignment. To address this issue, this study employs a multifunctional organic molecule, curcumin, to modify the interface between SnO2 and the perovskite layer. The functional groups on curcumin effectively passivate the defects on both sides of the interface, reducing -OH and oxygen vacancy defects on the SnO2 surface and passivating uncoordinated Pb2+ in the perovskite layer. This results in a more compatible energy level alignment and lower defect density at the interface, enhancing carrier transport across it. Consequently, the devices based on curcumin achieve an impressive champion power conversion efficiency (PCE) of 24.46%, compared to 22.03% for control devices. This work demonstrates a simple, green, hydrophobic, and efficient molecular modification method for the buried interface, laying the foundation for the development of high-performance and stable perovskite solar cells. △ Less

Submitted 30 August, 2024; originally announced August 2024.

arXiv:2408.16365 [pdf, ps, other]

Protograph-Based Batched Network Codes

Authors: Mingyang Zhu, Ming Jiang, Chunming Zhao

Abstract: Batched network codes (BNCs) are a low-complexity solution for communication through networks with packet loss. Although their belief propagation (BP) performance is proved to approach capacity in the asymptotic regime, there is no evidence indicating that their BP performance is as good as expected in the finite-length regime. In this paper, we propose a protograph-based construction for BNCs, re… ▽ More Batched network codes (BNCs) are a low-complexity solution for communication through networks with packet loss. Although their belief propagation (BP) performance is proved to approach capacity in the asymptotic regime, there is no evidence indicating that their BP performance is as good as expected in the finite-length regime. In this paper, we propose a protograph-based construction for BNCs, referred to as protograph-based BNCs (P-BNCs), which significantly differs from existing BNCs in three aspects: 1) Unlike traditional constructions where the degree of variable nodes is random, P-BNCs have a highly structured Tanner graph with specified degree distributions for both variable nodes and check nodes. 2) Traditional BNCs use a fixed degree distribution to generate all batches, making their performance highly sensitive to channel conditions, but P-BNCs achieve good performance under varying channel conditions due to their rate-compatible structures. 3) The construction of PBNCs takes into account joint BP decoding with a sparse precode, whereas traditional constructions typically do not consider a precode, or assume the presence of a precode that can recover a certain fraction of erasures. Thanks to these three improvements, P-BNCs not only have higher achievable rates under varying channel conditions, but more importantly, their finite-length BP performance is significantly improved. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: submitted to IEEE for possible publication

arXiv:2408.15664 [pdf, other]

Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts

Authors: Lean Wang, Huazuo Gao, Chenggang Zhao, Xu Sun, Damai Dai

Abstract: For Mixture-of-Experts (MoE) models, an unbalanced expert load will lead to routing collapse or increased computational overhead. Existing methods commonly employ an auxiliary loss to encourage load balance, but a large auxiliary loss will introduce non-negligible interference gradients into training and thus impair the model performance. In order to control load balance while not producing undesi… ▽ More For Mixture-of-Experts (MoE) models, an unbalanced expert load will lead to routing collapse or increased computational overhead. Existing methods commonly employ an auxiliary loss to encourage load balance, but a large auxiliary loss will introduce non-negligible interference gradients into training and thus impair the model performance. In order to control load balance while not producing undesired gradients during training, we propose Loss-Free Balancing, featured by an auxiliary-loss-free load balancing strategy. To be specific, before the top-K routing decision, Loss-Free Balancing will first apply an expert-wise bias to the routing scores of each expert. By dynamically updating the bias of each expert according to its recent load, Loss-Free Balancing can consistently maintain a balanced distribution of expert load. In addition, since Loss-Free Balancing does not produce any interference gradients, it also elevates the upper bound of model performance gained from MoE training. We validate the performance of Loss-Free Balancing on MoE models with up to 3B parameters trained on up to 200B tokens. Experimental results show that Loss-Free Balancing achieves both better performance and better load balance compared with traditional auxiliary-loss-controlled load balancing strategies. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Showing 1–50 of 1,783 results for author: Zhao, C