subscribe to arXiv mailings

Topology automaton and conformal dimension of post-critical-finite self-similar sets

Authors: Hui Rao, Zhi-Ying Wen, Qihan Yuan, Yuan Zhang

Abstract: In this paper, we use a class of finite state automata, called topology automaton, to study the metric classification of a special class of post-critically finite self-similar sets. As an application, we prove that the conformal dimension of post-critically finite self-similar dendrites and fractal gasket with connected component is 1. In this paper, we use a class of finite state automata, called topology automaton, to study the metric classification of a special class of post-critically finite self-similar sets. As an application, we prove that the conformal dimension of post-critically finite self-similar dendrites and fractal gasket with connected component is 1. △ Less

Submitted 17 March, 2023; originally announced March 2023.

Comments: 38 pages, 11 figures, 27 references

arXiv:2303.01211 [pdf, other]

Learning From Yourself: A Self-Distillation Method for Fake Speech Detection

Authors: Jun Xue, Cunhang Fan, Jiangyan Yi, Chenglong Wang, Zhengqi Wen, Dan Zhang, Zhao Lv

Abstract: In this paper, we propose a novel self-distillation method for fake speech detection (FSD), which can significantly improve the performance of FSD without increasing the model complexity. For FSD, some fine-grained information is very important, such as spectrogram defects, mute segments, and so on, which are often perceived by shallow networks. However, shallow networks have much noise, which can… ▽ More In this paper, we propose a novel self-distillation method for fake speech detection (FSD), which can significantly improve the performance of FSD without increasing the model complexity. For FSD, some fine-grained information is very important, such as spectrogram defects, mute segments, and so on, which are often perceived by shallow networks. However, shallow networks have much noise, which can not capture this very well. To address this problem, we propose using the deepest network instruct shallow network for enhancing shallow networks. Specifically, the networks of FSD are divided into several segments, the deepest network being used as the teacher model, and all shallow networks become multiple student models by adding classifiers. Meanwhile, the distillation path between the deepest network feature and shallow network features is used to reduce the feature difference. A series of experimental results on the ASVspoof 2019 LA and PA datasets show the effectiveness of the proposed method, with significant improvements compared to the baseline. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: Accepted by ICASSP 2023

arXiv:2302.13524 [pdf, other]

doi 10.1117/1.APN.2.5.056007

Efficient reference-less transmission matrix retrieval for a multimode fiber using fast Fourier transform

Authors: Jingshan Zhong, Zhong Wen, Quanzhi Li, Qilin Deng, Qing Yang

Abstract: Transmission matrix (TM) linearly maps the incident and transmitted complex fields, and has been used widely due to its ability to characterize scattering media. It is computationally demanding to reconstruct the TM from intensity images measured by a reference-less experimental setup. Removing reference beam for interference gains the advantage of simple experimental setup. However, the long comp… ▽ More Transmission matrix (TM) linearly maps the incident and transmitted complex fields, and has been used widely due to its ability to characterize scattering media. It is computationally demanding to reconstruct the TM from intensity images measured by a reference-less experimental setup. Removing reference beam for interference gains the advantage of simple experimental setup. However, the long computational time still limits its practical application. We propose an efficient reference-less TM retrieval method for multimode fiber (MMF). Our method adopts a data acquisition scheme which employs Fourier transform matrix in the design of the incident fields. We develop a nonlinear optimization algorithm to solve the TM retrieval problem in a parallel manner. The data acquisition scheme allows the algorithm to be implemented with fast Fourier transform (FFT), and hence achieves great efficiency improvement. Further, our method acquires intensity images at a defocus plane and correct the error of relative phase offset of TM recovered from the intensity images measured at one fixed plane. We validate the proposed TM retrieval method with both simulations and experiments. By using FFT, our TM retrieval algorithm achieves 1200x speed-up in computational time, and recovers $2286 \times 8192$ TM of a 0.22 NA and $50 \ μm$ diameter MMF with 124.9 seconds by a computer of 32 CPU cores. With the advantages of efficiency and the correction of phase offset, our method paves the way for the application of reference-less TM retrieval in real practice. △ Less

Submitted 1 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

arXiv:2302.13087 [pdf, other]

Gauss-Newton Temporal Difference Learning with Nonlinear Function Approximation

Authors: Zhifa Ke, Junyu Zhang, Zaiwen Wen

Abstract: In this paper, a Gauss-Newton Temporal Difference (GNTD) learning method is proposed to solve the Q-learning problem with nonlinear function approximation. In each iteration, our method takes one Gauss-Newton (GN) step to optimize a variant of Mean-Squared Bellman Error (MSBE), where target networks are adopted to avoid double sampling. Inexact GN steps are analyzed so that one can safely and effi… ▽ More In this paper, a Gauss-Newton Temporal Difference (GNTD) learning method is proposed to solve the Q-learning problem with nonlinear function approximation. In each iteration, our method takes one Gauss-Newton (GN) step to optimize a variant of Mean-Squared Bellman Error (MSBE), where target networks are adopted to avoid double sampling. Inexact GN steps are analyzed so that one can safely and efficiently compute the GN updates by cheap matrix iterations. Under mild conditions, non-asymptotic finite-sample convergence to the globally optimal Q function is derived for various nonlinear function approximations. In particular, for neural network parameterization with relu activation, GNTD achieves an improved sample complexity of $\tilde{\mathcal{O}}(\varepsilon^{-1})$, as opposed to the $\mathcal{\mathcal{O}}(\varepsilon^{-2})$ sample complexity of the existing neural TD methods. An $\tilde{\mathcal{O}}(\varepsilon^{-1.5})$ sample complexity of GNTD is also established for general smooth function approximations. We validate our method via extensive experiments in several RL benchmarks, where GNTD exhibits both higher rewards and faster convergence than TD-type methods. △ Less

Submitted 31 March, 2024; v1 submitted 25 February, 2023; originally announced February 2023.

arXiv:2302.12400 [pdf, other]

Towards Stable Test-Time Adaptation in Dynamic Wild World

Authors: Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Zhiquan Wen, Yaofo Chen, Peilin Zhao, Mingkui Tan

Abstract: Test-time adaptation (TTA) has shown to be effective at tackling distribution shifts between training and testing data by adapting a given model on test samples. However, the online model updating of TTA may be unstable and this is often a key obstacle preventing existing TTA methods from being deployed in the real world. Specifically, TTA may fail to improve or even harm the model performance whe… ▽ More Test-time adaptation (TTA) has shown to be effective at tackling distribution shifts between training and testing data by adapting a given model on test samples. However, the online model updating of TTA may be unstable and this is often a key obstacle preventing existing TTA methods from being deployed in the real world. Specifically, TTA may fail to improve or even harm the model performance when test data have: 1) mixed distribution shifts, 2) small batch sizes, and 3) online imbalanced label distribution shifts, which are quite common in practice. In this paper, we investigate the unstable reasons and find that the batch norm layer is a crucial factor hindering TTA stability. Conversely, TTA can perform more stably with batch-agnostic norm layers, \ie, group or layer norm. However, we observe that TTA with group and layer norms does not always succeed and still suffers many failure cases. By digging into the failure cases, we find that certain noisy test samples with large gradients may disturb the model adaption and result in collapsed trivial solutions, \ie, assigning the same class label for all samples. To address the above collapse issue, we propose a sharpness-aware and reliable entropy minimization method, called SAR, for further stabilizing TTA from two aspects: 1) remove partial noisy samples with large gradients, 2) encourage model weights to go to a flat minimum so that the model is robust to the remaining noisy samples. Promising results demonstrate that SAR performs more stably over prior methods and is computationally efficient under the above wild test scenarios. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: accepted by International Conference on Learning Representations (ICLR) 2023 as Notable-Top-5%; 27 pages, 10 figures, 18 tables

arXiv:2302.12375 [pdf, other]

doi 10.1016/j.cma.2023.115965

Isogeometric analysis using G-spline surfaces with arbitrary unstructured quadrilateral layout

Authors: Zuowei Wen, Md. Sadman Faruque, Xin Li, Xiaodong Wei, Hugo Casquero

Abstract: G-splines are a generalization of B-splines that deals with extraordinary points by imposing G^1 constraints across their spoke edges, thus obtaining a continuous tangent plane throughout the surface. Using the isoparametric concept and the Bubnov-Galerkin method to solve partial differential equations with G-splines results in discretizations with global C^1 continuity in physical space. Extraord… ▽ More G-splines are a generalization of B-splines that deals with extraordinary points by imposing G^1 constraints across their spoke edges, thus obtaining a continuous tangent plane throughout the surface. Using the isoparametric concept and the Bubnov-Galerkin method to solve partial differential equations with G-splines results in discretizations with global C^1 continuity in physical space. Extraordinary points (EPs) are required to represent manifold surfaces with arbitrary topological genus. In this work, we allow both interior and boundary EPs and there are no limitations regarding how close EPs can be from each other. Reaching this level of flexibility is necessary so that splines with EPs can become mainstream in the design-through-analysis cycle of complex thin-walled structures. To the authors' knowledge, the two EP constructions based on imposing G^1 constraints proposed in this work are the first two EP constructions used in isogeometric analysis (IGA) that combine the following distinctive characteristics: (1) Only vertex-based control points are used and they behave as geometric shape handles, (2) any control point of the control net can potentially be an EP, (3) global C^1 continuity in physical space is obtained without introducing singularities, (4) faces around EPs are not split into multiple elements, and (5) good surface quality is attained. The studies of convergence and surface quality performed in this paper suggest that G-splines are more suitable for IGA than EP constructions based on the D-patch framework. Finally, we have represented the stiffener, the inner part, and the outer part of a B-pillar with G-spline surfaces and solved eigenvalue problems using both Kirchhoff-Love and Reissner-Mindlin shell theories. The results are compared with bilinear quadrilateral meshes and excellent agreement is found between G-splines and conventional finite elements. △ Less

Submitted 23 February, 2023; originally announced February 2023.

arXiv:2302.12046 [pdf, ps, other]

Observation of Q-switched and continuous wave regimes with mode-hopping in Er-doped fiber lasers incorporating a dynamic population grating

Authors: Zengrun Wen, Xiulin Fan, Kaile Wang, Weiming Wang, Song Gao, Wenjing Hao, Yuanmei Gao, Yangjian Cai, Liren Zheng

Abstract: Dynamic population gratings (DPGs) in rare-earth doped fibers are prevalent devices in fiber lasers for the production of single-longitudinal-mode emission, Q-switched pulses, and wavelength self-sweeping regimes. This study presents a transition from Q-switched state to continuous wave (CW) state, accompanying irregular mode-hopping, in an erbium-doped fiber laser with a heavily-doped DPG centere… ▽ More Dynamic population gratings (DPGs) in rare-earth doped fibers are prevalent devices in fiber lasers for the production of single-longitudinal-mode emission, Q-switched pulses, and wavelength self-sweeping regimes. This study presents a transition from Q-switched state to continuous wave (CW) state, accompanying irregular mode-hopping, in an erbium-doped fiber laser with a heavily-doped DPG centered at 1549.95 nm. Our results demonstrate that the transition between these two states can be achieved by adjusting the pump power. The repetition frequency of the Q-switched pulse increases monotonically with the increasing pump power, while the pulse duration initially narrows and then expands because the reduced peak intensity weakens the nonlinear effect. Additionally, modulation peaks are evident on both the Q-switched pulse train and the CW background, which are induced by the irregular mode-hopping caused by the DPG. Furthermore, we observe that the central wavelength fluctuates within a range of 0.05 nm. These results provide valuable insight into the DPG effect in heavily-doped fibers. △ Less

Submitted 23 February, 2023; originally announced February 2023.

arXiv:2302.09205 [pdf, other]

Approximate Thompson Sampling via Epistemic Neural Networks

Authors: Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy

Abstract: Thompson sampling (TS) is a popular heuristic for action selection, but it requires sampling from a posterior distribution. Unfortunately, this can become computationally intractable in complex environments, such as those modeled using neural networks. Approximate posterior samples can produce effective actions, but only if they reasonably approximate joint predictive distributions of outputs acro… ▽ More Thompson sampling (TS) is a popular heuristic for action selection, but it requires sampling from a posterior distribution. Unfortunately, this can become computationally intractable in complex environments, such as those modeled using neural networks. Approximate posterior samples can produce effective actions, but only if they reasonably approximate joint predictive distributions of outputs across inputs. Notably, accuracy of marginal predictive distributions does not suffice. Epistemic neural networks (ENNs) are designed to produce accurate joint predictive distributions. We compare a range of ENNs through computational experiments that assess their performance in approximating TS across bandit and reinforcement learning environments. The results indicate that ENNs serve this purpose well and illustrate how the quality of joint predictive distributions drives performance. Further, we demonstrate that the \textit{epinet} -- a small additive network that estimates uncertainty -- matches the performance of large ensembles at orders of magnitude lower computational cost. This enables effective application of TS with computation that scales gracefully to complex environments. △ Less

Submitted 17 February, 2023; originally announced February 2023.

arXiv:2302.05010 [pdf, other]

doi 10.1088/1674-4527/acc155

Constraints on dark energy from the CSST galaxy clusters

Authors: Yufei Zhang, Mingjing Chen, Zhonglue Wen, Wenjuan Fang

Abstract: We study the potential of the galaxy cluster sample expected from the China Space Station Telescope (CSST) survey to constrain dark energy properties. By modelling the distribution of observed cluster mass for a given true mass to be log-normal and adopting a selection threshold in the observed mass $M_{200m} \geq 0.836 \times 10^{14} h^{-1}M_{\odot}$, we find about $4.1 \times 10^{5}$ clusters in… ▽ More We study the potential of the galaxy cluster sample expected from the China Space Station Telescope (CSST) survey to constrain dark energy properties. By modelling the distribution of observed cluster mass for a given true mass to be log-normal and adopting a selection threshold in the observed mass $M_{200m} \geq 0.836 \times 10^{14} h^{-1}M_{\odot}$, we find about $4.1 \times 10^{5}$ clusters in the redshift range $0 \leq z \leq 1.5$ can be detected by the CSST. We construct the Fisher matrix for the cluster number counts from CSST, and forecast constraints on dark energy parameters for models with constant ($w_0$CDM) and time dependent ($w_0w_a$CDM) equation of state. In the self-calibration scheme, the dark energy equation of state parameter $w_0$ of $w_0$CDM model can be constrained to $Δw_0 = 0.036$. If $w_a$ is added as a free parameter, we obtain $Δw_0 = 0.077$ and $Δw_a = 0.39$ for the $w_0w_a$CDM model, with a Figure of Merit for ($w_0,w_a$) to be 68.99. Should we had perfect knowledge of the observable-mass scaling relation (``known SR" scheme), we would obtain $Δw_0 = 0.012$ for $w_0$CDM model, $Δw_0 = 0.062$ and $Δw_a = 0.24$ for $w_0w_a$CDM model. The dark energy Figure of Merit of ($w_0,w_a$) increases to 343.25. By extending the maximum redshift of the clusters from $z_{max} \sim 1.5$ to $z_{max} \sim 2$, the dark energy Figure of Merit for ($w_0,w_a$) increases to 89.72 (self-calibration scheme) and 610.97 (``known SR" scheme), improved by a factor of $\sim 1.30$ and $\sim 1.78$, respectively. We find that the impact of clusters' redshift uncertainty on the dark energy constraints is negligible as long as the redshift error of clusters is smaller than 0.01, achievable by CSST. We also find that the bias in logarithm mass must be calibrated to be $0.30$ or better to avoid significant dark energy parameter bias. △ Less

Submitted 9 February, 2023; originally announced February 2023.

Comments: 19 pages, 5 figures, 4 tables. Accepted for publication in Research in Astronomy and Astrophysics

Journal ref: Res. Astron. Astrophys. 23 045011 (2023)

arXiv:2302.04580 [pdf, other]

doi 10.24963/ijcai.2022/591

Generating a Structured Summary of Numerous Academic Papers: Dataset and Method

Authors: Shuaiqi Liu, Jiannong Cao, Ruosong Yang, Zhiyuan Wen

Abstract: Writing a survey paper on one research topic usually needs to cover the salient content from numerous related papers, which can be modeled as a multi-document summarization (MDS) task. Existing MDS datasets usually focus on producing the structureless summary covering a few input documents. Meanwhile, previous structured summary generation works focus on summarizing a single document into a multi-… ▽ More Writing a survey paper on one research topic usually needs to cover the salient content from numerous related papers, which can be modeled as a multi-document summarization (MDS) task. Existing MDS datasets usually focus on producing the structureless summary covering a few input documents. Meanwhile, previous structured summary generation works focus on summarizing a single document into a multi-section summary. These existing datasets and methods cannot meet the requirements of summarizing numerous academic papers into a structured summary. To deal with the scarcity of available data, we propose BigSurvey, the first large-scale dataset for generating comprehensive summaries of numerous academic papers on each topic. We collect target summaries from more than seven thousand survey papers and utilize their 430 thousand reference papers' abstracts as input documents. To organize the diverse content from dozens of input documents and ensure the efficiency of processing long text sequences, we propose a summarization method named category-based alignment and sparse transformer (CAST). The experimental results show that our CAST method outperforms various advanced summarization methods. △ Less

Submitted 9 February, 2023; originally announced February 2023.

Comments: IJCAI 2022

ACM Class: I.2.7; I.7

arXiv:2302.03815 [pdf, other]

Long Text and Multi-Table Summarization: Dataset and Method

Authors: Shuaiqi Liu, Jiannong Cao, Ruosong Yang, Zhiyuan Wen

Abstract: Automatic document summarization aims to produce a concise summary covering the input document's salient information. Within a report document, the salient information can be scattered in the textual and non-textual content. However, existing document summarization datasets and methods usually focus on the text and filter out the non-textual content. Missing tabular data can limit produced summari… ▽ More Automatic document summarization aims to produce a concise summary covering the input document's salient information. Within a report document, the salient information can be scattered in the textual and non-textual content. However, existing document summarization datasets and methods usually focus on the text and filter out the non-textual content. Missing tabular data can limit produced summaries' informativeness, especially when summaries require covering quantitative descriptions of critical metrics in tables. Existing datasets and methods cannot meet the requirements of summarizing long text and multiple tables in each report. To deal with the scarcity of available data, we propose FINDSum, the first large-scale dataset for long text and multi-table summarization. Built on 21,125 annual reports from 3,794 companies, it has two subsets for summarizing each company's results of operations and liquidity. To summarize the long text and dozens of tables in each report, we present three types of summarization methods. Besides, we propose a set of evaluation metrics to assess the usage of numerical information in produced summaries. Dataset analyses and experimental results indicate the importance of jointly considering input textual and tabular data when summarizing report documents. △ Less

Submitted 7 February, 2023; originally announced February 2023.

Comments: EMNLP 2022 Findings

ACM Class: I.2.7; I.7

arXiv:2302.03773 [pdf, other]

What Matters In The Structured Pruning of Generative Language Models?

Authors: Michael Santacroce, Zixin Wen, Yelong Shen, Yuanzhi Li

Abstract: Auto-regressive large language models such as GPT-3 require enormous computational resources to use. Traditionally, structured pruning methods are employed to reduce resource usage. However, their application to and efficacy for generative language models is heavily under-explored. In this paper we conduct an comprehensive evaluation of common structured pruning methods, including magnitude, rando… ▽ More Auto-regressive large language models such as GPT-3 require enormous computational resources to use. Traditionally, structured pruning methods are employed to reduce resource usage. However, their application to and efficacy for generative language models is heavily under-explored. In this paper we conduct an comprehensive evaluation of common structured pruning methods, including magnitude, random, and movement pruning on the feed-forward layers in GPT-type models. Unexpectedly, random pruning results in performance that is comparable to the best established methods, across multiple natural language generation tasks. To understand these results, we provide a framework for measuring neuron-level redundancy of models pruned by different methods, and discover that established structured pruning methods do not take into account the distinctiveness of neurons, leaving behind excess redundancies. In view of this, we introduce Globally Unique Movement (GUM) to improve the uniqueness of neurons in pruned models. We then discuss the effects of our techniques on different redundancy metrics to explain the improved performance. △ Less

Submitted 7 February, 2023; originally announced February 2023.

arXiv:2302.03319 [pdf, ps, other]

Leveraging Demonstrations to Improve Online Learning: Quality Matters

Authors: Botao Hao, Rahul Jain, Tor Lattimore, Benjamin Van Roy, Zheng Wen

Abstract: We investigate the extent to which offline demonstration data can improve online learning. It is natural to expect some improvement, but the question is how, and by how much? We show that the degree of improvement must depend on the quality of the demonstration data. To generate portable insights, we focus on Thompson sampling (TS) applied to a multi-armed bandit as a prototypical online learning… ▽ More We investigate the extent to which offline demonstration data can improve online learning. It is natural to expect some improvement, but the question is how, and by how much? We show that the degree of improvement must depend on the quality of the demonstration data. To generate portable insights, we focus on Thompson sampling (TS) applied to a multi-armed bandit as a prototypical online learning algorithm and model. The demonstration data is generated by an expert with a given competence level, a notion we introduce. We propose an informed TS algorithm that utilizes the demonstration data in a coherent way through Bayes' rule and derive a prior-dependent Bayesian regret bound. This offers insight into how pretraining can greatly improve online performance and how the degree of improvement increases with the expert's competence level. We also develop a practical, approximate informed TS algorithm through Bayesian bootstrapping and show substantial empirical regret reduction through experiments. △ Less

Submitted 17 May, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

Comments: Accepted at ICML 2023

arXiv:2301.10013 [pdf]

doi 10.1016/j.mtphys.2024.101450

Ultra-soft Thermal Diodes Enabled by Dual-Alkane-Based Phase Change Composites

Authors: Yunsong Pang, Junhong Li, Zhibin Wen, Ting Liang, Shan Gao, Dezhao Huang, Rong Sun Jianbin Xu Tengfei Luo, Xiaoliang Zeng

Abstract: Thermal diode, a type of device that allows heat to flow in one direction preferentially, can be employed in many thermal applications. However, if the mechanical compliance of the thermal diode is poor, which prevents its intimate contact with heat source or sink surfaces, the thermal rectification performance cannot be used to its full extent. In this work, we introduce a heterojunction thermal… ▽ More Thermal diode, a type of device that allows heat to flow in one direction preferentially, can be employed in many thermal applications. However, if the mechanical compliance of the thermal diode is poor, which prevents its intimate contact with heat source or sink surfaces, the thermal rectification performance cannot be used to its full extent. In this work, we introduce a heterojunction thermal diode made of a phase change material (PCM) consisting of dual alkanes (hexadecane and paraffine wax) and polyurethane. The fabricated thermal diode exhibits an ultra soft mechanical feature, with a low elastic modulus of 0.4 KPa and larger than 300% elongation until failure: the best values reported to date for thermal diodes. The measured thermal rectification factor is as high as 1.42 that in line with the theoretical model prediction. Molecular dynamic simulations reveal that the thermal rectification mechanism of the PCM based thermal diode originates from the crystal-amorphous phase transition of the hexadecane terminal as the temperature bias flips. Therefore, the heat flow in the forward direction is greater than the flux in the reverse direction. A series of experiments and finite element analyses are employed to verify the feasibility of thermal diodes for applications. Our results demonstrate that the fabricated thermal diode can be potentially used in building envelop to help with temperature regulation and thus reduce energy consumption for space cooling or heating. △ Less

Submitted 11 January, 2023; originally announced January 2023.

Journal ref: Materials Today Physics (2024): 101450

arXiv:2301.06290 [pdf, ps, other]

All possible orders less than 1 of transcendental entire solutions of linear difference equations with polynomial coefficients

Authors: Katsuya Ishizaki, Zhi-Tao Wen

Abstract: In this paper, we study all possible orders which are less than 1 of transcendental entire solutions of linear difference equations \begin{equation} P_m(z)Δ^mf(z)+\cdots+P_1(z)Δf(z)+P_0(z)f(z)=0,\tag{+} \end{equation} where $P_j(z)$ are polynomials for $j=0,\ldots,m$. Firstly, we give the condition on existence of transcendental entire solutions of order less than 1 of difference equations (… ▽ More In this paper, we study all possible orders which are less than 1 of transcendental entire solutions of linear difference equations \begin{equation} P_m(z)Δ^mf(z)+\cdots+P_1(z)Δf(z)+P_0(z)f(z)=0,\tag{+} \end{equation} where $P_j(z)$ are polynomials for $j=0,\ldots,m$. Firstly, we give the condition on existence of transcendental entire solutions of order less than 1 of difference equations (+). Secondly, we give a list of all possible orders which are less than 1 of transcendental entire solutions of difference equations (+). Moreover, the maximum number of distinct orders which are less than 1 of transcendental entire solutions of difference equations (+) are shown. In addition, for any given rational number $0<ρ<1$, we can construct a linear difference equation with polynomial coefficients which has a transcendental entire solution of order $ρ$. At least, some examples are illustrated for our main theorems. △ Less

Submitted 16 January, 2023; originally announced January 2023.

arXiv:2301.03801 [pdf, other]

UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion

Authors: Haogeng Liu, Tao Wang, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, Jianhua Tao

Abstract: Text-to-speech (TTS) and voice conversion (VC) are two different tasks both aiming at generating high quality speaking voice according to different input modality. Due to their similarity, this paper proposes UnifySpeech, which brings TTS and VC into a unified framework for the first time. The model is based on the assumption that speech can be decoupled into three independent components: content… ▽ More Text-to-speech (TTS) and voice conversion (VC) are two different tasks both aiming at generating high quality speaking voice according to different input modality. Due to their similarity, this paper proposes UnifySpeech, which brings TTS and VC into a unified framework for the first time. The model is based on the assumption that speech can be decoupled into three independent components: content information, speaker information, prosody information. Both TTS and VC can be regarded as mining these three parts of information from the input and completing the reconstruction of speech. For TTS, the speech content information is derived from the text, while in VC it's derived from the source speech, so all the remaining units are shared except for the speech content extraction module in the two tasks. We applied vector quantization and domain constrain to bridge the gap between the content domains of TTS and VC. Objective and subjective evaluation shows that by combining the two task, TTS obtains better speaker modeling ability while VC gets hold of impressive speech content decoupling capability. △ Less

Submitted 10 January, 2023; originally announced January 2023.

arXiv:2301.01548 [pdf, ps, other]

doi 10.1088/1674-4527/acb251

Three New Spiral Galaxies with Active Nuclei Producing Double Radio Lobes

Authors: Xuyang Gao, Zhongsheng Yuan, Jinlin Han, Zhonglue Wen, Susu Shan

Abstract: Double radio lobes are generally believed to be produced by active nuclei of elliptical galaxies. However, several double-lobed radio sources have been solidly found to be associated with spiral galaxies. By cross-matching $\sim9\times10^5$ spiral galaxies selected from the Sloan Digital Sky Survey DR8 data with the full 1.4-GHz radio source catalogs of NRAO VLA Sky Survey and Faint Images of Radi… ▽ More Double radio lobes are generally believed to be produced by active nuclei of elliptical galaxies. However, several double-lobed radio sources have been solidly found to be associated with spiral galaxies. By cross-matching $\sim9\times10^5$ spiral galaxies selected from the Sloan Digital Sky Survey DR8 data with the full 1.4-GHz radio source catalogs of NRAO VLA Sky Survey and Faint Images of Radio Sky at Twenty-centimeters, we identify three new spiral galaxies: J0326$-$0623, J1110+0321 and J1134+3046 that produce double radio lobes, and five double-lobed spirals previously known. By combining the newly discovered and all the other known cases in literature, we confirm the relation that more massive spirals could produce more powerful large-scale radio jets. We find that most of these spiral galaxies are located in a galaxy group or a poor cluster, in which the environment is denser than in the field, and about half of them are the central brightest galaxies in their parent system. We therefore suggest that the environment is one of the key factors for a spiral to produce double radio lobes. △ Less

Submitted 16 February, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

Comments: typos corrected, accepted for publication in RAA

arXiv:2212.10191 [pdf, other]

Emotion Selectable End-to-End Text-based Speech Editing

Authors: Tao Wang, Jiangyan Yi, Ruibo Fu, Jianhua Tao, Zhengqi Wen, Chu Yuan Zhang

Abstract: Text-based speech editing allows users to edit speech by intuitively cutting, copying, and pasting text to speed up the process of editing speech. In the previous work, CampNet (context-aware mask prediction network) is proposed to realize text-based speech editing, significantly improving the quality of edited speech. This paper aims at a new task: adding emotional effect to the editing speech du… ▽ More Text-based speech editing allows users to edit speech by intuitively cutting, copying, and pasting text to speed up the process of editing speech. In the previous work, CampNet (context-aware mask prediction network) is proposed to realize text-based speech editing, significantly improving the quality of edited speech. This paper aims at a new task: adding emotional effect to the editing speech during the text-based speech editing to make the generated speech more expressive. To achieve this task, we propose Emo-CampNet (emotion CampNet), which can provide the option of emotional attributes for the generated speech in text-based speech editing and has the one-shot ability to edit unseen speakers' speech. Firstly, we propose an end-to-end emotion-selectable text-based speech editing model. The key idea of the model is to control the emotion of generated speech by introducing additional emotion attributes based on the context-aware mask prediction network. Secondly, to prevent the emotion of the generated speech from being interfered by the emotional components in the original speech, a neutral content generator is proposed to remove the emotion from the original speech, which is optimized by the generative adversarial framework. Thirdly, two data augmentation methods are proposed to enrich the emotional and pronunciation information in the training set, which can enable the model to edit the unseen speaker's speech. The experimental results that 1) Emo-CampNet can effectively control the emotion of the generated speech in the process of text-based speech editing; And can edit unseen speakers' speech. 2) Detailed ablation experiments further prove the effectiveness of emotional selectivity and data augmentation methods. The demo page is available at https://hairuo55.github.io/Emo-CampNet/ △ Less

Submitted 20 December, 2022; originally announced December 2022.

Comments: Under review, 12 pages, 11 figures, demo page is available at https://hairuo55.github.io/Emo-CampNet/

arXiv:2212.09970 [pdf, other]

Data Augmentation on Graphs: A Technical Survey

Authors: Jiajun Zhou, Chenxuan Xie, Shengbo Gong, Zhenyu Wen, Xiangyu Zhao, Qi Xuan, Xiaoniu Yang

Abstract: In recent years, graph representation learning has achieved remarkable success while suffering from low-quality data problems. As a mature technology to improve data quality in computer vision, data augmentation has also attracted increasing attention in graph domain. To advance research in this emerging direction, this survey provides a comprehensive review and summary of existing graph data augm… ▽ More In recent years, graph representation learning has achieved remarkable success while suffering from low-quality data problems. As a mature technology to improve data quality in computer vision, data augmentation has also attracted increasing attention in graph domain. To advance research in this emerging direction, this survey provides a comprehensive review and summary of existing graph data augmentation (GDAug) techniques. Specifically, this survey first provides an overview of various feasible taxonomies and categorizes existing GDAug studies based on multi-scale graph elements. Subsequently, for each type of GDAug technique, this survey formalizes standardized technical definition, discuss the technical details, and provide schematic illustration. The survey also reviews domain-specific graph data augmentation techniques, including those for heterogeneous graphs, temporal graphs, spatio-temporal graphs, and hypergraphs. In addition, this survey provides a summary of available evaluation metrics and design guidelines for graph data augmentation. Lastly, it outlines the applications of GDAug at both the data and model levels, discusses open issues in the field, and looks forward to future directions. The latest advances in GDAug are summarized in GitHub. △ Less

Submitted 21 June, 2024; v1 submitted 19 December, 2022; originally announced December 2022.

Comments: Version 2. Under review

arXiv:2212.09450 [pdf, other]

doi 10.1145/3580305.3599249

Accelerating Antimicrobial Peptide Discovery with Latent Structure

Authors: Danqing Wang, Zeyu Wen, Fei Ye, Lei Li, Hao Zhou

Abstract: Antimicrobial peptides (AMPs) are promising therapeutic approaches against drug-resistant pathogens. Recently, deep generative models are used to discover new AMPs. However, previous studies mainly focus on peptide sequence attributes and do not consider crucial structure information. In this paper, we propose a latent sequence-structure model for designing AMPs (LSSAMP). LSSAMP exploits multi-sca… ▽ More Antimicrobial peptides (AMPs) are promising therapeutic approaches against drug-resistant pathogens. Recently, deep generative models are used to discover new AMPs. However, previous studies mainly focus on peptide sequence attributes and do not consider crucial structure information. In this paper, we propose a latent sequence-structure model for designing AMPs (LSSAMP). LSSAMP exploits multi-scale vector quantization in the latent space to represent secondary structures (e.g. alpha helix and beta sheet). By sampling in the latent space, LSSAMP can simultaneously generate peptides with ideal sequence attributes and secondary structures. Experimental results show that the peptides generated by LSSAMP have a high probability of antimicrobial activity. Our wet laboratory experiments verified that two of the 21 candidates exhibit strong antimicrobial activity. The code is released at https://github.com/dqwang122/LSSAMP. △ Less

Submitted 20 August, 2023; v1 submitted 28 November, 2022; originally announced December 2022.

Comments: KDD 2023

arXiv:2212.08283 [pdf, other]

doi 10.3390/robotics12040114

SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering

Authors: Feiqi Cao, Siwen Luo, Felipe Nunez, Zean Wen, Josiah Poon, Caren Han

Abstract: Most TextVQA approaches focus on the integration of objects, scene texts and question words by a simple transformer encoder. But this fails to capture the semantic relations between different modalities. The paper proposes a Scene Graph based co-Attention Network (SceneGATE) for TextVQA, which reveals the semantic relations among the objects, Optical Character Recognition (OCR) tokens and the ques… ▽ More Most TextVQA approaches focus on the integration of objects, scene texts and question words by a simple transformer encoder. But this fails to capture the semantic relations between different modalities. The paper proposes a Scene Graph based co-Attention Network (SceneGATE) for TextVQA, which reveals the semantic relations among the objects, Optical Character Recognition (OCR) tokens and the question words. It is achieved by a TextVQA-based scene graph that discovers the underlying semantics of an image. We created a guided-attention module to capture the intra-modal interplay between the language and the vision as a guidance for inter-modal interactions. To make explicit teaching of the relations between the two modalities, we proposed and integrated two attention modules, namely a scene graph-based semantic relation-aware attention and a positional relation-aware attention. We conducted extensive experiments on two benchmark datasets, Text-VQA and ST-VQA. It is shown that our SceneGATE method outperformed existing ones because of the scene graph and its attention modules. △ Less

Submitted 7 August, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

Comments: Published in Robotics (Q1, SCI indexed Journal): https://www.mdpi.com/2218-6581/12/4/114

arXiv:2212.07632 [pdf, other]

Reinforcement Learning in Credit Scoring and Underwriting

Authors: Seksan Kiatsupaibul, Pakawan Chansiripas, Pojtanut Manopanjasiri, Kantapong Visantavarakul, Zheng Wen

Abstract: This paper proposes a novel reinforcement learning (RL) framework for credit underwriting that tackles ungeneralizable contextual challenges. We adapt RL principles for credit scoring, incorporating action space renewal and multi-choice actions. Our work demonstrates that the traditional underwriting approach aligns with the RL greedy strategy. We introduce two new RL-based credit underwriting alg… ▽ More This paper proposes a novel reinforcement learning (RL) framework for credit underwriting that tackles ungeneralizable contextual challenges. We adapt RL principles for credit scoring, incorporating action space renewal and multi-choice actions. Our work demonstrates that the traditional underwriting approach aligns with the RL greedy strategy. We introduce two new RL-based credit underwriting algorithms to enable more informed decision-making. Simulations show these new approaches outperform the traditional method in scenarios where the data aligns with the model. However, complex situations highlight model limitations, emphasizing the importance of powerful machine learning models for optimal performance. Future research directions include exploring more sophisticated models alongside efficient exploration mechanisms. △ Less

Submitted 26 June, 2024; v1 submitted 15 December, 2022; originally announced December 2022.

arXiv:2212.05193 [pdf, ps, other]

doi 10.1093/mnras/stac3654

Individual pulse emission from the diffuse drifter PSR J1401$-$6357 using the ultrawideband receiver on the Parkes radio telescope

Authors: J. L. Chen, Z. G. Wen, X. F. Duan, D. L. He, N. Wang, H. G. Wang, R. Yuen, J. P. Yuan, W. M. Yan, Z. Wang, C. B. Lv, H. Wang, S. R. Cui

Abstract: In this study, we report on a detailed single pulse analysis of the radio emission from the pulsar J1401$-$6357 (B1358$-$63) based on data observed with the ultrawideband low-frequency receiver on the Parkes radio telescope. In addition to a weak leading component, the integrated pulse profile features a single-humped structure with a slight asymmetry. The frequency evolution of the pulse profile… ▽ More In this study, we report on a detailed single pulse analysis of the radio emission from the pulsar J1401$-$6357 (B1358$-$63) based on data observed with the ultrawideband low-frequency receiver on the Parkes radio telescope. In addition to a weak leading component, the integrated pulse profile features a single-humped structure with a slight asymmetry. The frequency evolution of the pulse profile is studied. Well-defined nulls, with an estimated nulling fraction greater than 2\%, are present across the whole frequency band. No emission is detected with significance above 3$σ$ in the average pulse profile integrated over all null pulses. Using fluctuation spectral analysis, we reveal the existence of temporal-dependent subpulse drifting in this pulsar for the first time. A clear double-peaked feature is present at exactly the alias border across the whole frequency band, which suggests that the apparent drift sense changes during the observation. Our observations provide further confirmation that the phenomena of pulse nulling and subpulse drifting are independent of observing frequency, which suggest that they invoke changes on the global magnetospheric scale. △ Less

Submitted 9 December, 2022; originally announced December 2022.

Comments: 10 pages, 13 figures

arXiv:2212.04259 [pdf, other]

doi 10.1109/IPDPS53621.2022.00066

Fast Parallel Bayesian Network Structure Learning

Authors: Jiantong Jiang, Zeyi Wen, Ajmal Mian

Abstract: Bayesian networks (BNs) are a widely used graphical model in machine learning for representing knowledge with uncertainty. The mainstream BN structure learning methods require performing a large number of conditional independence (CI) tests. The learning process is very time-consuming, especially for high-dimensional problems, which hinders the adoption of BNs to more applications. Existing works… ▽ More Bayesian networks (BNs) are a widely used graphical model in machine learning for representing knowledge with uncertainty. The mainstream BN structure learning methods require performing a large number of conditional independence (CI) tests. The learning process is very time-consuming, especially for high-dimensional problems, which hinders the adoption of BNs to more applications. Existing works attempt to accelerate the learning process with parallelism, but face issues including load unbalancing, costly atomic operations and dominant parallel overhead. In this paper, we propose a fast solution named Fast-BNS on multi-core CPUs to enhance the efficiency of the BN structure learning. Fast-BNS is powered by a series of efficiency optimizations including (i) designing a dynamic work pool to monitor the processing of edges and to better schedule the workloads among threads, (ii) grouping the CI tests of the edges with the same endpoints to reduce the number of unnecessary CI tests, (iii) using a cache-friendly data storage to improve the memory efficiency, and (iv) generating the conditioning sets on-the-fly to avoid extra memory consumption. A comprehensive experimental study shows that the sequential version of Fast-BNS is up to 50 times faster than its counterpart, and the parallel version of Fast-BNS achieves 4.8 to 24.5 times speedup over the state-of-the-art multi-threaded solution. Moreover, Fast-BNS has a good scalability to the network size as well as sample size. Fast-BNS source code is freely available at https://github.com/jjiantong/FastBN. △ Less

Submitted 8 December, 2022; originally announced December 2022.

arXiv:2212.04241 [pdf, ps, other]

doi 10.1145/3572848.3577476

Fast Parallel Exact Inference on Bayesian Networks: Poster

Authors: Jiantong Jiang, Zeyi Wen, Atif Mansoor, Ajmal Mian

Abstract: Bayesian networks (BNs) are attractive, because they are graphical and interpretable machine learning models. However, exact inference on BNs is time-consuming, especially for complex problems. To improve the efficiency, we propose a fast BN exact inference solution named Fast-BNI on multi-core CPUs. Fast-BNI enhances the efficiency of exact inference through hybrid parallelism that tightly integr… ▽ More Bayesian networks (BNs) are attractive, because they are graphical and interpretable machine learning models. However, exact inference on BNs is time-consuming, especially for complex problems. To improve the efficiency, we propose a fast BN exact inference solution named Fast-BNI on multi-core CPUs. Fast-BNI enhances the efficiency of exact inference through hybrid parallelism that tightly integrates coarse- and fine-grained parallelism. We also propose techniques to further simplify the bottleneck operations of BN exact inference. Fast-BNI source code is freely available at https://github.com/jjiantong/FastBN. △ Less

Submitted 8 December, 2022; originally announced December 2022.

arXiv:2211.05910 [pdf, other]

Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

Authors: Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, Jingang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, Yoonsik Choe, Jinwoo Jeong, Sungjei Kim, Maciej Smyl, Tomasz Latkowski, Pawel Kubik, Michal Sokolski, Yujie Ma, Jiahao Chao, Zhou Zhou, Hongfan Gao, Zhengfeng Yang, Zhenbing Zeng, Zhengyang Zhuge, Chenghua Li , et al. (71 additional authors not shown)

Abstract: Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose… ▽ More Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2105.07825, arXiv:2105.08826, arXiv:2211.04470, arXiv:2211.03885, arXiv:2211.05256

arXiv:2211.01127 [pdf, other]

On the local convergence of the semismooth Newton method for composite optimization

Authors: Jiang Hu, Tonghua Tian, Shaohua Pan, Zaiwen Wen

Abstract: In this paper, we consider a large class of nonlinear equations derived from first-order type methods for solving composite optimization problems. Traditional approaches to establishing superlinear convergence rates of semismooth Newton-type methods for solving nonlinear equations usually postulate either nonsingularity of the B-Jacobian or smoothness of the equation. We investigate the feasibilit… ▽ More In this paper, we consider a large class of nonlinear equations derived from first-order type methods for solving composite optimization problems. Traditional approaches to establishing superlinear convergence rates of semismooth Newton-type methods for solving nonlinear equations usually postulate either nonsingularity of the B-Jacobian or smoothness of the equation. We investigate the feasibility of both conditions. For the nonsingularity condition, we present equivalent characterizations in broad generality, and illustrate that they are easy-to-check criteria for some examples. For the smoothness condition, we show that it holds locally for a large class of residual mappings derived from composite optimization problems. Furthermore, we investigate a relaxed version of the smoothness condition - smoothness restricted to certain active manifolds. We present a conceptual algorithm utilizing such structures and prove that it has a superlinear convergence rate. △ Less

Submitted 28 July, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: 29 pages

arXiv:2211.00286 [pdf, other]

Strategies for Optimizing End-to-End Artificial Intelligence Pipelines on Intel Xeon Processors

Authors: Meena Arunachalam, Vrushabh Sanghavi, Yi A Yao, Yi A Zhou, Lifeng A Wang, Zongru Wen, Niroop Ammbashankar, Ning W Wang, Fahim Mohammad

Abstract: End-to-end (E2E) artificial intelligence (AI) pipelines are composed of several stages including data preprocessing, data ingestion, defining and training the model, hyperparameter optimization, deployment, inference, postprocessing, followed by downstream analyses. To obtain efficient E2E workflow, it is required to optimize almost all the stages of pipeline. Intel Xeon processors come with large… ▽ More End-to-end (E2E) artificial intelligence (AI) pipelines are composed of several stages including data preprocessing, data ingestion, defining and training the model, hyperparameter optimization, deployment, inference, postprocessing, followed by downstream analyses. To obtain efficient E2E workflow, it is required to optimize almost all the stages of pipeline. Intel Xeon processors come with large memory capacities, bundled with AI acceleration (e.g., Intel Deep Learning Boost), well suited to run multiple instances of training and inference pipelines in parallel and has low total cost of ownership (TCO). To showcase the performance on Xeon processors, we applied comprehensive optimization strategies coupled with software and hardware acceleration on variety of E2E pipelines in the areas of Computer Vision, NLP, Recommendation systems, etc. We were able to achieve a performance improvement, ranging from 1.8x to 81.7x across different E2E pipelines. In this paper, we will be highlighting the optimization strategies adopted by us to achieve this performance on Intel Xeon processors with a set of eight different E2E pipelines. △ Less

Submitted 1 November, 2022; originally announced November 2022.

Comments: 10 pages, 11 figures, 3 tables

Journal ref: The Parallel Universe Magazine, Issue 48, 2022

arXiv:2210.11429 [pdf]

Text Enhancement for Paragraph Processing in End-to-End Code-switching TTS

Authors: Chunyu Qiang, Jianhua Tao, Ruibo Fu, Zhengqi Wen, Jiangyan Yi, Tao Wang, Shiming Wang

Abstract: Current end-to-end code-switching Text-to-Speech (TTS) can already generate high quality two languages speech in the same utterance with single speaker bilingual corpora. When the speakers of the bilingual corpora are different, the naturalness and consistency of the code-switching TTS will be poor. The cross-lingual embedding layers structure we proposed makes similar syllables in different langu… ▽ More Current end-to-end code-switching Text-to-Speech (TTS) can already generate high quality two languages speech in the same utterance with single speaker bilingual corpora. When the speakers of the bilingual corpora are different, the naturalness and consistency of the code-switching TTS will be poor. The cross-lingual embedding layers structure we proposed makes similar syllables in different languages relevant, thus improving the naturalness and consistency of generated speech. In the end-to-end code-switching TTS, there exists problem of prosody instability when synthesizing paragraph text. The text enhancement method we proposed makes the input contain prosodic information and sentence-level context information, thus improving the prosody stability of paragraph text. Experimental results demonstrate the effectiveness of the proposed methods in the naturalness, consistency, and prosody stability. In addition to Mandarin and English, we also apply these methods to Shanghaiese and Cantonese corpora, proving that the methods we proposed can be extended to other languages to build end-to-end code-switching TTS system. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: accepted in ISCSLP 2021

arXiv:2210.09926 [pdf, other]

RAPO: An Adaptive Ranking Paradigm for Bilingual Lexicon Induction

Authors: Zhoujin Tian, Chaozhuo Li, Shuo Ren, Zhiqiang Zuo, Zengxuan Wen, Xinyue Hu, Xiao Han, Haizhen Huang, Denvy Deng, Qi Zhang, Xing Xie

Abstract: Bilingual lexicon induction induces the word translations by aligning independently trained word embeddings in two languages. Existing approaches generally focus on minimizing the distances between words in the aligned pairs, while suffering from low discriminative capability to distinguish the relative orders between positive and negative candidates. In addition, the mapping function is globally… ▽ More Bilingual lexicon induction induces the word translations by aligning independently trained word embeddings in two languages. Existing approaches generally focus on minimizing the distances between words in the aligned pairs, while suffering from low discriminative capability to distinguish the relative orders between positive and negative candidates. In addition, the mapping function is globally shared by all words, whose performance might be hindered by the deviations in the distributions of different languages. In this work, we propose a novel ranking-oriented induction model RAPO to learn personalized mapping function for each word. RAPO is capable of enjoying the merits from the unique characteristics of a single word and the cross-language isomorphism simultaneously. Extensive experimental results on public datasets including both rich-resource and low-resource languages demonstrate the superiority of our proposal. Our code is publicly available in \url{https://github.com/Jlfj345wf/RAPO}. △ Less

Submitted 18 October, 2022; originally announced October 2022.

Comments: 9 pages, accepted by EMNLP 2022

arXiv:2210.07198 [pdf, other]

Towards Trustworthy Automatic Diagnosis Systems by Emulating Doctors' Reasoning with Deep Reinforcement Learning

Authors: Arsene Fansi Tchango, Rishab Goel, Julien Martel, Zhi Wen, Gaetan Marceau Caron, Joumana Ghosn

Abstract: The automation of the medical evidence acquisition and diagnosis process has recently attracted increasing attention in order to reduce the workload of doctors and democratize access to medical care. However, most works proposed in the machine learning literature focus solely on improving the prediction accuracy of a patient's pathology. We argue that this objective is insufficient to ensure docto… ▽ More The automation of the medical evidence acquisition and diagnosis process has recently attracted increasing attention in order to reduce the workload of doctors and democratize access to medical care. However, most works proposed in the machine learning literature focus solely on improving the prediction accuracy of a patient's pathology. We argue that this objective is insufficient to ensure doctors' acceptability of such systems. In their initial interaction with patients, doctors do not only focus on identifying the pathology a patient is suffering from; they instead generate a differential diagnosis (in the form of a short list of plausible diseases) because the medical evidence collected from patients is often insufficient to establish a final diagnosis. Moreover, doctors explicitly explore severe pathologies before potentially ruling them out from the differential, especially in acute care settings. Finally, for doctors to trust a system's recommendations, they need to understand how the gathered evidences led to the predicted diseases. In particular, interactions between a system and a patient need to emulate the reasoning of doctors. We therefore propose to model the evidence acquisition and automatic diagnosis tasks using a deep reinforcement learning framework that considers three essential aspects of a doctor's reasoning, namely generating a differential diagnosis using an exploration-confirmation approach while prioritizing severe pathologies. We propose metrics for evaluating interaction quality based on these three aspects. We show that our approach performs better than existing models while maintaining competitive pathology prediction accuracy. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: Camera ready. NeurIPS 2022

arXiv:2209.12606 [pdf, ps, other]

Some Sharp Error Bounds for Multivariate Linear Interpolation and Extrapolation

Authors: Liyuan Cao, Zaiwen Wen, Ya-xiang Yuan

Abstract: We study in this paper the function approximation error of linear interpolation and extrapolation. Several upper bounds are presented along with the conditions under which they are sharp. All results are under the assumptions that the function has Lipschitz continuous gradient and is interpolated on an affinely independent sample set. Errors for quadratic functions and errors of bivariate linear e… ▽ More We study in this paper the function approximation error of linear interpolation and extrapolation. Several upper bounds are presented along with the conditions under which they are sharp. All results are under the assumptions that the function has Lipschitz continuous gradient and is interpolated on an affinely independent sample set. Errors for quadratic functions and errors of bivariate linear extrapolation are analyzed in depth. △ Less

Submitted 26 September, 2022; originally announced September 2022.

Comments: 18 pages, 3 figures

MSC Class: 41A05; 41A10; 41A80; 46N10

arXiv:2209.11615 [pdf, other]

Robust Domain Adaptation for Machine Reading Comprehension

Authors: Liang Jiang, Zhenyu Huang, Jia Liu, Zujie Wen, Xi Peng

Abstract: Most domain adaptation methods for machine reading comprehension (MRC) use a pre-trained question-answer (QA) construction model to generate pseudo QA pairs for MRC transfer. Such a process will inevitably introduce mismatched pairs (i.e., noisy correspondence) due to i) the unavailable QA pairs in target documents, and ii) the domain shift during applying the QA construction model to the target d… ▽ More Most domain adaptation methods for machine reading comprehension (MRC) use a pre-trained question-answer (QA) construction model to generate pseudo QA pairs for MRC transfer. Such a process will inevitably introduce mismatched pairs (i.e., noisy correspondence) due to i) the unavailable QA pairs in target documents, and ii) the domain shift during applying the QA construction model to the target domain. Undoubtedly, the noisy correspondence will degenerate the performance of MRC, which however is neglected by existing works. To solve such an untouched problem, we propose to construct QA pairs by additionally using the dialogue related to the documents, as well as a new domain adaptation method for MRC. Specifically, we propose Robust Domain Adaptation for Machine Reading Comprehension (RMRC) method which consists of an answer extractor (AE), a question selector (QS), and an MRC model. Specifically, RMRC filters out the irrelevant answers by estimating the correlation to the document via the AE, and extracts the questions by fusing the candidate questions in multiple rounds of dialogue chats via the QS. With the extracted QA pairs, MRC is fine-tuned and provides the feedback to optimize the QS through a novel reinforced self-training method. Thanks to the optimization of the QS, our method will greatly alleviate the noisy correspondence problem caused by the domain shift. To the best of our knowledge, this could be the first study to reveal the influence of noisy correspondence in domain adaptation MRC models and show a feasible way to achieve robustness to mismatched pairs. Extensive experiments on three datasets demonstrate the effectiveness of our method. △ Less

Submitted 23 September, 2022; originally announced September 2022.

arXiv:2209.00204 [pdf, ps, other]

doi 10.1093/mnras/stac2492

The alignment between brightest cluster galaxies and host clusters

Authors: Z. S. Yuan, Z. L. Wen

Abstract: The alignment between brightest cluster galaxies (BCGs) and host clusters can reveal the mystery of formation and evolution for galaxy clusters. We measure cluster orientations in optical based on the projected distribution of member galaxies and in X-ray by fitting the morphology of intra-cluster medium (ICM). Cluster orientations determined in the two wavelengths are generally consistent. The or… ▽ More The alignment between brightest cluster galaxies (BCGs) and host clusters can reveal the mystery of formation and evolution for galaxy clusters. We measure cluster orientations in optical based on the projected distribution of member galaxies and in X-ray by fitting the morphology of intra-cluster medium (ICM). Cluster orientations determined in the two wavelengths are generally consistent. The orientation alignment between BCGs and host clusters is confirmed and more significant than previous works. We find that BCGs are more aligned with cluster orientations measured in X-ray than those from optical data. Clusters with a brighter BCG generally show a stronger alignment. We argue that the detected redshift evolution of the alignment is probably caused by observational bias rather than intrinsic evolution. The alignment is not related to the ellipticity of BCGs, and the richness, ellipticity and dynamical state of host clusters. The strong alignment between BCGs and morphology of ICMs may be the consequence of the co-evolution between the central massive galaxy and host clusters. △ Less

Submitted 31 August, 2022; originally announced September 2022.

Comments: 8 pages, 8 figures, 3 tables, accepted for publication in MNRAS

arXiv:2208.14196 [pdf, other]

A Unified Primal-Dual Algorithm Framework for Inequality Constrained Problems

Authors: Zhenyuan Zhu, Fan Chen, Junyu Zhang, Zaiwen Wen

Abstract: In this paper, we propose a unified primal-dual algorithm framework based on the augmented Lagrangian function for composite convex problems with conic inequality constraints. The new framework is highly versatile. First, it not only covers many existing algorithms such as PDHG, Chambolle-Pock (CP), GDA, OGDA and linearized ALM, but also guides us to design a new efficient algorithm called Simi-OG… ▽ More In this paper, we propose a unified primal-dual algorithm framework based on the augmented Lagrangian function for composite convex problems with conic inequality constraints. The new framework is highly versatile. First, it not only covers many existing algorithms such as PDHG, Chambolle-Pock (CP), GDA, OGDA and linearized ALM, but also guides us to design a new efficient algorithm called Simi-OGDA (SOGDA). Second, it enables us to study the role of the augmented penalty term in the convergence analysis. Interestingly, a properly selected penalty not only improves the numerical performance of the above methods, but also theoretically enables the convergence of algorithms like PDHG and SOGDA. Under properly designed step sizes and penalty term, our unified framework preserves the $\mathcal{O}(1/N)$ ergodic convergence while not requiring any prior knowledge about the magnitude of the optimal Lagrangian multiplier. Linear convergence rate for affine equality constrained problem is also obtained given appropriate conditions. Finally, numerical experiments on linear programming, $\ell_1$ minimization problem, and multi-block basis pursuit problem demonstrate the efficiency of our methods. △ Less

Submitted 30 August, 2022; originally announced August 2022.

MSC Class: 90C25; 90C46; 90C47; 90C60

arXiv:2208.11609 [pdf, other]

Fast Nearest Convolution for Real-Time Efficient Image Super-Resolution

Authors: Ziwei Luo, Youwei Li, Lei Yu, Qi Wu, Zhihong Wen, Haoqiang Fan, Shuaicheng Liu

Abstract: Deep learning-based single image super-resolution (SISR) approaches have drawn much attention and achieved remarkable success on modern advanced GPUs. However, most state-of-the-art methods require a huge number of parameters, memories, and computational resources, which usually show inferior inference times when applying them to current mobile device CPUs/NPUs. In this paper, we propose a simple… ▽ More Deep learning-based single image super-resolution (SISR) approaches have drawn much attention and achieved remarkable success on modern advanced GPUs. However, most state-of-the-art methods require a huge number of parameters, memories, and computational resources, which usually show inferior inference times when applying them to current mobile device CPUs/NPUs. In this paper, we propose a simple plain convolution network with a fast nearest convolution module (NCNet), which is NPU-friendly and can perform a reliable super-resolution in real-time. The proposed nearest convolution has the same performance as the nearest upsampling but is much faster and more suitable for Android NNAPI. Our model can be easily deployed on mobile devices with 8-bit quantization and is fully compatible with all major mobile AI accelerators. Moreover, we conduct comprehensive experiments on different tensor operations on a mobile device to illustrate the efficiency of our network architecture. Our NCNet is trained and validated on the DIV2K 3x dataset, and the comparison with other efficient SR methods demonstrated that the NCNet can achieve high fidelity SR results while using fewer inference times. Our codes and pretrained models are publicly available at \url{https://github.com/Algolzw/NCNet}. △ Less

Submitted 24 August, 2022; originally announced August 2022.

Comments: AIM & Mobile AI 2022

arXiv:2208.07529 [pdf, other]

Understanding the Challenges of Team-Based Live Streaming for First-person Shooter Games

Authors: Jiaye Li, Minghao Li, Zikai Alex Wen, Wei Cai

Abstract: First-person shooter (FPS) game tournaments take place across the globe. A growing number of people choose to watch FPS games online instead of attending the game events in person. However, live streaming might miss critical highlight moments in the game, including kills and tactics. We identify how and why the live streaming team fails to capture highlight moments to reduce such live streaming mi… ▽ More First-person shooter (FPS) game tournaments take place across the globe. A growing number of people choose to watch FPS games online instead of attending the game events in person. However, live streaming might miss critical highlight moments in the game, including kills and tactics. We identify how and why the live streaming team fails to capture highlight moments to reduce such live streaming mistakes. We named such mistakes jarring observations. We conducted a field study of live streaming competitions of Game For Peace, a popular FPS mobile game, to summarize five typical jarring observations and identify three primary reasons that caused the issues. We further studied how to improve the live streaming system to prevent jarring observations from happening by doing semi-structured interviews with two professional streaming teams for Game For Peace. The study showed that a better system should (1) add a new sub-team role to share the director's responsibility of managing observers; (2) provide interfaces customized for three roles of live streamers in the team; (3) abstract more geographical info; (4) predict the priority of observation targets; and (5) provide non-verbal interfaces for sync-up between sub-teams. Our work provides insights for esports streaming system researchers and developers to improve the system for a smoother audience experience. △ Less

Submitted 16 August, 2022; originally announced August 2022.

Comments: Accepted by The IEEE CTSoc International Conference on Games Entertainment & Media 2022 (GEM 2022)

arXiv:2208.07491 [pdf, other]

HetVis: A Visual Analysis Approach for Identifying Data Heterogeneity in Horizontal Federated Learning

Authors: Xumeng Wang, Wei Chen, Jiazhi Xia, Zhen Wen, Rongchen Zhu, Tobias Schreck

Abstract: Horizontal federated learning (HFL) enables distributed clients to train a shared model and keep their data privacy. In training high-quality HFL models, the data heterogeneity among clients is one of the major concerns. However, due to the security issue and the complexity of deep learning models, it is challenging to investigate data heterogeneity across different clients. To address this issue,… ▽ More Horizontal federated learning (HFL) enables distributed clients to train a shared model and keep their data privacy. In training high-quality HFL models, the data heterogeneity among clients is one of the major concerns. However, due to the security issue and the complexity of deep learning models, it is challenging to investigate data heterogeneity across different clients. To address this issue, based on a requirement analysis we developed a visual analytics tool, HetVis, for participating clients to explore data heterogeneity. We identify data heterogeneity through comparing prediction behaviors of the global federated model and the stand-alone model trained with local data. Then, a context-aware clustering of the inconsistent records is done, to provide a summary of data heterogeneity. Combining with the proposed comparison techniques, we develop a novel set of visualizations to identify heterogeneity issues in HFL. We designed three case studies to introduce how HetVis can assist client analysts in understanding different types of heterogeneity issues. Expert reviews and a comparative study demonstrate the effectiveness of HetVis. △ Less

Submitted 15 August, 2022; originally announced August 2022.

Comments: Accepted by IEEE VIS 2022

arXiv:2208.06155 [pdf, other]

What Features Influence Impact Feel? A Study of Impact Feedback in Action Games

Authors: Zhonghao Lin, Haihan Duan, Zikai Alex Wen, Wei Cai

Abstract: Making the hit effect satisfy players is a long-standing problem faced by action game designers. However, no research systematically analyzed which game design elements affect such game feel. There is not even a term to describe it. So, we propose to use impact feel to describe the player's feeling when receiving juicy impact feedback. After collecting player's comments on action games from Steam'… ▽ More Making the hit effect satisfy players is a long-standing problem faced by action game designers. However, no research systematically analyzed which game design elements affect such game feel. There is not even a term to describe it. So, we propose to use impact feel to describe the player's feeling when receiving juicy impact feedback. After collecting player's comments on action games from Steam's top seller list, we trained a natural language processing (NLP) model to rank action games with their performance on impact feel. We presented a 19-feature framework of impact feedback design and examined it in the top eight and last eight games. We listed an inventory of the usage of features and found that hit stop, sound coherence, and camera control may strongly influence players' impact feel. A lack of dedicated design on one of these three features may ruin players' impact feel. Our findings may become an evaluation metric for future studies. △ Less

Submitted 22 August, 2022; v1 submitted 12 August, 2022; originally announced August 2022.

Comments: Accepted by The IEEE CTSoc International Conference on Games Entertainment & Media 2022 (GEM 2022)

arXiv:2208.02759 [pdf, other]

New Differential Privacy Communication Pipeline and Design Framework

Authors: Jingyu Jia, Zikai Alex Wen, Zheli Liu, Changyu Dong

Abstract: Organizations started to adopt differential privacy (DP) techniques hoping to persuade more users to share personal data with them. However, many users do not understand DP techniques, thus may not be willing to share. Previous research suggested that the design of DP mechanism communication could influence users' willingness to share data. Based on the prior work, we propose a new communication p… ▽ More Organizations started to adopt differential privacy (DP) techniques hoping to persuade more users to share personal data with them. However, many users do not understand DP techniques, thus may not be willing to share. Previous research suggested that the design of DP mechanism communication could influence users' willingness to share data. Based on the prior work, we propose a new communication pipeline that starts by asking users about their privacy concerns and then provides a customized DP mechanism and communication. We also propose a design framework that systemically explores effective communication designs ranging from a text-based high-level description to a step-by-step interactive storyboard. Based on the framework, we created 17 designs and recruited five people to evaluate. Our user study showed that text-based descriptions have the highest clarity in all scenarios, while the step-by-step interactive storyboards have the potential to persuade users to trust central DP. Our future work will optimize the design and conduct a large-scale efficacy study. △ Less

Submitted 4 August, 2022; originally announced August 2022.

Comments: poster

Journal ref: The Eighteenth Symposium on Usable Privacy and Security (SOUPS 2022)

arXiv:2208.01214 [pdf, other]

doi 10.1145/3552466.3556526

Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features

Authors: Jun Xue, Cunhang Fan, Zhao Lv, Jianhua Tao, Jiangyan Yi, Chengshi Zheng, Zhengqi Wen, Minmin Yuan, Shegang Shao

Abstract: Recently, pioneer research works have proposed a large number of acoustic features (log power spectrogram, linear frequency cepstral coefficients, constant Q cepstral coefficients, etc.) for audio deepfake detection, obtaining good performance, and showing that different subbands have different contributions to audio deepfake detection. However, this lacks an explanation of the specific informatio… ▽ More Recently, pioneer research works have proposed a large number of acoustic features (log power spectrogram, linear frequency cepstral coefficients, constant Q cepstral coefficients, etc.) for audio deepfake detection, obtaining good performance, and showing that different subbands have different contributions to audio deepfake detection. However, this lacks an explanation of the specific information in the subband, and these features also lose information such as phase. Inspired by the mechanism of synthetic speech, the fundamental frequency (F0) information is used to improve the quality of synthetic speech, while the F0 of synthetic speech is still too average, which differs significantly from that of real speech. It is expected that F0 can be used as important information to discriminate between bonafide and fake speech, while this information cannot be used directly due to the irregular distribution of F0. Insteadly, the frequency band containing most of F0 is selected as the input feature. Meanwhile, to make full use of the phase and full-band information, we also propose to use real and imaginary spectrogram features as complementary input features and model the disjoint subbands separately. Finally, the results of F0, real and imaginary spectrogram features are fused. Experimental results on the ASVspoof 2019 LA dataset show that our proposed system is very effective for the audio deepfake detection task, achieving an equivalent error rate (EER) of 0.43%, which surpasses almost all systems. △ Less

Submitted 1 August, 2022; originally announced August 2022.

arXiv:2207.07874 [pdf, other]

Model-Aware Contrastive Learning: Towards Escaping the Dilemmas

Authors: Zizheng Huang, Haoxing Chen, Ziqi Wen, Chao Zhang, Huaxiong Li, Bo Wang, Chunlin Chen

Abstract: Contrastive learning (CL) continuously achieves significant breakthroughs across multiple domains. However, the most common InfoNCE-based methods suffer from some dilemmas, such as \textit{uniformity-tolerance dilemma} (UTD) and \textit{gradient reduction}, both of which are related to a $\mathcal{P}_{ij}$ term. It has been identified that UTD can lead to unexpected performance degradation. We arg… ▽ More Contrastive learning (CL) continuously achieves significant breakthroughs across multiple domains. However, the most common InfoNCE-based methods suffer from some dilemmas, such as \textit{uniformity-tolerance dilemma} (UTD) and \textit{gradient reduction}, both of which are related to a $\mathcal{P}_{ij}$ term. It has been identified that UTD can lead to unexpected performance degradation. We argue that the fixity of temperature is to blame for UTD. To tackle this challenge, we enrich the CL loss family by presenting a Model-Aware Contrastive Learning (MACL) strategy, whose temperature is adaptive to the magnitude of alignment that reflects the basic confidence of the instance discrimination task, then enables CL loss to adjust the penalty strength for hard negatives adaptively. Regarding another dilemma, the gradient reduction issue, we derive the limits of an involved gradient scaling factor, which allows us to explain from a unified perspective why some recent approaches are effective with fewer negative samples, and summarily present a gradient reweighting to escape this dilemma. Extensive remarkable empirical results in vision, sentence, and graph modality validate our approach's general improvement for representation learning and downstream tasks. △ Less

Submitted 11 June, 2023; v1 submitted 16 July, 2022; originally announced July 2022.

Journal ref: ICML2023

arXiv:2207.07287 [pdf, other]

Riemannian Natural Gradient Methods

Authors: Jiang Hu, Ruicheng Ao, Anthony Man-Cho So, Minghan Yang, Zaiwen Wen

Abstract: This paper studies large-scale optimization problems on Riemannian manifolds whose objective function is a finite sum of negative log-probability losses. Such problems arise in various machine learning and signal processing applications. By introducing the notion of Fisher information matrix in the manifold setting, we propose a novel Riemannian natural gradient method, which can be viewed as a na… ▽ More This paper studies large-scale optimization problems on Riemannian manifolds whose objective function is a finite sum of negative log-probability losses. Such problems arise in various machine learning and signal processing applications. By introducing the notion of Fisher information matrix in the manifold setting, we propose a novel Riemannian natural gradient method, which can be viewed as a natural extension of the natural gradient method from the Euclidean setting to the manifold setting. We establish the almost-sure global convergence of our proposed method under standard assumptions. Moreover, we show that if the loss function satisfies certain convexity and smoothness conditions and the input-output map satisfies a Riemannian Jacobian stability condition, then our proposed method enjoys a local linear -- or, under the Lipschitz continuity of the Riemannian Jacobian of the input-output map, even quadratic -- rate of convergence. We then prove that the Riemannian Jacobian stability condition will be satisfied by a two-layer fully connected neural network with batch normalization with high probability, provided that the width of the network is sufficiently large. This demonstrates the practical relevance of our convergence rate result. Numerical experiments on applications arising from machine learning demonstrate the advantages of the proposed method over state-of-the-art ones. △ Less

Submitted 15 July, 2022; originally announced July 2022.

arXiv:2207.06147 [pdf, other]

A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP

Authors: Fan Chen, Junyu Zhang, Zaiwen Wen

Abstract: As an important framework for safe Reinforcement Learning, the Constrained Markov Decision Process (CMDP) has been extensively studied in the recent literature. However, despite the rich results under various on-policy learning settings, there still lacks some essential understanding of the offline CMDP problems, in terms of both the algorithm design and the information theoretic sample complexity… ▽ More As an important framework for safe Reinforcement Learning, the Constrained Markov Decision Process (CMDP) has been extensively studied in the recent literature. However, despite the rich results under various on-policy learning settings, there still lacks some essential understanding of the offline CMDP problems, in terms of both the algorithm design and the information theoretic sample complexity lower bound. In this paper, we focus on solving the CMDP problems where only offline data are available. By adopting the concept of the single-policy concentrability coefficient $C^*$, we establish an $Ω\left(\frac{\min\left\{|\mathcal{S}||\mathcal{A}|,|\mathcal{S}|+I\right\} C^*}{(1-γ)^3ε^2}\right)$ sample complexity lower bound for the offline CMDP problem, where $I$ stands for the number of constraints. By introducing a simple but novel deviation control mechanism, we propose a near-optimal primal-dual learning algorithm called DPDL. This algorithm provably guarantees zero constraint violation and its sample complexity matches the above lower bound except for an $\tilde{\mathcal{O}}((1-γ)^{-1})$ factor. Comprehensive discussion on how to deal with the unknown constant $C^*$ and the potential asynchronous structure on the offline dataset are also included. △ Less

Submitted 13 July, 2022; originally announced July 2022.

arXiv:2207.03096 [pdf]

doi 10.1038/s41566-023-01240-x

Single multimode fiber for in vivo light-field encoded nano-imaging

Authors: Zhong Wen, Zhenyu Dong, Chenlei Pang, Clemens F. Kaminski, Qilin Deng, Jinggang Xu, Liqiang Wang, Songguo Liu, Jianbin Tang, Wei Chen, Xu Liu, Qing Yang

Abstract: Super-resolution microscopy normally requiring complex and cumbersome optics is not applicable for in situ imaging through a narrow channel. Here, we demonstrate single hair-thin multimode fiber (MMF) endoscope (less than 250 $μm$) for in vivo light-field nano-imaging, which is called spatial-frequency tracking adaptive beacon light-field encoded nano-endoscopy (STABLE nano-endoscopy) that enables… ▽ More Super-resolution microscopy normally requiring complex and cumbersome optics is not applicable for in situ imaging through a narrow channel. Here, we demonstrate single hair-thin multimode fiber (MMF) endoscope (less than 250 $μm$) for in vivo light-field nano-imaging, which is called spatial-frequency tracking adaptive beacon light-field encoded nano-endoscopy (STABLE nano-endoscopy) that enables three-dimensional (3D) subcellular-scale imaging. Spatial-frequency tracking provides up to $10^3$ Hz disorder tracking that ensures stable imaging in long-haul MMFs (up to 200 m) under various conditions. Full-vector modulation and fluorescence emission difference are combined to enhance the imaging signal-to-noise ratio two times and to improve the resolution to sub-diffraction-limited 250 nm ($λ/3NA$). STABLE nano-endoscopy and white-light endoscopy (WLE) are integrated to achieve cross-scale in vivo imaging inside the lumen. This high-resolution and robust observation in a minimally invasive manner paves the way to gain a deeper understanding of the disease mechanisms and to bridge clinical and biological sciences. △ Less

Submitted 7 July, 2022; originally announced July 2022.

Comments: 14 pages, 5 figures

Journal ref: Published online: 03 July 2023, Nature Photonics

arXiv:2207.00137 [pdf, other]

Robustness of Epinets against Distributional Shifts

Authors: Xiuyuan Lu, Ian Osband, Seyed Mohammad Asghari, Sven Gowal, Vikranth Dwaracherla, Zheng Wen, Benjamin Van Roy

Abstract: Recent work introduced the epinet as a new approach to uncertainty modeling in deep learning. An epinet is a small neural network added to traditional neural networks, which, together, can produce predictive distributions. In particular, using an epinet can greatly improve the quality of joint predictions across multiple inputs, a measure of how well a neural network knows what it does not know. I… ▽ More Recent work introduced the epinet as a new approach to uncertainty modeling in deep learning. An epinet is a small neural network added to traditional neural networks, which, together, can produce predictive distributions. In particular, using an epinet can greatly improve the quality of joint predictions across multiple inputs, a measure of how well a neural network knows what it does not know. In this paper, we examine whether epinets can offer similar advantages under distributional shifts. We find that, across ImageNet-A/O/C, epinets generally improve robustness metrics. Moreover, these improvements are more significant than those afforded by even very large ensembles at orders of magnitude lower computational costs. However, these improvements are relatively small compared to the outstanding issues in distributionally-robust deep learning. Epinets may be a useful tool in the toolbox, but they are far from the complete solution. △ Less

Submitted 30 June, 2022; originally announced July 2022.

arXiv:2206.03633 [pdf, other]

Ensembles for Uncertainty Estimation: Benefits of Prior Functions and Bootstrapping

Authors: Vikranth Dwaracherla, Zheng Wen, Ian Osband, Xiuyuan Lu, Seyed Mohammad Asghari, Benjamin Van Roy

Abstract: In machine learning, an agent needs to estimate uncertainty to efficiently explore and adapt and to make effective decisions. A common approach to uncertainty estimation maintains an ensemble of models. In recent years, several approaches have been proposed for training ensembles, and conflicting views prevail with regards to the importance of various ingredients of these approaches. In this paper… ▽ More In machine learning, an agent needs to estimate uncertainty to efficiently explore and adapt and to make effective decisions. A common approach to uncertainty estimation maintains an ensemble of models. In recent years, several approaches have been proposed for training ensembles, and conflicting views prevail with regards to the importance of various ingredients of these approaches. In this paper, we aim to address the benefits of two ingredients -- prior functions and bootstrapping -- which have come into question. We show that prior functions can significantly improve an ensemble agent's joint predictions across inputs and that bootstrapping affords additional benefits if the signal-to-noise ratio varies across inputs. Our claims are justified by both theoretical and experimental results. △ Less

Submitted 7 June, 2022; originally announced June 2022.

arXiv:2206.03091 [pdf, ps, other]

doi 10.3847/1538-4357/ac75d1

The discovery of a rotating radio transient J1918$-$0449 with intriguing emission properties with the five hundred meter aperture spherical radio telescope

Authors: J. L. Chen, Z. G. Wen, J. P. Yuan, N. Wang, D. Li, H. G. Wang, W. M. Yan, R. Yuen, P. Wang, Z. Wang, W. W. Zhu, J. R. Niu, C. C. Miao, M. Y. Xue, B. P. Gong

Abstract: In this study, we report on a detailed single pulse analysis of the radio emission from a rotating radio transient (RRAT) J1918$-$0449 which is the first RRAT discovered with the five hundred meter aperture spherical radio telescope (FAST). The sensitive observations were carried out on 30 April 2021 using the FAST with a central frequency of 1250 MHz and a short time resolution of 49.152 $μ$s, wh… ▽ More In this study, we report on a detailed single pulse analysis of the radio emission from a rotating radio transient (RRAT) J1918$-$0449 which is the first RRAT discovered with the five hundred meter aperture spherical radio telescope (FAST). The sensitive observations were carried out on 30 April 2021 using the FAST with a central frequency of 1250 MHz and a short time resolution of 49.152 $μ$s, which forms a reliable basis to probe single pulse emission properties in detail. The source was successively observed for around 2 hours. A total of 83 dispersed bursts with significance above 6$σ$ are detected over 1.8 hours. The source's DM and rotational period are determined to be 116.1$\pm$0.4 \pcm \ and 2479.21$\pm$0.03 ms, respectively. The share of registered pulses from the total number of observed period is 3.12\%. No underlying emission is detected in the averaged off pulse profile. For bursts with fluence larger than 10 Jy ms, the pulse energy follows a power-law distribution with an index of $-3.1\pm0.4$, suggesting the existence of bright pulse emission. We find that the distribution of time between subsequent pulses is consistent with a stationary Poisson process and find no evidence of clustering over the 1.8 h observations, giving a mean burst rate of one burst every 66 s. Close inspection of the detected bright pulses reveals that 21 pulses exhibit well-defined quasi-periodicities. The subpulse drifting is present in non-successive rotations with periodicity of $2.51\pm0.06$ periods. Finally, possible physical mechanisms are discussed. △ Less

Submitted 7 June, 2022; originally announced June 2022.

Comments: 11 pages, 11 figures

arXiv:2205.09148 [pdf, other]

DDXPlus: A New Dataset For Automatic Medical Diagnosis

Authors: Arsene Fansi Tchango, Rishab Goel, Zhi Wen, Julien Martel, Joumana Ghosn

Abstract: There has been a rapidly growing interest in Automatic Symptom Detection (ASD) and Automatic Diagnosis (AD) systems in the machine learning research literature, aiming to assist doctors in telemedicine services. These systems are designed to interact with patients, collect evidence about their symptoms and relevant antecedents, and possibly make predictions about the underlying diseases. Doctors w… ▽ More There has been a rapidly growing interest in Automatic Symptom Detection (ASD) and Automatic Diagnosis (AD) systems in the machine learning research literature, aiming to assist doctors in telemedicine services. These systems are designed to interact with patients, collect evidence about their symptoms and relevant antecedents, and possibly make predictions about the underlying diseases. Doctors would review the interactions, including the evidence and the predictions, collect if necessary additional information from patients, before deciding on next steps. Despite recent progress in this area, an important piece of doctors' interactions with patients is missing in the design of these systems, namely the differential diagnosis. Its absence is largely due to the lack of datasets that include such information for models to train on. In this work, we present a large-scale synthetic dataset of roughly 1.3 million patients that includes a differential diagnosis, along with the ground truth pathology, symptoms and antecedents for each patient. Unlike existing datasets which only contain binary symptoms and antecedents, this dataset also contains categorical and multi-choice symptoms and antecedents useful for efficient data collection. Moreover, some symptoms are organized in a hierarchy, making it possible to design systems able to interact with patients in a logical way. As a proof-of-concept, we extend two existing AD and ASD systems to incorporate the differential diagnosis, and provide empirical evidence that using differentials as training signals is essential for the efficiency of such systems or for helping doctors better understand the reasoning of those systems. △ Less

Submitted 13 October, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

Comments: Camera ready. NeurIPS 2022 Datasets and Benchmarks Track

arXiv:2205.06226 [pdf, other]

The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning

Authors: Zixin Wen, Yuanzhi Li

Abstract: Recently the surprising discovery of the Bootstrap Your Own Latent (BYOL) method by Grill et al. shows the negative term in contrastive loss can be removed if we add the so-called prediction head to the network. This initiated the research of non-contrastive self-supervised learning. It is mysterious why even when there exist trivial collapsed global optimal solutions, neural networks trained by (… ▽ More Recently the surprising discovery of the Bootstrap Your Own Latent (BYOL) method by Grill et al. shows the negative term in contrastive loss can be removed if we add the so-called prediction head to the network. This initiated the research of non-contrastive self-supervised learning. It is mysterious why even when there exist trivial collapsed global optimal solutions, neural networks trained by (stochastic) gradient descent can still learn competitive representations. This phenomenon is a typical example of implicit bias in deep learning and remains little understood. In this work, we present our empirical and theoretical discoveries on non-contrastive self-supervised learning. Empirically, we find that when the prediction head is initialized as an identity matrix with only its off-diagonal entries being trainable, the network can learn competitive representations even though the trivial optima still exist in the training objective. Theoretically, we present a framework to understand the behavior of the trainable, but identity-initialized prediction head. Under a simple setting, we characterized the substitution effect and acceleration effect of the prediction head. The substitution effect happens when learning the stronger features in some neurons can substitute for learning these features in other neurons through updating the prediction head. And the acceleration effect happens when the substituted features can accelerate the learning of other weaker features to prevent them from being ignored. These two effects enable the neural networks to learn all the features rather than focus only on learning the stronger features, which is likely the cause of the dimensional collapse phenomenon. To the best of our knowledge, this is also the first end-to-end optimization guarantee for non-contrastive methods using nonlinear neural networks with a trainable prediction head and normalization. △ Less

Submitted 15 January, 2023; v1 submitted 12 May, 2022; originally announced May 2022.

Comments: 88 pages, comments welcome

Showing 151–200 of 528 results for author: Wen, Z