subscribe to arXiv mailings

Acceleration of positive muons by a radio-frequency cavity

Authors: S. Aritome, K. Futatsukawa, H. Hara, K. Hayasaka, Y. Ibaraki, T. Ichikawa, T. Iijima, H. Iinuma, Y. Ikedo, Y. Imai, K. Inami, K. Ishida, S. Kamal, S. Kamioka, N. Kawamura, M. Kimura, A. Koda, S. Koji, K. Kojima, A. Kondo, Y. Kondo, M. Kuzuba, R. Matsushita, T. Mibe, Y. Miyamoto , et al. (29 additional authors not shown)

Abstract: Acceleration of positive muons from thermal energy to $100~$keV has been demonstrated. Thermal muons were generated by resonant multi-photon ionization of muonium atoms emitted from a sheet of laser-ablated aerogel. The thermal muons were first electrostatically accelerated to $5.7~$keV, followed by further acceleration to 100 keV using a radio-frequency quadrupole. The transverse normalized emitt… ▽ More Acceleration of positive muons from thermal energy to $100~$keV has been demonstrated. Thermal muons were generated by resonant multi-photon ionization of muonium atoms emitted from a sheet of laser-ablated aerogel. The thermal muons were first electrostatically accelerated to $5.7~$keV, followed by further acceleration to 100 keV using a radio-frequency quadrupole. The transverse normalized emittance of the accelerated muons in the horizontal and vertical planes were $0.85 \pm 0.25 ~\rm{(stat.)}~^{+0.22}_{-0.13} ~\rm{(syst.)}~π~$mm$\cdot$mrad and $0.32\pm 0.03~\rm{(stat.)} ^{+0.05}_{-0.02} ~\rm{(syst.)}~π~$mm$\cdot$mrad, respectively. The measured emittance values demonstrated phase space reduction by a factor of $2.0\times 10^2$ (horizontal) and $4.1\times 10^2$ (vertical) allowing good acceleration efficiency. These results pave the way to realize the first-ever muon accelerator for a variety of applications in particle physics, material science, and other fields. △ Less

Submitted 15 October, 2024; originally announced October 2024.

arXiv:2410.08707 [pdf, other]

The nature of low-luminosity AGNs discovered by JWST at $5<z<6$ based on clustering analysis: ancestors of quasars at $z\lesssim3$?

Authors: Junya Arita, Nobunari Kashikawa, Masafusa Onoue, Takehiro Yoshioka, Yoshihiro Takeda, Hiroki Hoshi, Shunta Shimizu

Abstract: James Webb Space Telescope (JWST) has discovered many faint AGNs at high-$z$ by detecting their broad Balmer lines. However, their high number density, lack of X-ray emission, and overly high black hole masses with respect to their host stellar masses suggest that they are a distinct population from general type-1 quasars. Here, we present clustering analysis of 28 low-luminosity broad-line AGNs f… ▽ More James Webb Space Telescope (JWST) has discovered many faint AGNs at high-$z$ by detecting their broad Balmer lines. However, their high number density, lack of X-ray emission, and overly high black hole masses with respect to their host stellar masses suggest that they are a distinct population from general type-1 quasars. Here, we present clustering analysis of 28 low-luminosity broad-line AGNs found by JWST (JWST AGNs) at $5<z<6$ based on cross-correlation analysis with 679 photometrically-selected galaxies to characterize their host dark matter halo (DMH) masses. From angular and projected cross-correlation functions, we find that their typical DMH mass is $\log (M_\mathrm{halo}/h^{-1}\mathrm{M}_\odot) = 11.61_{-0.24}^{+0.19},$ and $ 11.72_{-0.20}^{+0.17}$, respectively. This result implies that the host DMHs of these AGNs are $\sim1$ dex smaller than that of luminous quasars. The DMHs of JWST AGNs at $5<z<6$ are predicted to grow to $10^{12\unicode{x2013}13}\,h^{-1}\mathrm{M}_\odot$, a typical mass of quasar at $z\lesssim3$. Applying the empirical stellar-to-halo mass ratio to the measured DMH mass, their host stellar mass is evaluated as $\log(M_*/\mathrm{M}_\odot)=9.72_{-0.39}^{+0.31},$ and $ 9.90_{-0.33}^{+0.27}$, which are higher than some of those estimated by the SED fitting. We also evaluate their duty cycle as $f_\mathrm{duty}=0.36_{-0.14}^{+0.18}$ per cent, namely $\sim7\times10^6$ yr as the lifetime of JWST AGNs. While we cannot exclude the possibility that JWST AGNs are simply low-mass type-1 quasars, these results suggest that JWST AGNs are a different population from type-1 quasars, and may be the ancestors of quasars at $z\lesssim3$. △ Less

Submitted 11 October, 2024; originally announced October 2024.

Comments: 12 pages, 7 figures, submitted to MNRAS

arXiv:2407.14219 [pdf, other]

Probing instantaneous quantum circuit refrigeration in the quantum regime

Authors: Shuji Nakamura, Teruaki Yoshioka, Sergei Lemziakov, Dmitrii Lvov, Hiroto Mukai, Akiyoshi Tomonaga, Shintaro Takada, Yuma Okazaki, Nobu-Hisa Kaneko, Jukka Pekola, Jaw-Shen Tsai

Abstract: Recent advancements in circuit quantum electrodynamics have enabled precise manipulation and detection of the single energy quantum in quantum systems. A quantum circuit refrigerator (QCR) is capable of electrically cooling the excited population of quantum systems, such as superconducting resonators and qubits, through photon-assisted tunneling of quasi-particles within a superconductor-insulator… ▽ More Recent advancements in circuit quantum electrodynamics have enabled precise manipulation and detection of the single energy quantum in quantum systems. A quantum circuit refrigerator (QCR) is capable of electrically cooling the excited population of quantum systems, such as superconducting resonators and qubits, through photon-assisted tunneling of quasi-particles within a superconductor-insulator-normal metal junction. In this study, we demonstrated instantaneous QCR in the quantum regime. We performed the time-resolved measurement of the QCR-induced cooling of photon number inside the superconducting resonator by harnessing a qubit as a photon detector. From the enhanced photon loss rate of the resonator estimated from the amount of the AC Stark shift, the QCR was shown to have a cooling power of approximately 300 aW. Furthermore, even below the single energy quantum, the QCR can reduce the number of photons inside the resonator with 100 ns pulse from thermal equilibrium. Numerical calculations based on the Lindblad master equation successfully reproduced these experimental results. △ Less

Submitted 13 August, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

Comments: 15 pages, 9 figures, and 1 table

arXiv:2407.11277 [pdf, other]

doi 10.21437/Interspeech.2024-225

Target conversation extraction: Source separation using turn-taking dynamics

Authors: Tuochao Chen, Qirui Wang, Bohan Wu, Malek Itani, Sefik Emre Eskimez, Takuya Yoshioka, Shyamnath Gollakota

Abstract: Extracting the speech of participants in a conversation amidst interfering speakers and noise presents a challenging problem. In this paper, we introduce the novel task of target conversation extraction, where the goal is to extract the audio of a target conversation based on the speaker embedding of one of its participants. To accomplish this, we propose leveraging temporal patterns inherent in h… ▽ More Extracting the speech of participants in a conversation amidst interfering speakers and noise presents a challenging problem. In this paper, we introduce the novel task of target conversation extraction, where the goal is to extract the audio of a target conversation based on the speaker embedding of one of its participants. To accomplish this, we propose leveraging temporal patterns inherent in human conversations, particularly turn-taking dynamics, which uniquely characterize speakers engaged in conversation and distinguish them from interfering speakers and noise. Using neural networks, we show the feasibility of our approach on English and Mandarin conversation datasets. In the presence of interfering speakers, our results show an 8.19 dB improvement in signal-to-noise ratio for 2-speaker conversations and a 7.92 dB improvement for 2-4-speaker conversations. Code, dataset available at https://github.com/chentuochao/Target-Conversation-Extraction. △ Less

Submitted 29 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

Comments: Accepted by Interspeech 2024

arXiv:2407.11055 [pdf, other]

Knowledge boosting during low-latency inference

Authors: Vidya Srinivas, Malek Itani, Tuochao Chen, Sefik Emre Eskimez, Takuya Yoshioka, Shyamnath Gollakota

Abstract: Models for low-latency, streaming applications could benefit from the knowledge capacity of larger models, but edge devices cannot run these models due to resource constraints. A possible solution is to transfer hints during inference from a large model running remotely to a small model running on-device. However, this incurs a communication delay that breaks real-time requirements and does not gu… ▽ More Models for low-latency, streaming applications could benefit from the knowledge capacity of larger models, but edge devices cannot run these models due to resource constraints. A possible solution is to transfer hints during inference from a large model running remotely to a small model running on-device. However, this incurs a communication delay that breaks real-time requirements and does not guarantee that both models will operate on the same data at the same time. We propose knowledge boosting, a novel technique that allows a large model to operate on time-delayed input during inference, while still boosting small model performance. Using a streaming neural network that processes 8 ms chunks, we evaluate different speech separation and enhancement tasks with communication delays of up to six chunks or 48 ms. Our results show larger gains where the performance gap between the small and large models is wide, demonstrating a promising method for large-small model collaboration for low-latency applications. Code, dataset, and audio samples available at https://knowledgeboosting.cs.washington.edu/. △ Less

Submitted 25 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

Comments: Accepted by Interspeech 2024

arXiv:2406.15401 [pdf, ps, other]

Circular polarization measurement for individual gamma rays in capture reactions with intense pulsed neutrons

Authors: S. Endo, R. Abe, H. Fujioka, T. Ino, O. Iwamoto, N. Iwamoto, S. Kawamura, A. Kimura, M. Kitaguchi, R. Kobayashi, S. Nakamura, T. Oku T. Okudaira, M. Okuizumi, M. Omer, G. Rovira, T. Shima, H. M. Shimizu, T. Shizuma, Y. Taira, S. Takada, S. Takahashi, H. Yoshikawa, T. Yoshioka, H. Zen

Abstract: Measurements of circular polarization of $γ$-ray emitted from neutron capture reactions provide valuable information for nuclear physics studies. The spin and parity of excited states can be determined by measuring the circular polarization from polarized neutron capture reactions. Furthermore, the $γ$-ray circular polarization in a neutron capture resonance is crucial for studying the enhancement… ▽ More Measurements of circular polarization of $γ$-ray emitted from neutron capture reactions provide valuable information for nuclear physics studies. The spin and parity of excited states can be determined by measuring the circular polarization from polarized neutron capture reactions. Furthermore, the $γ$-ray circular polarization in a neutron capture resonance is crucial for studying the enhancement effect of parity nonconservation in compound nuclei. The $γ$-ray circular polarization can be measured using a polarimeter based on magnetic Compton scattering. A polarimeter was constructed, and its performance indicators were evaluated using a circularly polarized $γ$-ray beam. Furthermore, as a demonstration, the $γ$-ray circular polarization was measured in $^{32}$S($\vec{\textrm{n}}$,$γ$)$^{33}$S reactions with polarized neutrons. △ Less

Submitted 7 May, 2024; originally announced June 2024.

Comments: 10pages, 13 figures

arXiv:2405.06289 [pdf, other]

doi 10.1145/3613904.3642057

Look Once to Hear: Target Speech Hearing with Noisy Examples

Authors: Bandhav Veluri, Malek Itani, Tuochao Chen, Takuya Yoshioka, Shyamnath Gollakota

Abstract: In crowded settings, the human brain can focus on speech from a target speaker, given prior knowledge of how they sound. We introduce a novel intelligent hearable system that achieves this capability, enabling target speech hearing to ignore all interfering speech and noise, but the target speaker. A naive approach is to require a clean speech example to enroll the target speaker. This is however… ▽ More In crowded settings, the human brain can focus on speech from a target speaker, given prior knowledge of how they sound. We introduce a novel intelligent hearable system that achieves this capability, enabling target speech hearing to ignore all interfering speech and noise, but the target speaker. A naive approach is to require a clean speech example to enroll the target speaker. This is however not well aligned with the hearable application domain since obtaining a clean example is challenging in real world scenarios, creating a unique user interface problem. We present the first enrollment interface where the wearer looks at the target speaker for a few seconds to capture a single, short, highly noisy, binaural example of the target speaker. This noisy example is used for enrollment and subsequent speech extraction in the presence of interfering speakers and noise. Our system achieves a signal quality improvement of 7.01 dB using less than 5 seconds of noisy enrollment audio and can process 8 ms of audio chunks in 6.24 ms on an embedded CPU. Our user studies demonstrate generalization to real-world static and mobile speakers in previously unseen indoor and outdoor multipath environments. Finally, our enrollment interface for noisy examples does not cause performance degradation compared to clean examples, while being convenient and user-friendly. Taking a step back, this paper takes an important step towards enhancing the human auditory perception with artificial intelligence. We provide code and data at: https://github.com/vb000/LookOnceToHear. △ Less

Submitted 29 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

Comments: Best paper honorable mention at CHI 2024

arXiv:2404.18387 [pdf, ps, other]

Quantum entanglement in a pure state of strongly correlated quantum impurity systems

Authors: Yunori Nishikawa, Tomoki Yoshioka

Abstract: We consider quantum entanglement in strongly correlated quantum impurity systems for states manifesting interesting properties such as multi-level Kondo effect and dual nature between itineracy and localization etc.. For this purpose, we set up a system consisting of one or two quantum impurities arbitrarily selected from the system as a subsystem, and investigate quantum entanglement with its env… ▽ More We consider quantum entanglement in strongly correlated quantum impurity systems for states manifesting interesting properties such as multi-level Kondo effect and dual nature between itineracy and localization etc.. For this purpose, we set up a system consisting of one or two quantum impurities arbitrarily selected from the system as a subsystem, and investigate quantum entanglement with its environmental system. We reduce the pure state of interest as described above to the subsystem, and formulate quantum informative quantities such as entanglement entropy, mutual information and relative entropy. We apply them to the single impurity Anderson model where the most basic Kondo effect is manifested, and obtain new insights into the Kondo effect there. The obtained results suggest that the method proposed here is promising for elucidating the quantum entanglement of pure states in various quantum impurity systems. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.16381 [pdf, ps, other]

Abstracting Effect Systems for Algebraic Effect Handlers

Authors: Takuma Yoshioka, Taro Sekiyama, Atsushi Igarashi

Abstract: Many effect systems for algebraic effect handlers are designed to guarantee that all invoked effects are handled adequately. However, respective researchers have developed their own effect systems that differ in how to represent the collections of effects that may happen. This situation results in blurring what is required for the representation and manipulation of effect collections in a safe eff… ▽ More Many effect systems for algebraic effect handlers are designed to guarantee that all invoked effects are handled adequately. However, respective researchers have developed their own effect systems that differ in how to represent the collections of effects that may happen. This situation results in blurring what is required for the representation and manipulation of effect collections in a safe effect system. In this work, we present a language ${λ_{\mathrm{EA}}}$ equipped with an effect system that abstracts the existing effect systems for algebraic effect handlers. The effect system of ${λ_{\mathrm{EA}}}$ is parameterized over effect algebras, which abstract the representation and manipulation of effect collections in safe effect systems. We prove the type-and-effect safety of ${λ_{\mathrm{EA}}}$ by assuming that a given effect algebra meets certain properties called safety conditions. As a result, we can obtain the safety properties of a concrete effect system by proving that an effect algebra corresponding to the concrete system meets the safety conditions. We also show that effect algebras meeting the safety conditions are expressive enough to accommodate some existing effect systems, each of which represents effect collections in a different style. Our framework can also differentiate the safety aspects of the effect collections of the existing effect systems. To this end, we extend ${λ_{\mathrm{EA}}}$ and the safety conditions to lift coercions and type-erasure semantics, propose other effect algebras including ones for which no effect system has been studied in the literature, and compare which effect algebra is safe and which is not for the extensions. △ Less

Submitted 25 April, 2024; originally announced April 2024.

ACM Class: D.3.1; D.3.2; D.3.3; F.3.3

arXiv:2404.09841 [pdf, other]

Anatomy of Industrial Scale Multilingual ASR

Authors: Francis McCann Ramirez, Luka Chkhetiani, Andrew Ehrenberg, Robert McHardy, Rami Botros, Yash Khare, Andrea Vanzo, Taufiquzzaman Peyash, Gabriel Oexle, Michael Liang, Ilya Sklyar, Enver Fakhan, Ahmed Etefy, Daniel McCrystal, Sam Flamini, Domenic Donato, Takuya Yoshioka

Abstract: This paper describes AssemblyAI's industrial-scale automatic speech recognition (ASR) system, designed to meet the requirements of large-scale, multilingual ASR serving various application needs. Our system leverages a diverse training dataset comprising unsupervised (12.5M hours), supervised (188k hours), and pseudo-labeled (1.6M hours) data across four languages. We provide a detailed descriptio… ▽ More This paper describes AssemblyAI's industrial-scale automatic speech recognition (ASR) system, designed to meet the requirements of large-scale, multilingual ASR serving various application needs. Our system leverages a diverse training dataset comprising unsupervised (12.5M hours), supervised (188k hours), and pseudo-labeled (1.6M hours) data across four languages. We provide a detailed description of our model architecture, consisting of a full-context 600M-parameter Conformer encoder pre-trained with BEST-RQ and an RNN-T decoder fine-tuned jointly with the encoder. Our extensive evaluation demonstrates competitive word error rates (WERs) against larger and more computationally expensive models, such as Whisper large and Canary-1B. Furthermore, our architectural choices yield several key advantages, including an improved code-switching capability, a 5x inference speedup compared to an optimized Whisper baseline, a 30% reduction in hallucination rate on speech data, and a 90% reduction in ambient noise compared to Whisper, along with significantly improved time-stamp accuracy. Throughout this work, we adopt a system-centric approach to analyzing various aspects of fully-fledged ASR models to gain practically relevant insights useful for real-world services operating at scale. △ Less

Submitted 16 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

arXiv:2403.04632 [pdf, other]

Software Compensation for Highly Granular Calorimeters using Machine Learning

Authors: S. Lai, J. Utehs, A. Wilhahn, O. Bach, E. Brianne, A. Ebrahimi, K. Gadow, P. Göttlicher, O. Hartbrich, D. Heuchel, A. Irles, K. Krüger, J. Kvasnicka, S. Lu, C. Neubüser, A. Provenza, M. Reinecke, F. Sefkow, S. Schuwalow, M. De Silva, Y. Sudo, H. L. Tran, E. Buhmann, E. Garutti, S. Huck , et al. (39 additional authors not shown)

Abstract: A neural network for software compensation was developed for the highly granular CALICE Analogue Hadronic Calorimeter (AHCAL). The neural network uses spatial and temporal event information from the AHCAL and energy information, which is expected to improve sensitivity to shower development and the neutron fraction of the hadron shower. The neural network method produced a depth-dependent energy w… ▽ More A neural network for software compensation was developed for the highly granular CALICE Analogue Hadronic Calorimeter (AHCAL). The neural network uses spatial and temporal event information from the AHCAL and energy information, which is expected to improve sensitivity to shower development and the neutron fraction of the hadron shower. The neural network method produced a depth-dependent energy weighting and a time-dependent threshold for enhancing energy deposits consistent with the timescale of evaporation neutrons. Additionally, it was observed to learn an energy-weighting indicative of longitudinal leakage correction. In addition, the method produced a linear detector response and outperformed a published control method regarding resolution for every particle energy studied. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2402.18876 [pdf, ps, other]

Transverse asymmetry of individual $γ$-rays in the $^{139}$La($\vec{n}$, $γ$)$^{140}$La reaction

Authors: M. Okuizumi, C. J. Auton, S. Endo, H. Fujioka, K. Hirota, T. Ino, K. Ishizaki, A. Kimura, M. Kitaguchi, J. Koga, S. Makise, Y. Niinomi, T. Oku, T. Okudaira, K. Sakai, T. Shima, H. M. Shimizu, H. Tada, S. Takada, S. Takahashi, Y. Tani, T. Yamamoto, H. Yoshikawa, T. Yoshioka

Abstract: The enhancement of the parity-violating asymmetry in the vicinity of $p$-wave compound nuclear resonances was observed for a variety of medium-heavy nuclei. The enhanced parity-violating asymmetry can be understood using the $s$-$p$ mixing model. The $s$-$p$ mixing model predicts several neutron energy-dependent angular correlations between the neutron momentum $\vec k_n$, neutron spin $\vecσ_n$,… ▽ More The enhancement of the parity-violating asymmetry in the vicinity of $p$-wave compound nuclear resonances was observed for a variety of medium-heavy nuclei. The enhanced parity-violating asymmetry can be understood using the $s$-$p$ mixing model. The $s$-$p$ mixing model predicts several neutron energy-dependent angular correlations between the neutron momentum $\vec k_n$, neutron spin $\vecσ_n$, $γ$-ray momentum $\vec k_γ$, and $γ$-ray polarization $\vecσ_γ$ in the $(n,γ)$ reaction. In this paper, the improved value of the transverse asymmetry of $γ$-ray emissions, corresponding to a correlation term $\vecσ_n\cdot(\vec k_n\times\vec k_γ)$ in the $^{139}\mathrm{La}(\vec n,γ)^{140}\mathrm{La}$ reaction, and the transverse asymmetries in the transitions to several low excited states of $^{140}\mathrm{La}$ are reported. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: 5 pages, 5 figures, 2 tables

arXiv:2401.11920 [pdf, other]

The quality assurance test of the SliT ASIC for the J-PARC muon $g-2$/EDM experiment

Authors: Takashi Yamanaka, Yoichi Fujita, Eitaro Hamada, Tetsuichi Kishishita, Tsutomu Mibe, Yutaro Sato, Yoshiaki Seino, Masayoshi Shoji, Taikain Suehara, Manobu M. Tanaka, Junji Tojo, Keisuke Umebayashi, Tamaki Yoshioka

Abstract: The SliT ASIC is a readout chip for the silicon strip detector to be used at the J-PARC muon $g-2$/EDM experiment. The production version of SliT128D was designed and mass production was finished. A quality assurance test method for bare SliT128D chips was developed to provide a sufficient number of chips for the experiment. The quality assurance test of the SliT128D chips was performed and 5735 c… ▽ More The SliT ASIC is a readout chip for the silicon strip detector to be used at the J-PARC muon $g-2$/EDM experiment. The production version of SliT128D was designed and mass production was finished. A quality assurance test method for bare SliT128D chips was developed to provide a sufficient number of chips for the experiment. The quality assurance test of the SliT128D chips was performed and 5735 chips were inspected. No defect was observed in chips of 84.3%. Accepting a few channels with poor time walk performance out of 128 channels per chip, more than 90% yield can be achieved, which is sufficient to construct the whole detector. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 5 pages, 8 figures

arXiv:2312.12959 [pdf, other]

Performance of the Fully-equipped Spin Flip Chopper For Neutron Lifetime Experiment at J-PARC

Authors: K. Mishima, G. Ichikawa, Y. Fuwa, T. Hasegawa, M. Hino, R. Hosokawa, T. Ino, Y. Iwashita, M. Kitaguchi, S. Matsuzaki, T. Mogi, H. Okabe, T. Oku, T. Okudaira, Y. Seki, H. E. Shimizu, H. M. Shimizu, S. Takahashi, M. Tanida, S. Yamashita, M. Yokohashi, T. Yoshioka

Abstract: To solve the ''neutron lifetime puzzle,'' where measured neutron lifetimes differ depending on the measurement methods, an experiment with pulsed neutron beam at J-PARC is in progress. In this experiment, neutrons are bunched into 40-cm lengths using a spin flip chopper (SFC), where the statistical sensitivity was limited by the aperture size of the SFC. The SFC comprises three sets of magnetic su… ▽ More To solve the ''neutron lifetime puzzle,'' where measured neutron lifetimes differ depending on the measurement methods, an experiment with pulsed neutron beam at J-PARC is in progress. In this experiment, neutrons are bunched into 40-cm lengths using a spin flip chopper (SFC), where the statistical sensitivity was limited by the aperture size of the SFC. The SFC comprises three sets of magnetic supermirrors and two resonant spin flippers. In this paper, we discuss an upgrade to enlarge the apertures of the SFC. With this upgrade, the statistics per unit time of the neutron lifetime experiment increased by a factor of 2.8, while maintaining a signal-to-noise ratio of 250-400 comparable to the previous one. Consequently, the time required to reach a precision of 1 s in the neutron lifetime experiment was reduced from 590 to 170 days, which is a significant reduction in time. This improvement in statistic will also contribute to the reduction of systematic uncertainties, such as background evaluation, fostering further advancements in the neutron lifetime experiments at J-PARC. △ Less

Submitted 31 July, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

Comments: 33 pages, 22 figures

arXiv:2312.06115 [pdf, ps, other]

High sensitivity of a future search for P-odd/T-odd interactions on the 0.75 eV $p$-wave resonance in $\vec{n}+^{139}\vec{\rm La}$ forward transmission determined using pulsed neutron beam

Authors: R. Nakabe, C. J. Auton, S. Endo, H. Fujioka, V. Gudkov, K. Hirota, I. Ide, T. Ino, M. Ishikado, W. Kambara, S. Kawamura, A. Kimura, M. Kitaguchi, R. Kobayashi, T. Okamura, T. Oku, T. Okudaira, M. Okuizumi, J. G. Otero Munoz, J. D. Parker, K. Sakai, T. Shima, H. M. Shimizu, T. Shinohara, W. M. Snow , et al. (5 additional authors not shown)

Abstract: Neutron transmission experiments can offer a new type of highly sensitive search for time-reversal invariance violating (TRIV) effects in nucleon-nucleon interactions via the same enhancement mechanism observed for large parity violating (PV) effects in neutron-induced compound nuclear processes. In these compound processes, the TRIV cross-section is given as the product of the PV cross-section, a… ▽ More Neutron transmission experiments can offer a new type of highly sensitive search for time-reversal invariance violating (TRIV) effects in nucleon-nucleon interactions via the same enhancement mechanism observed for large parity violating (PV) effects in neutron-induced compound nuclear processes. In these compound processes, the TRIV cross-section is given as the product of the PV cross-section, a spin-factor $κ$, and a ratio of TRIV and PV matrix elements. We determined $κ$ to be $0.59\pm0.05$ for $^{139}$La+$n$ using both $(n, γ)$ spectroscopy and ($\vec{n}+^{139}\vec{\rm La}$) transmission. This result quantifies for the first time the high sensitivity of the $^{139}$La 0.75~eV $p$-wave resonance in a future search for P-odd/T-odd interactions in ($\vec{n}+^{139}\vec{\rm La}$) forward transmission. △ Less

Submitted 10 December, 2023; originally announced December 2023.

arXiv:2312.04710 [pdf, ps, other]

doi 10.1109/QCE57702.2023.00041

Experimental Demonstration of Fermionic QAOA with One-Dimensional Cyclic Driver Hamiltonian

Authors: Takuya Yoshioka, Keita Sasada, Yuichiro Nakano, Keisuke Fujii

Abstract: Quantum approximate optimization algorithm (QAOA) has attracted much attention as an algorithm that has the potential to efficiently solve combinatorial optimization problems. Among them, a fermionic QAOA (FQAOA) for solving constrained optimization problems has been developed [Yoshioka, Sasada, Nakano, and Fujii, Phys. Rev. Research vol. 5, 023071, 2023]. In this algorithm, the constraints are es… ▽ More Quantum approximate optimization algorithm (QAOA) has attracted much attention as an algorithm that has the potential to efficiently solve combinatorial optimization problems. Among them, a fermionic QAOA (FQAOA) for solving constrained optimization problems has been developed [Yoshioka, Sasada, Nakano, and Fujii, Phys. Rev. Research vol. 5, 023071, 2023]. In this algorithm, the constraints are essentially imposed as fermion number conservation at arbitrary approximation level. We take the portfolio optimization problem as an application example and propose a new driver Hamiltonian on an one-dimensional cyclic lattice. Our FQAOA with the new driver Hamiltonian reduce the number of gate operations in quantum circuits. Experiments on a trapped-ion quantum computer using 16 qubits on Amazon Braket demonstrates that the proposed driver Hamiltonian effectively suppresses noise effects compared to the previous FQAOA. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: published in 2023 IEEE International Conference on Quantum Computing and Engineering (QCE)

Journal ref: 2023 IEEE International Conference on Quantum Computing and Engineering (QCE), Bellevue, WA, USA, 2023, pp. 300-306

arXiv:2311.00320 [pdf, other]

doi 10.1145/3586183.3606779

Semantic Hearing: Programming Acoustic Scenes with Binaural Hearables

Authors: Bandhav Veluri, Malek Itani, Justin Chan, Takuya Yoshioka, Shyamnath Gollakota

Abstract: Imagine being able to listen to the birds chirping in a park without hearing the chatter from other hikers, or being able to block out traffic noise on a busy street while still being able to hear emergency sirens and car honks. We introduce semantic hearing, a novel capability for hearable devices that enables them to, in real-time, focus on, or ignore, specific sounds from real-world environment… ▽ More Imagine being able to listen to the birds chirping in a park without hearing the chatter from other hikers, or being able to block out traffic noise on a busy street while still being able to hear emergency sirens and car honks. We introduce semantic hearing, a novel capability for hearable devices that enables them to, in real-time, focus on, or ignore, specific sounds from real-world environments, while also preserving the spatial cues. To achieve this, we make two technical contributions: 1) we present the first neural network that can achieve binaural target sound extraction in the presence of interfering sounds and background noise, and 2) we design a training methodology that allows our system to generalize to real-world use. Results show that our system can operate with 20 sound classes and that our transformer-based network has a runtime of 6.56 ms on a connected smartphone. In-the-wild evaluation with participants in previously unseen indoor and outdoor scenarios shows that our proof-of-concept system can extract the target sounds and generalize to preserve the spatial cues in its binaural output. Project page with code: https://semantichearing.cs.washington.edu △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2309.12521 [pdf, other]

Profile-Error-Tolerant Target-Speaker Voice Activity Detection

Authors: Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Midia Yousefi, Takuya Yoshioka, Jian Wu

Abstract: Target-Speaker Voice Activity Detection (TS-VAD) utilizes a set of speaker profiles alongside an input audio signal to perform speaker diarization. While its superiority over conventional methods has been demonstrated, the method can suffer from errors in speaker profiles, as those profiles are typically obtained by running a traditional clustering-based diarization method over the input signal. T… ▽ More Target-Speaker Voice Activity Detection (TS-VAD) utilizes a set of speaker profiles alongside an input audio signal to perform speaker diarization. While its superiority over conventional methods has been demonstrated, the method can suffer from errors in speaker profiles, as those profiles are typically obtained by running a traditional clustering-based diarization method over the input signal. This paper proposes an extension to TS-VAD, called Profile-Error-Tolerant TS-VAD (PET-TSVAD), which is robust to such speaker profile errors. This is achieved by employing transformer-based TS-VAD that can handle a variable number of speakers and further introducing a set of additional pseudo-speaker profiles to handle speakers undetected during the first pass diarization. During training, we use speaker profiles estimated by multiple different clustering algorithms to reduce the mismatch between the training and testing conditions regarding speaker profiles. Experimental results show that PET-TSVAD consistently outperforms the existing TS-VAD method on both the VoxConverse and DIHARD-I datasets. △ Less

Submitted 3 April, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: Submission for ICASSP 2024

arXiv:2309.08905 [pdf, ps, other]

Spin dependence in the $p$-wave resonance of ${^{139}\vec{\rm{La}}+\vec{n}}$

Authors: T. Okudaira, R. Nakabe, S. Endo, H. Fujioka, V. Gudkov, I. Ide, T. Ino, M. Ishikado, W. Kambara, S. Kawamura, R. Kobayashi, M. Kitaguchi, T. Okamura, T. Oku, J. G. Otero Munoz, J. D. Parker, K. Sakai, T. Shima, H. M. Shimizu, T. Shinohara, W. M. Snow, S. Takada, Y. Tsuchikawa, R. Takahashi, S. Takahashi , et al. (2 additional authors not shown)

Abstract: We measured the spin dependence in a neutron-induced $p$-wave resonance by using a polarized epithermal neutron beam and a polarized nuclear target. Our study focuses on the 0.75~eV $p$-wave resonance state of $^{139}$La+$n$, where largely enhanced parity violation has been observed. We determined the partial neutron width of the $p$-wave resonance by measuring the spin dependence of the neutron a… ▽ More We measured the spin dependence in a neutron-induced $p$-wave resonance by using a polarized epithermal neutron beam and a polarized nuclear target. Our study focuses on the 0.75~eV $p$-wave resonance state of $^{139}$La+$n$, where largely enhanced parity violation has been observed. We determined the partial neutron width of the $p$-wave resonance by measuring the spin dependence of the neutron absorption cross section between polarized $^{139}\rm{La}$ and polarized neutrons. Our findings serve as a foundation for the quantitative study of the enhancement effect of the discrete symmetry violations caused by mixing between partial amplitudes in the compound nuclei. △ Less

Submitted 16 September, 2023; originally announced September 2023.

arXiv:2309.08131 [pdf, other]

t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability

Authors: Jian Wu, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao, Zhuo Chen, Jinyu Li

Abstract: Token-level serialized output training (t-SOT) was recently proposed to address the challenge of streaming multi-talker automatic speech recognition (ASR). T-SOT effectively handles overlapped speech by representing multi-talker transcriptions as a single token stream with $\langle \text{cc}\rangle$ symbols interspersed. However, the use of a naive neural transducer architecture significantly cons… ▽ More Token-level serialized output training (t-SOT) was recently proposed to address the challenge of streaming multi-talker automatic speech recognition (ASR). T-SOT effectively handles overlapped speech by representing multi-talker transcriptions as a single token stream with $\langle \text{cc}\rangle$ symbols interspersed. However, the use of a naive neural transducer architecture significantly constrained its applicability for text-only adaptation. To overcome this limitation, we propose a novel t-SOT model structure that incorporates the idea of factorized neural transducers (FNT). The proposed method separates a language model (LM) from the transducer's predictor and handles the unnatural token order resulting from the use of $\langle \text{cc}\rangle$ symbols in t-SOT. We achieve this by maintaining multiple hidden states and introducing special handling of the $\langle \text{cc}\rangle$ tokens within the LM. The proposed t-SOT FNT model achieves comparable performance to the original t-SOT model while retaining the ability to reduce word error rate (WER) on both single and multi-talker datasets through text-only adaptation. △ Less

Submitted 14 September, 2023; originally announced September 2023.

Comments: 5 pages, 2 figures, submitted to ICASSP2024

arXiv:2309.08007 [pdf, ps, other]

DiariST: Streaming Speech Translation with Speaker Diarization

Authors: Mu Yang, Naoyuki Kanda, Xiaofei Wang, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li, Takuya Yoshioka

Abstract: End-to-end speech translation (ST) for conversation recordings involves several under-explored challenges such as speaker diarization (SD) without accurate word time stamps and handling of overlapping speech in a streaming fashion. In this work, we propose DiariST, the first streaming ST and SD solution. It is built upon a neural transducer-based streaming ST system and integrates token-level seri… ▽ More End-to-end speech translation (ST) for conversation recordings involves several under-explored challenges such as speaker diarization (SD) without accurate word time stamps and handling of overlapping speech in a streaming fashion. In this work, we propose DiariST, the first streaming ST and SD solution. It is built upon a neural transducer-based streaming ST system and integrates token-level serialized output training and t-vector, which were originally developed for multi-talker speech recognition. Due to the absence of evaluation benchmarks in this area, we develop a new evaluation dataset, DiariST-AliMeeting, by translating the reference Chinese transcriptions of the AliMeeting corpus into English. We also propose new metrics, called speaker-agnostic BLEU and speaker-attributed BLEU, to measure the ST quality while taking SD accuracy into account. Our system achieves a strong ST and SD capability compared to offline systems based on Whisper, while performing streaming inference for overlapping speech. To facilitate the research in this new direction, we release the evaluation data, the offline baseline systems, and the evaluation code. △ Less

Submitted 22 January, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

Comments: Accepted to ICASSP 2024

arXiv:2308.13666 [pdf, other]

A Joint Fermi-GBM and Swift-BAT Analysis of Gravitational-Wave Candidates from the Third Gravitational-wave Observing Run

Authors: C. Fletcher, J. Wood, R. Hamburg, P. Veres, C. M. Hui, E. Bissaldi, M. S. Briggs, E. Burns, W. H. Cleveland, M. M. Giles, A. Goldstein, B. A. Hristov, D. Kocevski, S. Lesage, B. Mailyan, C. Malacaria, S. Poolakkil, A. von Kienlin, C. A. Wilson-Hodge, The Fermi Gamma-ray Burst Monitor Team, M. Crnogorčević, J. DeLaunay, A. Tohuvavohu, R. Caputo, S. B. Cenko , et al. (1674 additional authors not shown)

Abstract: We present Fermi Gamma-ray Burst Monitor (Fermi-GBM) and Swift Burst Alert Telescope (Swift-BAT) searches for gamma-ray/X-ray counterparts to gravitational wave (GW) candidate events identified during the third observing run of the Advanced LIGO and Advanced Virgo detectors. Using Fermi-GBM on-board triggers and sub-threshold gamma-ray burst (GRB) candidates found in the Fermi-GBM ground analyses,… ▽ More We present Fermi Gamma-ray Burst Monitor (Fermi-GBM) and Swift Burst Alert Telescope (Swift-BAT) searches for gamma-ray/X-ray counterparts to gravitational wave (GW) candidate events identified during the third observing run of the Advanced LIGO and Advanced Virgo detectors. Using Fermi-GBM on-board triggers and sub-threshold gamma-ray burst (GRB) candidates found in the Fermi-GBM ground analyses, the Targeted Search and the Untargeted Search, we investigate whether there are any coincident GRBs associated with the GWs. We also search the Swift-BAT rate data around the GW times to determine whether a GRB counterpart is present. No counterparts are found. Using both the Fermi-GBM Targeted Search and the Swift-BAT search, we calculate flux upper limits and present joint upper limits on the gamma-ray luminosity of each GW. Given these limits, we constrain theoretical models for the emission of gamma-rays from binary black hole mergers. △ Less

Submitted 25 August, 2023; originally announced August 2023.

arXiv:2308.06873 [pdf, other]

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

Authors: Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka

Abstract: Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However, existing models still face limitations in handling diverse audio-text speech generation tasks involving transforming input speech and processing audio captured in adverse acoustic conditions. This paper introduces SpeechX, a versatile… ▽ More Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However, existing models still face limitations in handling diverse audio-text speech generation tasks involving transforming input speech and processing audio captured in adverse acoustic conditions. This paper introduces SpeechX, a versatile speech generation model capable of zero-shot TTS and various speech transformation tasks, dealing with both clean and noisy signals. SpeechX combines neural codec language modeling with multi-task learning using task-dependent prompting, enabling unified and extensible modeling and providing a consistent way for leveraging textual input in speech enhancement and transformation tasks. Experimental results show SpeechX's efficacy in various tasks, including zero-shot TTS, noise suppression, target speaker extraction, speech removal, and speech editing with or without background noise, achieving comparable or superior performance to specialized models across tasks. See https://aka.ms/speechx for demo samples. △ Less

Submitted 25 June, 2024; v1 submitted 13 August, 2023; originally announced August 2023.

Comments: To appear in TASLP. See https://aka.ms/speechx for demo samples

arXiv:2307.02531 [pdf, other]

Subaru High-$z$ Exploration of Low-Luminosity Quasars (SHELLQs). XVIII. The Dark Matter Halo Mass of Quasars at $z\sim6$

Authors: Junya Arita, Nobunari Kashikawa, Yoshiki Matsuoka, Wanqiu He, Kei Ito, Yongming Liang, Rikako Ishimoto, Takehiro Yoshioka, Yoshihiro Takeda, Kazushi Iwasawa, Masafusa Onoue, Yoshiki Toba, Masatoshi Imanishi

Abstract: We present, for the first time, dark matter halo (DMH) mass measurement of quasars at $z\sim6$ based on a clustering analysis of 107 quasars. Spectroscopically identified quasars are homogeneously extracted from the HSC-SSP wide layer over $891\,\mathrm{deg^2}$. We evaluate the clustering strength by three different auto-correlation functions: projected correlation function, angular correlation fu… ▽ More We present, for the first time, dark matter halo (DMH) mass measurement of quasars at $z\sim6$ based on a clustering analysis of 107 quasars. Spectroscopically identified quasars are homogeneously extracted from the HSC-SSP wide layer over $891\,\mathrm{deg^2}$. We evaluate the clustering strength by three different auto-correlation functions: projected correlation function, angular correlation function, and redshift-space correlation function. The DMH mass of quasars at $z\sim6$ is evaluated as $5.0_{-4.0}^{+7.4}\times10^{12}\,h^{-1}M_\odot$ with the bias parameter $b=20.8\pm8.7$ by the projected correlation function. The other two estimators agree with these values, though each uncertainty is large. The DMH mass of quasars is found to be nearly constant $\sim10^{12.5}\,h^{-1}M_\odot$ throughout cosmic time, suggesting that there is a characteristic DMH mass where quasars are always activated. As a result, quasars appear in the most massive halos at $z \sim 6$, but in less extreme halos thereafter. The DMH mass does not appear to exceed the upper limit of $10^{13}\,h^{-1}M_\odot$, which suggests that most quasars reside in DMHs with $M_\mathrm{halo}<10^{13}\,h^{-1}M_\odot$ across most of the cosmic time. Our results supporting a significant increasing bias with redshift are consistent with the bias evolution model with inefficient AGN feedback at $z\sim6$. The duty cycle ($f_\mathrm{duty}$) is estimated as $0.019\pm0.008$ by assuming that DMHs in some mass interval can host a quasar. The average stellar mass is evaluated from stellar-to-halo mass ratio as $M_*=6.5_{-5.2}^{+9.6}\times10^{10}\,h^{-1}M_\odot$, which is found to be consistent with [C II] observational results. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: 22 pages, 8 figures, accepted for publication in ApJ

arXiv:2306.10212 [pdf, other]

Active Initialization Experiment of Superconducting Qubit Using Quantum-circuit Refrigerator

Authors: Teruaki Yoshioka, Hiroto Mukai, Akiyoshi Tomonaga, Shintaro Takada, Yuma Okazaki, Nobu-Hisa Kaneko, Shuji Nakamura, Jaw-Shen Tsai

Abstract: The initialization of superconducting qubits is one of the essential techniques for the realization of quantum computation. In previous research, initialization above 99\% fidelity has been achieved at 280 ns. Here, we demonstrate the rapid initialization of a superconducting qubit with a quantum-circuit refrigerator (QCR). Photon-assisted tunneling of quasiparticles in the QCR can temporally incr… ▽ More The initialization of superconducting qubits is one of the essential techniques for the realization of quantum computation. In previous research, initialization above 99\% fidelity has been achieved at 280 ns. Here, we demonstrate the rapid initialization of a superconducting qubit with a quantum-circuit refrigerator (QCR). Photon-assisted tunneling of quasiparticles in the QCR can temporally increase the relaxation time of photons inside the resonator and helps release energy from the qubit to the environment. Experiments using this protocol have shown that 99\% of initialization time is reduced to 180 ns. This initialization time depends strongly on the relaxation rate of the resonator, and faster initialization is possible by reducing the resistance of the QCR, which limits the ON/OFF ratio, and by strengthening the coupling between the QCR and the resonator. △ Less

Submitted 16 June, 2023; originally announced June 2023.

arXiv:2305.18747 [pdf, other]

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers

Authors: Chenda Li, Yao Qian, Zhuo Chen, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng

Abstract: State-of-the-art large-scale universal speech models (USMs) show a decent automatic speech recognition (ASR) performance across multiple domains and languages. However, it remains a challenge for these models to recognize overlapped speech, which is often seen in meeting conversations. We propose an approach to adapt USMs for multi-talker ASR. We first develop an enhanced version of serialized out… ▽ More State-of-the-art large-scale universal speech models (USMs) show a decent automatic speech recognition (ASR) performance across multiple domains and languages. However, it remains a challenge for these models to recognize overlapped speech, which is often seen in meeting conversations. We propose an approach to adapt USMs for multi-talker ASR. We first develop an enhanced version of serialized output training to jointly perform multi-talker ASR and utterance timestamp prediction. That is, we predict the ASR hypotheses for all speakers, count the speakers, and estimate the utterance timestamps at the same time. We further introduce a lightweight adapter module to maintain the multilingual property of the USMs even when we perform the adaptation with only a single language. Experimental results obtained using the AMI and AliMeeting corpora show that our proposed approach effectively transfers the USMs to a strong multilingual multi-talker ASR model with timestamp prediction capability. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: Accepted by Interspeech 2023

arXiv:2305.13738 [pdf, other]

i-Code Studio: A Configurable and Composable Framework for Integrative AI

Authors: Yuwei Fang, Mahmoud Khademi, Chenguang Zhu, Ziyi Yang, Reid Pryzant, Yichong Xu, Yao Qian, Takuya Yoshioka, Lu Yuan, Michael Zeng, Xuedong Huang

Abstract: Artificial General Intelligence (AGI) requires comprehensive understanding and generation capabilities for a variety of tasks spanning different modalities and functionalities. Integrative AI is one important direction to approach AGI, through combining multiple models to tackle complex multimodal tasks. However, there is a lack of a flexible and composable platform to facilitate efficient and eff… ▽ More Artificial General Intelligence (AGI) requires comprehensive understanding and generation capabilities for a variety of tasks spanning different modalities and functionalities. Integrative AI is one important direction to approach AGI, through combining multiple models to tackle complex multimodal tasks. However, there is a lack of a flexible and composable platform to facilitate efficient and effective model composition and coordination. In this paper, we propose the i-Code Studio, a configurable and composable framework for Integrative AI. The i-Code Studio orchestrates multiple pre-trained models in a finetuning-free fashion to conduct complex multimodal tasks. Instead of simple model composition, the i-Code Studio provides an integrative, flexible, and composable setting for developers to quickly and easily compose cutting-edge services and technologies tailored to their specific requirements. The i-Code Studio achieves impressive results on a variety of zero-shot multimodal tasks, such as video-to-text retrieval, speech-to-speech translation, and visual question answering. We also demonstrate how to quickly build a multimodal agent based on the i-Code Studio that can communicate and personalize for users. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.12311 [pdf, other]

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data

Authors: Ziyi Yang, Mahmoud Khademi, Yichong Xu, Reid Pryzant, Yuwei Fang, Chenguang Zhu, Dongdong Chen, Yao Qian, Mei Gao, Yi-Ling Chen, Robert Gmyr, Naoyuki Kanda, Noel Codella, Bin Xiao, Yu Shi, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang

Abstract: The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities. We propose closing this gap with i-Code V2, the first model capable of generating natural language from any combination of Vision, Language, and Speech data. i-Code V2 is a… ▽ More The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities. We propose closing this gap with i-Code V2, the first model capable of generating natural language from any combination of Vision, Language, and Speech data. i-Code V2 is an integrative system that leverages state-of-the-art single-modality encoders, combining their outputs with a new modality-fusing encoder in order to flexibly project combinations of modalities into a shared representational space. Next, language tokens are generated from these representations via an autoregressive decoder. The whole framework is pretrained end-to-end on a large collection of dual- and single-modality datasets using a novel text completion objective that can be generalized across arbitrary combinations of modalities. i-Code V2 matches or outperforms state-of-the-art single- and dual-modality baselines on 7 multimodal tasks, demonstrating the power of generative multimodal pretraining across a diversity of tasks and signals. △ Less

Submitted 20 May, 2023; originally announced May 2023.

arXiv:2304.08393 [pdf, other]

Search for gravitational-lensing signatures in the full third observing run of the LIGO-Virgo network

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, C. Alléné, A. Allocca, P. A. Altin , et al. (1670 additional authors not shown)

Abstract: Gravitational lensing by massive objects along the line of sight to the source causes distortions of gravitational wave-signals; such distortions may reveal information about fundamental physics, cosmology and astrophysics. In this work, we have extended the search for lensing signatures to all binary black hole events from the third observing run of the LIGO--Virgo network. We search for repeated… ▽ More Gravitational lensing by massive objects along the line of sight to the source causes distortions of gravitational wave-signals; such distortions may reveal information about fundamental physics, cosmology and astrophysics. In this work, we have extended the search for lensing signatures to all binary black hole events from the third observing run of the LIGO--Virgo network. We search for repeated signals from strong lensing by 1) performing targeted searches for subthreshold signals, 2) calculating the degree of overlap amongst the intrinsic parameters and sky location of pairs of signals, 3) comparing the similarities of the spectrograms amongst pairs of signals, and 4) performing dual-signal Bayesian analysis that takes into account selection effects and astrophysical knowledge. We also search for distortions to the gravitational waveform caused by 1) frequency-independent phase shifts in strongly lensed images, and 2) frequency-dependent modulation of the amplitude and phase due to point masses. None of these searches yields significant evidence for lensing. Finally, we use the non-detection of gravitational-wave lensing to constrain the lensing rate based on the latest merger-rate estimates and the fraction of dark matter composed of compact objects. △ Less

Submitted 17 April, 2023; originally announced April 2023.

Comments: 28 pages, 11 figures

Report number: LIGO-P2200031

arXiv:2303.08372 [pdf, other]

Target Sound Extraction with Variable Cross-modality Clues

Authors: Chenda Li, Yao Qian, Zhuo Chen, Dongmei Wang, Takuya Yoshioka, Shujie Liu, Yanmin Qian, Michael Zeng

Abstract: Automatic target sound extraction (TSE) is a machine learning approach to mimic the human auditory perception capability of attending to a sound source of interest from a mixture of sources. It often uses a model conditioned on a fixed form of target sound clues, such as a sound class label, which limits the ways in which users can interact with the model to specify the target sounds. To leverage… ▽ More Automatic target sound extraction (TSE) is a machine learning approach to mimic the human auditory perception capability of attending to a sound source of interest from a mixture of sources. It often uses a model conditioned on a fixed form of target sound clues, such as a sound class label, which limits the ways in which users can interact with the model to specify the target sounds. To leverage variable number of clues cross modalities available in the inference phase, including a video, a sound event class, and a text caption, we propose a unified transformer-based TSE model architecture, where a multi-clue attention module integrates all the clues across the modalities. Since there is no off-the-shelf benchmark to evaluate our proposed approach, we build a dataset based on public corpora, Audioset and AudioCaps. Experimental results for seen and unseen target-sound evaluation sets show that our proposed TSE model can effectively deal with a varying number of clues which improves the TSE performance and robustness against partially compromised clues. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: Accepted by ICASSP 2023

arXiv:2302.12369 [pdf, other]

Factual Consistency Oriented Speech Recognition

Authors: Naoyuki Kanda, Takuya Yoshioka, Yang Liu

Abstract: This paper presents a novel optimization framework for automatic speech recognition (ASR) with the aim of reducing hallucinations produced by an ASR model. The proposed framework optimizes the ASR model to maximize an expected factual consistency score between ASR hypotheses and ground-truth transcriptions, where the factual consistency score is computed by a separately trained estimator. Experime… ▽ More This paper presents a novel optimization framework for automatic speech recognition (ASR) with the aim of reducing hallucinations produced by an ASR model. The proposed framework optimizes the ASR model to maximize an expected factual consistency score between ASR hypotheses and ground-truth transcriptions, where the factual consistency score is computed by a separately trained estimator. Experimental results using the AMI meeting corpus and the VoxPopuli corpus show that the ASR model trained with the proposed framework generates ASR hypotheses that have significantly higher consistency scores with ground-truth transcriptions while maintaining the word error rates close to those of cross entropy-trained ASR models. Furthermore, it is shown that training the ASR models with the proposed framework improves the speech summarization quality as measured by the factual consistency of meeting conversation summaries generated by a large language model. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: 5 pages, 1 figure, 3 tables

arXiv:2301.10756 [pdf, ps, other]

doi 10.1103/PhysRevResearch.5.023071

Fermionic Quantum Approximate Optimization Algorithm

Authors: Takuya Yoshioka, Keita Sasada, Yuichiro Nakano, Keisuke Fujii

Abstract: Quantum computers are expected to accelerate solving combinatorial optimization problems, including algorithms such as Grover adaptive search and quantum approximate optimization algorithm (QAOA). However, many combinatorial optimization problems involve constraints which, when imposed as soft constraints in the cost function, can negatively impact the performance of the optimization algorithm. In… ▽ More Quantum computers are expected to accelerate solving combinatorial optimization problems, including algorithms such as Grover adaptive search and quantum approximate optimization algorithm (QAOA). However, many combinatorial optimization problems involve constraints which, when imposed as soft constraints in the cost function, can negatively impact the performance of the optimization algorithm. In this paper, we propose fermionic quantum approximate optimization algorithm (FQAOA) for solving combinatorial optimization problems with constraints. Specifically FQAOA tackle the constrains issue by using fermion particle number preservation to intrinsically impose them throughout QAOA. We provide a systematic guideline for designing the driver Hamiltonian for a given problem Hamiltonian with constraints. The initial state can be chosen to be a superposition of states satisfying the constraint and the ground state of the driver Hamiltonian. This property is important since FQAOA reduced to quantum adiabatic computation in the large limit of circuit depth p and improved performance, even for shallow circuits with optimizing the parameters starting from the fixed-angle determined by Trotterized quantum adiabatic evolution. We perform an extensive numerical simulation and demonstrates that proposed FQAOA provides substantial performance advantage against existing approaches in portfolio optimization problems. Furthermore, the Hamiltonian design guideline is useful not only for QAOA, but also Grover adaptive search and quantum phase estimation to solve combinatorial optimization problems with constraints. Since software tools for fermionic systems have been developed in quantum computational chemistry both for noisy intermediate-scale quantum computers and fault-tolerant quantum computers, FQAOA allows us to apply these tools for constrained combinatorial optimization problems. △ Less

Submitted 30 April, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

Comments: Accepted for publication in Physical Review Research on March 29, 2023. 16 pages, 8 figures

Journal ref: Physical Review Research 5, 023071 (2023)

arXiv:2212.10889 [pdf, ps, other]

doi 10.1103/PhysRevC.107.054602

Angular distribution of $γ$-rays from a neutron-induced $p$-wave resonance of $^{132}$Xe

Authors: T. Okudaira, Y. Tani, S. Endo, J. Doskow, H. Fujioka, K. Hirota, K. Kameda, A. Kimura, M. Kitaguchi, M. Luxnat, K. Sakai, D. Schaper, T. Shima, H. M. Shimizu, W. M. Snow, S. Takada, T. Yamamoto, H. Yoshikawa, T. Yoshioka

Abstract: A neutron-energy dependent angular distribution was measured for individual $γ$-rays from the 3.2 eV $p$-wave resonance of $^{131}$Xe+$n$, that shows enhanced parity violation owing to a mixing between $s$- and $p$-wave amplitudes. The $γ$-ray transitions from the $p$-wave resonance were identified, and the angular distribution with respect to the neutron momentum was evaluated as a function of th… ▽ More A neutron-energy dependent angular distribution was measured for individual $γ$-rays from the 3.2 eV $p$-wave resonance of $^{131}$Xe+$n$, that shows enhanced parity violation owing to a mixing between $s$- and $p$-wave amplitudes. The $γ$-ray transitions from the $p$-wave resonance were identified, and the angular distribution with respect to the neutron momentum was evaluated as a function of the neutron energy for 7132 keV $γ$-rays, which correspond to a transition to the 1807 keV excited state of $^{132}$Xe. The angular distribution is considered to originate from the interference between $s$- and $p$-wave amplitudes, and will provide a basis for a quantitative understanding of the enhancement mechanism of the fundamental parity violation in compound nuclei. △ Less

Submitted 21 December, 2022; originally announced December 2022.

arXiv:2212.01477 [pdf, other]

doi 10.1093/mnras/stad3120

Search for subsolar-mass black hole binaries in the second part of Advanced LIGO's and Advanced Virgo's third observing run

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, C. Alléné, A. Allocca, P. A. Altin , et al. (1680 additional authors not shown)

Abstract: We describe a search for gravitational waves from compact binaries with at least one component with mass 0.2 $M_\odot$ -- $1.0 M_\odot$ and mass ratio $q \geq 0.1$ in Advanced LIGO and Advanced Virgo data collected between 1 November 2019, 15:00 UTC and 27 March 2020, 17:00 UTC. No signals were detected. The most significant candidate has a false alarm rate of 0.2 $\mathrm{yr}^{-1}$. We estimate t… ▽ More We describe a search for gravitational waves from compact binaries with at least one component with mass 0.2 $M_\odot$ -- $1.0 M_\odot$ and mass ratio $q \geq 0.1$ in Advanced LIGO and Advanced Virgo data collected between 1 November 2019, 15:00 UTC and 27 March 2020, 17:00 UTC. No signals were detected. The most significant candidate has a false alarm rate of 0.2 $\mathrm{yr}^{-1}$. We estimate the sensitivity of our search over the entirety of Advanced LIGO's and Advanced Virgo's third observing run, and present the most stringent limits to date on the merger rate of binary black holes with at least one subsolar-mass component. We use the upper limits to constrain two fiducial scenarios that could produce subsolar-mass black holes: primordial black holes (PBH) and a model of dissipative dark matter. The PBH model uses recent prescriptions for the merger rate of PBH binaries that include a rate suppression factor to effectively account for PBH early binary disruptions. If the PBHs are monochromatically distributed, we can exclude a dark matter fraction in PBHs $f_\mathrm{PBH} \gtrsim 0.6$ (at 90% confidence) in the probed subsolar-mass range. However, if we allow for broad PBH mass distributions we are unable to rule out $f_\mathrm{PBH} = 1$. For the dissipative model, where the dark matter has chemistry that allows a small fraction to cool and collapse into black holes, we find an upper bound $f_{\mathrm{DBH}} < 10^{-5}$ on the fraction of atomic dark matter collapsed into black holes. △ Less

Submitted 26 January, 2024; v1 submitted 2 December, 2022; originally announced December 2022.

Comments: https://dcc.ligo.org/P2200139

arXiv:2211.09988 [pdf, ps, other]

Exploring WavLM on Speech Enhancement

Authors: Hyungchan Song, Sanyuan Chen, Zhuo Chen, Yu Wu, Takuya Yoshioka, Min Tang, Jong Won Shin, Shujie Liu

Abstract: There is a surge in interest in self-supervised learning approaches for end-to-end speech encoding in recent years as they have achieved great success. Especially, WavLM showed state-of-the-art performance on various speech processing tasks. To better understand the efficacy of self-supervised learning models for speech enhancement, in this work, we design and conduct a series of experiments with… ▽ More There is a surge in interest in self-supervised learning approaches for end-to-end speech encoding in recent years as they have achieved great success. Especially, WavLM showed state-of-the-art performance on various speech processing tasks. To better understand the efficacy of self-supervised learning models for speech enhancement, in this work, we design and conduct a series of experiments with three resource conditions by combining WavLM and two high-quality speech enhancement systems. Also, we propose a regression-based WavLM training objective and a noise-mixing data configuration to further boost the downstream enhancement performance. The experiments on the DNS challenge dataset and a simulation dataset show that the WavLM benefits the speech enhancement task in terms of both speech quality and speech recognition accuracy, especially for low fine-tuning resources. For the high fine-tuning resource condition, only the word error rate is substantially improved. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: Accepted by IEEE SLT 2022

arXiv:2211.06493 [pdf, other]

Handling Trade-Offs in Speech Separation with Sparsely-Gated Mixture of Experts

Authors: Xiaofei Wang, Zhuo Chen, Yu Shi, Jian Wu, Naoyuki Kanda, Takuya Yoshioka

Abstract: Employing a monaural speech separation (SS) model as a front-end for automatic speech recognition (ASR) involves balancing two kinds of trade-offs. First, while a larger model improves the SS performance, it also requires a higher computational cost. Second, an SS model that is more optimized for handling overlapped speech is likely to introduce more processing artifacts in non-overlapped-speech r… ▽ More Employing a monaural speech separation (SS) model as a front-end for automatic speech recognition (ASR) involves balancing two kinds of trade-offs. First, while a larger model improves the SS performance, it also requires a higher computational cost. Second, an SS model that is more optimized for handling overlapped speech is likely to introduce more processing artifacts in non-overlapped-speech regions. In this paper, we address these trade-offs with a sparsely-gated mixture-of-experts (MoE) architecture. Comprehensive evaluation results obtained using both simulated and real meeting recordings show that our proposed sparsely-gated MoE SS model achieves superior separation capabilities with less speech distortion, while involving only a marginal run-time cost increase. △ Less

Submitted 30 May, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

arXiv:2211.05564 [pdf, other]

Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition

Authors: Zili Huang, Zhuo Chen, Naoyuki Kanda, Jian Wu, Yiming Wang, Jinyu Li, Takuya Yoshioka, Xiaofei Wang, Peidong Wang

Abstract: Self-supervised learning (SSL), which utilizes the input data itself for representation learning, has achieved state-of-the-art results for various downstream speech tasks. However, most of the previous studies focused on offline single-talker applications, with limited investigations in multi-talker cases, especially for streaming scenarios. In this paper, we investigate SSL for streaming multi-t… ▽ More Self-supervised learning (SSL), which utilizes the input data itself for representation learning, has achieved state-of-the-art results for various downstream speech tasks. However, most of the previous studies focused on offline single-talker applications, with limited investigations in multi-talker cases, especially for streaming scenarios. In this paper, we investigate SSL for streaming multi-talker speech recognition, which generates transcriptions of overlapping speakers in a streaming fashion. We first observe that conventional SSL techniques do not work well on this task due to the poor representation of overlapping speech. We then propose a novel SSL training objective, referred to as bi-label masked speech prediction, which explicitly preserves representations of all speakers in overlapping speech. We investigate various aspects of the proposed system including data configuration and quantizer selection. The proposed SSL setup achieves substantially better word error rates on the LibriSpeechMix dataset. △ Less

Submitted 10 November, 2022; originally announced November 2022.

Comments: submitted to ICASSP 2023

arXiv:2211.05172 [pdf, other]

Speech separation with large-scale self-supervised learning

Authors: Zhuo Chen, Naoyuki Kanda, Jian Wu, Yu Wu, Xiaofei Wang, Takuya Yoshioka, Jinyu Li, Sunit Sivasankaran, Sefik Emre Eskimez

Abstract: Self-supervised learning (SSL) methods such as WavLM have shown promising speech separation (SS) results in small-scale simulation-based experiments. In this work, we extend the exploration of the SSL-based SS by massively scaling up both the pre-training data (more than 300K hours) and fine-tuning data (10K hours). We also investigate various techniques to efficiently integrate the pre-trained mo… ▽ More Self-supervised learning (SSL) methods such as WavLM have shown promising speech separation (SS) results in small-scale simulation-based experiments. In this work, we extend the exploration of the SSL-based SS by massively scaling up both the pre-training data (more than 300K hours) and fine-tuning data (10K hours). We also investigate various techniques to efficiently integrate the pre-trained model with the SS network under a limited computation budget, including a low frame rate SSL model training setup and a fine-tuning scheme using only the part of the pre-trained model. Compared with a supervised baseline and the WavLM-based SS model using feature embeddings obtained with the previously released 94K hours trained WavLM, our proposed model obtains 15.9% and 11.2% of relative word error rate (WER) reductions, respectively, for a simulated far-field speech mixture test set. For conversation transcription on real meeting recordings using continuous speech separation, the proposed model achieves 6.8% and 10.6% of relative WER reductions over the purely supervised baseline on AMI and ICSI evaluation sets, respectively, while reducing the computational cost by 38%. △ Less

Submitted 25 November, 2022; v1 submitted 9 November, 2022; originally announced November 2022.

arXiv:2211.02944 [pdf, other]

Breaking the trade-off in personalized speech enhancement with cross-task knowledge distillation

Authors: Hassan Taherian, Sefik Emre Eskimez, Takuya Yoshioka

Abstract: Personalized speech enhancement (PSE) models achieve promising results compared with unconditional speech enhancement models due to their ability to remove interfering speech in addition to background noise. Unlike unconditional speech enhancement, causal PSE models may occasionally remove the target speech by mistake. The PSE models also tend to leak interfering speech when the target speaker is… ▽ More Personalized speech enhancement (PSE) models achieve promising results compared with unconditional speech enhancement models due to their ability to remove interfering speech in addition to background noise. Unlike unconditional speech enhancement, causal PSE models may occasionally remove the target speech by mistake. The PSE models also tend to leak interfering speech when the target speaker is silent for an extended period. We show that existing PSE methods suffer from a trade-off between speech over-suppression and interference leakage by addressing one problem at the expense of the other. We propose a new PSE model training framework using cross-task knowledge distillation to mitigate this trade-off. Specifically, we utilize a personalized voice activity detector (pVAD) during training to exclude the non-target speech frames that are wrongly identified as containing the target speaker with hard or soft classification. This prevents the PSE model from being too aggressive while still allowing the model to learn to suppress the input speech when it is likely to be spoken by interfering speakers. Comprehensive evaluation results are presented, covering various PSE usage scenarios. △ Less

Submitted 5 November, 2022; originally announced November 2022.

Comments: Submitted to ICASSP 2023

arXiv:2211.02773 [pdf, other]

Real-Time Joint Personalized Speech Enhancement and Acoustic Echo Cancellation

Authors: Sefik Emre Eskimez, Takuya Yoshioka, Alex Ju, Min Tang, Tanel Parnamaa, Huaming Wang

Abstract: Personalized speech enhancement (PSE) is a real-time SE approach utilizing a speaker embedding of a target person to remove background noise, reverberation, and interfering voices. To deploy a PSE model for full duplex communications, the model must be combined with acoustic echo cancellation (AEC), although such a combination has been less explored. This paper proposes a series of methods that ar… ▽ More Personalized speech enhancement (PSE) is a real-time SE approach utilizing a speaker embedding of a target person to remove background noise, reverberation, and interfering voices. To deploy a PSE model for full duplex communications, the model must be combined with acoustic echo cancellation (AEC), although such a combination has been less explored. This paper proposes a series of methods that are applicable to various model architectures to develop efficient causal models that can handle the tasks of PSE, AEC, and joint PSE-AEC. We present extensive evaluation results using both simulated data and real recordings, covering various acoustic conditions and evaluation metrics. The results show the effectiveness of the proposed methods for two different model architectures. Our best joint PSE-AEC model comes close to the expert models optimized for individual tasks of PSE and AEC in their respective scenarios and significantly outperforms the expert models for the combined PSE-AEC task. △ Less

Submitted 25 May, 2023; v1 submitted 4 November, 2022; originally announced November 2022.

Comments: Accepted to Interspeech 2023

arXiv:2211.02250 [pdf, other]

Real-Time Target Sound Extraction

Authors: Bandhav Veluri, Justin Chan, Malek Itani, Tuochao Chen, Takuya Yoshioka, Shyamnath Gollakota

Abstract: We present the first neural network model to achieve real-time and streaming target sound extraction. To accomplish this, we propose Waveformer, an encoder-decoder architecture with a stack of dilated causal convolution layers as the encoder, and a transformer decoder layer as the decoder. This hybrid architecture uses dilated causal convolutions for processing large receptive fields in a computat… ▽ More We present the first neural network model to achieve real-time and streaming target sound extraction. To accomplish this, we propose Waveformer, an encoder-decoder architecture with a stack of dilated causal convolution layers as the encoder, and a transformer decoder layer as the decoder. This hybrid architecture uses dilated causal convolutions for processing large receptive fields in a computationally efficient manner while also leveraging the generalization performance of transformer-based architectures. Our evaluations show as much as 2.2-3.3 dB improvement in SI-SNRi compared to the prior models for this task while having a 1.2-4x smaller model size and a 1.5-2x lower runtime. We provide code, dataset, and audio samples: https://waveformer.cs.washington.edu/. △ Less

Submitted 19 April, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

Comments: ICASSP 2023 camera-ready

arXiv:2210.15807 [pdf, ps, other]

doi 10.1103/PhysRevC.106.064601

Measurement of the transverse asymmetry of $γ$-rays in the $^{117}$Sn(n,$γ$)$^{118}$Sn reaction

Authors: S. Endo, T. Okudaira, R. Abe, H. Fujioka, K. Hirota, A. Kimura, M. Kitaguchi, T. Oku, K. Sakai, T. Shima, H. M. Shimizu, S. Takada, S. Takahashi, T. Yamamoto, H. Yoshikawa, T. Yoshioka

Abstract: Largely enhanced parity-violating effects observed in compound resonances induced by epithermal neutrons are currently attributed to the mixing of parity-unfavored partial amplitudes in the entrance channel of the compound states. Furthermore, it is proposed that the same mechanism that enhances the parity-violation also enhances the breaking of time-reversal-invariance in the compound nucleus. Th… ▽ More Largely enhanced parity-violating effects observed in compound resonances induced by epithermal neutrons are currently attributed to the mixing of parity-unfavored partial amplitudes in the entrance channel of the compound states. Furthermore, it is proposed that the same mechanism that enhances the parity-violation also enhances the breaking of time-reversal-invariance in the compound nucleus. The entrance-channel mixing induces energy-dependent spin-angular correlations of individual $γ$-rays emitted from the compound nuclear state. For a detailed study of the mixing model, a $γ$-ray yield in the reaction of $^{117}$Sn(n,$γ$)$^{118}$Sn was measured using the pulsed beam of polarized epithermal neutrons and Ge detectors. An angular dependence of asymmetric $γ$-ray yields for the orientation of the neutron polarization was observed. △ Less

Submitted 27 October, 2022; originally announced October 2022.

Comments: 7 pages, 10 figures

arXiv:2210.15715 [pdf, ps, other]

Simulating realistic speech overlaps improves multi-talker ASR

Authors: Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, Takuya Yoshioka

Abstract: Multi-talker automatic speech recognition (ASR) has been studied to generate transcriptions of natural conversation including overlapping speech of multiple speakers. Due to the difficulty in acquiring real conversation data with high-quality human transcriptions, a naïve simulation of multi-talker speech by randomly mixing multiple utterances was conventionally used for model training. In this wo… ▽ More Multi-talker automatic speech recognition (ASR) has been studied to generate transcriptions of natural conversation including overlapping speech of multiple speakers. Due to the difficulty in acquiring real conversation data with high-quality human transcriptions, a naïve simulation of multi-talker speech by randomly mixing multiple utterances was conventionally used for model training. In this work, we propose an improved technique to simulate multi-talker overlapping speech with realistic speech overlaps, where an arbitrary pattern of speech overlaps is represented by a sequence of discrete tokens. With this representation, speech overlapping patterns can be learned from real conversations based on a statistical language model, such as N-gram, which can be then used to generate multi-talker speech for training. In our experiments, multi-talker ASR models trained with the proposed method show consistent improvement on the word error rates across multiple datasets. △ Less

Submitted 17 November, 2022; v1 submitted 27 October, 2022; originally announced October 2022.

Comments: v2: fix minor typo

arXiv:2210.10931 [pdf, other]

Search for gravitational-wave transients associated with magnetar bursts in Advanced LIGO and Advanced Virgo data from the third observing run

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, K. Agatsuma, N. Aggarwal, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, A. Allocca, P. A. Altin , et al. (1645 additional authors not shown)

Abstract: Gravitational waves are expected to be produced from neutron star oscillations associated with magnetar giant flares and short bursts. We present the results of a search for short-duration (milliseconds to seconds) and long-duration ($\sim$ 100 s) transient gravitational waves from 13 magnetar short bursts observed during Advanced LIGO, Advanced Virgo and KAGRA's third observation run. These 13 bu… ▽ More Gravitational waves are expected to be produced from neutron star oscillations associated with magnetar giant flares and short bursts. We present the results of a search for short-duration (milliseconds to seconds) and long-duration ($\sim$ 100 s) transient gravitational waves from 13 magnetar short bursts observed during Advanced LIGO, Advanced Virgo and KAGRA's third observation run. These 13 bursts come from two magnetars, SGR 1935$+$2154 and Swift J1818.0$-$1607. We also include three other electromagnetic burst events detected by Fermi GBM which were identified as likely coming from one or more magnetars, but they have no association with a known magnetar. No magnetar giant flares were detected during the analysis period. We find no evidence of gravitational waves associated with any of these 16 bursts. We place upper bounds on the root-sum-square of the integrated gravitational-wave strain that reach $2.2 \times 10^{-23}$ $/\sqrt{\text{Hz}}$ at 100 Hz for the short-duration search and $8.7 \times 10^{-23}$ $/\sqrt{\text{Hz}}$ at $450$ Hz for the long-duration search, given a detection efficiency of 50%. For a ringdown signal at 1590 Hz targeted by the short-duration search the limit is set to $1.8 \times 10^{-22}$ $/\sqrt{\text{Hz}}$. Using the estimated distance to each magnetar, we derive upper bounds on the emitted gravitational-wave energy of $3.2 \times 10^{43}$ erg ($7.3 \times 10^{43}$ erg) for SGR 1935$+$2154 and $8.2 \times 10^{42}$ erg ($2.8 \times 10^{43}$ erg) for Swift J1818.0$-$1607, for the short-duration (long-duration) search. Assuming isotropic emission of electromagnetic radiation of the burst fluences, we constrain the ratio of gravitational-wave energy to electromagnetic energy for bursts from SGR 1935$+$2154 with available fluence information. The lowest of these ratios is $3 \times 10^3$. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: 30 pages with appendices, 5 figures, 10 tables

Report number: LIGO-P2100387

arXiv:2210.05934 [pdf, other]

Input optics systems of the KAGRA detector during O3GK

Authors: T. Akutsu, M. Ando, K. Arai, Y. Arai, S. Araki, A. Araya, N. Aritomi, H. Asada, Y. Aso, S. Bae, Y. Bae, L. Baiotti, R. Bajpai, M. A. Barton, K. Cannon, Z. Cao, E. Capocasa, M. Chan, C. Chen, K. Chen, Y. Chen, C-I. Chiang, H. Chu, Y-K. Chu, S. Eguchi , et al. (228 additional authors not shown)

Abstract: KAGRA, the underground and cryogenic gravitational-wave detector, was operated for its solo observation from February 25th to March 10th, 2020, and its first joint observation with the GEO 600 detector from April 7th -- 21st, 2020 (O3GK). This study presents an overview of the input optics systems of the KAGRA detector, which consist of various optical systems, such as a laser source, its intensit… ▽ More KAGRA, the underground and cryogenic gravitational-wave detector, was operated for its solo observation from February 25th to March 10th, 2020, and its first joint observation with the GEO 600 detector from April 7th -- 21st, 2020 (O3GK). This study presents an overview of the input optics systems of the KAGRA detector, which consist of various optical systems, such as a laser source, its intensity and frequency stabilization systems, modulators, a Faraday isolator, mode-matching telescopes, and a high-power beam dump. These optics were successfully delivered to the KAGRA interferometer and operated stably during the observations. The laser frequency noise was observed to limit the detector sensitivity above a few kHz, whereas the laser intensity did not significantly limit the detector sensitivity. △ Less

Submitted 12 October, 2022; originally announced October 2022.

arXiv:2209.04974 [pdf, other]

VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition

Authors: Naoyuki Kanda, Jian Wu, Xiaofei Wang, Zhuo Chen, Jinyu Li, Takuya Yoshioka

Abstract: This paper presents a novel streaming automatic speech recognition (ASR) framework for multi-talker overlapping speech captured by a distant microphone array with an arbitrary geometry. Our framework, named t-SOT-VA, capitalizes on independently developed two recent technologies; array-geometry-agnostic continuous speech separation, or VarArray, and streaming multi-talker ASR based on token-level… ▽ More This paper presents a novel streaming automatic speech recognition (ASR) framework for multi-talker overlapping speech captured by a distant microphone array with an arbitrary geometry. Our framework, named t-SOT-VA, capitalizes on independently developed two recent technologies; array-geometry-agnostic continuous speech separation, or VarArray, and streaming multi-talker ASR based on token-level serialized output training (t-SOT). To combine the best of both technologies, we newly design a t-SOT-based ASR model that generates a serialized multi-talker transcription based on two separated speech signals from VarArray. We also propose a pre-training scheme for such an ASR model where we simulate VarArray's output signals based on monaural single-talker ASR training data. Conversation transcription experiments using the AMI meeting corpus show that the system based on the proposed framework significantly outperforms conventional ones. Our system achieves the state-of-the-art word error rates of 13.7% and 15.5% for the AMI development and evaluation sets, respectively, in the multiple-distant-microphone setting while retaining the streaming inference capability. △ Less

Submitted 3 October, 2022; v1 submitted 11 September, 2022; originally announced September 2022.

Comments: 6 pages, 2 figure, 3 tables, v2: Appendix A has been added

arXiv:2209.02863 [pdf]

doi 10.3847/2041-8213/aca1b0

Model-based cross-correlation search for gravitational waves from the low-mass X-ray binary Scorpius X-1 in LIGO O3 data

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, C. Alléné, A. Allocca, P. A. Altin , et al. (1670 additional authors not shown)

Abstract: We present the results of a model-based search for continuous gravitational waves from the low-mass X-ray binary Scorpius X-1 using LIGO detector data from the third observing run of Advanced LIGO, Advanced Virgo and KAGRA. This is a semicoherent search which uses details of the signal model to coherently combine data separated by less than a specified coherence time, which can be adjusted to bala… ▽ More We present the results of a model-based search for continuous gravitational waves from the low-mass X-ray binary Scorpius X-1 using LIGO detector data from the third observing run of Advanced LIGO, Advanced Virgo and KAGRA. This is a semicoherent search which uses details of the signal model to coherently combine data separated by less than a specified coherence time, which can be adjusted to balance sensitivity with computing cost. The search covered a range of gravitational-wave frequencies from 25Hz to 1600Hz, as well as ranges in orbital speed, frequency and phase determined from observational constraints. No significant detection candidates were found, and upper limits were set as a function of frequency. The most stringent limits, between 100Hz and 200Hz, correspond to an amplitude h0 of about 1e-25 when marginalized isotropically over the unknown inclination angle of the neutron star's rotation axis, or less than 4e-26 assuming the optimal orientation. The sensitivity of this search is now probing amplitudes predicted by models of torque balance equilibrium. For the usual conservative model assuming accretion at the surface of the neutron star, our isotropically-marginalized upper limits are close to the predicted amplitude from about 70Hz to 100Hz; the limits assuming the neutron star spin is aligned with the most likely orbital angular momentum are below the conservative torque balance predictions from 40Hz to 200Hz. Assuming a broader range of accretion models, our direct limits on gravitational-wave amplitude delve into the relevant parameter space over a wide range of frequencies, to 500Hz or more. △ Less

Submitted 2 January, 2023; v1 submitted 6 September, 2022; originally announced September 2022.

Comments: 19 pages, Open Access Journal PDF

Report number: LIGO-P2100110-v13

Journal ref: The Astrophysical Journal Letters, 941, L30 (2022)

arXiv:2208.13085 [pdf, other]

Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization

Authors: Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Takuya Yoshioka, Jian Wu

Abstract: This paper describes a speaker diarization model based on target speaker voice activity detection (TS-VAD) using transformers. To overcome the original TS-VAD model's drawback of being unable to handle an arbitrary number of speakers, we investigate model architectures that use input tensors with variable-length time and speaker dimensions. Transformer layers are applied to the speaker axis to mak… ▽ More This paper describes a speaker diarization model based on target speaker voice activity detection (TS-VAD) using transformers. To overcome the original TS-VAD model's drawback of being unable to handle an arbitrary number of speakers, we investigate model architectures that use input tensors with variable-length time and speaker dimensions. Transformer layers are applied to the speaker axis to make the model output insensitive to the order of the speaker profiles provided to the TS-VAD model. Time-wise sequential layers are interspersed between these speaker-wise transformer layers to allow the temporal and cross-speaker correlations of the input speech signal to be captured. We also extend a diarization model based on end-to-end neural diarization with encoder-decoder based attractors (EEND-EDA) by replacing its dot-product-based speaker detection layer with the transformer-based TS-VAD. Experimental results on VoxConverse show that using the transformers for the cross-speaker modeling reduces the diarization error rate (DER) of TS-VAD by 11.3%, achieving a new state-of-the-art (SOTA) DER of 4.57%. Also, our extended EEND-EDA reduces DER by 6.9% on the CALLHOME dataset relative to the original EEND-EDA with a similar model size, achieving a new SOTA DER of 11.18% under a widely used training data setting. △ Less

Submitted 25 September, 2022; v1 submitted 27 August, 2022; originally announced August 2022.

arXiv:2207.06291 [pdf, other]

doi 10.1088/1748-0221/18/03/P03035

Description and stability of a RPC-based calorimeter in electromagnetic and hadronic shower environments

Authors: D. Boumediene, V. Francais, J. Apostolakis, G. Folger, A. Ribon, E. Sicking, K. Goto, K. Kawagoe, M. Kuhara, T. Suehara, T. Yoshioka, A. Pingault, M. Tytgat, G. Garillot, G. Grenier, T. Kurca, I. Laktineh, B. Liu, B. Li, L. Mirabito, E. Calvo Alamillo, C. Carrillo, M. C. Fouz, H. Garcia Cabrera, J. Marin , et al. (14 additional authors not shown)

Abstract: The CALICE Semi-Digital Hadron Calorimeter technological prototype completed in 2011 is a sampling calorimeter using Glass Resistive Plate Chamber (GRPC) detectors as the active medium. This technology is one of the two options proposed for the hadron calorimeter of the International Large Detector for the International Linear Collider. The prototype was exposed in 2015 to beams of muons, electron… ▽ More The CALICE Semi-Digital Hadron Calorimeter technological prototype completed in 2011 is a sampling calorimeter using Glass Resistive Plate Chamber (GRPC) detectors as the active medium. This technology is one of the two options proposed for the hadron calorimeter of the International Large Detector for the International Linear Collider. The prototype was exposed in 2015 to beams of muons, electrons, and pions of different energies at the CERN Super Proton Synchrotron. The use of this technology for future experiments requires a reliable simulation of its response that can predict its performance. GEANT4 combined with a digitization algorithm was used to simulate the prototype. It describes the full path of the signal: showering, gas avalanches, charge induction, and hit triggering. The simulation was tuned using muon tracks and electromagnetic showers for accounting for detector inhomogeneity and tested on hadronic showers collected in the test beam. This publication describes developments of the digitization algorithm. It is used to predict the stability of the detector performance against various changes in the data-taking conditions, including temperature, pressure, magnetic field, GRPC width variations, and gas mixture variations. These predictions are confronted with test beam data and provide an attempt to explain the detector properties. The data-taking conditions such as temperature and potential detector inhomogeneities affect energy density measurements but have a small impact on detector efficiency. △ Less

Submitted 21 March, 2023; v1 submitted 13 July, 2022; originally announced July 2022.

Comments: Version published in JINST

Report number: CALICE-PUB-2022-02

Journal ref: 2023 JINST 18 P03035

arXiv:2207.05098 [pdf, other]

doi 10.1093/mnras/stac1972

The physical origin for spatially large scatter of IGM opacity at the end of reionization: the IGM Ly$α$ opacity-galaxy density relation

Authors: Rikako Ishimoto, Nobunari Kashikawa, Daichi Kashino, Kei Ito, Yongming Liang, Zheng Cai, Takehiro Yoshioka, Katsuya Okoshi, Toru Misawa, Masafusa Onoue, Yoshihiro Takeda, Hisakazu Uchiyama

Abstract: The large opacity fluctuations in the $z > 5.5$ Ly$α$ forest may indicate inhomogeneous progress of reionization. To explain the observed large scatter of the effective Ly$α$ optical depth ($τ_{\rm eff}$) of the intergalactic medium (IGM), fluctuation of UV background ($Γ$ model) or the IGM gas temperature ($T$ model) have been proposed, which predict opposite correlations between $τ_{\rm eff}$ an… ▽ More The large opacity fluctuations in the $z > 5.5$ Ly$α$ forest may indicate inhomogeneous progress of reionization. To explain the observed large scatter of the effective Ly$α$ optical depth ($τ_{\rm eff}$) of the intergalactic medium (IGM), fluctuation of UV background ($Γ$ model) or the IGM gas temperature ($T$ model) have been proposed, which predict opposite correlations between $τ_{\rm eff}$ and galaxy density. In order to address which model can explain the large scatter of $τ_{\rm eff}$, we search for Ly$α$ emitters (LAEs) around two (J1137+3549 and J1602+4228) quasar sightlines with $τ_{\rm eff}\sim3$ and J1630+4012 sightline with $τ_{\rm eff}\sim5.5$. Using a narrowband imaging with Subaru/Hyper Suprime-Cam, we draw LAE density maps to explore their spatial distributions. Overdensities are found within 20 $h^{-1}$Mpc of the quasar sightlines in the low $τ_{\rm eff}$ regions, while a deficit of LAEs is found in the high $τ_{\rm eff}$ region. Although the $τ_{\rm eff}$ of the three quasar sightlines are neither high nor low enough to clearly distinguish the two models, these observed $τ_{\rm eff}$-galaxy density relations all consistently support the $Γ$ model rather than the $T$ model in the three fields, along with the previous studies. The observed overdensities near the low $τ_{\rm eff}$ sightlines may suggest that the relic temperature fluctuation does not affect reionization that much. Otherwise, these overdensities could be attributed to other factors besides the reionization process, such as the nature of LAEs as poor tracers of underlying large-scale structures. △ Less

Submitted 11 July, 2022; originally announced July 2022.

Comments: 13 pages, 14 figures, accepted for publication in MNRAS

Showing 1–50 of 211 results for author: Yoshioka, T