-
Describe Where You Are: Improving Noise-Robustness for Speech Emotion Recognition with Text Description of the Environment
Authors:
Seong-Gyun Leem,
Daniel Fulford,
Jukka-Pekka Onnela,
David Gard,
Carlos Busso
Abstract:
Speech emotion recognition (SER) systems often struggle in real-world environments, where ambient noise severely degrades their performance. This paper explores a novel approach that exploits prior knowledge of testing environments to maximize SER performance under noisy conditions. To address this task, we propose a text-guided, environment-aware training where an SER model is trained with contam…
▽ More
Speech emotion recognition (SER) systems often struggle in real-world environments, where ambient noise severely degrades their performance. This paper explores a novel approach that exploits prior knowledge of testing environments to maximize SER performance under noisy conditions. To address this task, we propose a text-guided, environment-aware training where an SER model is trained with contaminated speech samples and their paired noise description. We use a pre-trained text encoder to extract the text-based environment embedding and then fuse it to a transformer-based SER model during training and inference. We demonstrate the effectiveness of our approach through our experiment with the MSP-Podcast corpus and real-world additive noise samples collected from the Freesound repository. Our experiment indicates that the text-based environment descriptions processed by a large language model (LLM) produce representations that improve the noise-robustness of the SER system. In addition, our proposed approach with an LLM yields better performance than our environment-agnostic baselines, especially in low signal-to-noise ratio (SNR) conditions. When testing at -5dB SNR level, our proposed method shows better performance than our best baseline model by 31.8 % (arousal), 23.5% (dominance), and 9.5% (valence).
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
A Comprehensive Survey of Foundation Models in Medicine
Authors:
Wasif Khan,
Seowung Leem,
Kyle B. See,
Joshua K. Wong,
Shaoting Zhang,
Ruogu Fang
Abstract:
Foundation models (FMs) are large-scale deep-learning models trained on extensive datasets using self-supervised techniques. These models serve as a base for various downstream tasks, including healthcare. FMs have been adopted with great success across various domains within healthcare, including natural language processing (NLP), computer vision, graph learning, biology, and omics. Existing heal…
▽ More
Foundation models (FMs) are large-scale deep-learning models trained on extensive datasets using self-supervised techniques. These models serve as a base for various downstream tasks, including healthcare. FMs have been adopted with great success across various domains within healthcare, including natural language processing (NLP), computer vision, graph learning, biology, and omics. Existing healthcare-based surveys have not yet included all of these domains. Therefore, this survey provides a comprehensive overview of FMs in healthcare. We focus on the history, learning strategies, flagship models, applications, and challenges of FMs. We explore how FMs such as the BERT and GPT families are reshaping various healthcare domains, including clinical large language models, medical image analysis, and omics data. Furthermore, we provide a detailed taxonomy of healthcare applications facilitated by FMs, such as clinical NLP, medical computer vision, graph learning, and other biology-related tasks. Despite the promising opportunities FMs provide, they also have several associated challenges, which are explained in detail. We also outline potential future directions to provide researchers and practitioners with insights into the potential and limitations of FMs in healthcare to advance their deployment and mitigate associated risks.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention
Authors:
Saebom Leem,
Hyunseok Seo
Abstract:
Vision Transformer(ViT) is one of the most widely used models in the computer vision field with its great performance on various tasks. In order to fully utilize the ViT-based architecture in various applications, proper visualization methods with a decent localization performance are necessary, but these methods employed in CNN-based models are still not available in ViT due to its unique structu…
▽ More
Vision Transformer(ViT) is one of the most widely used models in the computer vision field with its great performance on various tasks. In order to fully utilize the ViT-based architecture in various applications, proper visualization methods with a decent localization performance are necessary, but these methods employed in CNN-based models are still not available in ViT due to its unique structure. In this work, we propose an attention-guided visualization method applied to ViT that provides a high-level semantic explanation for its decision. Our method selectively aggregates the gradients directly propagated from the classification output to each self-attention, collecting the contribution of image features extracted from each location of the input image. These gradients are additionally guided by the normalized self-attention scores, which are the pairwise patch correlation scores. They are used to supplement the gradients on the patch-level context information efficiently detected by the self-attention mechanism. This approach of our method provides elaborate high-level semantic explanations with great localization performance only with the class labels. As a result, our method outperforms the previous leading explainability methods of ViT in the weakly-supervised localization task and presents great capability in capturing the full instances of the target class object. Meanwhile, our method provides a visualization that faithfully explains the model, which is demonstrated in the perturbation comparison test.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Solving norm equations in global function fields
Authors:
Sumin Leem,
Michael Jacobson,
Renate Scheidler
Abstract:
We present two new algorithms for solving norm equations over global function fields with at least one infinite place of degree 1 and no wild ramification. The first of these is a substantial improvement of a method due to Gaál and Pohst, while the second approach uses index calculus techniques and is significantly faster asymptotically and in practice. Both algorithms incorporate compact represen…
▽ More
We present two new algorithms for solving norm equations over global function fields with at least one infinite place of degree 1 and no wild ramification. The first of these is a substantial improvement of a method due to Gaál and Pohst, while the second approach uses index calculus techniques and is significantly faster asymptotically and in practice. Both algorithms incorporate compact representations of field elements which results in a significant gain in performance compared to the Gaál-Pohst approach. We provide Magma implementations, analyze the complexity of all three algorithms under varying asymptotics on the field parameters, and provide empirical data on their performance.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Versatile audio-visual learning for emotion recognition
Authors:
Lucas Goncalves,
Seong-Gyun Leem,
Wei-Cheng Lin,
Berrak Sisman,
Carlos Busso
Abstract:
Most current audio-visual emotion recognition models lack the flexibility needed for deployment in practical applications. We envision a multimodal system that works even when only one modality is available and can be implemented interchangeably for either predicting emotional attributes or recognizing categorical emotions. Achieving such flexibility in a multimodal emotion recognition system is d…
▽ More
Most current audio-visual emotion recognition models lack the flexibility needed for deployment in practical applications. We envision a multimodal system that works even when only one modality is available and can be implemented interchangeably for either predicting emotional attributes or recognizing categorical emotions. Achieving such flexibility in a multimodal emotion recognition system is difficult due to the inherent challenges in accurately interpreting and integrating varied data sources. It is also a challenge to robustly handle missing or partial information while allowing direct switch between regression or classification tasks. This study proposes a versatile audio-visual learning (VAVL) framework for handling unimodal and multimodal systems for emotion regression or emotion classification tasks. We implement an audio-visual framework that can be trained even when audio and visual paired data is not available for part of the training set (i.e., audio only or only video is present). We achieve this effective representation learning with audio-visual shared layers, residual connections over shared layers, and a unimodal reconstruction task. Our experimental results reveal that our architecture significantly outperforms strong baselines on the CREMA-D, MSP-IMPROV, and CMU-MOSEI corpora. Notably, VAVL attains a new state-of-the-art performance in the emotional attribute prediction task on the MSP-IMPROV corpus.
△ Less
Submitted 30 July, 2024; v1 submitted 11 May, 2023;
originally announced May 2023.
-
High resolution angle resolved photoemission studies on quasi-particle dynamics in graphite
Authors:
C. S. Leem,
B. J. Kim,
Chul Kim,
S. R. Park,
Min-Kook Kim,
S. Johnston,
T. Ohta,
A. Bostwick,
Hyoung Joon Choi,
T. Devereaux,
E. Rotenberg,
C. Kim
Abstract:
We obtained the spectral function of the graphite H point using high resolution angle resolved photoelectron spectroscopy (ARPES). The extracted width of the spectral function (inverse of the photo-hole lifetime) near the H point is approximately proportional to the energy as expected from the linearly increasing density of states (DOS) near the Fermi energy. This is well accounted by our electr…
▽ More
We obtained the spectral function of the graphite H point using high resolution angle resolved photoelectron spectroscopy (ARPES). The extracted width of the spectral function (inverse of the photo-hole lifetime) near the H point is approximately proportional to the energy as expected from the linearly increasing density of states (DOS) near the Fermi energy. This is well accounted by our electron-phonon coupling theory considering the peculiar electronic DOS near the Fermi level. And we also investigated the temperature dependence of the peak widths both experimentally and theoretically. The upper bound for the electron-phonon coupling parameter is ~0.23, nearly the same value as previously reported at the K point. Our analysis of temperature dependent ARPES data at K shows that the energy of phonon mode of graphite has much higher energy scale than 125K which is dominant in electron-phonon coupling.
△ Less
Submitted 3 March, 2009;
originally announced March 2009.
-
Angle-resolved photoemission spectroscopy of electron-doped cuprate superconductors: Isotropic electron-phonon coupling
Authors:
Seung Ryong Park,
D. J. Song,
C. S. Leem,
Chul Kim,
C. Kim,
B. J. Kim,
H. Eisaki
Abstract:
We have performed high resolution angle resolved photoemission (ARPES) studies on electron doped cuprate superconductors Sm2-xCexCuO4 (x=0.10, 0.15, 0.18), Nd2-xCexCuO4 (x=0.15) and Eu2-xCexCuO4 (x=0.15). Imaginary parts of the electron removal self energy show step-like features due to an electron-bosonic mode coupling. The step-like feature is seen along both nodal and anti-nodal directions bu…
▽ More
We have performed high resolution angle resolved photoemission (ARPES) studies on electron doped cuprate superconductors Sm2-xCexCuO4 (x=0.10, 0.15, 0.18), Nd2-xCexCuO4 (x=0.15) and Eu2-xCexCuO4 (x=0.15). Imaginary parts of the electron removal self energy show step-like features due to an electron-bosonic mode coupling. The step-like feature is seen along both nodal and anti-nodal directions but at energies of 50 and 70 meV, respectively, independent of the doping and rare earth element. Such energy scales can be understood as being due to preferential coupling to half- and full-breathing mode phonons, revealing the phononic origin of the kink structures. Estimated electron-phonon coupling constant lambda from the self energy is roughly independent of the doping and momentum. The isotropic nature of lambda is discussed in comparison with the hole doped case where a strong anisotropy exists.
△ Less
Submitted 5 August, 2008;
originally announced August 2008.
-
Novel Jeff = 1/2 Mott State Induced by Relativistic Spin-Orbit Coupling in Sr2IrO4
Authors:
B. J. Kim,
Hosub Jin,
S. J. Moon,
J. -Y. Kim,
B. -G. Park,
C. S. Leem,
Jaeju Yu,
T. W. Noh,
C. Kim,
S. -J. Oh,
J. -H. Park,
V. Durairaj,
G. Cao,
E. Rotenberg
Abstract:
We investigated electronic structure of 5d transition-metal oxide Sr2IrO4 using angle-resolved photoemission, optical conductivity, and x-ray absorption measurements and first-principles band calculations. The system was found to be well described by novel effective total angular momentum Jeff states, in which relativistic spin-orbit (SO) coupling is fully taken into account under a large crysta…
▽ More
We investigated electronic structure of 5d transition-metal oxide Sr2IrO4 using angle-resolved photoemission, optical conductivity, and x-ray absorption measurements and first-principles band calculations. The system was found to be well described by novel effective total angular momentum Jeff states, in which relativistic spin-orbit (SO) coupling is fully taken into account under a large crystal field. Despite of delocalized Ir 5d states, the Jeff-states form so narrow bands that even a small correlation energy leads to the Jeff = 1/2 Mott ground state with unique electronic and magnetic behaviors, suggesting a new class of the Jeff quantum spin driven correlated-electron phenomena.
△ Less
Submitted 20 March, 2008;
originally announced March 2008.
-
Effect of linear density of states on the quasi-particle dynamics and small electron-phonon coupling in graphite
Authors:
C. S. Leem,
B. J. Kim,
Chul Kim,
S. R. Park,
T. Ohta,
A. Bostwick,
E. Rotenberg,
H. -D. Kim,
M. K. Kim,
H. J. Choi,
C. Kim
Abstract:
We obtained the spectral function of very high quality natural graphite single crystals using angle resolved photoelectron spectroscopy (ARPES). A clear separation of non-bonding and bonding bands and asymmetric lineshape are observed. The asymmetric lineshapes are well accounted for by the finite photoelectron escape depth and the band structure. The extracted width of the spectral function (in…
▽ More
We obtained the spectral function of very high quality natural graphite single crystals using angle resolved photoelectron spectroscopy (ARPES). A clear separation of non-bonding and bonding bands and asymmetric lineshape are observed. The asymmetric lineshapes are well accounted for by the finite photoelectron escape depth and the band structure. The extracted width of the spectral function (inverse of the photohole life time) near the K point is, beyond the maximum phonon energy, approximately proportional to the energy as expected from the linear density of states near the Fermi energy. The upper bound for the electron-phonon coupling constant is about 0.2, a much smaller value than the previously reported one.
△ Less
Submitted 25 August, 2007;
originally announced August 2007.
-
Electron Removal Self Energy and its application to Ca2CuO2Cl2
Authors:
Chul Kim,
S. R. Park,
C. S. Leem,
D. J. Song,
H. U. Jin,
H. -D. Kim,
F. Ronning,
C. Kim
Abstract:
We propose using the self energy defined for the electron removal Green's function. Starting from the electron removal Green's function, we obtained expressions for the removal self energy Sigma^ER (k,omega) that are applicable for non-quasiparticle photoemission spectral functions from a single band system. Our method does not assume momentum independence and produces the self energy in the ful…
▽ More
We propose using the self energy defined for the electron removal Green's function. Starting from the electron removal Green's function, we obtained expressions for the removal self energy Sigma^ER (k,omega) that are applicable for non-quasiparticle photoemission spectral functions from a single band system. Our method does not assume momentum independence and produces the self energy in the full k-omega space. The method is applied to the angle resolved photoemission from Ca_2CuO_2Cl_2 and the result is found to be compatible with the self energy value from the peak width of sharp features. The self energy is found to be only weakly k-dependent. In addition, the Im Sigma shows a maximum at around 1 eV where the high energy kink is located.
△ Less
Submitted 9 August, 2007;
originally announced August 2007.
-
Electronic Structure of Electron-doped Sm1.86Ce0.14CuO4: Strong `Pseudo-Gap' Effects, Nodeless Gap and Signatures of Short Range Order
Authors:
S. R. Park,
Y. S. Roh,
Y. K. Yoon,
C. S. Leem,
J. H. Kim,
B. J. Kim,
H. Koh,
H. Eisaki,
N. P. Armitage,
C. Kim
Abstract:
Angle resolved photoemission (ARPES) data from the electron doped cuprate superconductor Sm$_{1.86}$Ce$_{0.14}$CuO$_4$ shows a much stronger pseudo-gap or "hot-spot" effect than that observed in other optimally doped $n$-type cuprates. Importantly, these effects are strong enough to drive the zone-diagonal states below the chemical potential, implying that d-wave superconductivity in this compou…
▽ More
Angle resolved photoemission (ARPES) data from the electron doped cuprate superconductor Sm$_{1.86}$Ce$_{0.14}$CuO$_4$ shows a much stronger pseudo-gap or "hot-spot" effect than that observed in other optimally doped $n$-type cuprates. Importantly, these effects are strong enough to drive the zone-diagonal states below the chemical potential, implying that d-wave superconductivity in this compound would be of a novel "nodeless" gap variety. The gross features of the Fermi surface topology and low energy electronic structure are found to be well described by reconstruction of bands by a $\sqrt{2}\times\sqrt{2}$ order. Comparison of the ARPES and optical data from the $same$ sample shows that the pseudo-gap energy observed in optical data is consistent with the inter-band transition energy of the model, allowing us to have a unified picture of pseudo-gap effects. However, the high energy electronic structure is found to be inconsistent with such a scenario. We show that a number of these model inconsistencies can be resolved by considering a short range ordering or inhomogeneous state.
△ Less
Submitted 17 December, 2006;
originally announced December 2006.