Skip to main content

Showing 1–16 of 16 results for author: Dumpala, S H

  1. arXiv:2410.13030  [pdf, other

    cs.CV cs.CL cs.LG

    Sensitivity of Generative VLMs to Semantically and Lexically Altered Prompts

    Authors: Sri Harsha Dumpala, Aman Jaiswal, Chandramouli Sastry, Evangelos Milios, Sageev Oore, Hassan Sajjad

    Abstract: Despite the significant influx of prompt-tuning techniques for generative vision-language models (VLMs), it remains unclear how sensitive these models are to lexical and semantic alterations in prompts. In this paper, we evaluate the ability of generative VLMs to understand lexical and semantic changes in text using the SugarCrepe++ dataset. We analyze the sensitivity of VLMs to lexical alteration… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  2. arXiv:2407.06342  [pdf, other

    eess.AS

    XANE Background Acoustic Embeddings: Ablation and Clustering Analysis

    Authors: Dushyant Sharma, James Fosburgh, Sri Harsha Dumpala, Chandramouli Shama Sastri, Stanislav Yu. Kruchinin, Patrick A. Naylor

    Abstract: We explore the recently proposed explainable acoustic neural embedding~(XANE) system that models the background acoustics of a speech signal in a non-intrusive manner. The XANE embeddings are used to estimate specific parameters related to the background acoustic properties of the signal which allows the embeddings to be explainable in terms of those parameters. We perform ablation studies on the… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2406.05199

  3. arXiv:2406.17229  [pdf, other

    cs.SD cs.LG eess.AS

    Self-Supervised Embeddings for Detecting Individual Symptoms of Depression

    Authors: Sri Harsha Dumpala, Katerina Dikaios, Abraham Nunes, Frank Rudzicz, Rudolf Uher, Sageev Oore

    Abstract: Depression, a prevalent mental health disorder impacting millions globally, demands reliable assessment systems. Unlike previous studies that focus solely on either detecting depression or predicting its severity, our work identifies individual symptoms of depression while also predicting its severity using speech input. We leverage self-supervised learning (SSL)-based speech models to better util… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted at INTERSPEECH 2024

  4. arXiv:2406.16000  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Predicting Individual Depression Symptoms from Acoustic Features During Speech

    Authors: Sebastian Rodriguez, Sri Harsha Dumpala, Katerina Dikaios, Sheri Rempel, Rudolf Uher, Sageev Oore

    Abstract: Current automatic depression detection systems provide predictions directly without relying on the individual symptoms/items of depression as denoted in the clinical depression rating scales. In contrast, clinicians assess each item in the depression rating scale in a clinical setting, thus implicitly providing a more detailed rationale for a depression diagnosis. In this work, we make a first ste… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  5. arXiv:2406.11171  [pdf, other

    cs.CV cs.CL cs.LG

    SUGARCREPE++ Dataset: Vision-Language Model Sensitivity to Semantic and Lexical Alterations

    Authors: Sri Harsha Dumpala, Aman Jaiswal, Chandramouli Sastry, Evangelos Milios, Sageev Oore, Hassan Sajjad

    Abstract: Despite their remarkable successes, state-of-the-art large language models (LLMs), including vision-and-language models (VLMs) and unimodal language models (ULMs), fail to understand precise semantics. For example, semantically equivalent sentences expressed using different lexical compositions elicit diverging representations. The degree of this divergence and its impact on encoded semantics is n… ▽ More

    Submitted 18 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: Added the dataset link to the abstract

    MSC Class: 68T45; 68T50 ACM Class: I.2.7; I.2.10

  6. arXiv:2406.05199  [pdf, other

    eess.AS cs.SD

    XANE: eXplainable Acoustic Neural Embeddings

    Authors: Sri Harsha Dumpala, Dushyant Sharma, Chandramouli Shama Sastri, Stanislav Kruchinin, James Fosburgh, Patrick A. Naylor

    Abstract: We present a novel method for extracting neural embeddings that model the background acoustics of a speech signal. The extracted embeddings are used to estimate specific parameters related to the background acoustic properties of the signal in a non-intrusive manner, which allows the embeddings to be explainable in terms of those parameters. We illustrate the value of these embeddings by performin… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  7. arXiv:2404.16365  [pdf, other

    cs.CL cs.AI

    VISLA Benchmark: Evaluating Embedding Sensitivity to Semantic and Lexical Alterations

    Authors: Sri Harsha Dumpala, Aman Jaiswal, Chandramouli Sastry, Evangelos Milios, Sageev Oore, Hassan Sajjad

    Abstract: Despite their remarkable successes, state-of-the-art language models face challenges in grasping certain important semantic details. This paper introduces the VISLA (Variance and Invariance to Semantic and Lexical Alterations) benchmark, designed to evaluate the semantic and lexical understanding of language models. VISLA presents a 3-way semantic (in)equivalence task with a triplet of sentences a… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  8. arXiv:2404.05071  [pdf, other

    cs.LG cs.SD eess.AS

    Test-Time Training for Depression Detection

    Authors: Sri Harsha Dumpala, Chandramouli Shama Sastry, Rudolf Uher, Sageev Oore

    Abstract: Previous works on depression detection use datasets collected in similar environments to train and test the models. In practice, however, the train and test distributions cannot be guaranteed to be identical. Distribution shifts can be introduced due to variations such as recording environment (e.g., background noise) and demographics (e.g., gender, age, etc). Such distributional shifts can surpri… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  9. arXiv:2309.10930  [pdf, other

    cs.SD cs.LG eess.AS

    Test-Time Training for Speech

    Authors: Sri Harsha Dumpala, Chandramouli Sastry, Sageev Oore

    Abstract: In this paper, we study the application of Test-Time Training (TTT) as a solution to handling distribution shifts in speech applications. In particular, we introduce distribution-shifts to the test datasets of standard speech-classification tasks -- for example, speaker-identification and emotion-detection -- and explore how Test-Time Training (TTT) can help adjust to the distribution-shift. In ou… ▽ More

    Submitted 28 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

  10. arXiv:2306.09192  [pdf, other

    cs.CV cs.LG

    DiffAug: A Diffuse-and-Denoise Augmentation for Training Robust Classifiers

    Authors: Chandramouli Sastry, Sri Harsha Dumpala, Sageev Oore

    Abstract: We introduce DiffAug, a simple and efficient diffusion-based augmentation technique to train image classifiers for the crucial yet challenging goal of improved classifier robustness. Applying DiffAug to a given example consists of one forward-diffusion step followed by one reverse-diffusion step. Using both ResNet-50 and Vision Transformer architectures, we comprehensively evaluate classifiers tra… ▽ More

    Submitted 28 May, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: Shorter version of this work was accepted in the CVPR 2024 Workshop on Synthetic Data for Computer Vision

  11. arXiv:2108.01043  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    Musical Speech: A Transformer-based Composition Tool

    Authors: Jason d'Eon, Sri Harsha Dumpala, Chandramouli Shama Sastry, Dani Oore, Sageev Oore

    Abstract: In this paper, we propose a new compositional tool that will generate a musical outline of speech recorded/provided by the user for use as a musical building block in their compositions. The tool allows any user to use their own speech to generate musical material, while still being able to hear the direct connection between their recorded speech and the resulting music. The tool is built on our p… ▽ More

    Submitted 2 August, 2021; originally announced August 2021.

    Comments: NeurIPS 2020 Demonstration Track; extended for PMLR

  12. arXiv:2107.13969  [pdf, other

    cs.CY cs.LG cs.SD eess.AS

    Significance of Speaker Embeddings and Temporal Context for Depression Detection

    Authors: Sri Harsha Dumpala, Sebastian Rodriguez, Sheri Rempel, Rudolf Uher, Sageev Oore

    Abstract: Depression detection from speech has attracted a lot of attention in recent years. However, the significance of speaker-specific information in depression detection has not yet been explored. In this work, we analyze the significance of speaker embeddings for the task of depression detection from speech. Experimental results show that the speaker embeddings provide important cues to achieve state-… ▽ More

    Submitted 24 July, 2021; originally announced July 2021.

  13. arXiv:1912.11151  [pdf, other

    eess.AS cs.CL cs.SD

    A Cycle-GAN Approach to Model Natural Perturbations in Speech for ASR Applications

    Authors: Sri Harsha Dumpala, Imran Sheikh, Rupayan Chakraborty, Sunil Kumar Kopparapu

    Abstract: Naturally introduced perturbations in audio signal, caused by emotional and physical states of the speaker, can significantly degrade the performance of Automatic Speech Recognition (ASR) systems. In this paper, we propose a front-end based on Cycle-Consistent Generative Adversarial Network (CycleGAN) which transforms naturally perturbed speech into normal speech, and hence improves the robustness… ▽ More

    Submitted 18 December, 2019; originally announced December 2019.

    Comments: 7 pages, 3 figures, ICASSP-2019

  14. arXiv:1712.05608  [pdf, other

    cs.CL cs.SD eess.AS

    A Novel Approach for Effective Learning in Low Resourced Scenarios

    Authors: Sri Harsha Dumpala, Rupayan Chakraborty, Sunil Kumar Kopparapu

    Abstract: Deep learning based discriminative methods, being the state-of-the-art machine learning techniques, are ill-suited for learning from lower amounts of data. In this paper, we propose a novel framework, called simultaneous two sample learning (s2sL), to effectively learn the class discriminative characteristics, even from very low amount of data. In s2sL, more than one sample (here, two samples) are… ▽ More

    Submitted 15 December, 2017; originally announced December 2017.

    Comments: Presented at NIPS 2017 Machine Learning for Audio Signal Processing (ML4Audio) Workshop, Dec. 2017

  15. arXiv:1705.09289  [pdf, other

    cs.SD

    Improved I-vector-based Speaker Recognition for Utterances with Speaker Generated Non-speech sounds

    Authors: Sri Harsha Dumpala, Ashish Panda, Sunil Kumar Kopparapu

    Abstract: Conversational speech not only contains several variants of neutral speech but is also prominently interlaced with several speaker generated non-speech sounds such as laughter and breath. A robust speaker recognition system should be capable of recognizing a speaker irrespective of these variations in his speech. An understanding of whether the speaker-specific information represented by these var… ▽ More

    Submitted 25 May, 2017; originally announced May 2017.

  16. arXiv:1704.07055  [pdf, other

    cs.LG cs.NE

    k-FFNN: A priori knowledge infused Feed-forward Neural Networks

    Authors: Sri Harsha Dumpala, Rupayan Chakraborty, Sunil Kumar Kopparapu

    Abstract: Recurrent neural network (RNN) are being extensively used over feed-forward neural networks (FFNN) because of their inherent capability to capture temporal relationships that exist in the sequential data such as speech. This aspect of RNN is advantageous especially when there is no a priori knowledge about the temporal correlations within the data. However, RNNs require large amount of data to lea… ▽ More

    Submitted 24 April, 2017; originally announced April 2017.