Skip to main content

Showing 1–20 of 20 results for author: Behera, S R

  1. arXiv:2410.12947  [pdf, other

    eess.AS cs.SD

    Multi-View Multi-Task Modeling with Speech Foundation Models for Speech Forensic Tasks

    Authors: Orchid Chetia Phukan, Devyani Koshal, Swarup Ranjan Behera, Arun Balaji Buduru, Rajesh Sharma

    Abstract: Speech forensic tasks (SFTs), such as automatic speaker recognition (ASR), speech emotion recognition (SER), gender recognition (GR), and age estimation (AE), find use in different security and biometric applications. Previous works have applied various techniques, with recent studies focusing on applying speech foundation models (SFMs) for improved performance. However, most prior efforts have ce… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    MSC Class: 68T45 ACM Class: I.2.7

  2. arXiv:2410.12645  [pdf, other

    eess.AS eess.SP

    Beyond Speech and More: Investigating the Emergent Ability of Speech Foundation Models for Classifying Physiological Time-Series Signals

    Authors: Orchid Chetia Phukan, Swarup Ranjan Behera, Girish, Mohd Mujtaba Akhtar, Arun Balaji Buduru, Rajesh Sharma

    Abstract: Despite being trained exclusively on speech data, speech foundation models (SFMs) like Whisper have shown impressive performance in non-speech tasks such as audio classification. This is partly because speech shares some common traits with audio, enabling SFMs to transfer effectively. In this study, we push the boundaries by evaluating SFMs on a more challenging out-of-domain (OOD) task: classifyi… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    MSC Class: 68T45 ACM Class: I.2.7

  3. arXiv:2410.12567  [pdf, other

    eess.AS cs.SD

    SeQuiFi: Mitigating Catastrophic Forgetting in Speech Emotion Recognition with Sequential Class-Finetuning

    Authors: Sarthak Jain, Orchid Chetia Phukan, Swarup Ranjan Behera, Arun Balaji Buduru, Rajesh Sharma

    Abstract: In this work, we introduce SeQuiFi, a novel approach for mitigating catastrophic forgetting (CF) in speech emotion recognition (SER). SeQuiFi adopts a sequential class-finetuning strategy, where the model is fine-tuned incrementally on one emotion class at a time, preserving and enhancing retention for each class. While various state-of-the-art (SOTA) methods, such as regularization-based, memory-… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    MSC Class: 68T45 ACM Class: I.2.7

  4. arXiv:2409.15767  [pdf, other

    eess.AS cs.SD

    Representation Loss Minimization with Randomized Selection Strategy for Efficient Environmental Fake Audio Detection

    Authors: Orchid Chetia Phukan, Girish, Mohd Mujtaba Akhtar, Swarup Ranjan Behera, Nitin Choudhury, Arun Balaji Buduru, Rajesh Sharma, S. R Mahadeva Prasanna

    Abstract: The adaptation of foundation models has significantly advanced environmental audio deepfake detection (EADD), a rapidly growing area of research. These models are typically fine-tuned or utilized in their frozen states for downstream tasks. However, the dimensionality of their representations can substantially lead to a high parameter count of downstream models, leading to higher computational dem… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

    Comments: Submitted to ICASSP 2025

    MSC Class: 68T45 ACM Class: I.2.7

  5. arXiv:2409.14312  [pdf, other

    eess.AS cs.SD

    Avengers Assemble: Amalgamation of Non-Semantic Features for Depression Detection

    Authors: Orchid Chetia Phukan, Swarup Ranjan Behera, Shubham Singh, Muskaan Singh, Vandana Rajan, Arun Balaji Buduru, Rajesh Sharma, S. R. Mahadeva Prasanna

    Abstract: In this study, we address the challenge of depression detection from speech, focusing on the potential of non-semantic features (NSFs) to capture subtle markers of depression. While prior research has leveraged various features for this task, NSFs-extracted from pre-trained models (PTMs) designed for non-semantic tasks such as paralinguistic speech processing (TRILLsson), speaker recognition (x-ve… ▽ More

    Submitted 22 September, 2024; originally announced September 2024.

    Comments: Submitted to ICASSP 2025

    MSC Class: 68T45 ACM Class: I.2.7

  6. arXiv:2409.14221  [pdf, other

    eess.AS cs.SD

    Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition

    Authors: Orchid Chetia Phukan, Mohd Mujtaba Akhtar, Girish, Swarup Ranjan Behera, Sishir Kalita, Arun Balaji Buduru, Rajesh Sharma, S. R Mahadeva Prasanna

    Abstract: In this study, we investigate multimodal foundation models (MFMs) for emotion recognition from non-verbal sounds. We hypothesize that MFMs, with their joint pre-training across multiple modalities, will be more effective in non-verbal sounds emotion recognition (NVER) by better interpreting and differentiating subtle emotional cues that may be ambiguous in audio-only foundation models (AFMs). To v… ▽ More

    Submitted 21 September, 2024; originally announced September 2024.

    Comments: Submitted to ICASSP 2025

    MSC Class: 68T45 ACM Class: I.2.7

  7. arXiv:2409.14131  [pdf, other

    eess.AS cs.LG cs.SD

    Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models

    Authors: Orchid Chetia Phukan, Sarthak Jain, Swarup Ranjan Behera, Arun Balaji Buduru, Rajesh Sharma, S. R Mahadeva Prasanna

    Abstract: In this study, for the first time, we extensively investigate whether music foundation models (MFMs) or speech foundation models (SFMs) work better for singing voice deepfake detection (SVDD), which has recently attracted attention in the research community. For this, we perform a comprehensive comparative study of state-of-the-art (SOTA) MFMs (MERT variants and music2vec) and SFMs (pre-trained fo… ▽ More

    Submitted 21 September, 2024; originally announced September 2024.

    Comments: Submitted to ICASSP 2025

    MSC Class: 68T45 ACM Class: I.2.7

  8. arXiv:2408.13530  [pdf, ps, other

    math.AP

    Homogeneous Dirichlet problem for degenerate parabolic-hyperbolic PDE driven by Levy noise

    Authors: Soumya Ranjan Behera, Ananta K Majee

    Abstract: In this article, we study the homogeneous Dirichlet problem for a degenerate parabolic-hyperbolic PDE perturbed by Levy noise. In particular, we develop the well-posedness theory of entropy solution based on the Kružkov's semi-entropy formulation. In comparison to the pioneered work by Bauzet et al. (J. Funct. Anal. 266, (2014), 2503-2545), concerning the existence and uniqueness of entropy soluti… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  9. arXiv:2408.13528  [pdf, ps, other

    math.AP math.PR

    Renormalized stochastic entropy solution for degenerate parabolic-hyperbolic equations with Levy noise

    Authors: Soumya Ranjan Behera, Ananta K Majee

    Abstract: In this article, we establish the well-posedness theory for renormalized entropy solutions of a degenerate parabolic-hyperbolic PDE perturbed by a multiplicative Levy noise with general L1-data on the unbounded domain. By using a suitable approximation procedure based on the vanishing viscosity technique and bounded data, we prove the existence of a renormalized entropy solution to the underlying… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  10. arXiv:2406.09156  [pdf, other

    cs.LG cs.CV cs.MM cs.SD eess.AS

    Towards Multilingual Audio-Visual Question Answering

    Authors: Orchid Chetia Phukan, Priyabrata Mallick, Swarup Ranjan Behera, Aalekhya Satya Narayani, Arun Balaji Buduru, Rajesh Sharma

    Abstract: In this paper, we work towards extending Audio-Visual Question Answering (AVQA) to multilingual settings. Existing AVQA research has predominantly revolved around English and replicating it for addressing AVQA in other languages requires a substantial allocation of resources. As a scalable solution, we leverage machine translation and present two multilingual AVQA datasets for eight languages crea… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

    MSC Class: 68T45

  11. arXiv:2406.07676  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    FastAST: Accelerating Audio Spectrogram Transformer via Token Merging and Cross-Model Knowledge Distillation

    Authors: Swarup Ranjan Behera, Abhishek Dhiman, Karthik Gowda, Aalekhya Satya Narayani

    Abstract: Audio classification models, particularly the Audio Spectrogram Transformer (AST), play a crucial role in efficient audio analysis. However, optimizing their efficiency without compromising accuracy remains a challenge. In this paper, we introduce FastAST, a framework that integrates Token Merging (ToMe) into the AST framework. FastAST enhances inference speed without requiring extensive retrainin… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

    MSC Class: 68T10

  12. arXiv:2404.03012  [pdf, other

    cs.LG

    Spectral Clustering in Convex and Constrained Settings

    Authors: Swarup Ranjan Behera, Vijaya V. Saradhi

    Abstract: Spectral clustering methods have gained widespread recognition for their effectiveness in clustering high-dimensional data. Among these techniques, constrained spectral clustering has emerged as a prominent approach, demonstrating enhanced performance by integrating pairwise constraints. However, the application of such constraints to semidefinite spectral clustering, a variant that leverages semi… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    ACM Class: I.2.7

  13. arXiv:2404.00030  [pdf, other

    cs.HC cs.LG

    Visualization of Unstructured Sports Data -- An Example of Cricket Short Text Commentary

    Authors: Swarup Ranjan Behera, Vijaya V Saradhi

    Abstract: Sports visualization focuses on the use of structured data, such as box-score data and tracking data. Unstructured data sources pertaining to sports are available in various places such as blogs, social media posts, and online news articles. Sports visualization methods either not fully exploited the information present in these sources or the proposed visualizations through the use of these sourc… ▽ More

    Submitted 22 March, 2024; originally announced April 2024.

    ACM Class: I.2.7

  14. Estimating the link budget of satellite-based Quantum Key Distribution (QKD) for uplink transmission through the atmosphere

    Authors: Satya Ranjan Behera, Urbasi Sinha

    Abstract: Satellite-based quantum communications including quantum key distribution (QKD) represent one of the most promising approaches toward global-scale quantum communications. To determine the viability of transmitting quantum signals through the atmosphere, it is essential to conduct atmospheric simulations for both uplink and downlink quantum communications. In the case of the uplink scenario, the in… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: 15 pages main text, 11 pages of appendices

    Journal ref: EPJ Quantum Technology Volume 11, article number 66, (2024)

  15. arXiv:2312.17343  [pdf, other

    cs.CL cs.AI cs.LG cs.MM cs.SD eess.AS

    AQUALLM: Audio Question Answering Data Generation Using Large Language Models

    Authors: Swarup Ranjan Behera, Krishna Mohan Injeti, Jaya Sai Kiran Patibandla, Praveen Kumar Pokala, Balakrishna Reddy Pailla

    Abstract: Audio Question Answering (AQA) constitutes a pivotal task in which machines analyze both audio signals and natural language questions to produce precise natural language answers. The significance of possessing high-quality, diverse, and extensive AQA datasets cannot be overstated when aiming for the precision of an AQA system. While there has been notable focus on developing accurate and efficient… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    ACM Class: I.2.7

  16. arXiv:2311.06818  [pdf, other

    cs.LG cs.CL

    Cricket Player Profiling: Unraveling Strengths and Weaknesses Using Text Commentary Data

    Authors: Swarup Ranjan Behera, Vijaya V. Saradhi

    Abstract: Devising player-specific strategies in cricket necessitates a meticulous understanding of each player's unique strengths and weaknesses. Nevertheless, the absence of a definitive computational approach to extract such insights from cricket players poses a significant challenge. This paper seeks to address this gap by establishing computational models designed to extract the rules governing player… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: The initial work was published in the ICMLA 2019 conference

    ACM Class: I.2.7

  17. arXiv:2310.02115  [pdf, other

    quant-ph

    Daytime and Nighttime QKD over an atmospheric free space channel with passive polarisation bases compensation

    Authors: Saumya Ranjan Behera, Melvee George, Urbasi Sinha

    Abstract: Quantum Communication (QC) represents a promising futuristic technology, revolutionizing secure communication. Photon-based Quantum Key Distribution (QKD) is the most widely explored area in QC research, utilizing the polarisation degree of freedom of photons for both fibre and free-space communication. In this work, we investigate and mitigate the challenges posed by fibre birefringence and atmos… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: 11 pages, 8 figures

  18. arXiv:2306.04294  [pdf, ps, other

    math.PR math.AP

    Stochastic Fractional Conservation Laws: Large deviation principle, Central limit theorem and Moderate deviation principle

    Authors: Soumya Ranjan Behera, Ananta K. Majee

    Abstract: In this article, we establish the Freidlin-Wentzell type large deviation principle and central limit theorem for stochastic fractional conservation laws with small multiplicative noise in kinetic formulation framework. The weak convergence method and doubling variables method play a crucial role. As a consequence, we also establish moderate deviation principle for the underlying problem.

    Submitted 7 June, 2023; originally announced June 2023.

  19. arXiv:2212.12846   

    math.NA math.AP

    On rate of convergence of finite difference scheme for degenerate parabolic-hyperbolic PDE with Levy noise

    Authors: Soumya Ranjan Behera, Ananta K. Majee

    Abstract: In this article, we consider a semi discrete finite difference scheme for a degenerate parabolic-hyperbolic PDE driven by Lévy noise in one space dimension. Using bounded variation estimations and a variant of classical Kružkov's doubling of variable approach, we prove that expected value of the $L^1$-difference between the unique entropy solution and approximate solution converges at a rate of… ▽ More

    Submitted 20 December, 2023; v1 submitted 24 December, 2022; originally announced December 2022.

    Comments: We found an error in Lemma 3.5.--which is used in the subsequent analysis to establish the rate of convergence. Since the error is not fixable, we would like to withdraw the article

  20. arXiv:2212.02041  [pdf, other

    math.NA math.AP math.PR

    Convergence of an operator splitting scheme for fractional conservation laws with Levy noise

    Authors: Soumya Ranjan Behera, Ananta K. Majee

    Abstract: In this paper, we are concerned with a operator splitting scheme for linear fractional and fractional degenerate stochastic conservation laws driven by multiplicative Levy noise. More specifically, using a variant of classical Kruzkov's doubling of variable approach, we show that the approximate solutions generated by the splitting scheme converges to the unique stochastic entropy solution of the… ▽ More

    Submitted 11 March, 2023; v1 submitted 5 December, 2022; originally announced December 2022.