Skip to main content

Showing 1–37 of 37 results for author: Goldwater, S

  1. arXiv:2406.09200  [pdf, other

    cs.CL

    Orthogonality and isotropy of speaker and phonetic information in self-supervised speech representations

    Authors: Mukhtar Mohamed, Oli Danyi Liu, Hao Tang, Sharon Goldwater

    Abstract: Self-supervised speech representations can hugely benefit downstream speech technologies, yet the properties that make them useful are still poorly understood. Two candidate properties related to the geometry of the representation space have been hypothesized to correlate well with downstream tasks: (1) the degree of orthogonality between the subspaces spanned by the speaker centroids and phone ce… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech

  2. arXiv:2405.11282  [pdf, other

    cs.CL cs.AI

    Estimating the Level of Dialectness Predicts Interannotator Agreement in Multi-dialect Arabic Datasets

    Authors: Amr Keleg, Walid Magdy, Sharon Goldwater

    Abstract: On annotating multi-dialect Arabic datasets, it is common to randomly assign the samples across a pool of native Arabic speakers. Recent analyses recommended routing dialectal samples to native speakers of their respective dialects to build higher-quality datasets. However, automatically identifying the dialect of samples is hard. Moreover, the pool of annotators who are native speakers of specifi… ▽ More

    Submitted 6 June, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

    Comments: Accepted to ACL 2024 - Main (camera-ready version)

  3. arXiv:2405.08237  [pdf, other

    cs.CL cs.SD eess.AS

    A predictive learning model can simulate temporal dynamics and context effects found in neural representations of continuous speech

    Authors: Oli Danyi Liu, Hao Tang, Naomi Feldman, Sharon Goldwater

    Abstract: Speech perception involves storing and integrating sequentially presented items. Recent work in cognitive neuroscience has identified temporal and contextual characteristics in humans' neural encoding of speech that may facilitate this temporal processing. In this study, we simulated similar analyses with representations extracted from a computational model that was trained on unlabelled speech wi… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: Accepted to CogSci 2024

  4. arXiv:2310.13747  [pdf, other

    cs.CL

    ALDi: Quantifying the Arabic Level of Dialectness of Text

    Authors: Amr Keleg, Sharon Goldwater, Walid Magdy

    Abstract: Transcribed speech and user-generated text in Arabic typically contain a mixture of Modern Standard Arabic (MSA), the standardized language taught in schools, and Dialectal Arabic (DA), used in daily communications. To handle this variation, previous work in Arabic NLP has focused on Dialect Identification (DI) on the sentence or the token level. However, DI treats the task as binary, whereas we a… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023

  5. arXiv:2306.02153  [pdf, ps, other

    cs.CL cs.LG cs.SD eess.AS

    Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling

    Authors: Ramon Sanabria, Ondrej Klejch, Hao Tang, Sharon Goldwater

    Abstract: Acoustic word embeddings are typically created by training a pooling function using pairs of word-like units. For unsupervised systems, these are mined using k-nearest neighbor (KNN) search, which is slow. Recently, mean-pooled representations from a pre-trained self-supervised English model were suggested as a promising alternative, but their performance on target languages was not fully competit… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

    Comments: Accepted to Interspeech 2023

  6. arXiv:2305.12464  [pdf, other

    cs.CL cs.SD

    Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces

    Authors: Oli Liu, Hao Tang, Sharon Goldwater

    Abstract: Self-supervised speech representations are known to encode both speaker and phonetic information, but how they are distributed in the high-dimensional space remains largely unexplored. We hypothesize that they are encoded in orthogonal subspaces, a property that lends itself to simple disentanglement. Applying principal component analysis to representations of two predictive coding models, we iden… ▽ More

    Submitted 11 December, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

  7. arXiv:2302.12165  [pdf, other

    cs.CL

    Prosodic features improve sentence segmentation and parsing

    Authors: Elizabeth Nielsen, Sharon Goldwater, Mark Steedman

    Abstract: Parsing spoken dialogue presents challenges that parsing text does not, including a lack of clear sentence boundaries. We know from previous work that prosody helps in parsing single sentences (Tran et al. 2018), but we want to show the effect of prosody on parsing speech that isn't segmented into sentences. In experiments on the English Switchboard corpus, we find prosody helps our model both wit… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

    Comments: arXiv admin note: text overlap with arXiv:2105.12667

  8. arXiv:2210.16043  [pdf, other

    cs.CL cs.SD eess.AS

    Analyzing Acoustic Word Embeddings from Pre-trained Self-supervised Speech Models

    Authors: Ramon Sanabria, Hao Tang, Sharon Goldwater

    Abstract: Given the strong results of self-supervised models on various tasks, there have been surprisingly few studies exploring self-supervised representations for acoustic word embeddings (AWE), fixed-dimensional vectors representing variable-length spoken word segments. In this work, we study several pre-trained models and pooling methods for constructing AWEs with self-supervised representations. Owing… ▽ More

    Submitted 14 March, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

    Comments: Accepted to IEEE ICASSP 2023

  9. arXiv:2109.10952  [pdf, other

    cs.CL

    Cross-linguistically Consistent Semantic and Syntactic Annotation of Child-directed Speech

    Authors: Ida Szubert, Omri Abend, Nathan Schneider, Samuel Gibbon, Louis Mahon, Sharon Goldwater, Mark Steedman

    Abstract: This paper proposes a methodology for constructing such corpora of child directed speech (CDS) paired with sentential logical forms, and uses this method to create two such corpora, in English and Hebrew. The approach enforces a cross-linguistically consistent representation, building on recent advances in dependency representation and semantic parsing. Specifically, the approach involves two step… ▽ More

    Submitted 14 March, 2024; v1 submitted 22 September, 2021; originally announced September 2021.

  10. arXiv:2109.10107  [pdf, other

    cs.CL cs.SD eess.AS

    On the Difficulty of Segmenting Words with Attention

    Authors: Ramon Sanabria, Hao Tang, Sharon Goldwater

    Abstract: Word segmentation, the problem of finding word boundaries in speech, is of interest for a range of tasks. Previous papers have suggested that for sequence-to-sequence models trained on tasks such as speech translation or speech recognition, attention can be used to locate and segment the words. We show, however, that even on monolingual data this approach is brittle. In our experiments with differ… ▽ More

    Submitted 21 September, 2021; originally announced September 2021.

    Comments: Accepted at the "Workshop on Insights from Negative Results in NLP" (EMNLP 2021)

  11. arXiv:2105.12667   

    cs.CL

    Prosodic segmentation for parsing spoken dialogue

    Authors: Elizabeth Nielsen, Mark Steedman, Sharon Goldwater

    Abstract: Parsing spoken dialogue poses unique difficulties, including disfluencies and unmarked boundaries between sentence-like units. Previous work has shown that prosody can help with parsing disfluent speech (Tran et al. 2018), but has assumed that the input to the parser is already segmented into sentence-like units (SUs), which isn't true in existing speech applications. We investigate how prosody af… ▽ More

    Submitted 12 October, 2021; v1 submitted 26 May, 2021; originally announced May 2021.

    Comments: This paper has been retracted -- do not cite. An error occurred in the preprocessing of the pitch and intensity features that this model used. This error means that it can no longer be concluded that prosody is as helpful for finding sentence boundaries and parsing as asserted in this paper

  12. arXiv:2105.05887  [pdf, other

    cs.CL

    Black or White but never neutral: How readers perceive identity from yellow or skin-toned emoji

    Authors: Alexander Robertson, Walid Magdy, Sharon Goldwater

    Abstract: Research in sociology and linguistics shows that people use language not only to express their own identity but to understand the identity of others. Recent work established a connection between expression of identity and emoji usage on social media, through use of emoji skin tone modifiers. Motivated by that finding, this work asks if, as with language, readers are sensitive to such acts of self-… ▽ More

    Submitted 11 October, 2021; v1 submitted 12 May, 2021; originally announced May 2021.

    Journal ref: ACM Conference On Computer-supported Cooperative Work And Social Computing 2021

  13. arXiv:2105.03160  [pdf, other

    cs.CL

    Identity Signals in Emoji Do not Influence Perception of Factual Truth on Twitter

    Authors: Alexander Robertson, Walid Magdy, Sharon Goldwater

    Abstract: Prior work has shown that Twitter users use skin-toned emoji as an act of self-representation to express their racial/ethnic identity. We test whether this signal of identity can influence readers' perceptions about the content of a post containing that signal. In a large scale (n=944) pre-registered controlled experiment, we manipulate the presence of skin-toned emoji and profile photos in a task… ▽ More

    Submitted 7 May, 2021; originally announced May 2021.

    Journal ref: International Workshop on Emoji Understanding and Applications in Social Media 2021

  14. arXiv:2101.11332  [pdf, other

    cs.CL

    A phonetic model of non-native spoken word processing

    Authors: Yevgen Matusevych, Herman Kamper, Thomas Schatz, Naomi H. Feldman, Sharon Goldwater

    Abstract: Non-native speakers show difficulties with spoken word processing. Many studies attribute these difficulties to imprecise phonological encoding of words in the lexical memory. We test an alternative hypothesis: that some of these difficulties can arise from the non-native speakers' phonetic perception. We train a computational model of phonetic learning, which has no access to phonology, on either… ▽ More

    Submitted 11 March, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

    Comments: Accepted for publication in Proceedings of EACL-2021. 11 pages, 5 figures, 2 tables

  15. arXiv:2010.10921  [pdf, ps, other

    cs.CL

    LemMED: Fast and Effective Neural Morphological Analysis with Short Context Windows

    Authors: Aibek Makazhanov, Sharon Goldwater, Adam Lopez

    Abstract: We present LemMED, a character-level encoder-decoder for contextual morphological analysis (combined lemmatization and tagging). LemMED extends and is named after two other attention-based models, namely Lematus, a contextual lemmatizer, and MED, a morphological (re)inflection model. Our approach does not require training separate lemmatization and tagging models, nor does it need additional resou… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

  16. arXiv:2008.02888  [pdf, other

    cs.CL cs.SD eess.AS

    Evaluating computational models of infant phonetic learning across languages

    Authors: Yevgen Matusevych, Thomas Schatz, Herman Kamper, Naomi H. Feldman, Sharon Goldwater

    Abstract: In the first year of life, infants' speech perception becomes attuned to the sounds of their native language. Many accounts of this early phonetic learning exist, but computational models predicting the attunement patterns observed in infants from the speech input they hear have been lacking. A recent study presented the first such model, drawing on algorithms proposed for unsupervised learning fr… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

    Comments: 7 pages, 1 figure

    Journal ref: 2020. In S. Denison, M. Mack, Y. Xu, and B. Armstrong (Eds.), Proceedings of the 42nd Annual Conference of the Cognitive Science Society (pp. 571-577). Austin, TX: Cognitive Science Society

  17. arXiv:2006.02295  [pdf, other

    cs.CL cs.SD eess.AS

    Improved acoustic word embeddings for zero-resource languages using multilingual transfer

    Authors: Herman Kamper, Yevgen Matusevych, Sharon Goldwater

    Abstract: Acoustic word embeddings are fixed-dimensional representations of variable-length speech segments. Such embeddings can form the basis for speech search, indexing and discovery systems when conventional speech recognition is not possible. In zero-resource settings where unlabelled speech is the only available resource, we need a method that gives robust embeddings on an arbitrary language. Here we… ▽ More

    Submitted 5 February, 2021; v1 submitted 2 June, 2020; originally announced June 2020.

    Comments: 11 pages, 7 figures, 8 tables. arXiv admin note: text overlap with arXiv:2002.02109. Submitted to the IEEE Transactions on Audio, Speech and Language Processing

  18. Inflecting when there's no majority: Limitations of encoder-decoder neural networks as cognitive models for German plurals

    Authors: Kate McCurdy, Sharon Goldwater, Adam Lopez

    Abstract: Can artificial neural networks learn to represent inflectional morphology and generalize to new words as human speakers do? Kirov and Cotterell (2018) argue that the answer is yes: modern Encoder-Decoder (ED) architectures learn human-like behavior when inflecting English verbs, such as extending the regular past tense form -(e)d to novel words. However, their work does not address the criticism r… ▽ More

    Submitted 18 May, 2020; originally announced May 2020.

    Comments: To appear at ACL 2020

  19. arXiv:2004.14846  [pdf, other

    cs.CL cs.SD eess.AS

    The role of context in neural pitch accent detection in English

    Authors: Elizabeth Nielsen, Mark Steedman, Sharon Goldwater

    Abstract: Prosody is a rich information source in natural language, serving as a marker for phenomena such as contrast. In order to make this information available to downstream tasks, we need a way to detect prosodic events in speech. We propose a new model for pitch accent detection, inspired by the work of Stehwien et al. (2018), who presented a CNN-based model for this task. Our model makes greater use… ▽ More

    Submitted 12 October, 2020; v1 submitted 30 April, 2020; originally announced April 2020.

    Journal ref: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing

  20. arXiv:2004.01647  [pdf, other

    cs.CL

    Analyzing autoencoder-based acoustic word embeddings

    Authors: Yevgen Matusevych, Herman Kamper, Sharon Goldwater

    Abstract: Recent studies have introduced methods for learning acoustic word embeddings (AWEs)---fixed-size vector representations of words which encode their acoustic features. Despite the widespread use of AWEs in speech processing research, they have only been evaluated quantitatively in their ability to discriminate between whole word tokens. To better understand the applications of AWEs in various downs… ▽ More

    Submitted 3 April, 2020; originally announced April 2020.

    Comments: 6 pages, 7 figures, accepted to BAICS workshop (ICLR2020)

  21. arXiv:2002.02109  [pdf, other

    cs.CL eess.AS

    Multilingual acoustic word embedding models for processing zero-resource languages

    Authors: Herman Kamper, Yevgen Matusevych, Sharon Goldwater

    Abstract: Acoustic word embeddings are fixed-dimensional representations of variable-length speech segments. In settings where unlabelled speech is the only available resource, such embeddings can be used in "zero-resource" speech search, indexing and discovery systems. Here we propose to train a single supervised embedding model on labelled data from multiple well-resourced languages and then apply it to u… ▽ More

    Submitted 21 February, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

    Comments: 5 pages, 4 figures, 1 table; accepted to ICASSP 2020. arXiv admin note: text overlap with arXiv:1811.00403

  22. arXiv:1910.10762  [pdf, other

    cs.CL eess.AS

    Analyzing ASR pretraining for low-resource speech-to-text translation

    Authors: Mihaela C. Stoian, Sameer Bansal, Sharon Goldwater

    Abstract: Previous work has shown that for low-resource source languages, automatic speech-to-text translation (AST) can be improved by pretraining an end-to-end model on automatic speech recognition (ASR) data from a high-resource language. However, it is not clear what factors --e.g., language relatedness or size of the pretraining data-- yield the biggest improvements, or whether pretraining can be effec… ▽ More

    Submitted 9 February, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

    Comments: Accepted at ICASSP 2020

  23. arXiv:1908.11425  [pdf, other

    cs.CL

    Cross-lingual topic prediction for speech using translations

    Authors: Sameer Bansal, Herman Kamper, Adam Lopez, Sharon Goldwater

    Abstract: Given a large amount of unannotated speech in a low-resource language, can we classify the speech utterances by topic? We consider this question in the setting where a small amount of speech in the low-resource language is paired with text translations in a high-resource language. We develop an effective cross-lingual topic classifier by training on just 20 hours of translated speech, using a rece… ▽ More

    Submitted 29 March, 2020; v1 submitted 29 August, 2019; originally announced August 2019.

    Comments: Accepted to ICASSP 2020

  24. arXiv:1906.01280  [pdf, other

    cs.CL

    Are we there yet? Encoder-decoder neural networks as cognitive models of English past tense inflection

    Authors: Maria Corkery, Yevgen Matusevych, Sharon Goldwater

    Abstract: The cognitive mechanisms needed to account for the English past tense have long been a subject of debate in linguistics and cognitive science. Neural network models were proposed early on, but were shown to have clear flaws. Recently, however, Kirov and Cotterell (2018) showed that modern encoder-decoder (ED) models overcome many of these flaws. They also presented evidence that ED models demonstr… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

    Comments: Accepted at ACL 2019

  25. arXiv:1904.01464  [pdf, other

    cs.CL

    Training Data Augmentation for Context-Sensitive Neural Lemmatization Using Inflection Tables and Raw Text

    Authors: Toms Bergmanis, Sharon Goldwater

    Abstract: Lemmatization aims to reduce the sparse data problem by relating the inflected forms of a word to its dictionary form. Using context can help, both for unseen and ambiguous words. Yet most context-sensitive approaches require full lemma-annotated sentences for training, which may be scarce or unavailable in low-resource languages. In addition (as shown here), in a low-resource setting, a lemmatize… ▽ More

    Submitted 1 July, 2019; v1 submitted 2 April, 2019; originally announced April 2019.

    Comments: Published in NAACL 2019

  26. Multilingual and Unsupervised Subword Modeling for Zero-Resource Languages

    Authors: Enno Hermann, Herman Kamper, Sharon Goldwater

    Abstract: Subword modeling for zero-resource languages aims to learn low-level representations of speech audio without using transcriptions or other resources from the target language (such as text corpora or pronunciation dictionaries). A good representation should capture phonetic content and abstract away from other types of variability, such as speaker differences and channel noise. Previous work in thi… ▽ More

    Submitted 7 April, 2020; v1 submitted 9 November, 2018; originally announced November 2018.

    Comments: 17 pages, 6 figures, 7 tables. Accepted for publication in Computer Speech and Language. arXiv admin note: text overlap with arXiv:1803.08863

  27. arXiv:1809.01431  [pdf, other

    cs.CL

    Pre-training on high-resource speech recognition improves low-resource speech-to-text translation

    Authors: Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez, Sharon Goldwater

    Abstract: We present a simple approach to improve direct speech-to-text translation (ST) when the source language is low-resource: we pre-train the model on a high-resource automatic speech recognition (ASR) task, and then fine-tune its parameters for ST. We demonstrate that our approach is effective by pre-training on 300 hours of English ASR data to improve Spanish-English ST from 10.8 to 20.2 BLEU when o… ▽ More

    Submitted 27 February, 2019; v1 submitted 5 September, 2018; originally announced September 2018.

    Comments: Accepted for publication in NAACL 2019

  28. arXiv:1804.02545  [pdf, other

    cs.CL

    Evaluating historical text normalization systems: How well do they generalize?

    Authors: Alexander Robertson, Sharon Goldwater

    Abstract: We highlight several issues in the evaluation of historical text normalization systems that make it hard to tell how well these systems would actually work in practice---i.e., for new datasets or languages; in comparison to more naïve systems; or as a preprocessing step for downstream NLP tools. We illustrate these issues and exemplify our proposed evaluation practices by comparing two neural mode… ▽ More

    Submitted 13 April, 2018; v1 submitted 7 April, 2018; originally announced April 2018.

    Comments: Accepted to NAACL 2018

  29. arXiv:1803.10738  [pdf, other

    cs.SI

    Self-Representation on Twitter Using Emoji Skin Color Modifiers

    Authors: Alexander Robertson, Walid Magdy, Sharon Goldwater

    Abstract: Since 2015, it has been possible to modify certain emoji with a skin tone. The five different skin tones were introduced with the aim of representing more human diversity, but some commentators feared they might be used as a way to negatively represent other users/groups. This paper presents a quantitative analysis of the use of skin tone modifiers on emoji on Twitter, showing that users with dark… ▽ More

    Submitted 28 March, 2018; originally announced March 2018.

    Comments: Accepted in ICWSM 2018

    Journal ref: Robertson A., W. Magdy, S. Goldwater. Self-Representation on Twitter Using Emoji Skin Color Modifiers. ICWSM 2018

  30. arXiv:1803.09164  [pdf, other

    cs.CL

    Low-Resource Speech-to-Text Translation

    Authors: Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez, Sharon Goldwater

    Abstract: Speech-to-text translation has many potential applications for low-resource languages, but the typical approach of cascading speech recognition with machine translation is often impossible, since the transcripts needed to train a speech recognizer are usually not available for low-resource languages. Recent work has found that neural encoder-decoder models can learn to directly translate foreign s… ▽ More

    Submitted 18 June, 2018; v1 submitted 24 March, 2018; originally announced March 2018.

    Comments: Added references; results remain unchanged. Accepted to Interspeech 2018

  31. Multilingual bottleneck features for subword modeling in zero-resource languages

    Authors: Enno Hermann, Sharon Goldwater

    Abstract: How can we effectively develop speech technology for languages where no transcribed data is available? Many existing approaches use no annotated resources at all, yet it makes sense to leverage information from large annotated corpora in other languages, for example in the form of multilingual bottleneck features (BNFs) obtained from a supervised speech recognition system. In this work, we evaluat… ▽ More

    Submitted 18 June, 2018; v1 submitted 23 March, 2018; originally announced March 2018.

    Comments: 5 pages, 2 figures, 4 tables; accepted at Interspeech 2018

    Journal ref: Proc. Interspeech 2018, 2668-2672

  32. arXiv:1703.08135  [pdf, other

    cs.CL cs.LG

    An embedded segmental K-means model for unsupervised segmentation and clustering of speech

    Authors: Herman Kamper, Karen Livescu, Sharon Goldwater

    Abstract: Unsupervised segmentation and clustering of unlabelled speech are core problems in zero-resource speech processing. Most approaches lie at methodological extremes: some use probabilistic Bayesian models with convergence guarantees, while others opt for more efficient heuristic techniques. Despite competitive performance in previous work, the full Bayesian approach is difficult to scale to large sp… ▽ More

    Submitted 5 September, 2017; v1 submitted 23 March, 2017; originally announced March 2017.

    Comments: 8 pages, 3 figures, 3 tables; accepted to ASRU 2017

  33. arXiv:1702.03856  [pdf, other

    cs.CL

    Towards speech-to-text translation without speech recognition

    Authors: Sameer Bansal, Herman Kamper, Adam Lopez, Sharon Goldwater

    Abstract: We explore the problem of translating speech to text in low-resource scenarios where neither automatic speech recognition (ASR) nor machine translation (MT) are available, but we have training data in the form of audio paired with text translations. We present the first system for this problem applied to a realistic multi-speaker dataset, the CALLHOME Spanish-English speech translation corpus. Our… ▽ More

    Submitted 13 February, 2017; originally announced February 2017.

    Comments: To appear in EACL 2017 (short papers)

  34. arXiv:1609.06530  [pdf, other

    cs.CL

    Weakly supervised spoken term discovery using cross-lingual side information

    Authors: Sameer Bansal, Herman Kamper, Sharon Goldwater, Adam Lopez

    Abstract: Recent work on unsupervised term discovery (UTD) aims to identify and cluster repeated word-like units from audio alone. These systems are promising for some very low-resource languages where transcribed audio is unavailable, or where no written form of the language exists. However, in some cases it may still be feasible (e.g., through crowdsourcing) to obtain (possibly noisy) text translations of… ▽ More

    Submitted 21 September, 2016; originally announced September 2016.

    Comments: 5 pages, 4 figures, submitted for ICASSP 2017

  35. A segmental framework for fully-unsupervised large-vocabulary speech recognition

    Authors: Herman Kamper, Aren Jansen, Sharon Goldwater

    Abstract: Zero-resource speech technology is a growing research area that aims to develop methods for speech processing in the absence of transcriptions, lexicons, or language modelling text. Early term discovery systems focused on identifying isolated recurring patterns in a corpus, while more recent full-coverage systems attempt to completely segment and cluster the audio into word-like units---effectivel… ▽ More

    Submitted 16 September, 2017; v1 submitted 22 June, 2016; originally announced June 2016.

    Comments: 15 pages, 6 figures, 8 tables

    Journal ref: Comput. Speech Lang. 46 (2017) 154-174

  36. Unsupervised word segmentation and lexicon discovery using acoustic word embeddings

    Authors: Herman Kamper, Aren Jansen, Sharon Goldwater

    Abstract: In settings where only unlabelled speech data is available, speech technology needs to be developed without transcriptions, pronunciation dictionaries, or language modelling text. A similar problem is faced when modelling infant language acquisition. In these cases, categorical linguistic structure needs to be discovered directly from speech audio. We present a novel unsupervised Bayesian model th… ▽ More

    Submitted 9 March, 2016; originally announced March 2016.

    Comments: 11 pages, 8 figures; Accepted to the IEEE/ACM Transactions on Audio, Speech, and Language Processing

    Journal ref: IEEE/ACM Trans. Audio, Speech, Language Process. 24 (2016) 669-679

  37. arXiv:cs/0006021  [pdf, ps, other

    cs.CL

    Compiling Language Models from a Linguistically Motivated Unification Grammar

    Authors: Manny Rayner, Beth Ann Hockey, Frankie James, Elizabeth O. Bratt, Sharon Goldwater, Mark Gawron

    Abstract: Systems now exist which are able to compile unification grammars into language models that can be included in a speech recognizer, but it is so far unclear whether non-trivial linguistically principled grammars can be used for this purpose. We describe a series of experiments which investigate the question empirically, by incrementally constructing a grammar and discovering what problems emerge… ▽ More

    Submitted 9 June, 2000; originally announced June 2000.

    Comments: To be published in COLING 2000

    ACM Class: I.2.7