-
Impact of Stop Sets on Stopping Active Learning for Text Classification
Abstract: Active learning is an increasingly important branch of machine learning and a powerful technique for natural language processing. The main advantage of active learning is its potential to reduce the amount of labeled data needed to learn high-performing models. A vital aspect of an effective active learning algorithm is the determination of when to stop obtaining additional labeled data. Several l… ▽ More
Submitted 2 April, 2022; v1 submitted 8 January, 2022; originally announced January 2022.
Comments: 8 pages, 3 tables, 1 figure; published in Proceedings of the IEEE 16th International Conference on Semantic Computing (ICSC), pages 25-32, January 2022. IEEE
ACM Class: H.3.3; I.2.6; I.2.7; I.5.4
Journal ref: In Proceedings of the 2022 IEEE 16th International Conference on Semantic Computing (ICSC), pages 25-32, January 2022. IEEE
-
Early Forecasting of Text Classification Accuracy and F-Measure with Active Learning
Abstract: When creating text classification systems, one of the major bottlenecks is the annotation of training data. Active learning has been proposed to address this bottleneck using stopping methods to minimize the cost of data annotation. An important capability for improving the utility of stopping methods is to effectively forecast the performance of the text classification models. Forecasting can be… ▽ More
Submitted 11 April, 2020; v1 submitted 20 January, 2020; originally announced January 2020.
Comments: 8 pages, 9 figures, 2 tables; published in Proceedings of the IEEE 14th International Conference on Semantic Computing (ICSC), San Diego, CA, USA, pages 77-84, February 2020
ACM Class: H.3.3; I.2.6; I.2.7; I.5.4
Journal ref: In Proceedings of the 2020 IEEE 14th International Conference on Semantic Computing (ICSC), pages 77-84, San Diego, CA, USA, February 2020. IEEE
-
Electric Switching of the Charge-Density-Wave and Normal Metallic Phases in Tantalum Disulfide Thin-Film Devices
Abstract: We report on switching among three charge-density-wave phases - commensurate, nearly commensurate, incommensurate - and the high-temperature normal metallic phase in thin-film 1T-TaS2 devices induced by application of an in-plane electric field. The electric switching among all phases has been achieved over a wide temperature range, from 77 K to 400 K. The low-frequency electronic noise spectrosco… ▽ More
Submitted 14 March, 2019; originally announced March 2019.
Comments: 32 pages, 7 figures
Journal ref: ACS Nano, 13, 7231 (2019)
-
arXiv:1901.09126 [pdf, ps, other]
The Use of Unlabeled Data versus Labeled Data for Stopping Active Learning for Text Classification
Abstract: Annotation of training data is the major bottleneck in the creation of text classification systems. Active learning is a commonly used technique to reduce the amount of training data one needs to label. A crucial aspect of active learning is determining when to stop labeling data. Three potential sources for informing when to stop active learning are an additional labeled set of data, an unlabeled… ▽ More
Submitted 22 April, 2019; v1 submitted 25 January, 2019; originally announced January 2019.
Comments: 8 pages, 4 figures, 3 tables; published in Proceedings of the IEEE 13th International Conference on Semantic Computing (ICSC), Newport Beach, CA, USA, pages 287-294, January 2019
ACM Class: H.3.3; I.2.6; I.2.7; I.5.4
Journal ref: In Proceedings of the 2019 IEEE 13th International Conference on Semantic Computing (ICSC), pages 287-294, Newport Beach, CA, USA, January 2019. IEEE
-
arXiv:1901.09118 [pdf, ps, other]
Stopping Active Learning based on Predicted Change of F Measure for Text Classification
Abstract: During active learning, an effective stopping method allows users to limit the number of annotations, which is cost effective. In this paper, a new stopping method called Predicted Change of F Measure will be introduced that attempts to provide the users an estimate of how much performance of the model is changing at each iteration. This stopping method can be applied with any base learner. This m… ▽ More
Submitted 22 April, 2019; v1 submitted 25 January, 2019; originally announced January 2019.
Comments: 8 pages, 12 tables; published in Proceedings of the 2019 IEEE 13th International Conference on Semantic Computing (ICSC), Newport Beach, CA, USA, pages 47-54, January 2019
ACM Class: H.3.3; I.2.6; I.2.7; I.5.4
Journal ref: In Proceedings of the 2019 IEEE 13th International Conference on Semantic Computing (ICSC), pages 47-54, Newport Beach, CA, USA, January 2019. IEEE
-
Low-Frequency Noise Spectroscopy of Charge-Density-Wave Phase Transitions in Vertical Quasi-2D Devices
Abstract: We report results regarding the electron transport in vertical quasi-2D layered 1T-TaS2 charge-density-wave devices. The low-frequency noise spectroscopy was used as a tool to study changes in the cross-plane electrical characteristics of the quasi-2D material below room temperature. The noise spectral density revealed strong peaks - changing by more than an order-of-magnitude - at the temperature… ▽ More
Submitted 5 January, 2019; originally announced January 2019.
Comments: 16 pages; 5 figures
Journal ref: Applied Physics Express, 12, 037001 (2019)
-
Proton-Irradiation-Immune Electronics Implemented with Two-Dimensional Charge-Density-Wave Devices
Abstract: Proton radiation damage is an important failure mechanism for electronic devices in near-Earth orbits, deep space and high energy physics facilities. Protons can cause ionizing damage and atomic displacements, resulting in device degradation and malfunction. Shielding of electronics increases the weight and cost of the systems but does not eliminate destructive single events produced by energetic… ▽ More
Submitted 2 January, 2019; originally announced January 2019.
Comments: 18 pages, 2 display items
Journal ref: Nanoscale, 11, 8380 - 8386 (2019)
-
Anomalous Characteristics of the Generation - Recombination Noise in Quasi-One-Dimensional Van der Waals Nanoribbons
Abstract: We describe the low-frequency current fluctuations, i.e. electronic noise, in quasi-one-dimensional ZrTe3 van der Waals nanoribbons, which have recently attracted attention owing to their extraordinary high current carrying capacity. Whereas the low-frequency noise spectral density reveals 1/f behavior near room temperature, it is dominated by the Lorentzian bulges of the generation - recombinatio… ▽ More
Submitted 28 August, 2018; originally announced August 2018.
Comments: 22 pages; 7 figures
Journal ref: Nanoscale, 10, 42, 19749 (2018)
-
Low-Frequency Noise and Sliding of the Charge Density Waves in Two-Dimensional Materials
Abstract: There has been a recent renewal of interest in charge-density-wave (CDW) phenomena, primarily driven by the emergence of two-dimensional (2D) layered CDW materials, such as 1T-TaS2, characterized by very high transition temperatures to CDW phases. In the extensively studied classical bulk CDW materials with quasi-1D crystal structure, the charge carrier transport exhibits intriguing sliding behavi… ▽ More
Submitted 7 February, 2018; originally announced February 2018.
Comments: 18 pages; 3 figures
Journal ref: Nano Letters, 18, 3630 (2018)
-
Impact of Batch Size on Stopping Active Learning for Text Classification
Abstract: When using active learning, smaller batch sizes are typically more efficient from a learning efficiency perspective. However, in practice due to speed and human annotator considerations, the use of larger batch sizes is necessary. While past work has shown that larger batch sizes decrease learning efficiency from a learning curve perspective, it remains an open question how batch size impacts meth… ▽ More
Submitted 16 May, 2018; v1 submitted 24 January, 2018; originally announced January 2018.
Comments: 2 pages, 1 table; published in Proceedings of the IEEE 12th International Conference on Semantic Computing (ICSC 2018), Laguna Hills, CA, USA, pages 306-307, January 2018
ACM Class: H.3.3; I.2.6; I.2.7; I.5.4
Journal ref: In Proceedings of the 2018 IEEE 12th International Conference on Semantic Computing (ICSC), pages 306-307, Laguna Hills, CA, USA, January 2018. IEEE
-
Support Vector Machine Active Learning Algorithms with Query-by-Committee versus Closest-to-Hyperplane Selection
Abstract: This paper investigates and evaluates support vector machine active learning algorithms for use with imbalanced datasets, which commonly arise in many applications such as information extraction applications. Algorithms based on closest-to-hyperplane selection and query-by-committee selection are combined with methods for addressing imbalance such as positive amplification based on prevalence stat… ▽ More
Submitted 16 May, 2018; v1 submitted 24 January, 2018; originally announced January 2018.
Comments: 8 pages, 7 figures, 3 tables; published in Proceedings of the IEEE 12th International Conference on Semantic Computing (ICSC 2018), Laguna Hills, CA, USA, pages 148-155, January 2018
ACM Class: H.3.3; I.2.6; I.2.7; I.5.4
Journal ref: In Proceedings of the 2018 IEEE 12th International Conference on Semantic Computing (ICSC), pages 148-155, Laguna Hills, CA, USA, January 2018. IEEE
-
Total Ionizing Dose Effects on Threshold Switching in 1T-Tantalum Disulfide Charge-Density-Wave Devices
Abstract: The 1T polytype of TaS2 exhibits voltage-triggered threshold switching as a result of a phase transition from nearly commensurate to incommensurate charge density wave states. Threshold switching, persistent above room temperature, can be utilized in a variety of electronic devices, e.g., voltage controlled oscillators. We evaluated the total-ionizing-dose response of thin film 1T-TaS2 at doses up… ▽ More
Submitted 18 October, 2017; originally announced December 2017.
Comments: 4 pages; 4 figures
Journal ref: EEE Electron Device Letters, 38, 1724 (2017)
-
Acquisition of Translation Lexicons for Historically Unwritten Languages via Bridging Loanwords
Abstract: With the advent of informal electronic communications such as social media, colloquial languages that were historically unwritten are being written for the first time in heavily code-switched environments. We present a method for inducing portions of translation lexicons through the use of expert knowledge in these settings where there are approximately zero resources available other than a langua… ▽ More
Submitted 20 August, 2017; v1 submitted 5 June, 2017; originally announced June 2017.
Comments: 5 pages, 1 figure, 1 table; published in the Proceedings of the 10th Workshop on Building and Using Comparable Corpora, pages 21-25, Vancouver, Canada, August 2017
ACM Class: I.2.7
Journal ref: In Proceedings of the 10th Workshop on Building and Using Comparable Corpora, pages 21-25, Vancouver, Canada, August 2017. Association for Computational Linguistics
-
Using Global Constraints and Reranking to Improve Cognates Detection
Abstract: Global constraints and reranking have not been used in cognates detection research to date. We propose methods for using global constraints by performing rescoring of the score matrices produced by state of the art cognates detection systems. Using global constraints to perform rescoring is complementary to state of the art methods for performing cognates detection and results in significant perfo… ▽ More
Submitted 19 August, 2017; v1 submitted 24 April, 2017; originally announced April 2017.
Comments: 10 pages, 6 figures, 6 tables; published in the Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 1983-1992, Vancouver, Canada, July 2017
ACM Class: I.2.6; I.2.7; I.5.1; I.5.4
Journal ref: In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages 1983-1992, Vancouver, Canada, July 2017. Association for Computational Linguistics
-
Filtering Tweets for Social Unrest
Abstract: Since the events of the Arab Spring, there has been increased interest in using social media to anticipate social unrest. While efforts have been made toward automated unrest prediction, we focus on filtering the vast volume of tweets to identify tweets relevant to unrest, which can be provided to downstream users for further analysis. We train a supervised classifier that is able to label Arabic… ▽ More
Submitted 1 April, 2017; v1 submitted 20 February, 2017; originally announced February 2017.
Comments: 7 pages, 8 figures, 3 tables; published in Proceedings of the 2017 IEEE 11th International Conference on Semantic Computing (ICSC), San Diego, CA, USA, pages 17-23, January 2017
ACM Class: H.3.3; I.2.6; I.2.7; I.5.4
Journal ref: In Proceedings of the 2017 IEEE 11th International Conference on Semantic Computing (ICSC), pages 17-23, San Diego, CA, USA, January 2017. IEEE
-
Low-Frequency Electronic Noise in Exfoliated Quasi-1D TaSe3 van Der Waals Nanowires
Abstract: We report results of investigation of the low-frequency electronic excess noise in quasi-1D nanowires of TaSe3 capped with quasi-2D h-BN layers. Semi-metallic TaSe3 is a quasi-1D van der Waals material with exceptionally high breakdown current density. It was found that TaSe3 nanowires have lower levels of the normalized noise spectral density, compared to carbon nanotubes and graphene. The temper… ▽ More
Submitted 16 October, 2016; originally announced October 2016.
Comments: 22 pages; 6 figures
Journal ref: Nano Letters, 17, 377 (2017)
-
Breakdown Current Density in BN-Capped Quasi-1D TaSe3 Metallic Nanowires: Prospects of Interconnect Applications
Abstract: We report results of investigation of the current-carrying capacity of nanowires made from the quasi-1D van der Waals metal tantalum triselenide capped with quasi-2D boron nitride. The chemical vapor transport method followed by chemical and mechanical exfoliation were used to fabricate mm-long TaSe3 wires with lateral dimensions in the 20 to 70 nm range. Electrical measurements establish that TaS… ▽ More
Submitted 11 April, 2016; originally announced April 2016.
Comments: 22 pages, 6 figures
Journal ref: Nanoscale, 8, 15774 (2016)
-
Data Cleaning for XML Electronic Dictionaries via Statistical Anomaly Detection
Abstract: Many important forms of data are stored digitally in XML format. Errors can occur in the textual content of the data in the fields of the XML. Fixing these errors manually is time-consuming and expensive, especially for large amounts of data. There is increasing interest in the research, development, and use of automated techniques for assisting with data cleaning. Electronic dictionaries are an i… ▽ More
Submitted 11 April, 2016; v1 submitted 25 February, 2016; originally announced February 2016.
Comments: 8 pages, 4 figures, 5 tables; published in Proceedings of the 2016 IEEE Tenth International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, pages 79-86, February 2016
ACM Class: I.5.1; I.5.4; G.3; I.2.7; I.2.6
Journal ref: In Proceedings of the 2016 IEEE Tenth International Conference on Semantic Computing (ICSC), pages 79-86, Laguna Hills, CA, USA, February 2016. IEEE
-
Translation Memory Retrieval Methods
Abstract: Translation Memory (TM) systems are one of the most widely used translation technologies. An important part of TM systems is the matching algorithm that determines what translations get retrieved from the bank of available translations to assist the human translator. Although detailed accounts of the matching algorithms used in commercial systems can't be found in the literature, it is widely beli… ▽ More
Submitted 21 May, 2015; originally announced May 2015.
Comments: 9 pages, 6 tables, 3 figures; appeared in Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, April 2014
ACM Class: I.2.7
Journal ref: In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 202-210, Gothenburg, Sweden, April 2014. Association for Computational Linguistics
-
Analysis of Stopping Active Learning based on Stabilizing Predictions
Abstract: Within the natural language processing (NLP) community, active learning has been widely investigated and applied in order to alleviate the annotation bottleneck faced by developers of new NLP systems and technologies. This paper presents the first theoretical analysis of stopping active learning based on stabilizing predictions (SP). The analysis has revealed three elements that are central to the… ▽ More
Submitted 23 April, 2015; originally announced April 2015.
Comments: 10 pages, 8 tables; appeared in Proceedings of the Seventeenth Conference on Computational Natural Language Learning, August 2013
ACM Class: I.5.1; I.5.4; G.3; I.2.7; I.2.6
Journal ref: In Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pages 10-19, Sofia, Bulgaria, August 2013. Association for Computational Linguistics
-
Statistical modality tagging from rule-based annotations and crowdsourcing
Abstract: We explore training an automatic modality tagger. Modality is the attitude that a speaker might have toward an event or state. One of the main hurdles for training a linguistic tagger is gathering training data. This is particularly problematic for training a tagger for modality because modality triggers are sparse for the overwhelming majority of sentences. We investigate an approach to automatic… ▽ More
Submitted 3 March, 2015; originally announced March 2015.
Comments: 8 pages, 6 tables; appeared in Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics, July 2012; In Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics, pages 57-64, Jeju, Republic of Korea, July 2012. Association for Computational Linguistics
ACM Class: I.2.7; I.2.6; I.5.1; I.5.4
Journal ref: In Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics, pages 57-64, Jeju, Republic of Korea, July 2012. Association for Computational Linguistics
-
Use of Modality and Negation in Semantically-Informed Syntactic MT
Abstract: This paper describes the resource- and system-building efforts of an eight-week Johns Hopkins University Human Language Technology Center of Excellence Summer Camp for Applied Language Exploration (SCALE-2009) on Semantically-Informed Machine Translation (SIMT). We describe a new modality/negation (MN) annotation scheme, the creation of a (publicly available) MN lexicon, and two automated MN tagge… ▽ More
Submitted 5 February, 2015; originally announced February 2015.
Comments: 28 pages, 13 figures, 2 tables; appeared in Computational Linguistics, 38(2):411-438, 2012
ACM Class: I.2.7; I.2.6; I.5.1; I.5.4
Journal ref: Computational Linguistics, 38(2):411-438, 2012
-
Annotating Cognates and Etymological Origin in Turkic Languages
Abstract: Turkic languages exhibit extensive and diverse etymological relationships among lexical items. These relationships make the Turkic languages promising for exploring automated translation lexicon induction by leveraging cognate and other etymological information. However, due to the extent and diversity of the types of relationships between words, it is not clear how to annotate such information. I… ▽ More
Submitted 13 January, 2015; originally announced January 2015.
Comments: 5 pages, 8 tables; appeared in Proceedings of the First Workshop on Language Resources and Technologies for Turkic Languages at the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 47-51, Istanbul, Turkey, May 2012. European Language Resources Association
ACM Class: I.2.7
Journal ref: In Proceedings of the First Workshop on Language Resources and Technologies for Turkic Languages at LREC'12, pages 47-51, Istanbul, Turkey, May 2012. European Language Resources Association
-
Rapid Adaptation of POS Tagging for Domain Specific Uses
Abstract: Part-of-speech (POS) tagging is a fundamental component for performing natural language tasks such as parsing, information extraction, and question answering. When POS taggers are trained in one domain and applied in significantly different domains, their performance can degrade dramatically. We present a methodology for rapid adaptation of POS taggers to new domains. Our technique is unsupervised… ▽ More
Submitted 31 October, 2014; originally announced November 2014.
Comments: 2 pages, 2 tables; appeared in Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology, June 2006
ACM Class: I.2.7; I.2.6; I.5.1; I.5.4
Journal ref: In Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology, pages 118-119, New York, New York, June 2006. Association for Computational Linguistics
-
A random forest system combination approach for error detection in digital dictionaries
Abstract: When digitizing a print bilingual dictionary, whether via optical character recognition or manual entry, it is inevitable that errors are introduced into the electronic version that is created. We investigate automating the process of detecting errors in an XML representation of a digitized print dictionary using a hybrid approach that combines rule-based, feature-based, and language model-based m… ▽ More
Submitted 30 October, 2014; originally announced October 2014.
Comments: 9 pages, 7 figures, 10 tables; appeared in Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data, April 2012
ACM Class: I.2.7; I.2.6; I.5.1; I.5.4
Journal ref: In Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data, pages 78-86, Avignon, France, April 2012. Association for Computational Linguistics
-
Detecting Structural Irregularity in Electronic Dictionaries Using Language Modeling
Abstract: Dictionaries are often developed using tools that save to Extensible Markup Language (XML)-based standards. These standards often allow high-level repeating elements to represent lexical entries, and utilize descendants of these repeating elements to represent the structure within each lexical entry, in the form of an XML tree. In many cases, dictionaries are published that have errors and inconsi… ▽ More
Submitted 29 October, 2014; originally announced October 2014.
Comments: 6 pages, 2 figures, 11 tables; appeared in Proceedings of Electronic Lexicography in the 21st Century (eLex), November 2011
ACM Class: I.2.7; I.2.6; I.5.1; I.5.4
Journal ref: In Proceedings of Electronic Lexicography in the 21st Century (eLex), pages 227-232, Bled, Slovenia, November 2011. Trojina Institute for Applied Slovene Studies
-
Correcting Errors in Digital Lexicographic Resources Using a Dictionary Manipulation Language
Abstract: We describe a paradigm for combining manual and automatic error correction of noisy structured lexicographic data. Modifications to the structure and underlying text of the lexicographic data are expressed in a simple, interpreted programming language. Dictionary Manipulation Language (DML) commands identify nodes by unique identifiers, and manipulations are performed using simple commands such as… ▽ More
Submitted 28 October, 2014; originally announced October 2014.
Comments: 5 pages, 3 figures, 1 table; appeared in Proceedings of Electronic Lexicography in the 21st Century (eLex), November 2011
Journal ref: In Proceedings of Electronic Lexicography in the 21st Century (eLex), pages 297-301, Bled, Slovenia, November 2011. Trojina Institute for Applied Slovene Studies
-
Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation
Abstract: We explore how to improve machine translation systems by adding more translation data in situations where we already have substantial resources. The main challenge is how to buck the trend of diminishing returns that is commonly encountered. We present an active learning-style data solicitation algorithm to meet this challenge. We test it, gathering annotations via Amazon Mechanical Turk, and find… ▽ More
Submitted 21 October, 2014; originally announced October 2014.
Comments: 11 pages, 14 figures; appeared in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, July 2010
ACM Class: I.2.7; I.2.6; I.5.1; I.5.4
Journal ref: In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 854-864, Uppsala, Sweden, July 2010. Association for Computational Linguistics
-
Using Mechanical Turk to Build Machine Translation Evaluation Sets
Abstract: Building machine translation (MT) test sets is a relatively expensive task. As MT becomes increasingly desired for more and more language pairs and more and more domains, it becomes necessary to build test sets for each case. In this paper, we investigate using Amazon's Mechanical Turk (MTurk) to make MT test sets cheaply. We find that MTurk can be used to make test sets much cheaper than professi… ▽ More
Submitted 20 October, 2014; originally announced October 2014.
Comments: 4 pages, 2 tables; appeared in Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, June 2010
ACM Class: I.2.7; I.2.6; I.5.1; I.5.4
Journal ref: In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, pages 208-211, Los Angeles, California, June 2010. Association for Computational Linguistics
-
A Modality Lexicon and its use in Automatic Tagging
Abstract: This paper describes our resource-building results for an eight-week JHU Human Language Technology Center of Excellence Summer Camp for Applied Language Exploration (SCALE-2009) on Semantically-Informed Machine Translation. Specifically, we describe the construction of a modality annotation scheme, a modality lexicon, and two automated modality taggers that were built using the lexicon and annotat… ▽ More
Submitted 17 October, 2014; originally announced October 2014.
Comments: 6 pages, 5 figures; appeared in Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), May 2010
ACM Class: I.2.7
Journal ref: In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), pages 1402-1407, Valletta, Malta, May 2010. European Language Resources Association
-
Semantically-Informed Syntactic Machine Translation: A Tree-Grafting Approach
Abstract: We describe a unified and coherent syntactic framework for supporting a semantically-informed syntactic approach to statistical machine translation. Semantically enriched syntactic tags assigned to the target-language training texts improved translation quality. The resulting system significantly outperformed a linguistically naive baseline model (Hiero), and reached the highest scores yet reporte… ▽ More
Submitted 24 September, 2014; originally announced September 2014.
Comments: 10 pages, 7 figures, 3 tables; appeared in Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas (AMTA), October 2010
ACM Class: I.2.7; I.2.6; I.5.1; I.5.4
Journal ref: In Proceedings of the Ninth Conference of the Association for Machine Translation in the Americas (AMTA), Denver, Colorado, October 2010
-
A Method for Stopping Active Learning Based on Stabilizing Predictions and the Need for User-Adjustable Stopping
Abstract: A survey of existing methods for stopping active learning (AL) reveals the needs for methods that are: more widely applicable; more aggressive in saving annotations; and more stable across changing datasets. A new method for stopping AL based on stabilizing predictions is presented that addresses these needs. Furthermore, stopping methods are required to handle a broad range of different annotatio… ▽ More
Submitted 17 September, 2014; originally announced September 2014.
Comments: 9 pages, 3 figures, 5 tables; appeared in Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009), June 2009
ACM Class: I.2.6; I.2.7; I.5.1; I.5.4; G.3
Journal ref: In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009), pages 39-47, Boulder, Colorado, June 2009. Association for Computational Linguistics
-
arXiv:1409.4835 [pdf, ps, other]
Taking into Account the Differences between Actively and Passively Acquired Data: The Case of Active Learning with Support Vector Machines for Imbalanced Datasets
Abstract: Actively sampled data can have very different characteristics than passively sampled data. Therefore, it's promising to investigate using different inference procedures during AL than are used during passive learning (PL). This general idea is explored in detail for the focused case of AL with cost-weighted SVMs for imbalanced data, a situation that arises for many HLT tasks. The key idea behind t… ▽ More
Submitted 16 September, 2014; originally announced September 2014.
Comments: 4 pages, 5 figures; appeared in Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pages 137-140, Boulder, Colorado, June 2009. Association for Computational Linguistics
ACM Class: I.2.6; I.2.7; I.5.1; I.5.4
Journal ref: Proceedings of HLT: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Short Papers, pages 137-140, Boulder, Colorado, June 2009. Association for Computational Linguistics
-
An Approach to Reducing Annotation Costs for BioNLP
Abstract: There is a broad range of BioNLP tasks for which active learning (AL) can significantly reduce annotation costs and a specific AL algorithm we have developed is particularly effective in reducing annotation costs for these tasks. We have previously developed an AL algorithm called ClosestInitPA that works best with tasks that have the following characteristics: redundancy in training material, bur… ▽ More
Submitted 12 September, 2014; originally announced September 2014.
Comments: 2 pages, 1 figure, 5 tables; appeared in Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing at ACL (Association for Computational Linguistics) 2008
ACM Class: I.2.7; I.2.6; I.5.1; I.5.4
Journal ref: In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, pages 104-105, Columbus, Ohio, June 2008. Association for Computational Linguistics