×

Effective active learning strategies for the use of large-margin classifiers in semantic annotation: an optimal parameter discovery perspective. (English) Zbl 1304.68148

Summary: Classical supervised machine learning techniques have been explored for semantically annotating unstructured textual data such as consumers’ comments archived at social media websites to extract business intelligence. However, these techniques often require a large number of manually labeled training examples to produce accurate annotations. Several active learning approaches that are designed based on probabilistic sequence models have been explored to minimize the number of labeled training examples for semantic annotation tasks. Recent research has shown that large-margin classifiers are viable alternatives to automated semantic annotation, given their strong generalization capabilities and the ability to process high-dimensional data. However, the existing active learning methods that are designed for probabilistic sequence models cannot be easily adapted and applied to large-margin classifiers. The main contribution of this paper is the development of novel active learning methods for large-margin classifiers to fill the aforementioned research gap. In particular, we propose an innovative perspective of taking active learning as a search of optimal parameters for large-margin classifiers. A rigorous evaluation involving two benchmark tests and an empirical test based on real-world data extracted from Amazon.com reveals that the proposed active learning methods can train effective classifiers with significantly fewer training examples while achieving similar annotation performance, compared to a typical state-of-the-art classifier that only uses several labeled training examples. More specifically, one of our proposed active learning methods can reduce the number of training examples by 19.74% at the 68% level of \(F_{1}\) when compared to the best baseline method, as evaluated based on the Amazon data set. Our research opens the door to the application of intelligent semantic annotation techniques to support real-world applications such as automatically analyzing consumer comments for customer relationship management.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text: DOI

References:

[1] Attenberg J, Melville P, Provost F, Saar-Tsechansky M (2011) Selective data acquisition. Krishnapuram B, Yu S, Rao RB, eds. Cost-Sensitive Machine Learning (CRC Press, Boca Raton, FL),101-155.
[2] Berry MW, Castellanos M (2007) Survey of Text Mining: Clustering, Classification, and Retrieval (Springer-Verlag, New York).
[3] Bhargava HK, Krishnan R (1998) The World Wide Web: Opportunities for operations research and management science. INFORMS J. Comput. 10(4):359-383. [Abstract]
[4] Birlutiu A, Groot P, Heskes T (2012) Efficiently learning the preferences of people. Machine Learn. 10(4):1-28. · Zbl 1260.68320
[5] Bordes A, Ertekin S, Weston J, Bottou O (2005) Fast kernel classifiers with online and active learning. J. Machine Learn. Res. 6:1579-1619. · Zbl 1222.68152
[6] Carrizosa E, Mart-Barrag B, Moralesc D (2011) Detecting relevant variables and interactions in supervised classification. Eur. J. Oper. Res. 213(1):260-269.
[7] Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Machine Learn. 15(2):201-221.
[8] Culotta A, McCallum A (2005) Reducing labeling effort for structured prediction tasks. Proc. 20th National Conf. Artificial Intelligence (AAAI Press, Menlo Park, CA), 746-751.
[9] Cunningham H, Maynard D, Bontcheva K, Tablan V (2002) GATE: A framework and geographical development environment for robust NLP tools and applications. Proc. 40th Anniversary Meeting Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA).
[10] Das SR, Chen MY (2007) Yahoo! for Amazon: Sentiment extraction from small talk on the Web. Management Sci. 53(9):1375-1388. [Abstract]
[11] Druck G, Mann G, McCallum A (2008) Learning from labeled features using generalized expectation criteria. Proc. 31st Annual Internat. ACM SIGIR Conf. Res. Development Inform. Retrieval (ACM Press, New York), 595-602.
[12] Erik F, Tjong KS (2002) Introduction to the CONLL-2002 shared task: Language-independent named entity recognition. Proc. Sixth Conf. Natural Language Learn. (Association for Computational Linguistics, Stroudsburg, PA), 155-158.
[13] Escudeiro N, Jorge A (2010) D-Confidence: An active learning strategy which efficiently identifies small classes. Proc. NAACL HLT 2010 Workshop on Active Learn. Natural Language Processing, (Association for Computational Linguistics, Stroudsburg, PA), 18-26.
[14] Evgeniou T, Pontil M, Poggio T (2000) Statistical learning theory: A primer. Internat. J. Comput. Vision 38(1):9-13. · Zbl 1012.68685
[15] Fan W, Michael GD, Pathak P (2005) Genetic programming-based discovery of ranking functions for effective Web search. J. Management Inform. Systems 21(4):37-56.
[16] Farhoomand AF, Drury DH (2002) Managerial information overload. Comm. ACM 45(10):127-131.
[17] Greiner R, Grove AJ, Roth D (2002) Learning cost-sensitive active classifiers. Artificial Intelligence 139(2):137-174.
[18] Grilheres B, Beauce C, Canu S, Brunessaux S (2005) Benchmarking of semantic annotation with conditional random fields. Proc. 2th Eur. Workshop on the Integration of Knowledge, Semantic and Digital Media Technologies (Institution of Electrical Engineers, London), 233-236.
[19] Guillory A, Chastain E, Bilmes J (2009) Active learning as non-convex optimization. J. Machine Learn. Res. 5:201-208.
[20] Haldun A, Serpil S (2009) Using support vector machines to learn the efficient set in multiple objective discrete optimization. Eur. J. Oper. Res. 193(2):510-519. · Zbl 1157.90013
[21] Hotho A, Nürnberger A, Paaß G (2005) A brief survey of text mining. J. Comput. Linguistics Language Tech. 20(1):19-62.
[22] Ikeda K, Yamasaki T (2007) Incremental support vector machines and their geometrical analyses. Neurocomputing 70(13-15):2528-2533.
[23] Ingo S, Andreas C (2008) Support Vector Machines (Springer-Verlag, New York).
[24] Jang H, Song SK, Myaeng SH (2006) Text mining for medical documents using a hidden Markov model. Proc. 3rd Asia Conf. Inform. Retrieval Tech. (Springer-Verlag, Berlin), 553-559.
[25] Ji S, Carin L (2007) Cost-sensitive feature acquisition and classification. Pattern Recognition 40(5):1474-1485. · Zbl 1113.68085
[26] Joachims T (1999) Making large-scale support vector machine learning practical. Schölkopf B, Burges JC, Smola AJ, eds. Advances in Kernel Methods: Support Vector Learning (MIT Press, Cambridge, MA), 169-184.
[27] Joachims T (2008) SVM-struct: Support vector machine for complex outputs. Accessed January 15, 2011, http://svmlight.joachims.org/svm_struct.html.
[28] Khemchandani R, Jayadeva, Chandra S (2009) Knowledge based proximal support vector machines. Eur. J. Oper. Res. 195(3):914-923. · Zbl 1159.68531
[29] Kim S, Song Y, Kim K, Cha J, Lee G (2006) MMR-based active machine learning for bio named entity recognition. Proc. Human Language Tech. Conf. North Amer. Chapter Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 69-72.
[30] Korde V, Mahender CN (2012) Text classification and classifiers: A survey. Internat. J. Artificial Intelligence Appl. 3(2):85-99.
[31] Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proc. 18th Internat. Conf. Machine Learn. (Morgan Kaufmann, San Francisco), 282-289.
[32] Lampert CH, Peters J (2009) Active structured learning for high-speed object detection. Proc. 31st DAGM Sympos. Pattern Recognition (Springer-Verlag, Berlin), 221-231.
[33] Lau KW, Wu QH (2003) Online training of support vector classifier. Pattern Recognition 36(8):1913-1920. · Zbl 1054.68123
[34] Lau RYK, Bruza PD, Song D (2008) Towards a belief revision based adaptive and context sensitive information retrieval system. ACM Trans. Inform. Systems 26(2):8.1-8.38.
[35] Li Y, Bontcheva K, Cunningham H (2004) SVM based learning system for information extraction. Proc. First Internat. Conf. Deterministic Statist. Methods in Machine Learn. (Springer-Verlag, Berlin), 319-339. · Zbl 1133.68399
[36] Lopresti T (2009) An economic lifeline: Text mining customer experience. Accessed March 11, 2010, http://www.mycustomer.com/item/134245.
[37] Lou X, Hamprecht FA (2012) Structured learning from partial annotations. Proc. 29th Internat. Conf. Machine Learn. (Omnipress, New York), 1519-1526.
[38] Manning C, Schutze H (1999) Foundations of Statistical Natural Language Processing (MIT Press, Cambridge, MA).
[39] Martens D, Baesens B, Gestel V, Vanthienen J (2007) Comprehensible credit scoring models using rule extraction from support vector machines. Eur. J. Oper. Res. 183(3):1466-1476. · Zbl 1278.91177
[40] Maynard D, Peters W, Li Y (2006) Metrics for evaluation of ontology-based information extraction. Proc. Fifteenth International Conf. World Wide Web (ACM, New York), 233-240.
[41] McCallum A (2002) MALLET: A machine learning for language toolkit. Accessed January 3, 2011, http://mallet.cs.umass.edu.
[42] Mitchell TM (1982) Generalization as search. Artificial Intelligence 18(2):203-226.
[43] Nguyen HT, Smeulders A (2004) Active learning using pre-clustering. Proc. Twenty-First Internat. Conf. Machine Learn. (ACM, New York), 79-86.
[44] Nguyen N, Guo Y (2007) Comparisons of sequence labeling algorithms and extensions. Proc. Twenty-Fourth Internat. Conf. Machine Learn. (ACM, New York), 681-688.
[45] Ohta T, Tateisi Y, Kim J, Mima H, Tsujii J (2002) The GENIA corpus: An annotated research abstract corpus in molecular biology domain. Proc. Second Internat. Conf. Human Language Tech. Res. (Morgan Kaufmann, San Francisco), 82-86.
[46] Olafsson S, Li X, Wu S (2008) Operations research and data mining. Eur. J. Oper. Res. 187(3):1429-1448. · Zbl 1137.90776
[47] O’Riain S, Spyns P (2006) Enhancing the business analysis function with semantics. Proc. 2006 Confederated Internat. Conf. Move to Meaningful Internet Systems: CoopIS, DOA, GADA, ODBASE (Springer-Verlag, Berlin), 818-835.
[48] Padmanabhan B, Tuzhilin A (2003) On the use of optimization for data mining: Theoretical interactions and eCRM opportunities. Management Sci. 49(10):1327-1343. [Abstract]
[49] Pradhan S, Hacioglu K, Krugler V, Ward W, Martin JH, Jurafsky D (2005) Support vector learning for semantic argument classification. Machine Learn. 60:11-39.
[50] Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. IEEE 77(2):257-286.
[51] Reeve L (2004) Integrating hidden Markov models into semantic Web annotation platforms. Technical report, College of Information Science and Technology, Drexel University, Philadelphia, PA.
[52] Reeve L, Han H (2005) Survey of semantic annotation platforms. Proc. 20th Annual ACM Sympos. Appl. Comput. (ACM, New York), 1634-1638.
[53] Ring S (2001) Incremental learning with support vector machines. Proc. First IEEE Internat. Conf. Data Mining (IEEE Computer Society, Washington, DC), 641-642.
[54] Rita K, Eid T, White A (2005) Management update: Companies should align their structured and unstructured data. Accessed March 21, 2010, http://www.gartner.com/DisplayDocument?doc_cd=126099&ref=g_fromdoc.
[55] Roy N, McCallum A (2001) Toward optimal active learning through sampling estimation of error reduction. Brodley CE, Danyluk A, eds. Proc. Eighteenth Internat. Conf. Machine Learn. (Morgan Kaufmann, San Francisco), 441-448.
[56] Saar-Tsechansky M, Melville P, Provost F (2009) Active feature-value acquisition. Management Sci. 55(4):664-684. [Abstract]
[57] Scheffer T, Decomain C, Wrobel S (2001) Active hidden Markov models for information extraction. Proc. 4th Internat. Conf. Adv. Intelligent Data Anal. (Springer-Verlag, London), 309-318. · Zbl 1029.68887
[58] Sen S, Padmanabhan B, Tuzhilin A, White NH, Stein R (1998) The identification and satisfaction of consumer analysis-driven information needs of marketers on the WWW. Eur. J. Marketing 32(7-8):688-702.
[59] Settles B (2009) Active learning literature survey. Technical Report 1648, Computer Sciences Department, University of Wisconsin-Madison, Madison. · Zbl 1270.68006
[60] Settles B (2011) From theories to queries: Active learning in practice. JMLR Workshop Conf. Proc. (Microtome Publishing, Brookline, MA), 1-18.
[61] Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. Proc. 2008 Conf. Empirical Methods in Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1070-1079.
[62] Shawe-Taylor J, Cristianini N (1999) Further results on the margin distribution. Proc. Twelfth Annual Conf. Comput. Learn. Theory (ACM, New York), 278-285.
[63] Shen D, Zhang J, Su J, Zhou G, Tan C (2004) Multi-criteria-based active learning for named entity recognition. Proc. 42nd Annual Meeting Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 589.
[64] Spangler S, Kreulen JT, Lessler J (2003) Generating and browsing multiple taxonomies over a document collection. J. Management Inform. Systems 19(4):191-212.
[65] Srivastava J, Cooley R (2003) Web business intelligence: Mining the Web for actionable knowledge. INFORMS J. Comput. 15(2):191-207. [Abstract]
[66] Strehl A, Ghosh J (2003) Relationship-based clustering and visualization for high-dimensional data mining. INFORMS J. Comput. 15(2):208-230. [Abstract] · Zbl 1238.68037
[67] Surdeanu M, Ciaramita M (2007) Robust information extraction with perceptrons. Proc. NIST 2007 Automatic Content Extraction Workshop (NIST Multimodal Information Group, Washington, DC), 1-4.
[68] Sutton C, McCallum A (2007) Piecewise pseudolikelihood for efficient training of conditional random fields. Proc. 24th Internat. Conf. Machine Learn. (ACM, New York), 863-870.
[69] Symons CT, Samatova NF, Ramya K, Park BH (2006) Multi-criterion active learning in conditional random fields. Proc. 18th IEEE Internat. Conf. Tools with Artificial Intelligence (IEEE Computer Society, Washington, DC), 323-331.
[70] Tang J, Hong M, Li J, Liang B (2006) Tree-structured conditional random fields for semantic annotation. Proc. Fifth Internat. Semantic Web Conf. (Springer-Verlag, Berlin), 640-653.
[71] Thompson CA, Califf ME, Mooney RJ (1999) Active learning for natural language parsing and information extraction. Proc. Sixteenth Internat. Conf. Machine Learn. (Morgan Kaufmann, San Francisco), 406-414.
[72] Tong S, Koller D (2002) Support vector machine active learning with applications to text classification. J. Machine Learn. Res. 2:45-66. · Zbl 1009.68131
[73] Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. Proc. Twenty-First Internat. Conf. Machine Learn. (ACM, New York), 104-111.
[74] Tsochantaridis I, Joachims T, Hofmann T, Altun Y (2005) Large margin methods for structured and interdependent output variables. J. Machine Learn. Res. 6:1453-1484. · Zbl 1222.68321
[75] Victoria U, Cimiano P, Iria J, Handschuh S, Vargas-Vera M, Motta E, Ciravegna F (2006) Semantic annotation for knowledge management: Requirements and a survey of the state of the art. J. Web Semantics 4(1):14-28.
[76] Vladimir V (1995) The Nature of Statistical Learning Theory (Springer-Verlag, New York). · Zbl 0833.62008
[77] Webb AR (2002) Statistical Pattern Recognition (John Wiley & Sons, Hoboken, NJ).
[78] Wei CP, Hu PJ, Tai CH, Huang CN, Yang CS (2008) Managing word mismatch problems in information retrieval: A topic-based query expansion approach. J. Management Inform. Systems 24(3):269-295.
[79] Yan R, Yang J, Hauptmann A (2003) Automatically labeling video data using multi-class active learning. Proc. Ninth IEEE Internat. Conf. Comput. Vision (IEEE Computer Society, Washington, DC), 516-523.
[80] Yasemin A, Ioannis T, Thomas H (2003) Hidden Markov support vector machines. Proc. Twentieth Internat. Conf. Machine Learn. (AAAI Press, Palo Alto, CA), 3-10.
[81] Yu CN (2010) Improved learning of structural support vector machines: Training with latent variables and nonlinear kernels. Unpublished doctoral dissertation, Department of Computer Science, Cornell University, Ithaca, NY.
[82] Zhang X, Zou J, Le DX, Thoma GR (2010) A structural SVM approach for reference parsing. Proc. Ninth Internat. Conf. Machine Learn. Appl. (IEEE Computer Society, Washington, DC), 479-484.
[83] Zhao B, Yin X, Xing EP (2011) Max margin learning on domain-independent Web information extraction. Proc. 20th ACM Internat. Conf. Inform. Knowledge Management (ACM, New York), 1305-1310.
[84] Zheng Z, Padmanabhan B (2006) Selectively acquiring customer information: A new data acquisition problem and an active learning-based solution. Management Sci. 52(5):697-712. [Abstract]
[85] Zhu J, Nie Z, Zhang B, Wen J-R (2008) Dynamic hierarchical markov random fields for integrated Web data extraction. J. Machine Learn. Res. 9:1583-1614. · Zbl 1225.68226
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.