×

End-user feature labeling: supervised and semi-supervised approaches based on locally-weighted logistic regression. (English) Zbl 1334.68181

Summary: When intelligent interfaces, such as intelligent desktop assistants, email classifiers, and recommender systems, customize themselves to a particular end user, such customizations can decrease productivity and increase frustration due to inaccurate predictions – especially in early stages when training data is limited. The end user can improve the learning algorithm by tediously labeling a substantial amount of additional training data, but this takes time and is too ad hoc to target a particular area of inaccuracy. To solve this problem, we propose new supervised and semi-supervised learning algorithms based on locally-weighted logistic regression for feature labeling by end users, enabling them to point out which features are important for a class, rather than provide new training instances.
We first evaluate our algorithms against other feature labeling algorithms under idealized conditions using feature labels generated by an oracle. In addition, another of our contributions is an evaluation of feature labeling algorithms under real-world conditions using feature labels harvested from actual end users in our user study. Our user study is the first statistical user study for feature labeling involving a large number of end users (43 participants), all of whom have no background in machine learning.
Our supervised and semi-supervised algorithms were among the best performers when compared to other feature labeling algorithms in the idealized setting and they are also robust to poor quality feature labels provided by ordinary end users in our study. We also perform an analysis to investigate the relative gains of incorporating the different sources of knowledge available in the labeled training set, the feature labels and the unlabeled data. Together, our results strongly suggest that feature labeling by end users is both viable and effective for allowing end users to improve the learning algorithm behind their customized applications.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)

References:

[1] Attenberg, J.; Melville, P.; Provost, F., Guided Feature labeling for budget-sensitive learning under extreme class imbalance, (Proceedings of the ICML 2010 Workshop on Budgeted Learning (2010))
[2] Attenberg, J.; Melville, P.; Provost, F. A., Unified approach to active dual supervision for labeling features and examples, (Proceedings of the European Conference on Machine Learning (2010))
[3] Bengio, Y.; Delalleau, O.; Roux, N. L., The curse of highly variable functions for local kernel machines, (Proceedings of the Advances in Neural Information Processing Systems, vol. 18 (2006), MIT Press)
[4] Blum, A.; Mitchell, T., Combining labeled and unlabeled data with co-training, (Proceedings of the Eleventh annual conference on Computational Learning Theory (1998), ACM Press), 92-100
[5] Chang, M.-W.; Ratinov, L.; Roth, D., Guiding semi-supervision with constraint-driven learning, (Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (2007)), 280-287
[6] Cleveland, W.; Devlin, S., Locally-weighted regression: An approach to regression analysis by local fitting, J. Am. Stat. Assoc., 83, 403, 596-610 (1988) · Zbl 1248.62054
[7] Cohn, D. A.; Ghahramani, Z.; Jordan, M. I., Active learning with statistical models, J. Artif. Intell. Res., 4, 129-145 (1996) · Zbl 0900.68366
[8] Craven, M.; Freitag, D.; McCallum, A.; Mitchell, T.; Nigam, K.; Quek, C. Y., Learning to extract symbolic knowledge from the World Wide Web, (Proceedings of AAAI (1998)), 509-516
[9] Deng, K., Omega: On-line memory-based general purpose system classifier (1998), Carnegie Mellon University: Carnegie Mellon University Pittsburgh, PA, Ph.D. thesis
[10] Druck, G.; Mann, G.; McCallum, A., Learning from labeled features using generalized expectation criteria, (Proceedings of SIGIR (2008), ACM Press), 595-602
[11] Ganchev, K.; Graça, J.; Gillenwater, J.; Taskar, B., Posterior regularization for structured latent variable models, J. Mach. Learn. Res., 11, 2001-2049 (2010) · Zbl 1242.68223
[12] Graça, J.; Ganchev, K.; Taskar, B., Expectation maximization and posterior constraints, Adv. Neural Inf. Process. Syst., 20, 569-576 (2008)
[13] Hastie, T.; Tibshirani, R.; Friedman, J. H., The Elements of Statistical Learning (2003), Springer
[14] Kulesza, T.; Wong, W.-K.; Stumpf, S.; Perona, S.; White, S.; Burnett, M.; Oberst, I.; Ko, A., Fixing the program my computer learned: Barriers for end users, challenges for the machine, (Proceedings of IUI (2009), ACM Press), 187-196
[15] Kulesza, T.; Stumpf, S.; Burnett, M.; Wong, W.-K.; Riche, Y.; Moore, T.; Oberst, I.; Shinsel, A.; McIntosh, K., Explanatory debugging: supporting end-user debugging of machine-learned programs, (IEEE Symposium on Visual Languages and Human-Centric Computing. IEEE Symposium on Visual Languages and Human-Centric Computing, Madrid, Spain (September 2010))
[16] Lang, K., Newsweeder: Learning to filter netnews, (Proceedings of ICML (1995)), 331-339
[17] Lewis, D., Reuters-21578, available at
[18] Lewis, D.; Yang, Y.; Rose, T.; Li, F., A new benchmark collection for text categorization research, J. Mach. Learn. Res., 5, 361-397 (2004)
[19] Liang, P.; Jordan, M. I.; Klein, D., Learning from measurements in exponential families, (Proceedings of the 26th International Conference on Machine Learning (2009), ACM Press), 641-648
[20] Liu, B.; Li, X.; Lee, W. S.; Yu, P. S., Text classification by labeling words, (Proceedings of the 19th National Conference on Artificial Intelligence (2004), AAAI Press), 425-430
[21] Liu, H.; Singh, P., ConceptNet—A practical commonsense reasoning tool-kit, BT Technol. J., 22, 4, 211-226 (2004)
[22] McCallum, A.; Mann, G.; Druck, G., Generalized expectation criteria (2007), University of Massachusetts: University of Massachusetts Amherst, MA, Technical report UM-CS-2007-60
[23] McCallum, A., MALLET: A machine learning for language toolkit (2002)
[24] McCallum, A.; Rosenfeld, R.; Mitchell, T.; Ng, A., Improving text classification by shrinkage in a hierarchy of classes, (Proceedings of ICML (1998))
[25] Melville, P.; Gryc, W.; Lawrence, R. D., Sentiment analysis of blogs by combining lexical knowledge with text classification, (Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD) (2009), ACM Press), 1275-1284
[26] Nocedal, J., Updating quasi-newton matrices with limited storage, Math. Comput., 35, 773-782 (1980) · Zbl 0464.65037
[27] Pang, B.; Lee, L., A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts, (Proceedings of the ACL (2004))
[28] Raghavan, H.; Madani, O.; Jones, R., Active learning with feedback on both features and instances, J. Mach. Learn. Res., 7, 1655-1686 (2006) · Zbl 1222.68283
[29] Raghavan, H.; Allan, J., An interactive algorithm for asking and incorporating feature feedback into support vector machines, (Proceedings of SIGIR (2007), ACM Press), 79-86
[30] Roth, D.; Small, K., Interactive feature space construction using semantic information, (Proceedings of CoNLL (2009)), 66-74
[31] Settles, B., Active learning literature survey (2009), University of Wisconsin-Madison: University of Wisconsin-Madison Madison, WI, Technical report 1648
[32] Settles, B., Closing the loop: Fast, interactive semi-supervised annotation with queries on features and instances, (Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (2011)), 1467-1478
[33] Sindhwani, V.; Melville, P.; Lawrence, R., Uncertainty sampling and transductive experimental design for active dual supervision, (Proceedings of the 26th International Conference on Machine Learning (2009)), 953-960
[34] Speer, R.; Havasi, C.; Lieberman, H., AnalogySpace: Reducing the dimensionality of common sense knowledge, (Proceedings of AAAI (2008))
[35] Stumpf, S.; Rajaram, V.; Li, L.; Wong, W.-K.; Burnett, M.; Dietterich, T.; Sullivan, E.; Herlocker, J., Interacting meaningfully with machine learning systems: Three experiments, Int. J. Hum.-Comput. Stud., 67, 8, 639-662 (2009)
[36] Wong, W.-K.; Oberst, I.; Das, S.; Moore, T.; Stumpf, S.; McIntosh, K.; Burnett, M., End-user feature labeling: A locally-weighted regression approach, (Proceedings of the ACM International Conference on Intelligent User Interfaces (2011), ACM Press), 115-124
[37] Wong, W.-K.; Oberst, I.; Das, S.; Moore, T.; Stumpf, S.; McIntosh, K.; Burnett, M., End-user feature labeling via locally weighted logistic regression, (Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, Special Track on New Scientific and Technical Advances in ResearchNew Scientific and Technical Advances in Research (2011), AAAI Press)
[38] Wu, X.; Srihari, R., Incorporating prior knowledge with weighted margin support vector machines, (Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2004), ACM Press), 326-333
[39] Zhou, D.; Bousquet, O.; Lal, T. N.; Weston, J.; Schlkopf, B., Learning with local and global consistency, (NIPS (2004), MIT Press), 321-328
[41] Zhu, X.; Ghahramani, Z.; Lafferty, J. D., Semi-supervised learning using Gaussian fields and harmonic functions, (ICML (2003), AAAI Press), 912-919
[42] Zhu, X.; Goldberg, A. B., Introduction to Semi-Supervised Learning (2009), Morgan & Claypool Publishers · Zbl 1209.68435
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.