Abstract
Due to the non-negativity of the matrix factors, Non-negative Matrix Factorization (NMF) is favorable for transforming a high-dimensional original Terms-Documents matrix into a lower-dimensional semantic Concepts-Documents matrix in the text categorization. With the iterative nature of all NMF algorithms, the NMF matrix factors need initializing. In this paper, we propose a clustering-based method for initializing the NMF according to the term vectors instead of the document vectors as the previous researches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Boutsidis, C., Gallopoulos, E.: SVD based initialization: a head start for nonnegative matrix factorization. Patt. Recogn. 41(4), 1350–1362 (2008)
Bullinaria, J.A., Levy, J.P.: Extracting semantic representations from word co-occurrence statistics: a computational study. Behav. Res. Methods 39(3), 510–526 (2007)
Casalino, G., Del Buono, N., Mencar, C.: Subtractive clustering for seeding non-negative matrix factorizations. Inf. Sci. 257, 369–387 (2014)
Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.I.: Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation. Wiley, Chichester (2009)
Correa, R.F., Ludermir, T.B.: Improving self-organization of document collections by semantic mapping. Neurocomputing 70(1), 62–69 (2006)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391 (1990)
Golub, G.H., Van Loan, C.F.: Matrix Computations, vol. 3. JHU Press, Baltimore (2012)
Hosseini-Asl, E., Zurada, Jacek M.: Nonnegative matrix factorization for document clustering: a survey. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, Lotfi A., Zurada, Jacek M. (eds.) ICAISC 2014. LNCS, vol. 8468, pp. 726–737. Springer, Cham (2014). doi:10.1007/978-3-319-07176-3_63
Janecek, A., Gansterer, W.N., Demel, M., Ecker, G.: On the relationship between feature selection and classification accuracy. In: FSDM, pp. 90–105 (2008)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Springer, Heidelberg (1998). pp. 137–142
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, pp. 556–562 (2001)
Levy, O., Gold, Y.: Improving distributional similarity with lessons learned from word embeddings. Trans. Comput. Linguist. Assoc. 3, 211–225 (2015)
Liu, H., Motoda, H. (Eds.): Feature Extraction, Construction and Selection: A Data Mining Perspective. Springer, New York (1998)
Nam, L.N.H., Quoc, H.B.: A comprehensive filter feature selection for improving document classification. In: Proceedings of 29th Pacific Asia Conference on Language, Information and Computation 2015, pp. 169–177 (2015)
Nam, L.N.H., Quoc, H.B.: A combined approach for filter feature selection in document classification. In: 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 317–324. IEEE (2015)
Nam, L.N.H., Quoc, H.B.: The ranking methods in the filter feature selection process for text categorization system. In: Proceedings of the 20th Pacific Asia Conference on Information Systems (PACIS 2016) (Paper 159) (2016)
Nam, L.N.H., Quoc, H.B.: The hybrid filter feature selection methods for improving high-dimensional text categorization. Int. J. Uncertainty Fuzziness Knowl.-Based Syst. 25(02), 235–265 (2017)
Pinheiro, R.H., Cavalcanti, G.D.: Data-driven global-ranking local feature selection methods for text categorization. Expert Syst. Appl. 42(4), 1941–1949 (2015)
Platt, J.C.: 12 fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods, pp. 185–208 (1999)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)
Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37(1), 141–188 (2010)
Wang, Y.X., Zhang, Y.J.: Nonnegative matrix factorization: a comprehensive review. IEEE Trans. Knowl. Data Eng. 25(6), 1336–1353 (2013)
Xue, Y., Tong, C.S., Chen, Y.: Clustering-based initialization for non-negative matrix factorization. Appl. Math. Comput. 205(2), 525–536 (2008)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML, vol. 97, pp. 412–420, July 1997
Zheng, Z., Yang, J., Zhu, Y.: Initialization enhancer for non-negative matrix factorization. Eng. Appl. Artif. Intell. 20(1), 101–110 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Nam, L.N.H., Quoc, H.B. (2017). The Clustering-Based Initialization for Non-negative Matrix Factorization in the Feature Transformation of the High-Dimensional Text Categorization System: A Viewpoint of Term Vectors. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science(), vol 10450. Springer, Cham. https://doi.org/10.1007/978-3-319-67008-9_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-67008-9_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67007-2
Online ISBN: 978-3-319-67008-9
eBook Packages: Computer ScienceComputer Science (R0)