×

A hierarchical neural-network-based document representation approach for text classification. (English) Zbl 1427.68281

Summary: Document representation is widely used in practical application, for example, sentiment classification, text retrieval, and text classification. Previous work is mainly based on the statistics and the neural networks, which suffer from data sparsity and model interpretability, respectively. In this paper, we propose a general framework for document representation with a hierarchical architecture. In particular, we incorporate the hierarchical architecture into three traditional neural-network models for document representation, resulting in three hierarchical neural representation models for document classification, that is, TextHFT, TextHRNN, and TextHCNN. Our comprehensive experimental results on two public datasets, that is, Yelp 2016 and Amazon Reviews (Electronics), show that our proposals with hierarchical architecture outperform the corresponding neural-network models for document classification, resulting in a significant improvement ranging from 4.65% to 35.08% in terms of accuracy with a comparable (or substantially less) expense of time consumption. In addition, we find that the long documents benefit more from the hierarchical architecture than the short ones as the improvement in terms of accuracy on long documents is greater than that on short documents.

MSC:

68T10 Pattern recognition, speech recognition
Full Text: DOI

References:

[1] Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; Hovy, E., Hierarchical attention networks for document classification, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics, Human Language Technologies
[2] Mikolov, T.; Chen, K.; Corrado, G.; Dean, J., Efficient estimation of word representations in vector space, Proceedings of International Conference on Learning Representations
[3] Pennington, J.; Socher, R.; Manning, C. D., GloVe: global vectors for word representation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing
[4] Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; Dean, J., Distributed representations of words and phrases and their compositionality, Proceedings of Advances in Neural Information Processing Systems 26
[5] Le, Q.; Mikolov, T., Distributed representations of sentences and documents, Proceedings of the 31st International Conference on Machine Learning, ICML 2014
[6] Zhao, Z.; Liu, T.; Hou, X.; Li, B.; Du, X., Distributed text representation with weighting scheme guidance for sentiment analysis, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface, 9931, 41-52 (2016) · doi:10.1007/978-3-319-45814-4_4
[7] Tang, D., Sentiment-specific representation learning for document-level sentiment analysis, Proceedings of the 8th ACM International Conference on Web Search and Data Mining, WSDM 2015 · doi:10.1145/2684822.2697035
[8] Isbell, C. L., Sparse multi-level representations for retrieval, Journal of Computing Information Science in Engineering, 8, 3, 603-616 (1998)
[9] Wang, M.; Liu, M.; Feng, S.; Wang, D.; Zhang, Y., A novel calibrated label ranking based method for multiple emotions detection in chinese microblogs, Natural Language Processing and Chinese Computing, 238-250 (2014), Heidelberg, Germany · doi:10.1007/978-3-662-45924-9_22
[10] Joachims, T., Text categorization with support vector machines: learning with many relevant features, Proceedings of the of European Conference on Machine Learning
[11] Zhang, X.; Zhao, J.; Lecun, Y., Character-level convolutional networks for text classification, Proceedings of the 29th Annual Conference on Neural Information Processing Systems
[12] Joulin, A.; Grave, E.; Bojanowski, P.; Mikolov, T., Bag of tricks for efficient text classification, Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017
[13] Kim, Y., Convolutional neural networks for sentence classification, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing
[14] Liu, P.; Qiu, X.; Huang, V., Recurrent neural network for text classification with multi-task learning, Proceedings of the 25th International Joint Conference on Artificial Intelligence
[15] Lai, S.; Xu, L.; Liu, K.; Jun, Z., Recurrent convolutional neural networks for text classification, Proceedings of the In Proceedings of Association for the Advancement of Artificial Intelligence
[16] Xu, R.; Chen, T.; Xia, Y.; Lu, Q.; Liu, B.; Wang, X., Word embedding composition for data imbalances in sentiment and emotion classification, Cognitive Computation, 7, 2, 226-240 (2015) · doi:10.1007/s12559-015-9319-y
[17] Chen, Y.-W.; Zhou, Q.; Luo, W.; Du, J.-X., Classification of chinese texts based on recognition of semantic topics, Cognitive Computation, 8, 1, 114-124 (2016) · doi:10.1007/s12559-015-9346-8
[18] Lewis, D. D., Evaluation of phrasal and clustered representations on a text categorization task, Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
[19] Post, M.; Bergsma, S., Explicit and implicit syntactic features for text classification, Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics
[20] Bengio, Y.; Ducharme, R.; Vincent, P.; Jauvin, C., A neural probabilistic language model, Journal of Machine Learning Research, 3, 1137-1155 (2003) · Zbl 1061.68157 · doi:10.1162/153244303322533223
[21] Collobert, R.; Weston, J.; Bottou, L.; Karlen, M.; Kavukcuoglu, K.; Kuksa, P., Natural language processing (almost) from scratch, Journal of Machine Learning Research, 12, 2493-2537 (2011) · Zbl 1280.68161
[22] Manning, C. D.; Surdeanu, M.; Bauer, J.; Finkel, J.; Bethard, S.; McClosky, D., The stanford corenlp natural language processing toolkit, Proceedings of the Meeting of the Association for Computational Linguistics: System Demonstrations
[23] Lai, S.; Liu, K.; He, S.; Zhao, J., How to generate a good word embedding, IEEE Intelligent Systems, 31, 6, 5-14 (2016) · doi:10.1109/MIS.2016.45
[24] Glorot, X.; Bengio, Y., Understanding the difficulty of training deep feedforward neural networks, Proceedings of the 13th International Conference on Artificial Intelligence and Statistics
[25] Pascanu, R.; Mikolov, T.; Bengio, Y., On the difficulty of training recurrent neural networks, Computer Science, 52, 3 (2012)
[26] Ioffe, S.; Szegedy, C., Batch normalization: Accelerating deep network training by reducing internal covariate shift, Proceedings of the 32nd International Conference on Machine Learning (ICML ’15)
[27] Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R., Dropout: a simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, 15, 1, 1929-1958 (2014) · Zbl 1318.68153
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.