Abstract
For the difficulty of marking Vietnamese dependency tree, this paper proposed the method which combined MST algorithm and improved Nivre algorithm to build Vietnamese dependency treebank. The method took full advantage of the characteristics of collaborative training. Firstly, we built a bit samples. Secondly, we used the samples to build two weak learners with two fully redundant views. Then, we marked a large number of unmarked samples mutually. Next, we selected the samples of high trust to relearn and built a dependency parsing system. Finally, we used 5000 Vietnamese sentences marked manually to do tenfold cross-test and obtained the accuracy of 76.33 %. Experimental results showed that the proposed method in this paper could take full advantage of unmarked corpus to effectively improve the quality of dependency treebank.
This work was supported in part by the National Natural Science Foundation of China (Grant Nos. 61262041, 61363044 and 61472168) and the key project of National Natural Science Foundation of Yunnan province (Grant No. 2013FA030).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Le-Hong, P., Nguyen, T.M.H.: Part-of-speech induction for Vietnamese. In: Huynh, V.N., Denoeux, T., Tran, D.H., Le, A.C., Pham, B.S. (eds.) KSE 2013, Part II. AISC, vol. 245, pp. 273–286. Springer, Heidelberg (2014)
Le-Hong, P., Nguyen, T.M.H., Rossignol, M., Roussanaly, A.: An empirical study of maximum entropy approach for part-of-speech tagging of Vietnamese texts. In: Actes du Traitement Automatique des Langues Naturelles (TALN-2010), Montreal, Canada (2010)
Dinh, Q.T., Nguyen, T.M H., Vu, X.L., Rossignol, M., Le-Hong, P., Nguyen, C.T.: Word segmentation of Vietnamese texts: a comparison of approaches. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation, Marrakech, Morocco (2008)
Lai, T.B.Y., Huang, C.N., Zhou, M., Miao, J.B., Siu, K.C.: Span-based statistical dependency parsing of Chinese. In: Proceedings of NLPRS, pp. 677–684 (2001)
Yamada, H., Matsumoto, Y.: Statistical dependency analysis with support vector machines. In: Proceedings of the 8th International Workshop on Parsing Technologies (IWPT), pp. 195–206 (2003)
Ma, J.S., Zhang, Y., Liu, T., Li, S.: A statistical dependency parser of Chinese under small training data. In: Workshop: Beyond Shallow Analyses-Formalisms and Statistical Modeling for Deep Analyses, IJCNLP-2004, San Ya, pp. 113–118 (2004)
Thi, L.N., Vietnam, H.N., Minh, H.N.T., Le Hong, P.: Building a treebank for Vietnamese dependency parsing. In: IEEE RIVF International Conference on Computing and Communication Technologies - Research, Innovation, and Vision for the Future (RIVF), 10–13 November 2013
McDonald, R.: Non-projective dependency parsing using spanning tree algorithms, pp. 523–530. Association for Computational Linguistics (2005)
Eisner, J.: Three new probabilistic models for dependency parsing: an exploration. In: Proceedings of the COLING (1996)
Chu, Y.J., Liu, T.H.: On the shortest arborescence of a directed graph. Sci. Sinica 14, 1396–1400 (1965)
Edmonds, J.: Optimum branchings. J. Res. Natl. Bur. Stand. 71B, 233–240 (1967)
Beyer, K., Ramakrishnan, R.: Bottom-up computation of sparse and iceberg cubes. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, Philadelphia, pp. 359–370 (1999)
Findlater, L., Hamilton, H.J.: Iceberg-cube algorithms: an empirical evaluation on synthetic and real data. Intell. Data Anal. 7(2), 77–97 (2003)
Nivre, J., Scholz, M.: Deterministic dependency parsing of English text. In: Proceedings of the 20th International Conference on Computational Linguistics (COLING), pp. 64–70 (2004)
Nivre, J., McDonald, R.: Integrating graphbased and transition-based dependency parsers. In: Proceedings of ACL, pp. 950–958 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Qiu, G., Guo, J., Yu, Z., Xian, Y., Mao, C. (2016). Using Collaborative Training Method to Build Vietnamese Dependency Treebank. In: Sun, M., Huang, X., Lin, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2016 2016. Lecture Notes in Computer Science(), vol 10035. Springer, Cham. https://doi.org/10.1007/978-3-319-47674-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-47674-2_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47673-5
Online ISBN: 978-3-319-47674-2
eBook Packages: Computer ScienceComputer Science (R0)