Abstract
Automatically constructing knowledge bases from free online encyclopedias has been considered to be a crucial step in many internet related areas. However, current research pays more attention to extract knowledge facts from English resources, and there is less work concerning other languages. In this paper, we describe an approach to extract entity attributes from a free Chinese online encyclopedia-HudongBaike. We first identified attribute-value pairs from HudongBaike pages that are featured with InfoBoxes, which in turn can be used to learn which attributes we should pay attention to for different HudongBaike entries. We then adopted a keyword matching approach to identify candidate sentences for each attribute in a plain HudongBaike article. At last, we trained a CRF model to extract corresponding values from these candidate sentences. Our approach is simple but effective, and our experiments show that it is possible to produce large amount of <S,P,O> triples from free online encyclopedias which can be then used to construct Chinese knowledge bases with less human supervision.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American (May 2001)
Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press (1998)
Matuszek, C., Cabral, J., Witbrock, M., DeOliveira, J.: An introduction to the syntax and content of Cyc. In: AAAI Spring Symposium (2006)
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: A Large Ontology from Wikipedia and WordNet. Web Semantics: Science, Services and Agents on the World Wide Web 6(3), 203–217 (2008)
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - A Crystallization Point for the Web of Data. Journal of Web Semantics: Science, Services and Agents on the World Wide Web (7), 154–165 (2009)
Wu, F., Weld, D.S.: Automatically Refining the Wikipedia Infobox Ontology. In: Proceedings of the 17th International Conference on World Wide Web, pp. 635–644. ACM, New York (2008)
Wu, F., Weld, D.S.: Autonomously semantifying Wikipedia. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management. ACM, New York (2007)
Qu, Y., Cheng, G., Ji, Q., Ge, W., Zhang, X.: Seeking knowledge with Falcons. Semantic Web Challenge (2008)
Shi, F., Li, J., Tang, J., Xie, G., Li, H.: Actively Learning Ontology Matching via User Interaction. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 585–600. Springer, Heidelberg (2009)
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, Y., Chen, L., Xu, K. (2012). Learning Chinese Entity Attributes from Online Encyclopedia. In: Wang, H., et al. Web Technologies and Applications. APWeb 2012. Lecture Notes in Computer Science, vol 7234. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29426-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-29426-6_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29425-9
Online ISBN: 978-3-642-29426-6
eBook Packages: Computer ScienceComputer Science (R0)