Abstract
Throughout the history of science, different knowledge areas have collaborated to overcome major research challenges. The task of associating a researcher with such areas makes a series of tasks feasible such as the organization of digital repositories, expertise recommendation and the formation of research groups for complex problems. In this article, we propose a simple yet effective automatic classification model that is capable of categorizing research expertise according to a knowledge area classification scheme. Our proposal relies on discriminatory evidence provided by the title of academic works, which is the minimum information capable of relating a researcher to its knowledge area. Our experiments show that using supervised machine learning methods trained with manually labeled information, it is possible to produce effective classification models.
Similar content being viewed by others
Notes
In this article we use the terms classification and categorization interchangeably.
Literal translation from the original title in Portuguese: A UNESCO e o Mundo da Cultura.
References
Aletras, N., Baldwin, T., Lau, J.H., Stevenson, M.: Representing topics labels for exploring digital libraries. In: Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 239–248 (2014)
Bakalov, A., McCallum, A., Wallach, H., Mimno, D.: Topic models for taxonomies. In: Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 237–240 (2012)
Campos, R., Canuto, S., Salles, T., de Sá, C.C., Gonçalves, M.A.: Stacking bagged and boosted forests for effective automated classification. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, pp. 105–114. ACM (2017)
Canuto, S., Gonçalves, M., Santos, W., Rosa, T., Martins, W.: An efficient and scalable metafeature-based document classification approach based on massively parallel computing. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, pp. 333–342. ACM (2015)
Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, vol. 3, pp. 1776–1781 (2011)
Chen, Y., Fox, E.A.: Using ACM DL paper metadata as an auxiliary source for building educational collections. In: Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 137–140 (2014)
de Siqueira, G.O., Canuto, S., Gonçalves, M.A., Laender, A.H.F.: Automatic hierarchical categorization of research expertise using minimum information. In: International Conference on Theory and Practice of Digital Libraries, pp. 103–115. Springer (2017)
Dias, T.M.R.: A study on the Brazilian scientific production based on data from the lattes platform (in Portuguese). Ph.D. Thesis, CEFET-MG, Belo Horizonte, MG (2016)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics. Springer, New York (2001)
Lane, J.: Let’s make science metrics more scientific. Nature 464(7288), 488–489 (2010)
Li, M., Liu, L., Li, C.-B.: An approach to expert recommendation based on fuzzy linguistic method and fuzzy text classification in knowledge management systems. Expert Syst. Appl. 38(7), 8586–8596 (2011)
Liu, T.-Y., Yang, Y., Wan, H., Zeng, H.-J., Chen, Z., Ma, W.-Y.: Support vector machines classification with a very large-scale taxonomy. ACM SIGKDD Explor. Newsl. 7(1), 36–43 (2005)
Macdonald, C., Ounis, I.: Voting techniques for expert search. Knowl. Inf. Syst. 16(3), 259–280 (2008)
Moreira, C., Calado, P., Martins, B.: Learning to Rank for Expert Search in Digital Libraries of Academic Publications. In: Antunes, L., Pinto, H.S. (eds.) Progress in Artificial Intelligence, pp. 431–445. Springer, Berlin (2011)
Naik, A., Rangwala, H.: Hierflat: flattened hierarchies for improving top-down hierarchical classification. Int. J. Data Sci. Anal. 4(3), 191–208 (2017)
Niu, W., Liu, Z., Caverlee, J.: On local expert discovery via geo-located crowds, queries, and candidates. ACM Trans. Spat. Algorithms Syst. 2(4), 14:1–14:24 (2016)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Ribeiro, I.S., Santos, R.L.T., Gonçalves, M.A., Laender, A.H.F.: On tag recommendation for expertise profiling: a case study in the scientific domain. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 189–198 (2015)
Ribeiro-Neto, B.A., Laender, A.H.F., de Lima, L.R.S.: An experimental study in automatically categorizing medical documents. J. Assoc. Inf. Sci. Technol. 52(5), 391–401 (2001)
Salles, T., Gonçalves, M., Rodrigues, V., Rocha, L.: Broof: exploiting out-of-bag errors, boosting and random forests for effective automated classification. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, pp. 353–362. ACM (2015)
Sanchez, D., Moreno, A.: Bringing taxonomic structure to large digital libraries. Int. J. Metadata Semant. Ontol. 2(2), 112–122 (2007)
Seymour, E., Damle, R., Sette, A., Peters, B.: Cost sensitive hierarchical document classification to triage PubMed abstracts for manual curation. BMC Bioinform. 12, 482 (2011)
Silla Jr., C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Discov. 22(1–2), 31–72 (2011)
Srinivasan, V., Fox, E.: Progress towards automated ETD cataloging. In: Proceedings of the 19th International Symposium on Electronic Theses and Dissertations: Data and Dissertations (2016)
Viegas, F., da Rocha, L.C., Resende, E., Salles, T., Martins, W., Freitas, M.F., Gonçalves, M.A.: Exploiting efficient and effective lazy semi-bayesian strategies for text classification. Neurocomputing 307, 153–171 (2018)
Waltinger, U., Mehler, A., Lösch, M., Horstmann, W.: Hierarchical classification of OAI metadata using the DDC taxonomy. In: Bernardi, R., Anderson, S., Bjrn, C., Frdrique, G., Zaihrayeu, S. (eds.) Advanced Language Technologies for Digital Libraries, pp. 29–40. Springer, Berlin (2011)
Yang, K.-W., Huh, S.-Y.: Automatic expert identification using a text categorization technique in knowledge management systems. Expert Syst. Appl. 34(2), 1445–1455 (2008)
Yang, Y.: An evaluation of statistical approaches to text categorization. Inf. Retr. J. 1(1–2), 69–90 (1999)
Acknowledgements
This work was partially funded by project MASWeb (Grant FAPEMIG/PRONEX APQ-01400-14) and by the authors’ individual Grants from CAPES, CNPq and FAPEMIG.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
de Siqueira, G.O., Canuto, S., Gonçalves, M.A. et al. A pragmatic approach to hierarchical categorization of research expertise in the presence of scarce information. Int J Digit Libr 21, 61–73 (2020). https://doi.org/10.1007/s00799-018-0260-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00799-018-0260-z