Abstract
This paper presents a variable transformation strategy for enriching the variables´ information content and defining the project target in actual data mining applications based on relational databases with data at different grains. In an actual solution for assessing the schools´ quality based on official school survey and students tests data, variables at the student and teachers´ grains had to become features of the schools they belonged. The formal problem was how to summarize the relevant information content of the attribute distributions in a few summarizing concepts (features). Instead of the typical lowest order distribution momenta, the proposed transformations based on the distribution histogram produced a weighted score for the input variables. Following the CRISP-DM method, the problem interpretation has been precisely defined as a binary decision problem on a granularly transformed student grade. The proposed granular transformation embedded additional human expert´s knowledge to the input variables at the school level. Logistic regression produced a classification score for good schools and the AUC_ROC and Max_KS assessed that score performance on statistically independent datasets. A 10-fold cross-validation experimental procedure showed that this domain-driven data mining approach produced statistically significant improvement at a 0.99 confidence level over the usual distribution central tendency approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
INEP Databases. <http://portal.inep.gov.br/basica-levantamentos-acessar>. Accessed 15 March 2015. (In Portuguese)
Travitzki, R.: ENEM: limites e possibilidades do Exame Nacional do Ensino Médio enquanto indicador de qualidade escolar. Ph.D. thesis, USP, São Paulo (2013). (In Portuguese)
Shearer, C.: The CRISP-DM model: the new blueprint for data mining. J. Data Warehouse. 5(4), 13–22 (2000)
Fawcett, T.: An introduction to ROC analysis. Patt. Recognition Lett. 27, 861–874 (2006)
Bolton, R.J., Hand, D.J.: Statistical fraud detection: a review. Statist. Sci. 17(3), 235–255 (2002)
Nordin, F., Kowalkowski, C.: Solutions offerings: a critical review and reconceptualisation. J. Serv. Manage. 21(4), 441–459 (2010)
Hu, M.K.: Visual pattern recognition by moment invariants. IRE Trans Info. Theor. 8(2), 179–187 (1962)
Hair, Jr., J.F., Black, W.C., Babin, B.J., Anderson, R.E., Tatham, R.L.: Multivariate Data Analysis, 6th edn. Pearson Prentice Hall, Upper Saddle River (2006)
Johnson, R.A., Wichern, D.W.: Applied Multivariate Statistical Analysis, 6th edn. Pearson Prentice Hall, Upper Saddle River (2007)
Sousa, M.U.R.S., Silva, K.P., Adeodato, P.J.L.: Data mining applied to the processes celerity of Pernambuco’s state court of accounts. In: Proceedings of CONTECSI 2008 (2008). (In Portuguese)
Flusser, J., Suk, T.: Pattern recognition by affine moment invariants. Pattern Recogn. 26(1), 167–174 (1993)
Cao, L.: Introduction to domain driven data mining. In: Cao, L., Yu, P.S., Zhang, C., Zhang, H. (eds.) Data Mining for Business Applications, pp. 3–10. Springer, US (2008)
Provost, F., Fawcett, T.: Robust classification for imprecise environments. Mach. Learn. J. 42(3), 203–231 (2001)
Conover, W.J.: Practical Nonparametric Statistics, 3rd edn. Wiley, New York (1999)
Adeodato, P.J.L., Vasconcelos, G.C., et al.: The power of sampling and stacking for the PAKDD-2007 cross-selling problem. Int. J. Data Warehouse. Min. 4(2), 22–31 (2008)
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann, Waltham (2012)
Kavukcuoglu, K.: Learning feature hierarchies for object recognition. Ph.D. thesis, Department Computer Science, New York University, January 2011
Acknowledgments
The author would like to thank Mr. Fábio C. Pereira for running the experiments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Adeodato, P.J.L. (2015). Variable Transformation for Granularity Change in Hierarchical Databases in Actual Data Mining Solutions. In: Jackowski, K., Burduk, R., Walkowiak, K., Wozniak, M., Yin, H. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2015. IDEAL 2015. Lecture Notes in Computer Science(), vol 9375. Springer, Cham. https://doi.org/10.1007/978-3-319-24834-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-24834-9_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24833-2
Online ISBN: 978-3-319-24834-9
eBook Packages: Computer ScienceComputer Science (R0)