Abstract
One of the main problems that modern e-mail systems face is the management of the high degree of spam or junk mail they recieve. Those systems are expected to be able to distinguish between legitimate mail and spam; in order to present the final user as much interesting information as possible. This study presents a novel hybrid intelligent system using both unsupervised and supervised learning that can be easily adapted to be used in an individual or collaborative system. The system divides the spam filtering problem into two stages: firstly it divides the input data space into different similar parts. Then it generates several simple classifiers that are used to classify correctly messages that are contained in one of the parts previously determined. That way the efficiency of each classifier increases, as they can specialize in separate the spam from certain types of related messages. The hybrid system presented has been tested with a real e-mail data base and a comparison of its results with those obtained from other common classification methods is also included. This novel hybrid technique proves to be effective in the problem under study.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ruta, D., Gabrys, B.: An overview of classifier fusion methods. Computing and Information Systems 7(1), 1–10 (2000)
Schapire, R.E.: The strength of weak learnability. Machine Learning 5(2), 197–227 (1990)
Baruque, B., Corchado, E.: A weighted voting summarization of SOM ensembles. Data Mining andKnowledge Discovery 21, 398–426 (2010), doi:10.1007/s10618-009-0160-3
Corchado, E., Baruque, B.: Wevos-visom: An ensemble summarization algorithm for enhanced data visualization. Neurocomputing ( in press, 2011)
Sharkey, A., Sharkey, N.: Combining diverse neural nets. Knowledge Engineering Review 12(3), 1–17 (1997)
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley Interscience, Hoboken (2004)
Jacobs, R., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Computation 3, 79–87 (1991)
Polikar, R.: Ensemble based systems in decision making. IEEE Circuits and Systems Magazine 6(3), 21–45 (2006)
Kohonen, T.: Self-Organizing Maps, vol. 30. Springer, Berlin (1995)
Lampinen, J., Oja, E.: Clustering properties of hierarchical self-organizing maps. Journal of Mathematical Imaging and Vision 2, 261–272 (1992)
Dara, R., Kremer, S.C., Stacey, D.A.: Clustering unlabelled data with SOMs improves classi cation of labelled real-world data. In: Proc. IEEE World Congress, on Computational Intelligence, pp. 2237–2242 ( May 2002)
Ultsch, A.: Self-organizing neural networks for visualization and classification. In: Proc. Conf. Soc. for Information and Classification (1992)
Ultsch, A.: U*-matrix: A tool to visualize clusters in high dimensional data. Tech. rep., Department of Computer Science, University of Marburg (2003)
Kuncheva, L.I.: Clustering-and-selection model for classifier combination. In: KES, pp. 185–188 (2000)
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998)
Breiman, L.: Bagging predictors. In: Machine Learning, vol. 24(2), pp. 123–140 (1996)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: International Conference on Machine Learning, vol. 156, p. 148 (1996)
Asuncion, A., Newman, D.J.: UCI machine learning repository (2007)
Apache Software Foundation. Spamassasin public corpus (2006)
Singhal, A.: Modern information retrieval: A brief overview. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 24(4), 35–43 (2001)
Maron, M.E.: An historical note on the origins of probabilistic indexing. Information Processing and Management 44, 971–972 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Porras, S., Baruque, B., Vaquerizo, B., Corchado, E. (2011). Clustering Ensemble for Spam Filtering. In: Corchado, E., Kurzyński, M., Woźniak, M. (eds) Hybrid Artificial Intelligent Systems. HAIS 2011. Lecture Notes in Computer Science(), vol 6679. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21222-2_44
Download citation
DOI: https://doi.org/10.1007/978-3-642-21222-2_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21221-5
Online ISBN: 978-3-642-21222-2
eBook Packages: Computer ScienceComputer Science (R0)