On the Effectiveness of Distributed Learning on Different Class-Probability Distributions of Data

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7023))

Included in the following conference series:

Conference of the Spanish Association for Artificial Intelligence

1288 Accesses
1 Citations

Abstract

The unrestrainable growth of data in many domains in which machine learning could be applied has brought a new field called large-scale learning that intends to develop efficient and scalable algorithms with regard to requirements of computation, memory, time and communications. A promising line of research for large-scale learning is distributed learning. It involves learning from data stored at different locations and, eventually, select and combine the “local” classifiers to obtain a unique global answer using one of three main approaches. This paper is concerned with a significant issue that arises when distributed data comes in from several sources, each of which has a different distribution. The class-probability distribution of data (CPDD) is defined and its impact on the performance of the three combination approaches is analyzed. Results show the necessity of taking into account the CPDD, concluding that combining only related knowledge is the most appropriate manner for learning in a distributed manner.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

The Impact of Data Selection Strategies on Distributed Model Performance

A Review of Distributed Data Models for Learning

Decentralised and Privacy Preserving Machine Learning for Multiple Distributed Data Resources

References

Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Advances in Neural Information Processing Systems, vol. 20, pp. 161–168 (2008)
Google Scholar
PASCAL Large Scale Learning Challenge (2008), http://largescale.first.fraunhofer.de/ (Online; accessed May 10, 2011)
Catlett, J.: Megainduction: machine learning on very large databases. PhD thesis, School of Computer Science, University of Technology, Sydney, Australia (1991)
Google Scholar
Tsoumakas, G.: Distributed Data Mining. In: Database Technologies: Concepts, Methodologies, Tools, and Applications, pp. 157–171 (2009)
Google Scholar
Tsoumakas, G., Vlahavas, I.: Effective stacking of distributed classifiers. In: Proc. 15th European Conference on Artificial Intelligence (ECAI 2002), pp. 340–344. Ios Pr. Inc. (2002)
Google Scholar
Guijarro-Berdiñas, B., Martínez-Rego, D., Fernández-Lorenzo, S.: Privacy-Preserving Distributed Learning Based on Genetic Algorithms and Artificial Neural Networks. In: Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living, pp. 195–202 (2009)
Google Scholar
McClean, S., Scotney, B., Greer, K., Páircéir, R.: Conceptual Clustering of Heterogeneous Distributed Databases. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 46–55. Springer, Heidelberg (2001)
Google Scholar
Bronshtein, I.N., Semendyayev, K.A., Hirsch, K.A.: Handbook of mathematics. Springer, Berlin (2007)
MATH Google Scholar
Agrawal, R., Srikant, R.: Privacy-preserving data mining. ACM Sigmod Record 29(2), 439–450 (2000)
Article Google Scholar
Dietterich, T.: Ensemble Methods in Machine Learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)
Chapter Google Scholar
Lam, L., Suen, C.Y.: A theoretical analysis of the application of majority voting to pattern recognition. In: Proceedings of the 12th ICPR, vol. 2, pp. 418–420. IEEE (1994)
Google Scholar
Tsoumakas, G., Angelis, L., Vlahavas, I.: Clustering classifiers for knowledge discovery from physically distributed databases. Data & Knowledge Engineering 49(3), 223–242 (2004)
Article Google Scholar
Yang, W., Huang, S.: Data privacy protection in multi-party clustering. Data & Knowledge Engineering 67(1), 185–199 (2008)
Article Google Scholar
Adhikari, A., Rao, P.R.: Efficient clustering of databases induced by local patterns. Decision Support Systems 44(4), 925–943 (2008)
Article Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository (2010), http://archive.ics.uci.edu/ml (Online; accessed May 10, 2011)
Quinlan, J.R.: C4. 5: programs for machine learning. Morgan Kaufmann (1993)
Google Scholar
Bishop, C.M.: Neural networks for pattern recognition. Oxford University Press, USA (1995)
MATH Google Scholar
Vapnik, V.N.: The nature of statistical learning theory. Springer, Heidelberg (2000)
Book MATH Google Scholar
Weiss, S.M., Kulikowski, C.A.: Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems. Morgan Kaufmann, San Francisco (1991)
Google Scholar
Hollander, M., Wolfe, D.A.: Nonparametric statistical methods (1999)
Google Scholar
Hsu, J.C.: Multiple comparisons: theory and methods. Chapman & Hall/CRC (1996)
Google Scholar
Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(10), 993–1001 (1990)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Informatics, University of A Coruña, Campus de Elviña s/n, 15071, A Coruña, Spain
Diego Peteiro-Barral, Bertha Guijarro-Berdiñas & Beatriz Pérez-Sánchez

Authors

Diego Peteiro-Barral
View author publications
You can also search for this author in PubMed Google Scholar
Bertha Guijarro-Berdiñas
View author publications
You can also search for this author in PubMed Google Scholar
Beatriz Pérez-Sánchez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science School, University of the Basque Country, PÂº Manuel de Lardizabal 1, 20018, Donostia-San Sebastian, Spain
Jose A. Lozano
Computing Systems Department, University of Castilla-La Mancha, Campus Universitario s/n, 02071, Albacete, Spain
José A. Gámez
Dep. Statistics, O.R. and Computation, University of La Laguna, 38271, La Laguna, S.C. Tenerife, Spain
José A. Moreno

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peteiro-Barral, D., Guijarro-Berdiñas, B., Pérez-Sánchez, B. (2011). On the Effectiveness of Distributed Learning on Different Class-Probability Distributions of Data. In: Lozano, J.A., Gámez, J.A., Moreno, J.A. (eds) Advances in Artificial Intelligence. CAEPIA 2011. Lecture Notes in Computer Science(), vol 7023. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25274-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-25274-7_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25273-0
Online ISBN: 978-3-642-25274-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On the Effectiveness of Distributed Learning on Different Class-Probability Distributions of Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

The Impact of Data Selection Strategies on Distributed Model Performance

A Review of Distributed Data Models for Learning

Decentralised and Privacy Preserving Machine Learning for Multiple Distributed Data Resources

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

On the Effectiveness of Distributed Learning on Different Class-Probability Distributions of Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

The Impact of Data Selection Strategies on Distributed Model Performance

A Review of Distributed Data Models for Learning

Decentralised and Privacy Preserving Machine Learning for Multiple Distributed Data Resources

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation