research-article

On the detection of community smells using genetic programming-based ensemble classifier chain

Authors:

Moataz Chouchen,

Mohamed Wiem MkaouerAuthors Info & Claims

ICGSE '20: Proceedings of the 15th International Conference on Global Software Engineering

Pages 43 - 54

https://doi.org/10.1145/3372787.3390439

Published: 25 September 2020 Publication History

Abstract

Community smells are symptoms of organizational and social issues within the software development community that often increase the project costs and impact software quality. Recent studies have identified a variety of community smells and defined them as suboptimal patterns connected to organizational-social structures in the software development community such as the lack of communication, coordination and collaboration. Recognizing the advantages of the early detection of potential community smells in a software project, we introduce a novel approach that learns from various community organizational and social practices to provide an automated support for detecting community smells. In particular, our approach learns from a set of interleaving organizational-social symptoms that characterize the existence of community smell instances in a software project. We build a multi-label learning model to detect 8 common types of community smells. We use the ensemble classifier chain (ECC) model that transforms multi-label problems into several single-label problems which are solved using genetic programming (GP) to find the optimal detection rules for each smell type. To evaluate the performance of our approach, we conducted an empirical study on a benchmark of 103 open source projects and 407 community smell instances. The statistical tests of our results show that our approach can detect the eight considered smell types with an average F-measure of 89% achieving a better performance compared to different state-of-the-art techniques. Furthermore, we found that the most influential factors that best characterize community smells include the social network density and closeness centrality as well as the standard deviation of the number of developers per time zone and per community.

References

[1]

2020. Replication Package. https://github.com/GP-ECC/community-smells

[2]

T. Mukhopadhyay A. Gopal and M. S. Krishnan. 2002. The role of software processes and communication in offshore software development. In Communications of the ACM April 2002. Association for Computing Machinery, New York, NY, United States, USA, 1106--1113.

Digital Library

[3]

Peter John Angeline. 1994. Genetic programming and emergent intelligence. Advances in genetic programming 1 (1994), 75--98.

[4]

Guilherme Avelino, Leonardo Passos, Andre Hora, and Marco Tulio Valente. 2016. A novel approach for estimating truck factors. In IEEE 24th International Conference on Program Comprehension (ICPC). 1--10.

[5]

Christian Bird, Nachiappan Nagappan, Premkumar Devanbu, Harald Gall, and Brendan Murphy. 2009. Does distributed development affect software quality? An empirical case study of Windows Vista. In Proceedings of the 31st international conference on software engineering. 518--528.

Digital Library

[6]

Christian Bird, Nachiappan Nagappan, Harald Gall, Brendan Murphy, and Premkumar Devanbu. 2009. Putting it all together: Using socio-technical networks to predict failures. In 20th International Symposium on Software Reliability Engineering. 109--119.

Digital Library

[7]

Marcelo Cataldo, James D Herbsleb, and Kathleen M Carley. 2008. Socio-technical congruence: a framework for assessing the impact of technical and work dependencies on software development productivity. In Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement. 2--11.

Digital Library

[8]

Marcelo Cataldo, Audris Mockus, Jeffrey A Roberts, and James D Herbsleb. 2009. Software dependencies, work dependencies, and their impact on failures. IEEE Transactions on Software Engineering 35, 6 (2009), 864--878.

Digital Library

[9]

Marcelo Cataldo and Sangeeth Nambiar. 2009. On the relationship between process maturity and geographic distribution: an empirical analysis of their impact on software quality. In Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. 101--110.

Digital Library

[10]

Marcelo Cataldo and Sangeeth Nambiar. 2012. The impact of geographic distribution and the nature of technical coupling on the quality of global software development projects. Journal of software: Evolution and Process 24, 2 (2012), 153--168.

[11]

Norman Cliff. 1993. Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin 114, 3 (1993), 494.

[12]

Jacob Cohen. 2013. Statistical power analysis for the behavioral sciences. Routledge.

[13]

V. Cosentino, J. L. C. Izquierdo, and J. Cabot. 2015. Assessing the bus factor of Git repositories. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). 499--503.

[14]

Stefano Invernizzi Elisabetta Di Nitto Damian A. Tamburri, Simone Gatti. 2016. Re-Architecting Software Forges into Communities: An Experience Report. In JOURNAL OF SOFTWARE: EVOLUTION AND PROCESS. 1--26.

[15]

André C. P. L. F. de Carvalho and Alex A. Freitas. 2009. A Tutorial on Multi-label Classification Techniques. 177--195.

[16]

Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE transactions on evolutionary computation 6, 2 (2002), 182--197.

[17]

Yvonne Dittrich, Jacob Nørbjerg, Paolo Tell, and Lars Bendix. 2018. Researching cooperation and communication in continuous software engineering. In 2018 IEEE/ACM 11th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). IEEE, 87--90.

Digital Library

[18]

Yang Feng and Zhenyu Chen. 2012. Multi-label software behavior learning. In 34th International Conference on Software Engineering (ICSE). 1305--1308.

[19]

Mívian Ferreira, Guilherme Avelino, Marco Tulio Valente, and Kecia AM Ferreira. 2016. A Comparative Study of Algorithms for Estimating Truck Factor. In Brazilian Symposium on Software Components, Architectures and Reuse (SBCARS). 91--100.

[20]

Fred W Glover and Gary A Kochenberger. 2006. Handbook of metaheuristics. Vol. 57. Springer Science & Business Media.

[21]

Mark Harman. 2007. The current state and future of search based software engineering. (2007), 342--357.

[22]

Mark Harman and John Clark. 2004. Metrics are fitness functions too. In 10th International Symposium on Software Metrics. 58--69.

[23]

Mark Harman and Bryan F Jones. 2001. Search-based software engineering. Information and software Technology 43, 14 (2001), 833--839.

[24]

Mark Harman, S Afshin Mansouri, and Yuanyuan Zhang. 2012. Search-based software engineering: Trends, techniques and applications. ACM Computing Surveys (CSUR) 45, 1 (2012), 11.

Digital Library

[25]

James D. Herbsleb and Audris Mockus. 2003. An empirical study of speed and communication in globally distributed software development. IEEE Transactions on software engineering 29, 6 (2003), 481--494.

Digital Library

[26]

Katherine J Hunt, Natalie Shlomo, and Julia Addington-Hall. 2013. Participant recruitment in sensitive surveys: a comparative trial of 'opt in'versus 'opt out'approaches. BMC Medical Research Methodology 13, 1 (2013), 3.

[27]

M. Joblin, W. Mauerer, S. Apel, J. Siegmund, and D. Riehle. 2015. From Developer Networks to Verified Communities: A Fine-Grained Approach. In 37th IEEE International Conference on Software Engineering (ICSE), Vol. 1. 563--573.

[28]

M. John R. Koza. 1992. Genetic Programming: On Programming Computers by means of Natural Selection and Genetics. In MIT Press, Cambridge, MA, 1992. Association for Computing Machinery, New York, NY, United States.

[29]

M. Kessentini and A. Ouni. 2017. Detecting Android Smells Using Multi-Objective Genetic Programming. In IEEE/ACM 4th International Conference on Mobile Software Engineering and Systems (MOBILESoft). 122--132.

[30]

Rensis Likert. 1932. A technique for the measurement of attitudes. Archives of psychology (1932).

[31]

Stuart McIlroy, Nasir Ali, Hammad Khalid, and Ahmed E Hassan. 2016. Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empirical Software Engineering 21, 3 (2016), 1067--1106.

Digital Library

[32]

Andrew Meneely and Laurie A. Williams. 2011. Socio-technical developer networks: should we trust our measurements? 2011 33rd International Conference on Software Engineering (ICSE) (2011), 281--290.

[33]

Nachiappan Nagappan, Brendan Murphy, and Victor Basili. 2008. The influence of organizational structure on software quality. In 2008 ACM/IEEE 30th International Conference on Software Engineering. 521--530.

Digital Library

[34]

Nachiappan Nagappan, Brendan Murphy, and Victor Basili. 2008. The Influence of Organizational Structure on Software Quality: An Empirical Case Study. In Proceedings of the 30th International Conference on Software Engineering. 521--530.

Digital Library

[35]

Martin Nordio, H Christian Estler, Bertrand Meyer, Julian Tschannen, Carlo Ghezzi, and Elisabetta Di Nitto. 2011. How do distribution and time zones affect software development? a case study on communication. In 2011 IEEE Sixth International Conference on Global Software Engineering. IEEE, 176--184.

Digital Library

[36]

Ali Ouni, Raula Gaikovina Kula, Marouane Kessentini, and Katsuro Inoue. 2015. Web service antipatterns detection using genetic programming. In Annual Conference on Genetic and Evolutionary Computation (GECCO). 1351--1358.

Digital Library

[37]

A. Ouni, M. Kessentini, K. Inoue, and M. Ó. Cinnéide. 2017. Search-Based Web Service Antipatterns Detection. IEEE Transactions on Services Computing 10, 4 (July 2017), 603--617.

[38]

Ali Ouni, Marouane Kessentini, and Houari Sahraoui. 2013. Search-based refactoring using recorded code changes. In 2013 17th European Conference on Software Maintenance and Reengineering. IEEE, 221--230.

Digital Library

[39]

Ali Ouni, Marouane Kessentini, Houari Sahraoui, and Mounir Boukadoum. 2013. Maintainability defects detection and correction: a multi-objective approach. Automated Software Engineering 20, 1 (2013), 47--79.

Digital Library

[40]

Ali Ouni, Marouane Kessentini, Houari Sahraoui, and Mohamed Salah Hamdi. 2012. Search-based refactoring: Towards semantics preservation. In 28th IEEE International Conference on Software Maintenance (ICSM). IEEE, 347--356.

Digital Library

[41]

Ali Ouni, Marouane Kessentini, Houari Sahraoui, and Mohamed Salah Hamdi. 2013. The use of development history in software refactoring using a multi-objective evolutionary algorithm. In Proceedings of the 15th annual conference on Genetic and evolutionary computation. 1461--1468.

Digital Library

[42]

Ali Ouni, Marouane Kessentini, Houari Sahraoui, Katsuro Inoue, and Kalyanmoy Deb. 2016. Multi-criteria code refactoring using search-based software engineering: An industrial case study. ACM Transactions on Software Engineering and Methodology (TOSEM) 25, 3 (2016), 1--53.

Digital Library

[43]

Ali Ouni, Raula Gaikovina Kula, and Katsuro Inoue. 2016. Search-based peer reviewers recommendation in modern code review. In IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 367--377.

[44]

Ali Ouni, Raula Gaikovina Kula, Marouane Kessentini, Takashi Ishio, Daniel M German, and Katsuro Inoue. 2017. Search-based software library recommendation using multi-objective optimization. Information and Software Technology 83 (2017), 55--75.

Digital Library

[45]

Fabio Palomba, Damian Andrew Andrew Tamburri, Francesca Arcelli Fontana, Rocco Oliveto, Andy Zaidman, and Alexander Serebrenik. 2018. Beyond technical aspects: How do community smells influence the intensity of code smells? IEEE transactions on software engineering (2018).

[46]

Martin Pinzger, Nachiappan Nagappan, and Brendan Murphy. 2008. Can Developer-module Networks Predict Failures?. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 2--12.

Digital Library

[47]

Andy Podgurski, David Leon, Patrick Francis, Wes Masri, Melinda Minch, Jiayang Sun, and Bin Wang. 2003. Automated support for classifying software failure reports. In 25th International Conference on Software Engineering, 2003. Proceedings. IEEE, 465--475.

[48]

Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. 2011. Classifier chains for multi-label classification. Machine learning 85, 3 (2011), 333.

[49]

Motoshi Saeki. 1995. Communication, collaboration and cooperation in software development-how should we support group work in software development?. In Proceedings 1995 Asia Pacific Software Engineering Conference. IEEE, 12--20.

[50]

WIlliam Sugar. 2014. Studies of ID practices: A review and synthesis of research on ID current practices. Springer.

[51]

Damian A Tamburri, Rick Kazman, and Hamed Fahimi. 2016. The architect's role in community shepherding. IEEE Software 33, 6 (2016), 70--79.

Digital Library

[52]

Damian A Tamburri, Philippe Kruchten, Patricia Lago, and Hans van Vliet. 2013. What is social debt in software engineering?. In International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). IEEE, 93--96.

[53]

D. A. Tamburri, P. Kruchten, P. Lago, and H. van Vliet. 2013. What is social debt in software engineering?. In International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). 93--96.

[54]

Damian A. Tamburri, Philippe Kruchten, Patricia Lago, and Hans van Vliet. 2015. Social debt in software engineering: insights from industry. Journal of Internet Services and Applications 6, 1 (04 May 2015), 10.

[55]

Damian A. Tamburri, Fabio Palomba, Alexander Serebrenik, and Andy Zaidman. 2018. Discovering community patterns in open-source: a systematic approach and its evaluation. Empirical Software Engineering (2018).

[56]

D. A. A. Tamburri, F. Palomba, and R. Kazman. 2019. Exploring Community Smells in Open-Source: An Automated Approach. IEEE Transactions on Software Engineering (2019), 1--1.

[57]

Grigorios Tsoumakas and Ioannis Katakis. 2007. Multi-label classification: An overview. International Journal of Data Warehousing and Mining 3, 3 (2007), 1--13.

[58]

Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2010. Mining Multilabel Data. 667--685.

[59]

Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2010. Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering 23, 7 (2010), 1079--1089.

Digital Library

[60]

Xin Xia, Yang Feng, David Lo, Zhenyu Chen, and Xinyu Wang. 2014. Towards more accurate multi-label software behavior learning. In Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE). 134--143.

[61]

Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern recognition 40, 7 (2007), 2038--2048.

Cited By

Lambiase S(2024)Cultural and Socio-Technical Aspects in Software DevelopmentProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661230(482-487)Online publication date: 18-Jun-2024
https://dl.acm.org/doi/10.1145/3661167.3661230
Tahsin NMd. Mahbubul Alam Joarder (2023)Community smells in software engineering: A systematic literature reviewSystematic Literature Review and Meta-Analysis Journal10.54480/slr-m.v3i4.513:4(127-145)Online publication date: 15-Aug-2023
https://doi.org/10.54480/slr-m.v3i4.51
Nanthaamornphong ABoonchieng E(2023)An Exploratory Study on Code Smells during Code Review in OSS Projects: A Case Study on OpenStack and WikiMediaRecent Advances in Computer Science and Communications10.2174/266625581666623022211231316:7Online publication date: Sep-2023
https://doi.org/10.2174/2666255816666230222112313
Show More Cited By

Index Terms

On the detection of community smells using genetic programming-based ensemble classifier chain
1. Software and its engineering
  1. Software organization and properties

Recommendations

How do community smells influence code smells?
ICSE '18: Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings

Code smells reflect sub-optimal patterns of code that often lead to critical software flaws or failure. In the same way, community smells reflect sub-optimal organisational and socio-technical patterns in the organisational structure of the software ...
Refactoring community smells in the wild: the practitioner's field manual
ICSE-SEIS '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Society

Community smells have been defined as sub-optimal organizational structures that may lead to social debt. Previous studies have shown that they are highly diffused in both open- and closed-source projects, are perceived as harmful by practitioners, and ...
csDetector: an open source tool for community smells detection
ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Community smells represent symptoms of sub-optimal organizational and social issues within software development communities that often lead to additional project costs and reduced software quality. Previous research identified a variety of community ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICGSE '20: Proceedings of the 15th International Conference on Global Software Engineering

June 2020

147 pages

ISBN:9781450370936

DOI:10.1145/3372787

General Chair:
Paolo Tell
IT University of Copenhagen, Denmark
,
Program Chairs:
Igor Steinmacher
Northern Arizona University
,
Ricardo Britto
Ericsson, Blekinge Institue of Technology, Sweden

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

In-Cooperation

IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 September 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICGSE '20

Sponsor:

SIGSOFT

ICGSE '20: 15th IEEE/ACM International Conference on Global Software Engineering

June 26 - 28, 2020

Seoul, Republic of Korea

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
294
Total Downloads

Downloads (Last 12 months)63
Downloads (Last 6 weeks)7

Reflects downloads up to 24 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lambiase S(2024)Cultural and Socio-Technical Aspects in Software DevelopmentProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661230(482-487)Online publication date: 18-Jun-2024
https://dl.acm.org/doi/10.1145/3661167.3661230
Tahsin NMd. Mahbubul Alam Joarder (2023)Community smells in software engineering: A systematic literature reviewSystematic Literature Review and Meta-Analysis Journal10.54480/slr-m.v3i4.513:4(127-145)Online publication date: 15-Aug-2023
https://doi.org/10.54480/slr-m.v3i4.51
Nanthaamornphong ABoonchieng E(2023)An Exploratory Study on Code Smells during Code Review in OSS Projects: A Case Study on OpenStack and WikiMediaRecent Advances in Computer Science and Communications10.2174/266625581666623022211231316:7Online publication date: Sep-2023
https://doi.org/10.2174/2666255816666230222112313
Tahsin NSakib K(2023)Exploring Community Smell Co-occurrences in the Context of Bangladesh: An Empirical Study2023 IEEE/ACM 11th International Workshop on Software Engineering for Systems-of-Systems and Software Ecosystems (SESoS)10.1109/SESoS59159.2023.00009(22-29)Online publication date: May-2023
https://doi.org/10.1109/SESoS59159.2023.00009
Sellami KSaied MOuni A(2022)A Hierarchical DBSCAN Method for Extracting Microservices from Monolithic ApplicationsProceedings of the 26th International Conference on Evaluation and Assessment in Software Engineering10.1145/3530019.3530040(201-210)Online publication date: 13-Jun-2022
https://dl.acm.org/doi/10.1145/3530019.3530040
Lambiase SCatolino GTamburri DSerebrenik APalomba FFerrucci FBegel ABlincoe K(2022)Good fences make good neighbours?Proceedings of the 2022 ACM/IEEE 44th International Conference on Software Engineering: Software Engineering in Society10.1145/3510458.3513015(67-78)Online publication date: 21-May-2022
https://dl.acm.org/doi/10.1145/3510458.3513015
Sarmento CMassoni TSerebrenik ACatolino GTamburri DPalomba F(2022)Gender Diversity and Community Smells: A Double-Replication Study on Brazilian Software Teams2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER53432.2022.00043(273-283)Online publication date: Mar-2022
https://doi.org/10.1109/SANER53432.2022.00043
Voria GPentangelo VPorta ALambiase SCatolino GPalomba FFerrucci F(2022)Community Smell Detection and Refactoring in SLACK: The CADOCS Project2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME55016.2022.00061(469-473)Online publication date: Oct-2022
https://doi.org/10.1109/ICSME55016.2022.00061
Lambiase SCatolino GTamburri DSerebrenik APalomba FFerrucci F(2022)Good Fences Make Good Neighbours? On the Impact of Cultural and Geographical Dispersion on Community Smells2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS)10.1109/ICSE-SEIS55304.2022.9793992(67-78)Online publication date: May-2022
https://doi.org/10.1109/ICSE-SEIS55304.2022.9793992
Tahsin NSakib K(2022)Refactoring Community Smells: An Empirical Study on the Software Practitioners of Bangladesh2022 29th Asia-Pacific Software Engineering Conference (APSEC)10.1109/APSEC57359.2022.00055(422-426)Online publication date: Dec-2022
https://doi.org/10.1109/APSEC57359.2022.00055
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents