skip to main content
research-article

On the detection of community smells using genetic programming-based ensemble classifier chain

Published: 25 September 2020 Publication History

Abstract

Community smells are symptoms of organizational and social issues within the software development community that often increase the project costs and impact software quality. Recent studies have identified a variety of community smells and defined them as suboptimal patterns connected to organizational-social structures in the software development community such as the lack of communication, coordination and collaboration. Recognizing the advantages of the early detection of potential community smells in a software project, we introduce a novel approach that learns from various community organizational and social practices to provide an automated support for detecting community smells. In particular, our approach learns from a set of interleaving organizational-social symptoms that characterize the existence of community smell instances in a software project. We build a multi-label learning model to detect 8 common types of community smells. We use the ensemble classifier chain (ECC) model that transforms multi-label problems into several single-label problems which are solved using genetic programming (GP) to find the optimal detection rules for each smell type. To evaluate the performance of our approach, we conducted an empirical study on a benchmark of 103 open source projects and 407 community smell instances. The statistical tests of our results show that our approach can detect the eight considered smell types with an average F-measure of 89% achieving a better performance compared to different state-of-the-art techniques. Furthermore, we found that the most influential factors that best characterize community smells include the social network density and closeness centrality as well as the standard deviation of the number of developers per time zone and per community.

References

[1]
2020. Replication Package. https://github.com/GP-ECC/community-smells
[2]
T. Mukhopadhyay A. Gopal and M. S. Krishnan. 2002. The role of software processes and communication in offshore software development. In Communications of the ACM April 2002. Association for Computing Machinery, New York, NY, United States, USA, 1106--1113.
[3]
Peter John Angeline. 1994. Genetic programming and emergent intelligence. Advances in genetic programming 1 (1994), 75--98.
[4]
Guilherme Avelino, Leonardo Passos, Andre Hora, and Marco Tulio Valente. 2016. A novel approach for estimating truck factors. In IEEE 24th International Conference on Program Comprehension (ICPC). 1--10.
[5]
Christian Bird, Nachiappan Nagappan, Premkumar Devanbu, Harald Gall, and Brendan Murphy. 2009. Does distributed development affect software quality? An empirical case study of Windows Vista. In Proceedings of the 31st international conference on software engineering. 518--528.
[6]
Christian Bird, Nachiappan Nagappan, Harald Gall, Brendan Murphy, and Premkumar Devanbu. 2009. Putting it all together: Using socio-technical networks to predict failures. In 20th International Symposium on Software Reliability Engineering. 109--119.
[7]
Marcelo Cataldo, James D Herbsleb, and Kathleen M Carley. 2008. Socio-technical congruence: a framework for assessing the impact of technical and work dependencies on software development productivity. In Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement. 2--11.
[8]
Marcelo Cataldo, Audris Mockus, Jeffrey A Roberts, and James D Herbsleb. 2009. Software dependencies, work dependencies, and their impact on failures. IEEE Transactions on Software Engineering 35, 6 (2009), 864--878.
[9]
Marcelo Cataldo and Sangeeth Nambiar. 2009. On the relationship between process maturity and geographic distribution: an empirical analysis of their impact on software quality. In Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering. 101--110.
[10]
Marcelo Cataldo and Sangeeth Nambiar. 2012. The impact of geographic distribution and the nature of technical coupling on the quality of global software development projects. Journal of software: Evolution and Process 24, 2 (2012), 153--168.
[11]
Norman Cliff. 1993. Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin 114, 3 (1993), 494.
[12]
Jacob Cohen. 2013. Statistical power analysis for the behavioral sciences. Routledge.
[13]
V. Cosentino, J. L. C. Izquierdo, and J. Cabot. 2015. Assessing the bus factor of Git repositories. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). 499--503.
[14]
Stefano Invernizzi Elisabetta Di Nitto Damian A. Tamburri, Simone Gatti. 2016. Re-Architecting Software Forges into Communities: An Experience Report. In JOURNAL OF SOFTWARE: EVOLUTION AND PROCESS. 1--26.
[15]
André C. P. L. F. de Carvalho and Alex A. Freitas. 2009. A Tutorial on Multi-label Classification Techniques. 177--195.
[16]
Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and TAMT Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE transactions on evolutionary computation 6, 2 (2002), 182--197.
[17]
Yvonne Dittrich, Jacob Nørbjerg, Paolo Tell, and Lars Bendix. 2018. Researching cooperation and communication in continuous software engineering. In 2018 IEEE/ACM 11th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). IEEE, 87--90.
[18]
Yang Feng and Zhenyu Chen. 2012. Multi-label software behavior learning. In 34th International Conference on Software Engineering (ICSE). 1305--1308.
[19]
Mívian Ferreira, Guilherme Avelino, Marco Tulio Valente, and Kecia AM Ferreira. 2016. A Comparative Study of Algorithms for Estimating Truck Factor. In Brazilian Symposium on Software Components, Architectures and Reuse (SBCARS). 91--100.
[20]
Fred W Glover and Gary A Kochenberger. 2006. Handbook of metaheuristics. Vol. 57. Springer Science & Business Media.
[21]
Mark Harman. 2007. The current state and future of search based software engineering. (2007), 342--357.
[22]
Mark Harman and John Clark. 2004. Metrics are fitness functions too. In 10th International Symposium on Software Metrics. 58--69.
[23]
Mark Harman and Bryan F Jones. 2001. Search-based software engineering. Information and software Technology 43, 14 (2001), 833--839.
[24]
Mark Harman, S Afshin Mansouri, and Yuanyuan Zhang. 2012. Search-based software engineering: Trends, techniques and applications. ACM Computing Surveys (CSUR) 45, 1 (2012), 11.
[25]
James D. Herbsleb and Audris Mockus. 2003. An empirical study of speed and communication in globally distributed software development. IEEE Transactions on software engineering 29, 6 (2003), 481--494.
[26]
Katherine J Hunt, Natalie Shlomo, and Julia Addington-Hall. 2013. Participant recruitment in sensitive surveys: a comparative trial of 'opt in'versus 'opt out'approaches. BMC Medical Research Methodology 13, 1 (2013), 3.
[27]
M. Joblin, W. Mauerer, S. Apel, J. Siegmund, and D. Riehle. 2015. From Developer Networks to Verified Communities: A Fine-Grained Approach. In 37th IEEE International Conference on Software Engineering (ICSE), Vol. 1. 563--573.
[28]
M. John R. Koza. 1992. Genetic Programming: On Programming Computers by means of Natural Selection and Genetics. In MIT Press, Cambridge, MA, 1992. Association for Computing Machinery, New York, NY, United States.
[29]
M. Kessentini and A. Ouni. 2017. Detecting Android Smells Using Multi-Objective Genetic Programming. In IEEE/ACM 4th International Conference on Mobile Software Engineering and Systems (MOBILESoft). 122--132.
[30]
Rensis Likert. 1932. A technique for the measurement of attitudes. Archives of psychology (1932).
[31]
Stuart McIlroy, Nasir Ali, Hammad Khalid, and Ahmed E Hassan. 2016. Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empirical Software Engineering 21, 3 (2016), 1067--1106.
[32]
Andrew Meneely and Laurie A. Williams. 2011. Socio-technical developer networks: should we trust our measurements? 2011 33rd International Conference on Software Engineering (ICSE) (2011), 281--290.
[33]
Nachiappan Nagappan, Brendan Murphy, and Victor Basili. 2008. The influence of organizational structure on software quality. In 2008 ACM/IEEE 30th International Conference on Software Engineering. 521--530.
[34]
Nachiappan Nagappan, Brendan Murphy, and Victor Basili. 2008. The Influence of Organizational Structure on Software Quality: An Empirical Case Study. In Proceedings of the 30th International Conference on Software Engineering. 521--530.
[35]
Martin Nordio, H Christian Estler, Bertrand Meyer, Julian Tschannen, Carlo Ghezzi, and Elisabetta Di Nitto. 2011. How do distribution and time zones affect software development? a case study on communication. In 2011 IEEE Sixth International Conference on Global Software Engineering. IEEE, 176--184.
[36]
Ali Ouni, Raula Gaikovina Kula, Marouane Kessentini, and Katsuro Inoue. 2015. Web service antipatterns detection using genetic programming. In Annual Conference on Genetic and Evolutionary Computation (GECCO). 1351--1358.
[37]
A. Ouni, M. Kessentini, K. Inoue, and M. Ó. Cinnéide. 2017. Search-Based Web Service Antipatterns Detection. IEEE Transactions on Services Computing 10, 4 (July 2017), 603--617.
[38]
Ali Ouni, Marouane Kessentini, and Houari Sahraoui. 2013. Search-based refactoring using recorded code changes. In 2013 17th European Conference on Software Maintenance and Reengineering. IEEE, 221--230.
[39]
Ali Ouni, Marouane Kessentini, Houari Sahraoui, and Mounir Boukadoum. 2013. Maintainability defects detection and correction: a multi-objective approach. Automated Software Engineering 20, 1 (2013), 47--79.
[40]
Ali Ouni, Marouane Kessentini, Houari Sahraoui, and Mohamed Salah Hamdi. 2012. Search-based refactoring: Towards semantics preservation. In 28th IEEE International Conference on Software Maintenance (ICSM). IEEE, 347--356.
[41]
Ali Ouni, Marouane Kessentini, Houari Sahraoui, and Mohamed Salah Hamdi. 2013. The use of development history in software refactoring using a multi-objective evolutionary algorithm. In Proceedings of the 15th annual conference on Genetic and evolutionary computation. 1461--1468.
[42]
Ali Ouni, Marouane Kessentini, Houari Sahraoui, Katsuro Inoue, and Kalyanmoy Deb. 2016. Multi-criteria code refactoring using search-based software engineering: An industrial case study. ACM Transactions on Software Engineering and Methodology (TOSEM) 25, 3 (2016), 1--53.
[43]
Ali Ouni, Raula Gaikovina Kula, and Katsuro Inoue. 2016. Search-based peer reviewers recommendation in modern code review. In IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 367--377.
[44]
Ali Ouni, Raula Gaikovina Kula, Marouane Kessentini, Takashi Ishio, Daniel M German, and Katsuro Inoue. 2017. Search-based software library recommendation using multi-objective optimization. Information and Software Technology 83 (2017), 55--75.
[45]
Fabio Palomba, Damian Andrew Andrew Tamburri, Francesca Arcelli Fontana, Rocco Oliveto, Andy Zaidman, and Alexander Serebrenik. 2018. Beyond technical aspects: How do community smells influence the intensity of code smells? IEEE transactions on software engineering (2018).
[46]
Martin Pinzger, Nachiappan Nagappan, and Brendan Murphy. 2008. Can Developer-module Networks Predict Failures?. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 2--12.
[47]
Andy Podgurski, David Leon, Patrick Francis, Wes Masri, Melinda Minch, Jiayang Sun, and Bin Wang. 2003. Automated support for classifying software failure reports. In 25th International Conference on Software Engineering, 2003. Proceedings. IEEE, 465--475.
[48]
Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. 2011. Classifier chains for multi-label classification. Machine learning 85, 3 (2011), 333.
[49]
Motoshi Saeki. 1995. Communication, collaboration and cooperation in software development-how should we support group work in software development?. In Proceedings 1995 Asia Pacific Software Engineering Conference. IEEE, 12--20.
[50]
WIlliam Sugar. 2014. Studies of ID practices: A review and synthesis of research on ID current practices. Springer.
[51]
Damian A Tamburri, Rick Kazman, and Hamed Fahimi. 2016. The architect's role in community shepherding. IEEE Software 33, 6 (2016), 70--79.
[52]
Damian A Tamburri, Philippe Kruchten, Patricia Lago, and Hans van Vliet. 2013. What is social debt in software engineering?. In International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). IEEE, 93--96.
[53]
D. A. Tamburri, P. Kruchten, P. Lago, and H. van Vliet. 2013. What is social debt in software engineering?. In International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE). 93--96.
[54]
Damian A. Tamburri, Philippe Kruchten, Patricia Lago, and Hans van Vliet. 2015. Social debt in software engineering: insights from industry. Journal of Internet Services and Applications 6, 1 (04 May 2015), 10.
[55]
Damian A. Tamburri, Fabio Palomba, Alexander Serebrenik, and Andy Zaidman. 2018. Discovering community patterns in open-source: a systematic approach and its evaluation. Empirical Software Engineering (2018).
[56]
D. A. A. Tamburri, F. Palomba, and R. Kazman. 2019. Exploring Community Smells in Open-Source: An Automated Approach. IEEE Transactions on Software Engineering (2019), 1--1.
[57]
Grigorios Tsoumakas and Ioannis Katakis. 2007. Multi-label classification: An overview. International Journal of Data Warehousing and Mining 3, 3 (2007), 1--13.
[58]
Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2010. Mining Multilabel Data. 667--685.
[59]
Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2010. Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering 23, 7 (2010), 1079--1089.
[60]
Xin Xia, Yang Feng, David Lo, Zhenyu Chen, and Xinyu Wang. 2014. Towards more accurate multi-label software behavior learning. In Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE). 134--143.
[61]
Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern recognition 40, 7 (2007), 2038--2048.

Cited By

View all
  • (2024)Cultural and Socio-Technical Aspects in Software DevelopmentProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661230(482-487)Online publication date: 18-Jun-2024
  • (2023)Community smells in software engineering: A systematic literature reviewSystematic Literature Review and Meta-Analysis Journal10.54480/slr-m.v3i4.513:4(127-145)Online publication date: 15-Aug-2023
  • (2023)An Exploratory Study on Code Smells during Code Review in OSS Projects: A Case Study on OpenStack and WikiMediaRecent Advances in Computer Science and Communications10.2174/266625581666623022211231316:7Online publication date: Sep-2023
  • Show More Cited By

Index Terms

  1. On the detection of community smells using genetic programming-based ensemble classifier chain

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICGSE '20: Proceedings of the 15th International Conference on Global Software Engineering
    June 2020
    147 pages
    ISBN:9781450370936
    DOI:10.1145/3372787
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • IEEE CS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 September 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. community smells
    2. genetic programming
    3. multi-label learning
    4. search-based software engineering
    5. social debt
    6. socio-technical factors

    Qualifiers

    • Research-article

    Conference

    ICGSE '20
    Sponsor:

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)63
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 24 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Cultural and Socio-Technical Aspects in Software DevelopmentProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661230(482-487)Online publication date: 18-Jun-2024
    • (2023)Community smells in software engineering: A systematic literature reviewSystematic Literature Review and Meta-Analysis Journal10.54480/slr-m.v3i4.513:4(127-145)Online publication date: 15-Aug-2023
    • (2023)An Exploratory Study on Code Smells during Code Review in OSS Projects: A Case Study on OpenStack and WikiMediaRecent Advances in Computer Science and Communications10.2174/266625581666623022211231316:7Online publication date: Sep-2023
    • (2023)Exploring Community Smell Co-occurrences in the Context of Bangladesh: An Empirical Study2023 IEEE/ACM 11th International Workshop on Software Engineering for Systems-of-Systems and Software Ecosystems (SESoS)10.1109/SESoS59159.2023.00009(22-29)Online publication date: May-2023
    • (2022)A Hierarchical DBSCAN Method for Extracting Microservices from Monolithic ApplicationsProceedings of the 26th International Conference on Evaluation and Assessment in Software Engineering10.1145/3530019.3530040(201-210)Online publication date: 13-Jun-2022
    • (2022)Good fences make good neighbours?Proceedings of the 2022 ACM/IEEE 44th International Conference on Software Engineering: Software Engineering in Society10.1145/3510458.3513015(67-78)Online publication date: 21-May-2022
    • (2022)Gender Diversity and Community Smells: A Double-Replication Study on Brazilian Software Teams2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER53432.2022.00043(273-283)Online publication date: Mar-2022
    • (2022)Community Smell Detection and Refactoring in SLACK: The CADOCS Project2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME55016.2022.00061(469-473)Online publication date: Oct-2022
    • (2022)Good Fences Make Good Neighbours? On the Impact of Cultural and Geographical Dispersion on Community Smells2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS)10.1109/ICSE-SEIS55304.2022.9793992(67-78)Online publication date: May-2022
    • (2022)Refactoring Community Smells: An Empirical Study on the Software Practitioners of Bangladesh2022 29th Asia-Pacific Software Engineering Conference (APSEC)10.1109/APSEC57359.2022.00055(422-426)Online publication date: Dec-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media