skip to main content
research-article

Preliminary Causal Discovery Results with Software Effort Estimation Data

Published: 09 February 2018 Publication History

Abstract

Correlation does not imply causation. Though this is a well-known fact, most analyses depend on correlation as proof of relationships that are often treated as causal. Causal discovery, also referred to as causal model search, involves the application of statistical methods to identify causal relationships from conditional independences (and/or other statistical relationships) in the data. Though software cost estimation models use both domain knowledge and statistics, to date, there has yet to be a published report describing the evaluation of a software dataset using causal discovery. Two of the authors have previously used regression analysis to evaluate the effectiveness of the International Function Points User Group (IFPUG)'s and the Common Software Measurement International Consortium (COSMIC)'s functional size measurement methods for analyzing the Unified Code Count (UCC)1's dataset of maintenance tasks. Using the same dataset, the authors will report in this paper on what types of information causal discovery provides, and how they differ from correlation tests. This paper will introduce causal discovery to software engineering research, and its use in the future may impact how software effort models are built.

References

[1]
Alain Abran, Serge Oligny, and Charles Symons. 2000. COSMIC FFP and the world-wide field trials strategy. New Approaches in Software Measurement (October 2000), 125--134.
[2]
Allan J. Albrecht and John E Gaffney. 1983. Software function, source lines of code, and development effort prediction: a software science validation. IEEE transactions on software engineering 6 (1983), 639--648.
[3]
Constantin F Aliferis, Ioannis Tsamardinos, Alexander R Statnikov, and Laura E Brown. 2003. Causal Explorer: A Causal Probabilistic Network Learning Toolkit for Biomedical Discovery. In METMBS, Vol. 3. 371--376.
[4]
Barry W Boehm et al. 1981. Software engineering economics. Vol. 197. Prentice-hall Englewood Cliffs (NJ).
[5]
Barry W Boehm, Ray Madachy, Bert Steece, et al. 2000. Software cost estimation with Cocomo II with Cdrom. Prentice Hall PTR.
[6]
Eugenio Brentari, Maurizio Carpita, and Silvia Golia. {n. d.}. INSPECTING THE QUALITY OF ITALIAN WINE THROUGH CAUSAL REASONING. In BOOK OF ABSTRACTS. 521.
[7]
Cesar Couto, Pedro Pires, Marco Tulio Valente, Roberto S Bigonha, and Nicolas Anquetil. 2014. Predicting software defects with causality tests. Journal of Systems and Software 93 (2014), 24--41.
[8]
Marek J Druzdze and Clark Glymour. 1994. Application of the TETRAD II Program to the Study of Student Retention in US Colleges. In KDD Workshop. 419--430.
[9]
Imme Ebert-Uphoff and Yi Deng. 2012. Causal discovery for climate research using graphical models. Journal of Climate 25, 17 (2012), 5648--5665.
[10]
Felix Elwert. 2013. Graphical causal models. In Handbook of causal analysis for social research. Springer, 245--273.
[11]
Abdolreza Eshghi, Dominique Haughton, and Heikki Topi. 2007. Determinants of customer loyalty in the wireless telecommunications industry. Telecommunications policy 31, 2 (2007), 93--106.
[12]
Egil Ferkingstad, Anders Løland, and Mathilde Wilhelmsen. 2011. Causal modeling and inference for electricity markets. Energy Economics 33, 3 (2011), 404--412.
[13]
Ronald Aylmer Fisher. 1925. Statistical methods for research workers. Genesis Publishing Pvt Ltd.
[14]
M Maria Glymour. 2006. Using causal diagrams to understand common problems in social epidemiology. Methods in social epidemiology (2006), 393--428.
[15]
TE Hastings and ASM Sajeev. 2001. A vector-based approach to software size measurement and effort estimation. IEEE Transactions on Software Engineering 27, 4 (2001), 337--350.
[16]
Anandi Hira and Barry Boehm. 2016. Function Point Analysis for Software Maintenance. In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. ACM, 48.
[17]
Anandi Hira and Barry Boehm. 2016. Using Software Non-Functional Assessment Process to Complement Function Points for Software Maintenance. In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. ACM, 50.
[18]
Anandi Hira and Barry Boehm. 2018. COSMIC Function Points Evaluation for Software Maintenance. In Proceedings of the 11th Innovations in Software Engineering Conference, Submitted. ACM.
[19]
Anandi Hira, Shreya Sharma, and Barry Boehm. 2016. Calibrating COCOMO® II for projects with high personnel turnover. In Proceedings of the International Workshop on Software and Systems Process. ACM, 51--55.
[20]
Paul W Holland, Clark Glymour, and Clive Granger. 1985. Statistics and causal inference. ETS Research Report Series 1985, 2 (1985).
[21]
Yong Hu, Xiangzhou Zhang, EWT Ngai, Ruichu Cai, and Mei Liu. 2013. Software project risk analysis using Bayesian networks with causality constraints. Decision Support Systems 56 (2013), 439--449.
[22]
Ghiyoung Im and Jijie Wang. 2007. A TETRAD-based approach for theory development in information systems research. Communications of the Association for Information Systems 20, 1 (2007), 22.
[23]
Yothin Jinjarak and Steven M Sheffrin. 2011. Causality, real estate prices, and the current account. Journal of Macroeconomics 33, 2 (2011), 233--246.
[24]
Chris F Kemerer. 1987. An empirical validation of software cost estimation models. Commun. ACM 30, 5 (1987), 416--429.
[25]
Barbara Kitchenham. 1997. Counterpoint: the problem with function points. IEEE software 14, 2 (1997), 29.
[26]
Barbara A Kitchenham, Tore Dyba, and Magne Jorgensen. 2004. Evidence-based software engineering. In Proceedings of the 26th international conference on software engineering. IEEE Computer Society, 273--281.
[27]
Marcus Klasson, Kun Zhang, Bo C Bertilson, Cheng Zhang, and Hedvig Kjellström. 2017. Causality Refined Diagnostic Prediction. arXiv preprint arXiv:1711.10915 (2017).
[28]
JA Landsheer. 2010. The specification of causal models with Tetrad IV: A review. Structural Equation Modeling 17, 4 (2010), 703--711.
[29]
Liping Liu. 2009. Technology acceptance model: A replicated test using TETRAD. International Journal of Intelligent Systems 24, 12 (2009), 1230--1242.
[30]
Daniel Malinsky and David Danks. 2017. Causal discovery algorithms: A practical guide. Philosophy Compass (2017).
[31]
Thomas J McCabe. 1976. A complexity measure. IEEE Transactions on software Engineering 4 (1976), 308--320.
[32]
Vu Nguyen. 2010. Improved size and effort estimation models for software maintenance (Software Engineering). Ph.D. Dissertation. Ph. D. Dissertation. University of Southern California, Los Angeles, CA. UTI Order.
[33]
Robert E Park. 1992. Software size measurement: A framework for counting source statements. Technical Report. DTIC Document.
[34]
Judea Pearl. 2001. Causal inference in the health sciences: a conceptual introduction. Health services and outcomes research methodology 2, 3 (2001), 189--220.
[35]
Judea Pearl, Madelyn Glymour, and Nicholas P Jewell. 2016. Causal inference in statistics: a primer. John Wiley & Sons.
[36]
Joseph D Ramsey, Stephen José Hanson, Catherine Hanson, Yaroslav O Halchenko, Russell A Poldrack, and Clark Glymour. 2010. Six problems for causal inference from fMRI. neuroimage 49, 2 (2010), 1545--1558.
[37]
Andrew J Rettenmaier and Zijun Wang. 2013. What determines health: a causal analysis using county level data. The European Journal of Health Economics 14, 5 (2013), 821--834.
[38]
Ruben Sanchez-Romero, Joseph D Ramsey, Jackson C Liang, and Clark Glymour. 2017. Identification of Mechanisms of Functional Signaling Between Human Hippocampus Regions. bioRxiv (2017), 099820.
[39]
Andrew J Sedgewick, Joseph D Ramsey, Peter Spirtes, Clark Glymour, and Panayiotis V Benos. 2017. Mixed Graphical Models for Causal Analysis of Multi-modal Variables. arXiv preprint arXiv:1704.02621 (2017).
[40]
William R. Shadish, Thomas D Cook, and Donald Thomas Campbell. 2002. Experimental and quasi-experimental designs for generalized causal inference. Wadsworth Cengage learning.
[41]
Peter Spirtes. 2010. Introduction to causal inference. Journal of Machine Learning Research 11, May (2010), 1643--1662.
[42]
Božidar Tepeš, Gordana Lešin, Ana Hrkač, and Krunoslav Tepeš. 2016. Causal Bayes Model of Mathematical Competence in Kindergarten. Journal of systemics, cybernetics and informatics 14, 3 (2016), 14--17.
[43]
Charley Tichenor. 2013. A new software metric to complement function points: the Software Non-functional Assessment Process (SNAP). Technical Report. DEFENSE SECURITY COOPERATION AGENCY WASHINGTON DC.

Cited By

View all
  • (2023)Applications of statistical causal inference in software engineeringInformation and Software Technology10.1016/j.infsof.2023.107198159:COnline publication date: 10-May-2023
  • (2023)A practical approach to explaining defect proneness of code commits by causal discoveryEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.106187123(106187)Online publication date: Aug-2023
  • (2020)RETRACTED ARTICLE: Ensemble learning with recursive feature elimination integrated software effort estimation: a novel approachEvolutionary Intelligence10.1007/s12065-020-00360-514:1(151-162)Online publication date: 17-Feb-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ISEC '18: Proceedings of the 11th Innovations in Software Engineering Conference
February 2018
154 pages
ISBN:9781450363983
DOI:10.1145/3172871
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • iSOFT: iSOFT

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 February 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CFPs
  2. COCOMO
  3. COSMIC Function Points
  4. Causal Discovery
  5. Causal Inference
  6. Cost Estimation
  7. Effort Estimation
  8. Function Point Analysis
  9. IFPUG
  10. SLOC
  11. Software Non-Functional Assessment Process
  12. Source Lines of Code
  13. causal discovery

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ISEC '18

Acceptance Rates

Overall Acceptance Rate 76 of 315 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Applications of statistical causal inference in software engineeringInformation and Software Technology10.1016/j.infsof.2023.107198159:COnline publication date: 10-May-2023
  • (2023)A practical approach to explaining defect proneness of code commits by causal discoveryEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.106187123(106187)Online publication date: Aug-2023
  • (2020)RETRACTED ARTICLE: Ensemble learning with recursive feature elimination integrated software effort estimation: a novel approachEvolutionary Intelligence10.1007/s12065-020-00360-514:1(151-162)Online publication date: 17-Feb-2020

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media