Improving fault localization and program repair with deep semantic features and transferred knowledge

X Meng, X Wang, H Zhang, H Sun, X Liu�- Proceedings of the 44th�…, 2022 - dl.acm.org
Proceedings of the 44th International Conference on Software Engineering, 2022dl.acm.org
Automatic software debugging mainly includes two tasks of fault localization and automated
program repair. Compared with the traditional spectrum-based and mutation-based
methods, deep learning-based methods are proposed to achieve better performance for
fault localization. However, the existing methods ignore the deep semantic features or only
consider simple code representations. They do not leverage the existing bug-related
knowledge from large-scale open-source projects either. In addition, existing template�…
Automatic software debugging mainly includes two tasks of fault localization and automated program repair. Compared with the traditional spectrum-based and mutation-based methods, deep learning-based methods are proposed to achieve better performance for fault localization. However, the existing methods ignore the deep semantic features or only consider simple code representations. They do not leverage the existing bug-related knowledge from large-scale open-source projects either. In addition, existing template-based program repair techniques can incorporate project specific information better than deep-learning approaches. However, they are weak in selecting the fix templates for efficient program repair. In this work, we propose a novel approach called TRANSFER, which leverages the deep semantic features and transferred knowledge from open-source data to improve fault localization and program repair. First, we build two large-scale open-source bug datasets and design 11 BiLSTM-based binary classifiers and a BiLSTM-based multi-classifier to learn deep semantic features of statements for fault localization and program repair, respectively. Second, we combine semantic-based, spectrum-based and mutation-based features and use an MLP-based model for fault localization. Third, the semantic-based features are leveraged to rank the fix templates for program repair. Our extensive experiments on widely-used benchmark De-fects4J show that TRANSFER outperforms all baselines in fault localization, and is better than existing deep-learning methods in automated program repair. Compared with the typical template-based work TBar, TRANSFER can correctly repair 6 more bugs (47 in total) on Defects4J.
ACM Digital Library