Skip to main content

Showing 1–24 of 24 results for author: Tan, S H

  1. arXiv:2409.14610  [pdf, other

    cs.SE

    An Empirical Study of Refactoring Engine Bugs

    Authors: Haibo Wang, Zhuolin Xu, Huaien Zhang, Nikolaos Tsantalis, Shin Hwei Tan

    Abstract: Refactoring is a critical process in software development, aiming at improving the internal structure of code while preserving its external behavior. Refactoring engines are integral components of modern Integrated Development Environments (IDEs) and can automate or semi-automate this process to enhance code readability, reduce complexity, and improve the maintainability of software products. Like… ▽ More

    Submitted 22 September, 2024; originally announced September 2024.

  2. arXiv:2409.14541  [pdf, other

    cs.SE

    Tumbling Down the Rabbit Hole: How do Assisting Exploration Strategies Facilitate Grey-box Fuzzing?

    Authors: Mingyuan Wu, Jiahong Xiang, Kunqiu Chen, Peng DI, Shin Hwei Tan, Heming Cui, Yuqun Zhang

    Abstract: Many assisting exploration strategies have been proposed to assist grey-box fuzzers in exploring program states guarded by tight and complex branch conditions such as equality constraints. Although they have shown promising results in their original papers, their evaluations seldom follow equivalent protocols, e.g., they are rarely evaluated on identical benchmarks. Moreover, there is a lack of su… ▽ More

    Submitted 24 September, 2024; v1 submitted 22 September, 2024; originally announced September 2024.

    Comments: Accepted at ICSE 2025

  3. arXiv:2408.13855  [pdf, other

    cs.SE

    An Empirical Study of False Negatives and Positives of Static Code Analyzers From the Perspective of Historical Issues

    Authors: Han Cui, Menglei Xie, Ting Su, Chengyu Zhang, Shin Hwei Tan

    Abstract: Static code analyzers are widely used to help find program flaws. However, in practice the effectiveness and usability of such analyzers is affected by the problems of false negatives (FNs) and false positives (FPs). This paper aims to investigate the FNs and FPs of such analyzers from a new perspective, i.e., examining the historical issues of FNs and FPs of these analyzers reported by the mainta… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  4. arXiv:2405.02213  [pdf, other

    cs.SE cs.AI cs.LG

    Automatic Programming: Large Language Models and Beyond

    Authors: Michael R. Lyu, Baishakhi Ray, Abhik Roychoudhury, Shin Hwei Tan, Patanamon Thongtanunam

    Abstract: Automatic programming has seen increasing popularity due to the emergence of tools like GitHub Copilot which rely on Large Language Models (LLMs). At the same time, automatically generated code faces challenges during deployment due to concerns around quality and trust. In this article, we study automated coding in a general sense and study the concerns around code quality, security and related is… ▽ More

    Submitted 15 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  5. arXiv:2404.08877  [pdf, other

    cs.SE cs.CL cs.LG

    Aligning LLMs for FL-free Program Repair

    Authors: Junjielong Xu, Ying Fu, Shin Hwei Tan, Pinjia He

    Abstract: Large language models (LLMs) have achieved decent results on automated program repair (APR). However, the next token prediction training objective of decoder-only LLMs (e.g., GPT-4) is misaligned with the masked span prediction objective of current infilling-style methods, which impedes LLMs from fully leveraging pre-trained knowledge for program repair. In addition, while some LLMs are capable of… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  6. arXiv:2402.14366  [pdf, other

    cs.SE

    Understanding and Detecting Annotation-Induced Faults of Static Analyzers

    Authors: Huaien Zhang, Yu Pei, Shuyun Liang, Shin Hwei Tan

    Abstract: Static analyzers can reason about the properties and behaviors of programs and detect various issues without executing them. Hence, they should extract the necessary information to understand the analyzed program well. Annotation has been a widely used feature for different purposes in Java since the introduction of Java 5. Annotations can change program structures and convey semantics information… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 23 pages, 16 figures

  7. arXiv:2401.15234  [pdf, other

    cs.SE

    Moving beyond Deletions: Program Simplification via Diverse Program Transformations

    Authors: Haibo Wang, Zezhong Xing, Zheng Wang, Chengnian Sun, Shin Hwei Tan

    Abstract: To reduce the complexity of software, Developers manually simplify program (known as developer-induced program simplification in this paper) to reduce its code size yet preserving its functionality but manual simplification is time-consuming and error-prone. To reduce manual effort, rule-based approaches (e.g., refactoring) and deletion-based approaches (e.g., delta debugging) can be potentially a… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

  8. LPR: Large Language Models-Aided Program Reduction

    Authors: Mengxiao Zhang, Yongqiang Tian, Zhenyang Xu, Yiwen Dong, Shin Hwei Tan, Chengnian Sun

    Abstract: Program reduction is a prevalent technique to facilitate compilers' debugging by automatically minimizing bug-triggering programs. Existing program reduction techniques are either generic across languages (e.g., Perses and Vulcan) or specifically customized for one certain language by employing language-specific features, like C-Reduce. However, striking the balance between generality across multi… ▽ More

    Submitted 11 May, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: Accepted by ISSTA'24. This is the preprint version

  9. arXiv:2312.05778  [pdf, other

    cs.SE

    Guiding ChatGPT to Fix Web UI Tests via Explanation-Consistency Checking

    Authors: Zhuolin Xu, Qiushi Li, Shin Hwei Tan

    Abstract: The rapid evolution of Web UI incurs time and effort in maintaining UI tests. Existing techniques in Web UI test repair focus on finding the target elements on the new web page that match the old ones so that the corresponding broken statements can be repaired. We present the first study that investigates the feasibility of using prior Web UI repair techniques for initial local matching and then u… ▽ More

    Submitted 26 January, 2024; v1 submitted 10 December, 2023; originally announced December 2023.

  10. arXiv:2305.00918  [pdf, other

    cs.CV cs.AI cs.LG

    CORSD: Class-Oriented Relational Self Distillation

    Authors: Muzhou Yu, Sia Huat Tan, Kailu Wu, Runpei Dong, Linfeng Zhang, Kaisheng Ma

    Abstract: Knowledge distillation conducts an effective model compression method while holding some limitations:(1) the feature based distillation methods only focus on distilling the feature map but are lack of transferring the relation of data examples; (2) the relational distillation methods are either limited to the handcrafted functions for relation extraction, such as L2 norm, or weak in inter- and int… ▽ More

    Submitted 28 April, 2023; originally announced May 2023.

    Comments: 4 pages, 4 figures, accepted to ICASSP2023

  11. arXiv:2302.11985  [pdf, other

    cs.SE

    Automatic Detecting Unethical Behavior in Open-source Software Projects

    Authors: Hsu Myat Win, Haibo Wang, Shin Hwei Tan

    Abstract: Given the rapid growth of Open-Source Software (OSS) projects, ethical considerations are becoming more important. Past studies focused on specific ethical issues (e.g., gender bias and fairness in OSS). There is little to no study on the different types of unethical behavior in OSS projects. We present the first study of unethical behavior in OSS projects from the stakeholders' perspective. Our s… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

    Comments: 11 pages

  12. arXiv:2205.10583  [pdf, other

    cs.SE

    Automated Repair of Programs from Large Language Models

    Authors: Zhiyu Fan, Xiang Gao, Martin Mirchev, Abhik Roychoudhury, Shin Hwei Tan

    Abstract: Large language models such as Codex, have shown the capability to produce code for many programming tasks. However, the success rate of existing models is low, especially for complex programming tasks. One of the reasons is that language models lack awareness of program semantics, resulting in incorrect programs, or even programs which do not compile. In this paper, we systematically study whether… ▽ More

    Submitted 1 January, 2023; v1 submitted 21 May, 2022; originally announced May 2022.

    Comments: 12 pages, To appear in ICSE 2023

  13. arXiv:2111.02018  [pdf, other

    cs.CV

    Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention

    Authors: Sia Huat Tan, Runpei Dong, Kaisheng Ma

    Abstract: Most feedforward convolutional neural networks spend roughly the same efforts for each pixel. Yet human visual recognition is an interaction between eye movements and spatial attention, which we will have several glimpses of an object in different regions. Inspired by this observation, we propose an end-to-end trainable Multi-Glimpse Network (MGNet) which aims to tackle the challenges of high comp… ▽ More

    Submitted 12 April, 2023; v1 submitted 3 November, 2021; originally announced November 2021.

    Comments: Accepted at BMVC 2021

    Journal ref: The British Machine Vision Conference (BMVC) 2021

  14. Automated Conformance Testing for JavaScript Engines via Deep Compiler Fuzzing

    Authors: Guixin Ye, Zhanyong Tang, Shin Hwei Tan, Songfang Huang, Dingyi Fang, Xiaoyang Sun, Lizhong Bian, Haibo Wang, Zheng Wang

    Abstract: JavaScript (JS) is a popular, platform-independent programming language. To ensure the interoperability of JS programs across different platforms, the implementation of a JS engine should conform to the ECMAScript standard. However, doing so is challenging as there are many subtle definitions of API behaviors, and the definitions keep evolving. We present COMFORT, a new compiler fuzzing framewor… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

    Comments: PLDI 2021

    ACM Class: D.3.0; I.2.5

  15. arXiv:2103.13453  [pdf, other

    cs.SE

    CrossFix: Collaborative bug fixing by recommending similar bugs

    Authors: Shin Hwei Tan, Ziqiang Li, Lu Yan

    Abstract: Many automated program repair techniques have been proposed for fixing bugs. Some of these techniques use the information beyond the given buggy program and test suite to improve the quality of generated patches. However, there are several limitations that hinder the wide adoption of these techniques, including (1) they rely on a fixed set of repair templates for patch generation or reference impl… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

  16. arXiv:2011.14392  [pdf, other

    cs.SE

    GitHub-OSS Fixit: Fixing bugs at scale in a Software Engineering Course

    Authors: Shin Hwei Tan, Chunfeng Hu, Ziqiang Li, Xiaowen Zhang, Ying Zhou

    Abstract: Many studies have shown the benefits of introducing open-source projects into teaching Software Engineering (SE) courses. However, there are several limitations of existing studies that limit the wide adaptation of open-source projects in a classroom setting, including (1) the selected project is limited to one particular project, (2) most studies only investigated on its effect on teaching a spec… ▽ More

    Submitted 14 February, 2021; v1 submitted 29 November, 2020; originally announced November 2020.

    Comments: conference

  17. arXiv:1911.12796  [pdf, other

    cs.CV cs.LG eess.IV

    Light-weight Calibrator: a Separable Component for Unsupervised Domain Adaptation

    Authors: Shaokai Ye, Kailu Wu, Mu Zhou, Yunfei Yang, Sia huat Tan, Kaidi Xu, Jiebo Song, Chenglong Bao, Kaisheng Ma

    Abstract: Existing domain adaptation methods aim at learning features that can be generalized among domains. These methods commonly require to update source classifier to adapt to the target domain and do not properly handle the trade off between the source domain and the target domain. In this work, instead of training a classifier to adapt to the target domain, we use a separable component called data cal… ▽ More

    Submitted 28 February, 2020; v1 submitted 28 November, 2019; originally announced November 2019.

    Comments: Accepted by CVPR2020

  18. Internet-based Adaptive Distributed Simulation of Mobile Ad-hoc Networks

    Authors: Gabriele D'Angelo, Stefano Ferretti, Gary S. H. Tan

    Abstract: In this paper we focus on Internet-based simulation, a form of distributed simulation in which a set of execution units that are physically located around the globe work together to run a simulation model. This setup is very challenging because of the latency/variability of communications. Thus, clever mechanisms must be adopted in the distributed simulation, such as the adaptive partitioning of t… ▽ More

    Submitted 31 March, 2020; v1 submitted 30 August, 2019; originally announced August 2019.

    Comments: Proceedings of the Proceedings of the 2019 Winter Simulation Conference (WSC 2019)

  19. The State and Future of Genetic Improvement

    Authors: William B. Langdon, Westley Weimer, Christopher Timperley, Oliver Krauss, Zhen Yu Ding, Yiwei Lyu, Nicolas Chausseau, Eric Schulte, Shin Hwei Tan, Kevin Leach, Yu Huang, Gabin An

    Abstract: We report the discussion session at the sixth international Genetic Improvement workshop, GI-2019 @ ICSE, which was held as part of the 41st ACM/IEEE International Conference on Software Engineering on Tuesday 28th May 2019. Topics included GI representations, the maintainability of evolved code, automated software testing, future areas of GI research, such as co-evolution, and existing GI tools a… ▽ More

    Submitted 27 June, 2019; originally announced July 2019.

    Comments: University College London, Computer Science

    Report number: RN/19/02

    Journal ref: SIGSOFT Software Engineering Notes, 44(3) p25-29, July 2019

  20. arXiv:1907.02124  [pdf, other

    cs.LG cs.AI cs.CV cs.NE stat.ML

    Non-Structured DNN Weight Pruning -- Is It Beneficial in Any Platform?

    Authors: Xiaolong Ma, Sheng Lin, Shaokai Ye, Zhezhi He, Linfeng Zhang, Geng Yuan, Sia Huat Tan, Zhengang Li, Deliang Fan, Xuehai Qian, Xue Lin, Kaisheng Ma, Yanzhi Wang

    Abstract: Large deep neural network (DNN) models pose the key challenge to energy efficiency due to the significantly higher energy consumption of off-chip DRAM accesses than arithmetic or SRAM operations. It motivates the intensive research on model compression with two main approaches. Weight pruning leverages the redundancy in the number of weights and can be performed in a non-structured, which has high… ▽ More

    Submitted 7 January, 2020; v1 submitted 3 July, 2019; originally announced July 2019.

  21. arXiv:1906.10320  [pdf, other

    stat.ML cs.LG stat.AP

    From Non-Paying to Premium: Predicting User Conversion in Video Games with Ensemble Learning

    Authors: Anna Guitart, Shi Hui Tan, Ana Fernández del Río, Pei Pei Chen, África Periáñez

    Abstract: Retaining premium players is key to the success of free-to-play games, but most of them do not start purchasing right after joining the game. By exploiting the exceptionally rich datasets recorded by modern video games--which provide information on the individual behavior of each and every player--survival analysis techniques can be used to predict what players are more likely to become paying (or… ▽ More

    Submitted 30 June, 2019; v1 submitted 25 June, 2019; originally announced June 2019.

    Comments: social games, conversion prediction, ensemble methods, survival analysis, online games, user behavior

    Journal ref: ACM Foundations of Digital Games (FDG'2019), 97, 9, 2019

  22. arXiv:1905.12171  [pdf, other

    cs.LG cs.AI stat.ML

    Brain-inspired reverse adversarial examples

    Authors: Shaokai Ye, Sia Huat Tan, Kaidi Xu, Yanzhi Wang, Chenglong Bao, Kaisheng Ma

    Abstract: A human does not have to see all elephants to recognize an animal as an elephant. On contrast, current state-of-the-art deep learning approaches heavily depend on the variety of training samples and the capacity of the network. In practice, the size of network is always limited and it is impossible to access all the data samples. Under this circumstance, deep learning models are extremely fragile… ▽ More

    Submitted 27 May, 2019; originally announced May 2019.

    Comments: Preprint

  23. arXiv:1707.03139  [pdf, other

    cs.SE

    Partitioning Patches into Test-equivalence Classes for Scaling Program Repair

    Authors: Sergey Mechtaev, Xiang Gao, Shin Hwei Tan, Abhik Roychoudhury

    Abstract: Automated program repair is a problem of finding a transformation (called a patch) of a given incorrect program that eliminates the observable failures. It has important applications such as providing debugging aids, automatically grading assignments and patching security vulnerabilities. A common challenge faced by all existing repair techniques is scalability to large patch spaces, since there a… ▽ More

    Submitted 15 July, 2017; v1 submitted 11 July, 2017; originally announced July 2017.

  24. arXiv:1201.6078   

    cs.SE

    @tComment: Testing Javadoc Comments to Detect Comment-Code Inconsistencies

    Authors: Shin Hwei Tan, Darko Marinov, Lin Tan, Gary T. Leavens

    Abstract: This paper has been withdrawn by the author.

    Submitted 20 February, 2012; v1 submitted 29 January, 2012; originally announced January 2012.

    Comments: This paper has been withdrawn by the author