2021
Many organizations use legacy systems as these systems contain their valuable business rules. However, these legacy systems answer the past requirements but are difficult to maintain and evolve due to old technology use. In this situation, stockholders decide to renovate the system with a minimum amount of cost and risk. Although the renovation process is a more affordable choice over redevelopment, it comes with its risks such as performance loss and failure to obtain quality goals. A proper test process can minimize risks incorporated with the renovation process. This work introduces a testing model tailored for the migration and re-engineering process and employs test automation, which results in early bug detection. Moreover, the automated tests ensure functional sameness between the old and the new system. This process enhances reliability, accuracy, and speed of testing.
Evolutionary coupling is a well investigated phenomenon in software maintenance research and practice. Association rules and two related measures, support and confidence, have been used to identify evolutionary coupling among program entities. However, these measures only emphasize the co-change (i.e., changing together) frequency of entities and cannot determine whether the entities co-evolved by experiencing related changes. Consequently, the approach reports false positives and fails to detect evolutionary coupling among infrequently co-changed entities. We propose a new measure, identifier correspondence (id-correspondence), that quantifies the extent to which changes that occurred to the co-changed entities are related based on identifier similarity. Identifiers are the names given to different program entities such as variables, methods, classes, packages, interfaces, structures, unions etc. We use Dice-Sørensen co-efficient for measuring lexical similarity between the identifiers involved in the changed lines of the co-changed entities. Our investigation on thousands of revisions from nine subject systems covering three programming languages shows that id-correspondence can considerably improve the detection accuracy of evolutionary coupling. It outperforms the existing state-of-the-art evolutionary coupling based techniques with significantly higher recall and F-score in predicting future co-change candidates.
When a programmer changes a particular code fragment, the other similar code fragments in the code-base may also need to be changed together (i.e., co-changed) consistently to ensure that the software system remains consistent. Existing studies and tools apply clone detectors to identify these similar co-change candidates for a target code fragment. However, clone detectors suffer from a confounding configuration choice problem and it affects their accuracy in retrieving co-change candidates.In our research, we propose and empirically evaluate a lightweight co-change suggestion technique that can automatically suggest fragment level similar co-change candidates for a target code fragment using WA-DiSC (Weighted Average Dice-Sørensen Co-efficient) through a context-sensitive mining of the entire code-base. We apply our technique, FLeCCS (Fragment Level Co-change Candidate Suggester), on six subject systems written in three different programming languages (Java, C, and C#) and compare its performance with the existing state-of-the-art techniques. According to our experiment, our technique outperforms not only the existing code clone based techniques but also the association rule mining based techniques in detecting co-change candidates with a significantly higher accuracy (precision and recall). We also find that File Proximity Ranking performs significantly better than Similarity Extent Ranking when ranking the co-change candidates suggested by our proposed technique.
2020
Code reuse by copying and pasting from one place to another place in a codebase is a very common scenario in software development which is also one of the most typical reasons for introducing code clones. There is a huge availability of tools to detect such cloned fragments and a lot of studies have already been done for efficient clone detection. There are also several studies for evaluating those tools considering their clone detection effectiveness. Unfortunately, we find no study which compares different clone detection tools in the perspective of detecting cloned co-change candidates during software evolution. Detecting cloned co-change candidates is essential for clone tracking. In this study, we wanted to explore this dimension of code clone research. We used six promising clone detection tools to identify cloned and non-cloned co-change candidates from six $C$ and Java-based subject systems and evaluated the performance of those clone detection tools in detecting the cloned co-change fragments. Our findings show that a good clone detector may not perform well in detecting cloned co-change candidates. The amount of unique lines covered by a clone detector and the number of detected clone fragments plays an important role in its performance. The findings of this study can enrich a new dimension of code clone research.
When a programmer makes changes to a target program entity (files, classes, methods), it is important to identify which other entities might also get impacted. These entities constitute the impact set for the target entity. Association rules have been widely used for discovering the impact sets. However, such rules only depend on the previous co-change history of the program entities ignoring the fact that similar entities might often need to be updated together consistently even if they did not co-change before. Considering this fact, we investigate whether cloning relationships among program entities can be associated with association rules to help us better identify the impact sets. In our research, we particularly investigate whether the impact set detection capability of a clone detector can be utilized to enhance the capability of the state-of-the-art association rule mining technique, Tarmaq, in discovering impact sets. We use the well known clone detector called NiCad in our investigation and consider both regular and micro-clones. Our evolutionary analysis on thousands of commit operations of eight diverse subject systems reveals that consideration of code clones can enhance the impact set detection accuracy of Tarmaq with a significantly higher precision and recall. Micro-clones of 3LOC and 4LOC and regular code clones of 5LOC to 20LOC contribute the most towards enhancing the detection accuracy.
Evolutionary coupling is a well investigated phenomenon during software evolution and maintenance. If two or more program entities co-change (i.e., change together) frequently during evolution, it is expected that the entities are coupled. This type of coupling is called evolutionary coupling or change coupling in the literature. Evolutionary coupling is realized using association rules and two measures: support and confidence. Association rules have been extensively used for predicting co-change candidates for a target program entity (i.e., an entity that a programmer attempts to change). However, association rules often predict a large number of co-change candidates with many false positives. Thus, it is important to rank the predicted co-change candidates so that the true positives get higher priorities. The predicted co-change candidates have always been ranked using the support and confidence measures of the association rules. In our research, we investigate five different ranking mechanisms on thousands of commits of ten diverse subject systems. On the basis of our findings, we propose a history-based ranking approach, HistoRank (History-based Ranking), that analyzes the previous ranking history to dynamically select the most appropriate one from those five ranking mechanisms for ranking co-change candidates of a target program entity. According to our experiment result, HistoRank outperforms each individual ranking mechanism with a significantly better MAP (mean average precision). We investigate different variants of HistoRank and realize that the variant that emphasizes the ranking in the most recent occurrence of co-change in the history performs the best.
Code clones are the same or nearly similar code fragments in a software system's code-base. While the existing studies have extensively studied regular code clones in software systems, micro-clones have been mostly ignored. Although an existing study investigated consistent changes in exact micro-clones, near-miss micro-clones have never been investigated. In our study, we investigate the importance of near-miss micro-clones in software evolution and maintenance by automatically detecting and analyzing the consistent updates that they experienced during the whole period of evolution of our subject systems. We compare the consistent co-change tendency of near-miss micro-clones with that of exact micro-clones and regular code clones. According to our investigation on thousands of revisions of six open-source subject systems written in two different programming languages, near-miss micro-clones have a significantly higher tendency of experiencing consistent updates compared to exact micro-clones and regular (both exact and near-miss) code clones. Consistent updates in near-miss micro-clones have a high tendency of being related with bug-fixes. Moreover, the percentage of commit operations where near-miss micro-clones experience consistent updates is considerably higher than that of regular clones and exact micro-clones. We finally observe that near-miss micro-clones staying in close proximity to each other have a high tendency of experiencing consistent updates. Our research implies that near-miss micro-clones should be considered equally important as of regular clones and exact micro-clones when making clone management decisions.
Abstract Code clones, identical or nearly similar code fragments in a software system’s code-base, have mixed impacts on software evolution and maintenance. Focusing on the issues of clones researchers suggest managing them through refactoring, and tracking. In this paper we present a survey on the state-of-the-art of clone refactoring and tracking techniques, and identify future research possibilities in these areas. We define the quality assessment features for the clone refactoring and tracking tools, and make a comparison among these tools considering these features. To the best of our knowledge, our survey is the first comprehensive study on clone refactoring and tracking. According to our survey on clone refactoring we realize that automatic refactoring cannot eradicate the necessity of manual effort regarding finding refactoring opportunities, and post refactoring testing of system behaviour. Post refactoring testing can require a significant amount of time and effort from the quality assurance engineers. There is a marked lack of research on the effect of clone refactoring on system performance. Future investigations in this direction will add much value to clone refactoring research. We also feel the necessity of future research towards real-time detection, and tracking of code clones in a big-data environment.
2019
Copying and pasting source code during software development is known as code cloning. Clone fragments with a minimum size of 5 LOC were usually considered in previous studies. In recent studies, clone fragments which are less than 5 LOC are referred as micro-clones. It has been established by the literature that code clones are closely related with software bugs as well as bug replication. None of the previous studies have been conducted on bug-replication of micro-clones. In this paper we investigate and compare bug-replication in between regular and micro-clones. For the purpose of our investigation, we analyze the evolutionary history of our subject systems and identify occurrences of similarity preserving co-changes (SPCOs) in both regular and micro-clones where they experienced bug-fixes. From our experiment on thousands of revisions of six diverse subject systems written in three different programming languages, C, C# and Java we find that the percentage of clone fragments that take part in bug-replication is often higher in micro-clones than in regular code clones. The percentage of bugs that get replicated in micro-clones is almost the same as the percentage in regular clones. Finally, both regular and micro-clones have similar tendencies of replicating severe bugs according to our experiment. Thus, micro-clones in a code-base should not be ignored. We should rather consider these equally important as of the regular clones when making clone management decisions.
Abstract With the era of big data approaching, the number of software systems, their dependencies, as well as the complexity of the individual system is becoming larger and more intricate. Understanding these evolving software systems is thus a primary challenge for cost-effective software management and maintenance. In this paper we perform a case study with evolving code clones. The programmers often need to manually analyze the co-evolution of clone fragments to decide about refactoring, tracking, and bug removal. However, manual analysis is time consuming, and nearly infeasible for a large number of clones, e.g., with millions of similarity pairs, where clones are evolving over hundreds of software revisions. We propose an interactive visual analytics system, Clone-World, which leverages big data visualization approach to manage code clones in large software systems. Clone-World, gives an intuitive yet powerful solution to the clone analytic problems. Clone-World combines multiple information-linked zoomable views, where users can explore and analyze clones through interactive exploration in real time. User studies and experts’ reviews suggest that Clone-World may assist developers in many real-life software development and maintenance scenarios. We believe that Clone-World will ease the management and maintenance of clones, and inspire future innovation to adapt visual analytics to manage big software systems.
Abstract Code clones are identical or nearly similar code fragments in a code-base. According to the existing studies, code clones are directly related to bugs. Code cloning, creating code clones, is suspected to propagate temporarily hidden bugs from one code fragment to another. However, there is no study on the intensity of bug-propagation through code cloning. In this paper, we define two clone evolutionary patterns that reasonably indicate bug propagation through code cloning. By analyzing software evolution history, we identify those code clones that evolved following the bug propagation patterns. According to our study on thousands of commits of seven subject systems, overall 18.42% of the clone fragments that experience bug-fixes contain propagated bugs. Type-3 clones are primarily involved with bug-propagation. Bug propagation is more likely to occur in the clone fragments that are created in the same commit rather than in different commits. Moreover, code clones residing in the same file have a higher possibility of containing propagated bugs compared to those residing in different files. Severe bugs can sometimes get propagated through code cloning. Automatic support for immediately identifying occurrences of bug-propagation can be beneficial for software maintenance. Our findings are important for prioritizing code clones for management.
The identical or nearly similar code fragments in a code-base are called code clones. There is a common belief that code cloning (copy/pasting code fragments) can introduce bugs in a software system if the copied code fragments are not properly adapted to their contexts (i.e., surrounding code). However, none of the existing studies have investigated whether such bugs are really present in code clones. We denote these bugs as Context Adaptation Bugs, or simply Context-Bugs, in our paper and investigate the extent to which they can be present in code clones. We define and automatically analyze two clone evolutionary patterns that indicate fixing of Context-Bugs. According to our analysis on thousands of revisions of six open-source subject systems written in Java, C, and C#, code cloning often introduces Context-Bugs in software systems. Around 50% of the clone related bug-fixes can occur for fixing Context-Bugs. Cloning (copy/pasting) a newly created code fragment (i.e., a code fragment that was not added in a former revision) is more likely to introduce Context-Bugs compared to cloning a preexisting fragment (i.e., a code fragment that was added in a former revision). Moreover, cloning across different files appears to have a significantly higher tendency of introducing Context-Bugs compared to cloning within the same file. Finally, Type 3 clones (gapped clones) have the highest tendency of containing Context-Bugs among the three major clone-types. Our findings can be important for early detection as well as removal of Context-Bugs in code clones.
2018
Workflows are frequently built and used to systematically process large datasets using workflow management systems (WMS). A workflow (i.e., a pipeline) is a finite set of processing modules organized as a series of steps that is applied to an input dataset to produce a desired output. In a workflow management system, users generally create workflows manually for their own investigations. However, workflows can sometimes be lengthy and the constituent processing modules might often be computationally expensive. In this situation, it would be beneficial if users could reuse intermediate stage results generated by previously executed workflows for executing their current workflow.In this paper, we propose a novel technique based on association rule mining for suggesting which intermediate stage results from a workflow that a user is going to execute should be stored for reusing in the future. We call our proposed technique, RISP (Recommending Intermediate States from Pipelines). According to our investigation on hundreds of workflows from two scientific workflow management systems, our proposed technique can efficiently suggest intermediate state results to store for future reuse. The results that are suggested to be stored have a high reuse frequency. Moreover, for creating around 51% of the entire pipelines, we can reuse results suggested by our technique. Finally, we can achieve a considerable gain (74% gain) in execution time by reusing intermediate results stored by the suggestions provided by our proposed technique. We believe that our technique (RISP) has the potential to have a significant positive impact on Big-Data systems, because it can considerably reduce execution time of the workflows through appropriate reuse of intermediate state results, and hence, can improve the performance of the systems.
If two or more program entities (such as files, classes, methods) co-change (i.e., change together) frequently during software evolution, then it is likely that these two entities are coupled (i.e., the entities are related). Such a coupling is termed as evolutionary coupling in the literature. The concept of traditional evolutionary coupling restricts us to assume coupling among only those entities that changed together in the past. The entities that did not co-change in the past might also have coupling. However, such couplings can not be retrieved using the current concept of detecting evolutionary coupling in the literature. In this paper, we investigate whether we can detect such couplings by applying transitive rules on the evolutionary couplings detected using the traditional mechanism. We call these couplings that we detect using our proposed mechanism as transitive evolutionary couplings. According to our research on thousands of revisions of four subject systems, transitive evolutionary couplings combined with the traditional ones provide us with 13.96% higher recall and 5.56% higher precision in detecting future co-change candidates when compared with a state-of-the-art technique.