Proceedings of the 40th International Conference on Software Engineering
- Anthology ID:
- G18-144
- Month:
- Year:
- 2018
- Address:
- Venue:
- GWF
- SIG:
- Publisher:
- ACM
- URL:
- https://gwf-uwaterloo.github.io/gwf-publications/G18-144
- DOI:
CCAligner
Pengcheng Wang
|
Jeffrey Svajlenko
|
Yanzhao Wu
|
Yun Xu
|
Chanchal K. Roy
Copying code and then pasting with large number of edits is a common activity in software development, and the pasted code is a kind of complicated Type-3 clone. Due to large number of edits, we consider the clone as a large-gap clone. Large-gap clone can reflect the extension of code, such as change and improvement. The existing state-of-the-art clone detectors suffer from several limitations in detecting large-gap clones. In this paper, we propose a tool, CCAligner, using code window that considers e edit distance for matching to detect large-gap clones. In our approach, a novel e-mismatch index is designed and the asymmetric similarity coefficient is used for similarity measure. We thoroughly evaluate CCAligner both for large-gap clone detection, and for general Type-1, Type-2 and Type-3 clone detection. The results show that CCAligner performs better than other competing tools in large-gap clone detection, and has the best execution time for 10MLOC input with good precision and recall in general Type-1 to Type-3 clone detection. Compared with existing state-of-the-art tools, CCAligner is the best performing large-gap clone detection tool, and remains competitive with the best clone detectors in general Type-1, Type-2 and Type-3 clone detection.