@article{Al-Omari-2020-SemanticCloneBench:,
title = "SemanticCloneBench: A Semantic Code Clone Benchmark using Crowd-Source Knowledge",
author = "Al-Omari, Farouq and
Roy, Chanchal K. and
Chen, Tonghao",
journal = "2020 IEEE 14th International Workshop on Software Clones (IWSC)",
year = "2020",
publisher = "IEEE",
url = "https://gwf-uwaterloo.github.io/gwf-publications/G20-41001",
doi = "10.1109/iwsc50091.2020.9047643",
abstract = "Not only do newly proposed code clone detection techniques, but existing techniques and tools also need to be evaluated and compared. This evaluation process could be done by assessing the reported clones manually or by using benchmarks. The main limitations of available benchmarks include: they are restricted to one programming language; they have a limited number of clone pairs that are confined within the selected system(s); they require manual validation; they do not support all types of code clones. To overcome these limitations, we proposed a methodology to generate a wide range of semantic clone benchmark(s) for different programming languages with minimal human validation. Our technique is based on the knowledge provided by developers who participate in the crowd-sourced information website, Stack Overflow. We applied automatic filtering, selection and validation to the source code in Stack Overflow answers. Finally, we build a semantic code clone benchmark of 4000 clones pairs for the languages Java, C, C{\#} and Python.",
}
<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3">
<mods ID="Al-Omari-2020-SemanticCloneBench:">
<titleInfo>
<title>SemanticCloneBench: A Semantic Code Clone Benchmark using Crowd-Source Knowledge</title>
</titleInfo>
<name type="personal">
<namePart type="given">Farouq</namePart>
<namePart type="family">Al-Omari</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Chanchal</namePart>
<namePart type="given">K</namePart>
<namePart type="family">Roy</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<name type="personal">
<namePart type="given">Tonghao</namePart>
<namePart type="family">Chen</namePart>
<role>
<roleTerm authority="marcrelator" type="text">author</roleTerm>
</role>
</name>
<originInfo>
<dateIssued>2020</dateIssued>
</originInfo>
<typeOfResource>text</typeOfResource>
<genre authority="bibutilsgt">journal article</genre>
<relatedItem type="host">
<titleInfo>
<title>2020 IEEE 14th International Workshop on Software Clones (IWSC)</title>
</titleInfo>
<originInfo>
<issuance>continuing</issuance>
<publisher>IEEE</publisher>
</originInfo>
<genre authority="marcgt">periodical</genre>
<genre authority="bibutilsgt">academic journal</genre>
</relatedItem>
<abstract>Not only do newly proposed code clone detection techniques, but existing techniques and tools also need to be evaluated and compared. This evaluation process could be done by assessing the reported clones manually or by using benchmarks. The main limitations of available benchmarks include: they are restricted to one programming language; they have a limited number of clone pairs that are confined within the selected system(s); they require manual validation; they do not support all types of code clones. To overcome these limitations, we proposed a methodology to generate a wide range of semantic clone benchmark(s) for different programming languages with minimal human validation. Our technique is based on the knowledge provided by developers who participate in the crowd-sourced information website, Stack Overflow. We applied automatic filtering, selection and validation to the source code in Stack Overflow answers. Finally, we build a semantic code clone benchmark of 4000 clones pairs for the languages Java, C, C# and Python.</abstract>
<identifier type="citekey">Al-Omari-2020-SemanticCloneBench:</identifier>
<identifier type="doi">10.1109/iwsc50091.2020.9047643</identifier>
<location>
<url>https://gwf-uwaterloo.github.io/gwf-publications/G20-41001</url>
</location>
<part>
<date>2020</date>
</part>
</mods>
</modsCollection>
%0 Journal Article
%T SemanticCloneBench: A Semantic Code Clone Benchmark using Crowd-Source Knowledge
%A Al-Omari, Farouq
%A Roy, Chanchal K.
%A Chen, Tonghao
%J 2020 IEEE 14th International Workshop on Software Clones (IWSC)
%D 2020
%I IEEE
%F Al-Omari-2020-SemanticCloneBench:
%X Not only do newly proposed code clone detection techniques, but existing techniques and tools also need to be evaluated and compared. This evaluation process could be done by assessing the reported clones manually or by using benchmarks. The main limitations of available benchmarks include: they are restricted to one programming language; they have a limited number of clone pairs that are confined within the selected system(s); they require manual validation; they do not support all types of code clones. To overcome these limitations, we proposed a methodology to generate a wide range of semantic clone benchmark(s) for different programming languages with minimal human validation. Our technique is based on the knowledge provided by developers who participate in the crowd-sourced information website, Stack Overflow. We applied automatic filtering, selection and validation to the source code in Stack Overflow answers. Finally, we build a semantic code clone benchmark of 4000 clones pairs for the languages Java, C, C# and Python.
%R 10.1109/iwsc50091.2020.9047643
%U https://gwf-uwaterloo.github.io/gwf-publications/G20-41001
%U https://doi.org/10.1109/iwsc50091.2020.9047643
Markdown (Informal)
[SemanticCloneBench: A Semantic Code Clone Benchmark using Crowd-Source Knowledge](https://gwf-uwaterloo.github.io/gwf-publications/G20-41001) (Al-Omari et al., GWF 2020)
ACL
- Farouq Al-Omari, Chanchal K. Roy, and Tonghao Chen. 2020. SemanticCloneBench: A Semantic Code Clone Benchmark using Crowd-Source Knowledge. 2020 IEEE 14th International Workshop on Software Clones (IWSC).