GPTCloneBench
收藏arXiv2023-09-02 更新2024-06-21 收录
下载链接:
https://shorturl.at/jvxOV
下载链接
链接失效反馈官方服务:
资源简介:
GPTCloneBench是由萨斯喀彻温大学计算机科学系创建的一个综合性的语义克隆和跨语言克隆数据集。该数据集通过利用SemanticCloneBench和OpenAI的GPT-3模型,从79,928个克隆对中筛选出37,149个真实的语义克隆对、19,288个错误语义克隆对和20,770个跨语言克隆对。数据集的创建过程包括使用GPT-3模型生成克隆,随后进行手动验证、功能测试和自动验证,确保克隆对的质量。GPTCloneBench的应用领域主要集中在软件工程中,旨在解决语义克隆和跨语言克隆的检测问题,为机器学习模型提供训练数据,以提高克隆检测工具的准确性和效率。
GPTCloneBench is a comprehensive semantic clone and cross-language clone dataset developed by the Department of Computer Science at the University of Saskatchewan. By leveraging SemanticCloneBench and OpenAI's GPT-3 model, this dataset filters 37,149 genuine semantic clone pairs, 19,288 erroneous semantic clone pairs and 20,770 cross-language clone pairs from a total of 79,928 clone pairs. The dataset creation process involves generating clones using the GPT-3 model, followed by manual validation, functional testing and automatic verification to ensure the quality of the clone pairs. GPTCloneBench is primarily applied in the field of software engineering, aiming to address the detection of semantic clones and cross-language clones, providing training data for machine learning models to enhance the accuracy and efficiency of clone detection tools.
提供机构:
萨斯喀彻温大学计算机科学系
创建时间:
2023-08-27



