CLCDSA
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/Kawser-nerd/CLCDSA/
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为CLCDSA,由来自在线评测和竞赛平台的四种编程语言(C++、C#、Java和Python)的代码片段组成,涵盖了不同语言对编程问题的多种解决方案。该数据集经过筛选,仅包含可编译的源代码文件,并按照6:2:2的比例划分为训练集、验证集和测试集。每个编程问题都关联有多个解决方案。该数据集的任务是进行跨语言的源代码匹配。
This dataset, named CLCDSA, consists of code snippets from four programming languages (C++, C#, Java, and Python) sourced from online judging and competitive programming platforms, covering diverse solutions to programming problems across different languages. The dataset has been filtered to only include compilable source code files, and is split into training, validation, and test sets with a 6:2:2 ratio. Each programming problem is associated with multiple solutions, and the core task of this dataset is cross-lingual source code matching.
提供机构:
Kawser-nerd



