Distribution shift datasets for source code classification
收藏DataCite Commons2022-08-25 更新2024-07-29 收录
下载链接:
https://figshare.com/articles/dataset/Distribution_shift_datasets_for_source_code_classification/19828294/1
下载链接
链接失效反馈官方服务:
资源简介:
Three collections of datasets: Python75, Java250-S, Python800-S. Each collection has the same structure of directories. For example: Python75.zip: 1. raw: raw data files scrapped from the online resources. 1.1. [task_name] 1.1.1. [submission].py: source code file 1.2. csv 1.2.1.[task_name].csv: the description (e.g., Submission_id, Task_name, User) of each code file for this task 2. task 2.1. pre-trained (files for pre-trained language models) 2.1.1.train.jsonl: code files and labels in the training set 2.1.2.id_test.jsonl: code files and labels in the ID test set 2.1.3.ood_test.jsonl: code files and labels in the OOD test set 2.2. token (files for the DNN models) 2.2.1.train 2.2.1.1. [task_name].tkn: tokens of source code files in this task for training 2.2.1.2. info.json: information of the programming language and number of tokens 2.2.1.3. problems.json: information of the data size of each task 2.2.2.id_test 2.2.2.1. [task_name].tkn: tokens of source code files in this task for ID test 2.2.2.2. info.json: information of the programming language and number of tokens 2.2.2.3. problems.json: information of the data size of each task 2.2.3.ood_test 2.2.3.1. [task_name].tkn: tokens of source code files in this task for OOD test 2.2.3.2. info.json: information of the programming language and number of tokens 2.2.3.3. problems.json: information of the data size of each task 3. -random: the same as task 4. -user: the same as task 5. -time: the same as task 6. -token: the same as task 7. -cst: the same as task <br> models.zip: trained models 1. cnns 1.1. [DNN name]-[data name]-[distribution shift type].h5 2. oe_detectors 2.1. [DNN name]-[data name]-[distribution shift type]-oe.h5 <br>
提供机构:
figshare
创建时间:
2022-08-25



