Raw C Code Corpus
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/3628774
下载链接
链接失效反馈官方服务:
资源简介:
A raw code corpus for the C programming language i.e., includes only the C source files of each repository without any preprocessing.
The corpus was used to generate the C training, validation, testing, and BPE encoding sets for the experiments performed in the paper: Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code.
创建时间:
2020-01-29



