project-droid/Category-2-final
收藏Hugging Face2025-01-19 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/project-droid/Category-2-final
下载链接
链接失效反馈官方服务:
资源简介:
这个数据集包含了来自The-Vault-function和The-Vault-class的两个数据集去重后的数据。数据集中的代码都是AST可解析的,并且深度在2.0到31.0之间。代码的最大行长度在12.0到400.0字符之间,平均行长度在5.0到140.0字符之间。字母数字比例大于0.2且小于0.9,行数在6.0到300.0之间。只有文档字符串语言置信度大于98%的样本被选中,并且使用了MinHash方法进行样本去重。
This dataset contains deduplicated data from The-Vault-function and The-Vault-class. The codes in the dataset are AST parseable with depths ranging from 2.0 to 31.0. The maximum line length of the codes is between 12.0 to 400.0 characters, and the average line length is between 5.0 to 140.0 characters. The alphanumeric fraction of the samples is greater than 0.2 and less than 0.9, and the number of lines is between 6.0 to 300.0. Only samples with an English language confidence greater than 98% in their docstrings are selected, and deduplication is performed using the MinHash method.
提供机构:
project-droid



